Commit Graph

5115 Commits

Author SHA1 Message Date
brian m. carlson
7eda0e4fbb streaming: make stream_blob_to_fd take struct object_id
Since all of its callers have been updated, modify stream_blob_to_fd to
take a struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
acad70d106 builtin: convert textconv_object to use struct object_id
Since all of its callers have been updated, make textconv_object take a
struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
63ecb99e0d builtin/cat-file: convert some static functions to struct object_id
Convert all of the static functions that are not callbacks to use struct
object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
cd4f77beb7 builtin/cat-file: convert struct expand_data to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib,
plus the actual change to the struct:

@@
struct expand_data E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct expand_data *E1;
@@
- E1->sha1
+ E1->oid.hash

@@
struct expand_data E1;
@@
- E1.delta_base_sha1
+ E1.delta_base_oid.hash

@@
struct expand_data *E1;
@@
- E1->delta_base_sha1
+ E1->delta_base_oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
d801627b0c builtin/log: convert some static functions to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
a7bcfa126b builtin/blame: convert struct origin to use struct object_id
Convert struct origin to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib,
plus the actual change to the struct:

@@
struct origin E1;
@@
- E1.blob_sha1
+ E1.blob_oid.hash

@@
struct origin *E1;
@@
- E1->blob_sha1
+ E1->blob_oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
eb1c9c7328 builtin/apply: convert static functions to struct object_id
There were several static functions using unsigned char arrays for SHA-1
values.  Convert them to use struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
brian m. carlson
99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
Christian Couder
edfac5ebff builtin/am: use apply API in run_apply()
This replaces run_apply() implementation with a new one that
uses the apply API that has been previously prepared in
apply.c and apply.h.

This shoud improve performance a lot in certain cases.

As the previous implementation was creating a new `git apply`
process to apply each patch, it could be slow on systems like
Windows where it is costly to create new processes.

Also the new `git apply` process had to read the index from
disk, and when the process was done the calling process
discarded its own index and read back from disk the new
index that had been created by the `git apply` process.

This could be very inefficient with big repositories that
have big index files, especially when the system decided
that it was a good idea to run the `git apply` processes on
a different processor core.

Also eliminating index reads enables further performance
improvements by using:

`git update-index --split-index`

For example here is a benchmark of a multi hundred commit
rebase on the Linux kernel on a Debian laptop with SSD:

command: git rebase --onto 1993b17 52bef0c 29dde7c

Vanilla "next" without split index:                1m54.953s
Vanilla "next" with split index:                   1m22.476s
This series on top of "next" without split index:  1m12.034s
This series on top of "next" with split index:     0m15.678s

(using branch "next" from mid April 2016.)

Benchmarked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:54 -07:00
Christian Couder
7e1bad24e3 apply: refactor git apply option parsing
Parsing `git apply` options can be useful to other commands that
want to call the libified apply functionality, because this way
they can easily pass some options from their own command line to
the libified apply functionality.

This will be used by `git am` in a following patch.

To make this possible, let's refactor the `git apply` option
parsing code into a new libified apply_parse_options() function.

Doing that makes it possible to remove some functions definitions
from "apply.h" and make them static in "apply.c".

Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder
a46160d27e apply: make it possible to silently apply
This changes 'int apply_verbosely' into 'enum apply_verbosity', and
changes the possible values of the variable from a bool to
a tristate.

The previous 'false' state is changed into 'verbosity_normal'.
The previous 'true' state is changed into 'verbosity_verbose'.

The new added state is 'verbosity_silent'. It should prevent
anything to be printed on both stderr and stdout.

This is needed because `git am` wants to first call apply
functionality silently, if it can then fall back on 3-way merge
in case of error.

Printing on stdout, and calls to warning() or error() are not
taken care of in this patch, as that will be done in following
patches.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder
13b5af22f3 apply: move libified code from builtin/apply.c to apply.{c,h}
As most of the apply code in builtin/apply.c has been libified by a number of
previous commits, it can now be moved to apply.{c,h}, so that more code can
use it.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder
803bf4e012 apply: rename and move opt constants to apply.h
The constants for the "inaccurate-eof" and the "recount" options will
be used in both "apply.c" and "builtin/apply.c", so they need to go
into "apply.h", and therefore they need a name that is more specific
to the API they belong to.

Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder
da8e30dcd9 builtin/apply: rename option parsing functions
As these functions are going to be part of the libified
apply API, let's give them a name that is more specific
to the apply API.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder
603752a88d builtin/apply: make create_one_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", create_one_file() should return -1 instead of
calling exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Christian Couder
739d8a16b5 builtin/apply: make try_create_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", try_create_file() should return -1 in case of
error.

Unfortunately try_create_file() currently returns -1 to signal a
recoverable error. To fix that, let's make it return 1 in case of
a recoverable error and -1 in case of an unrecoverable error.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:29:53 -07:00
Jeff King
b773ddea2c pack-objects: walk tag chains for --include-tag
When pack-objects is given --include-tag, it peels each tag
ref down to a non-tag object, and if that non-tag object is
going to be packed, we include the tag, too. But what
happens if we have a chain of tags (e.g., tag "A" points to
tag "B", which points to commit "C")?

We'll peel down to "C" and realize that we want to include
tag "A", but we do not ever consider tag "B", leading to a
broken pack (assuming "B" was not otherwise selected).
Instead, we have to walk the whole chain, adding any tags we
find to the pack.

Interestingly, it doesn't seem possible to trigger this
problem with "git fetch", but you can with "git clone
--single-branch". The reason is that we generate the correct
pack when the client explicitly asks for "A" (because we do
a real reachability analysis there), and "fetch" is more
willing to do so. There are basically two cases:

  1. If "C" is already a ref tip, then the client can deduce
     that it needs "A" itself (via find_non_local_tags), and
     will ask for it explicitly rather than relying on the
     include-tag capability. Everything works.

  2. If "C" is not already a ref tip, then we hope for
     include-tag to send us the correct tag. But it doesn't;
     it generates a broken pack. However, the next step is
     to do a follow-up run of find_non_local_tags(),
     followed by fetch_refs() to backfill any tags we
     learned about.

     In the normal case, fetch_refs() calls quickfetch(),
     which does a connectivity check and sees we have no
     new objects to fetch. We just write the refs.

     But for the broken-pack case, the connectivity check
     fails, and quickfetch will follow-up with the remote,
     asking explicitly for each of the ref tips. This picks
     up the missing object in a new pack.

For a regular "git clone", we are similarly OK, because we
explicitly request all of the tag refs, and get a correct
pack. But with "--single-branch", we kick in tag
auto-following via "include-tag", but do _not_ do a
follow-up backfill. We just take whatever the server sent us
via include-tag and write out tag refs for any tag objects
we were sent. So prior to c6807a4 (clone: open a shortcut
for connectivity check, 2013-05-26), we actually claimed the
clone was a success, but the result was silently
corrupted!  Since c6807a4, index-pack's connectivity
check catches this case, and we correctly complain.

The included test directly checks that pack-objects does not
generate a broken pack, but also confirms that "clone
--single-branch" does not hit the bug.

Note that tag chains introduce another interesting question:
if we are packing the tag "B" but not the commit "C", should
"A" be included?

Both before and after this patch, we do not include "A",
because the initial peel_ref() check only knows about the
bottom-most level, "C". To realize that "B" is involved at
all, we would have to switch to an incremental peel, in
which we examine each tagged object, asking if it is being
packed (and including the outer tag if so).

But that runs contrary to the optimizations in peel_ref(),
which avoid accessing the objects at all, in favor of using
the value we pull from packed-refs. It's OK to walk the
whole chain once we know we're going to include the tag (we
have to access it anyway, so the effort is proportional to
the pack we're generating). But for the initial selection,
we have to look at every ref. If we're only packing a few
objects, we'd still have to parse every single referenced
tag object just to confirm that it isn't part of a tag
chain.

This could be addressed if packed-refs stored the complete
tag chain for each peeled ref (in most cases, this would be
the same cost as now, as each "chain" is only a single
link). But given the size of that project, it's out of scope
for this fix (and probably nobody cares enough anyway, as
it's such an obscure situation). This commit limits itself
to just avoiding the creation of a broken pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:31 -07:00
Junio C Hamano
12cfa792b8 symbolic-ref -d: do not allow removal of HEAD
If you delete the symbolic-ref HEAD from a repository, Git no longer
considers the repository valid, and even "git symbolic-ref HEAD
refs/heads/master" would not be able to recover from that state
(although "git init" can, but that is a sure sign that you are
talking about a "broken" repository).

In the spirit similar to afe5d3d5 ("symbolic ref: refuse non-ref
targets in HEAD", 2009-01-29), forbid removal of HEAD to avoid
corrupting a repository.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-02 09:01:38 -07:00
Jacob Keller
660e113ce1 graph: add support for --line-prefix on all graph-aware output
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.

To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.

This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.

This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.

Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:09 -07:00
Junio C Hamano
4762bf36d9 Merge branch 'mh/blame-worktree'
* mh/blame-worktree:
  blame: fix segfault on untracked files
2016-08-31 10:03:50 -07:00
Ralf Thielow
af74128f4a help: introduce option --exclude-guides
Introduce option --exclude-guides to the help command.  With this option
being passed, "git help" will open man pages only for actual commands.

Since we know it is a command, we can use function help_unknown_command
to give the user advice on typos.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-30 16:09:41 -07:00
Junio C Hamano
a77598ef44 am: refactor read_author_script()
By splitting the part that reads from a file and the part that
parses the variable definitions from the contents, make the latter
can be more reusable in the future.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-30 12:36:42 -07:00
Thomas Gummerer
bc6b13a7d2 blame: fix segfault on untracked files
Since 3b75ee9 ("blame: allow to blame paths freshly added to the index",
2016-07-16) git blame also looks at the index to determine if there is a
file that was freshly added to the index.

cache_name_pos returns -pos - 1 in case there is no match is found, or
if the name matches, but the entry has a stage other than 0.  As git
blame should work for unmerged files, it uses strcmp to determine
whether the name of the returned position matches, in which case the
file exists, but is merely unmerged, or if the file actually doesn't
exist in the index.

If the repository is empty, or if the file would lexicographically be
sorted as the last file in the repository, -cache_name_pos - 1 is
outside of the length of the active_cache array, causing git blame to
segfault.  Guard against that, and die() normally to restore the old
behaviour.

Reported-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-29 11:57:33 -07:00
Junio C Hamano
3b1c6a9b6e Merge branch 'js/no-html-bypass-on-windows' into rt/help-unknown
* js/no-html-bypass-on-windows:
  Revert "display HTML in default browser using Windows' shell API"
2016-08-26 11:29:07 -07:00
Junio C Hamano
13e11ff707 Merge branch 'js/no-html-bypass-on-windows'
On Windows, help.browser configuration variable used to be ignored,
which has been corrected.

* js/no-html-bypass-on-windows:
  Revert "display HTML in default browser using Windows' shell API"
2016-08-25 13:55:06 -07:00
Jeff King
c08db5a2d0 receive-pack: allow a maximum input size to be specified
Receive-pack feeds its input to either index-pack or
unpack-objects, which will happily accept as many bytes as
a sender is willing to provide. Let's allow an arbitrary
cutoff point where we will stop writing bytes to disk.

Cleaning up what has already been written to disk is a
related problem that is not addressed by this patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Christian Couder
5ad2186733 unpack-objects: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git unpack-objects`, if the pack-file is too big.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Jeff King
411481be6f index-pack: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git index-pack`, if the pack-file is too big.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Jacob Keller
957ed3a56c format-patch: show 0/1 and 1/1 for singleton patch with cover letter
Change the default behavior of git-format-patch to generate numbered
sequence of 0/1 and 1/1 when generating both a cover-letter and a single
patch. This standardizes the cover letter to have 0/N which helps
distinguish the cover letter from the patch itself. Since the behavior
is easily changed via configuration as well as the use of -n and -N this
should be acceptable default behavior.

Add tests for the new default behavior.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-23 15:59:11 -07:00
Junio C Hamano
e6dab9f62f Merge branch 'sb/checkout-explit-detach-no-advice'
"git checkout --detach <branch>" used to give the same advice
message as that is issued when "git checkout <tag>" (or anything
that is not a branch name) is given, but asking with "--detach" is
an explicit enough sign that the user knows what is going on.  The
advice message has been squelched in this case.

* sb/checkout-explit-detach-no-advice:
  checkout: do not mention detach advice for explicit --detach option
2016-08-19 15:34:15 -07:00
Johannes Schindelin
6db5967d4e Revert "display HTML in default browser using Windows' shell API"
Since 4804aab (help (Windows): Display HTML in default browser using
Windows' shell API, 2008-07-13), Git for Windows used to call
`ShellExecute()` to launch the default Windows handler for `.html`
files.

The idea was to avoid going through a shell script, for performance
reasons.

However, this change ignores the `help.browser` config setting. Together
with browsing help not being a performance-critical operation, let's
just revert that patch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-19 13:47:28 -07:00
Stefan Beller
31224cbdc7 clone: recursive and reference option triggers submodule alternates
When `--recursive` and `--reference` is given, it is reasonable to
expect that the submodules are created with references to the submodules
of the given alternate for the superproject.

  An initial attempt to do this was presented to the mailing list, which
  used flags that are passed around ("--super-reference") that instructed
  the submodule clone to look for a reference in the submodules of the
  referenced superproject. This is not well thought out, as any further
  `submodule update` should also respect the initial setup.

  When a new  submodule is added to the superproject and the alternate
  of the superproject does not know about that submodule yet, we rather
  error out informing the user instead of being unclear if we did or did
  not use a submodules alternate.

To solve this problem introduce new options that store the configuration
for what the user wanted originally.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-17 17:19:11 -07:00
Junio C Hamano
2f664566c5 Merge branch 'jk/tighten-alloc'
Small code and comment clean-up.

* jk/tighten-alloc:
  receive-pack: use FLEX_ALLOC_MEM in queue_command()
  correct FLEXPTR_* example in comment
2016-08-17 14:07:46 -07:00
Stefan Beller
f7415b4d71 clone: implement optional references
In a later patch we want to try to create alternates for submodules,
but they might not exist in the referenced superproject. So add a way
to skip the non existing references and report them.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:45 -07:00
Stefan Beller
5e40800df2 clone: clarify option_reference as required
In the next patch we introduce optional references; To better distinguish
between optional and required references we rename the variable.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:42 -07:00
Stefan Beller
9eeea7d2bc clone: factor out checking for an alternate path
In a later patch we want to determine if a path is suitable as an
alternate from other commands than builtin/clone. Move the checking
functionality of `add_one_reference` to `compute_alternate_path` that is
defined in cache.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:01 -07:00
Stefan Beller
779b88a91f checkout: do not mention detach advice for explicit --detach option
When a user asked for a detached HEAD specifically with `--detach`,
we do not need to give advice on what a detached HEAD state entails as
we can assume they know what they're getting into as they asked for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:01:45 -07:00
René Scharfe
ddd0bfac7c receive-pack: use FLEX_ALLOC_MEM in queue_command()
Use the macro FLEX_ALLOC_MEM instead of open-coding it.  This shortens
and simplifies the code a bit.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-13 19:49:30 -07:00
Stefan Beller
5f50f33e87 submodule--helper update-clone: allow multiple references
Allow the user to pass in multiple references to update_clone.
Currently this is only internal API, but once the shell script is
replaced by a C version, this is needed.

This fixes an API bug between the shell script and the helper.
Currently the helper accepts "--reference" "--reference=foo"
as a OPT_STRING whose value happens to be "--reference=foo", and
then uses

        if (suc->reference)
                argv_array_push(&child->args, suc->reference)

where suc->reference _is_ "--reference=foo" when invoking the
underlying "git clone", it cancels out.

With this change we omit one of the "--reference" arguments when
passing references from the shell script to the helper.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 15:00:17 -07:00
Stefan Beller
965dbea09a submodule--helper module-clone: allow multiple references
Allow users to pass in multiple references, just as clone accepts multiple
references as well.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-12 15:00:17 -07:00
Junio C Hamano
dd610aeda6 Merge branch 'kw/patch-ids-optim'
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.

* kw/patch-ids-optim:
  rebase: avoid computing unnecessary patch IDs
  patch-ids: add flag to create the diff patch id using header only data
  patch-ids: replace the seen indicator with a commit pointer
  patch-ids: stop using a hand-rolled hashmap implementation
2016-08-12 09:47:39 -07:00
Junio C Hamano
2c44b7a53b Merge branch 'js/mv-dir-to-new-directory'
"git mv dir non-existing-dir/" did not work in some environments
the same way as existing mainstream platforms.  The code now moves
"dir" to "non-existing-dir", without relying on rename("A", "B/")
that strips the trailing slash of '/'.

* js/mv-dir-to-new-directory:
  git mv: do not keep slash in `git mv dir non-existing-dir/`
2016-08-12 09:47:37 -07:00
Junio C Hamano
0a315befa7 Merge branch 'rs/use-strbuf-add-unique-abbrev'
A small code clean-up.

* rs/use-strbuf-add-unique-abbrev:
  use strbuf_add_unique_abbrev() for adding short hashes
2016-08-12 09:47:37 -07:00
Junio C Hamano
b32d7c524b Merge branch 'rs/merge-add-strategies-simplification'
A small code clean-up.

* rs/merge-add-strategies-simplification:
  merge: use string_list_split() in add_strategies()
2016-08-12 09:47:36 -07:00
Junio C Hamano
18f3ce8841 Merge branch 'rs/child-process-init'
A small code clean-up.

* rs/child-process-init:
  use CHILD_PROCESS_INIT to initialize automatic variables
2016-08-12 09:47:36 -07:00
Junio C Hamano
2f9c615efb Merge branch 'sb/submodule-clone-retry'
Fix-up to an error codepath in a topic already in 'master'.

* sb/submodule-clone-retry:
  submodule--helper: use parallel processor correctly
2016-08-12 09:47:34 -07:00
Junio C Hamano
f4fd627661 Merge branch 'jk/reset-ident-time-per-commit' into maint
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.

* jk/reset-ident-time-per-commit:
  am: reset cached ident date for each patch
2016-08-12 09:16:56 -07:00
Kevin Willford
b3dfeebb92 rebase: avoid computing unnecessary patch IDs
The `rebase` family of Git commands avoid applying patches that were
already integrated upstream. They do that by using the revision walking
option that computes the patch IDs of the two sides of the rebase
(local-only patches vs upstream-only ones) and skipping those local
patches whose patch ID matches one of the upstream ones.

In many cases, this causes unnecessary churn, as already the set of
paths touched by a given commit would suffice to determine that an
upstream patch has no local equivalent.

This hurts performance in particular when there are a lot of upstream
patches, and/or large ones.

Therefore, let's introduce the concept of a "diff-header-only" patch ID,
compare those first, and only evaluate the "full" patch ID lazily.

Please note that in contrast to the "full" patch IDs, those
"diff-header-only" patch IDs are prone to collide with one another, as
adjacent commits frequently touch the very same files. Hence we now
have to be careful to allow multiple hash entries with the same hash.
We accomplish that by using the hashmap_add() function that does not even
test for hash collisions.  This also allows us to evaluate the full patch ID
lazily, i.e. only when we found commits with matching diff-header-only
patch IDs.

We add a performance test that demonstrates ~1-6% improvement.  In
practice this will depend on various factors such as how many upstream
changes and how big those changes are along with whether file system
caches are cold or warm.  As Git's test suite has no way of catching
performance regressions, we also add a regression test that verifies
that the full patch ID computation is skipped when the diff-header-only
computation suffices.

Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 14:39:16 -07:00
Christian Couder
ccceb7bb13 builtin/apply: make write_out_results() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", write_out_results() should return -1 instead of
calling exit().

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
434389deb1 builtin/apply: make write_out_one_result() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", write_out_one_result() should just return what
remove_file() and create_file() are returning instead of calling
exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
8f5b5445d7 builtin/apply: make create_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", create_file() should just return what
add_conflicted_stages_file() and add_index_file() are returning
instead of calling exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
69e1609f81 builtin/apply: make add_index_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", add_index_file() should return -1 instead of
calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
a902edceeb builtin/apply: make add_conflicted_stages_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", add_conflicted_stages_file() should return -1
instead of calling die().

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
6e8df31469 builtin/apply: make remove_file() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", remove_file() should return -1 instead of
calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
fe41b80225 builtin/apply: make build_fake_ancestor() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", build_fake_ancestor() should return -1 instead
of calling die().

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
119ab159e6 builtin/apply: change die_on_unsafe_path() to check_unsafe_path()
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", die_on_unsafe_path() should return a negative
integer instead of calling die(), so while doing that let's change
its name to check_unsafe_path().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
dbf1b5fb6a builtin/apply: make gitdiff_*() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", gitdiff_*() functions should return -1 instead
of calling die().

A previous patch made it possible for gitdiff_*() functions to
return -1 in case of error. Let's take advantage of that to
make gitdiff_verify_name() return -1 on error, and to have
gitdiff_oldname() and gitdiff_newname() directly return
what gitdiff_verify_name() returns.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
70af7662d4 builtin/apply: make gitdiff_*() return 1 at end of header
The gitdiff_*() functions that are called as p->fn() in parse_git_header()
should return 1 instead of -1 in case of end of header or unrecognized
input, as these are not real errors. It just instructs the parser to break
out.

This makes it possible for gitdiff_*() functions to return -1 in case of a
real error. This will be done in a following patch.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
9724e6ff48 builtin/apply: make parse_traditional_patch() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", parse_traditional_patch() should return -1
instead of calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
fef7ba5353 builtin/apply: make apply_all_patches() return 128 or 1 on error
To finish libifying the apply functionality, apply_all_patches() should not
die() or exit() in case of error, but return either 128 or 1, so that it
gives the same exit code as when die() or exit(1) is called. This way
scripts relying on the exit code don't need to be changed.

While doing that we must take care that file descriptors are properly closed
and, if needed, reset to a sensible value.

Also, according to the lockfile API, when finished with a lockfile, one
should either commit it or roll it back.

This is even more important now that the same lockfile can be passed
to init_apply_state() many times to be reused by series of calls to
the apply lib functions.

Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
b6446d54ec builtin/apply: move check_apply_state() to apply.c
To libify `git apply` functionality we must make check_apply_state()
usable outside "builtin/apply.c".

Let's do that by moving it into "apply.c".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
f36538d88b builtin/apply: make check_apply_state() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", check_apply_state() should return -1 instead of
calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
2f5a6d1218 apply: make init_apply_state() return -1 instead of exit()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of exit()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", init_apply_state() should return -1 instead of
calling exit().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
bb493a5c14 builtin/apply: move init_apply_state() to apply.c
To libify `git apply` functionality we must make init_apply_state()
usable outside "builtin/apply.c".

Let's do that by moving it into a new "apply.c".

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
f95fdc256b builtin/apply: make parse_ignorewhitespace_option() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in "builtin/apply.c", parse_ignorewhitespace_option() should return
-1 instead of calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:47 -07:00
Christian Couder
aaf6c447aa builtin/apply: make parse_whitespace_option() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, parse_whitespace_option() should return -1 instead
of calling die().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder
dae197f753 builtin/apply: make parse_single_patch() return -1 on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, parse_single_patch() should return a negative
integer instead of calling die().

Let's do that by using error() and let's adjust the related test
cases accordingly.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder
b654b34c1c builtin/apply: make parse_chunk() return a negative integer on error
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing or exit()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, parse_chunk() should return a negative integer
instead of calling die() or exit().

As parse_chunk() is called only by apply_patch() which already
returns either -1 or -128 when an error happened, let's make it also
return -1 or -128.

This makes it compatible with what find_header() and parse_binary()
already return.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder
5950851e44 builtin/apply: make find_header() return -128 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.

To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, let's make find_header() return -128 instead of
calling die().

We could make it return -1, unfortunately find_header() already
returns -1 when no header is found.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder
3bee345d7b builtin/apply: read_patch_file() return -1 instead of die()ing
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing. Let's do that by returning -1 instead of
die()ing in read_patch_file().

Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder
f07a9f7643 builtin/apply: make apply_patch() return -1 or -128 instead of die()ing
To libify `git apply` functionality we have to signal errors
to the caller instead of die()ing.

As a first step in this direction, let's make apply_patch() return
-1 or -128 in case of errors instead of dying. For now its only
caller apply_all_patches() will exit(128) when apply_patch()
return -128 and it will exit(1) when it returns -1.

We exit() with code 128 because that was what die() was doing
and we want to keep the distinction between exiting with code 1
and exiting with code 128.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder
71501a71d0 apply: move 'struct apply_state' to apply.h
To libify `git apply` functionality we must make 'struct apply_state'
usable outside "builtin/apply.c".

Let's do that by creating a new "apply.h" and moving
'struct apply_state' there.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:46 -07:00
Christian Couder
4d5acae0ca apply: make some names more specific
To prepare for some structs and constants being moved from
builtin/apply.c to apply.h, we should give them some more
specific names to avoid possible name collisions in the global
namespace.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 12:41:09 -07:00
Jeff King
07e7dbf0db gc: default aggressive depth to 50
This commit message is long and has lots of background and
numbers. The summary is: the current default of 250 doesn't
save much space, and costs CPU. It's not a good tradeoff.
Read on for details.

The "--aggressive" flag to git-gc does three things:

  1. use "-f" to throw out existing deltas and recompute from
     scratch

  2. use "--window=250" to look harder for deltas

  3. use "--depth=250" to make longer delta chains

Items (1) and (2) are good matches for an "aggressive"
repack. They ask the repack to do more computation work in
the hopes of getting a better pack. You pay the costs during
the repack, and other operations see only the benefit.

Item (3) is not so clear. Allowing longer chains means fewer
restrictions on the deltas, which means potentially finding
better ones and saving some space. But it also means that
operations which access the deltas have to follow longer
chains, which affects their performance. So it's a tradeoff,
and it's not clear that the tradeoff is even a good one.

The existing "250" numbers for "--aggressive" come
originally from this thread:

  http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/

where Linus says:

  So when I said "--depth=250 --window=250", I chose those
  numbers more as an example of extremely aggressive
  packing, and I'm not at all sure that the end result is
  necessarily wonderfully usable. It's going to save disk
  space (and network bandwidth - the delta's will be re-used
  for the network protocol too!), but there are definitely
  downsides too, and using long delta chains may
  simply not be worth it in practice.

There are some numbers in that thread, but they're mostly
focused on the improved window size, and measure the
improvement from --depth=250 and --window=250 together.
E.g.:

  http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/

talks about the improved run-time of "git-blame", which
comes from the reduced pack size. But most of that reduction
is coming from --window=250, whereas most of the extra costs
come from --depth=250. There's a link in that thread showing
that increasing the depth beyond 50 doesn't seem to help
much with the size:

  https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html

but again, no discussion of the timing impact.

In an earlier thread from Ted Ts'o which discussed setting
the non-aggressive default (from 10 to 50):

  http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/

we have more numbers, with the conclusion that going past 50
does not help size much, and hurts the speed of normal
operations.

So from that, we might guess that 50 is actually a sweet
spot, even for aggressive, if we interpret aggressive to
"spend time now to make a better pack". It is not clear that
"--depth=250" is actually a better pack. It may be slightly
_smaller_, but it carries a run-time penalty.

Here are some more recent timings I did to verify that. They
show three things:

  - the size of the resulting pack (so disk saved to store,
    bandwidth saved on clones/fetches)

  - the cost of "rev-list --objects --all", which shows the
    effect of the delta chains on trees (commits typically
    don't delta, and the command doesn't touch the blobs at
    all)

  - the cost of "log -Sfoo", which will additionally access
    each blob

All cases were repacked with "git repack -adf --depth=$d
--window=250" (so basically, what would happen if we tweaked
the "gc --aggressive" default depth).

The timings are all wall-clock best-of-3. The machine itself
has plenty of RAM compared to the repositories (which is
probably typical of most workstations these days), so we're
really measuring CPU usage, as the whole thing will be in
disk cache after the first run.

The core.deltaBaseCacheLimit is at its default of 96MiB.
It's possible that tweaking it would have some impact on the
tests, as some of them (especially "log -S" on a large repo)
are likely to overflow that. But bumping that carries a
run-time memory cost, so for these tests, I focused on what
we could do just with the on-disk pack tradeoffs.

Each test is done for four depths: 250 (the current value),
50 (the current default that tested well previously), 100
(to show something on the larger side, which previous tests
showed was not a good tradeoff), and 10 (the very old
default, which previous tests showed was worse than 50).

Here are the numbers for linux.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  | 967MB |  n/a  | 48.159s  |   n/a  | 378.088   |   n/a
    100  | 971MB | +0.4% | 41.471s  | -13.9% | 342.060   |  -9.5%
     50  | 979MB | +1.2% | 37.778s  | -21.6% | 311.040s  | -17.7%
     10  | 1.1GB | +6.6% | 32.518s  | -32.5% | 279.890s  | -25.9%

and for git.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  |  48MB |  n/a  |  2.215s  |   n/a  |  20.922s  |   n/a
    100  |  49MB | +0.5% |  2.140s  |  -3.4% |  17.736s  | -15.2%
     50  |  49MB | +1.7% |  2.099s  |  -5.2% |  15.418s  | -26.3%
     10  |  53MB | +9.3% |  2.001s  |  -9.7% |  12.677s  | -39.4%

You can see that that the CPU savings for regular operations improves as we
decrease the depth. The savings are less for "rev-list" on a smaller repository
than they are for blob-accessing operations, or even rev-list on a larger
repository. This may mean that a larger delta cache would help (though setting
core.deltaBaseCacheLimit by itself doesn't).

But we can also see that the space savings are not that great as the depth goes
higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
probably not.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:53:19 -07:00
Jeff Hostetler
d9fc746cd7 status: print branch info with --porcelain=v2 --branch
Expand porcelain v2 output to include branch and tracking
branch information. This includes the commit id, the branch,
the upstream branch, and the ahead and behind counts.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:15:40 -07:00
Jeff Hostetler
1ecdecce62 status: collect per-file data for --porcelain=v2
Collect extra per-file data for porcelain V2 format.

The output of `git status --porcelain` leaves out many
details about the current status that clients might like
to have.  This can force them to be less efficient as they
may need to launch secondary commands (and try to match
the logic within git) to accumulate this extra information.
For example, a GUI IDE might want the file mode to display
the correct icon for a changed item (without having to stat
it afterwards).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:14:43 -07:00
Jeff King
c9af708b1a pack-objects: use mru list when iterating over packs
In the original implementation of want_object_in_pack(), we
always looked for the object in every pack, so the order did
not matter for performance.

As of the last few patches, however, we can now often break
out of the loop early after finding the first instance, and
avoid looking in the other packs at all. In this case, pack
order can make a big difference, because we'd like to find
the objects by looking at as few packs as possible.

This patch switches us to the same packed_git_mru list that
is now used by normal object lookups.

Here are timings for p5303 on linux.git:

Test                      HEAD^                HEAD
------------------------------------------------------------------------
5303.3: rev-list (1)      31.31(31.07+0.23)    31.28(31.00+0.27) -0.1%
5303.4: repack (1)        40.35(38.84+2.60)    40.53(39.31+2.32) +0.4%
5303.6: rev-list (50)     31.37(31.15+0.21)    31.41(31.16+0.24) +0.1%
5303.7: repack (50)       58.25(68.54+2.03)    47.28(57.66+1.89) -18.8%
5303.9: rev-list (1000)   31.91(31.57+0.33)    31.93(31.64+0.28) +0.1%
5303.10: repack (1000)    304.80(376.00+3.92)  87.21(159.54+2.84) -71.4%

The rev-list numbers are unchanged, which makes sense (they
are not exercising this code at all). The 50- and 1000-pack
repack cases show considerable improvement.

The single-pack repack case doesn't, of course; there's
nothing to improve. In fact, it gives us a baseline for how
fast we could possibly go. You can see that though rev-list
can approach the single-pack case even with 1000 packs,
repack doesn't. The reason is simple: the loop we are
optimizing is only part of what the repack is doing. After
the "counting" phase, we do delta compression, which is much
more expensive when there are multiple packs, because we
have fewer deltas we can reuse (you can also see that these
numbers come from a multicore machine; the CPU times are
much higher than the wall-clock times due to the delta
phase).

So the good news is that in cases with many packs, we used
to be dominated by the "counting" phase, and now we are
dominated by the delta compression (which is faster, and
which we have already parallelized).

Here are similar numbers for git.git:

Test                      HEAD^               HEAD
---------------------------------------------------------------------
5303.3: rev-list (1)      1.55(1.51+0.02)     1.54(1.53+0.00) -0.6%
5303.4: repack (1)        1.82(1.80+0.08)     1.82(1.78+0.09) +0.0%
5303.6: rev-list (50)     1.58(1.57+0.00)     1.58(1.56+0.01) +0.0%
5303.7: repack (50)       2.50(3.12+0.07)     2.31(2.95+0.06) -7.6%
5303.9: rev-list (1000)   2.22(2.20+0.02)     2.23(2.19+0.03) +0.5%
5303.10: repack (1000)    10.47(16.78+0.22)   7.50(13.76+0.22) -28.4%

Not as impressive in terms of percentage, but still
measurable wins.  If you look at the wall-clock time
improvements in the 1000-pack case, you can see that linux
improved by roughly 10x as many seconds as git. That's
because it has roughly 10x as many objects, and we'd expect
this improvement to scale linearly with the number of
objects (since the number of packs is kept constant). It's
just that the "counting" phase is a smaller percentage of
the total time spent for a git.git repack, and hence the
percentage win is smaller.

The implementation itself is a straightforward use of the
MRU code. We only bother marking a pack as used when we know
that we are able to break early out of the loop, for two
reasons:

  1. If we can't break out early, it does no good; we have
     to visit each pack anyway, so we might as well avoid
     even the minor overhead of managing the cache order.

  2. The mru_mark() function reorders the list, which would
     screw up our traversal. So it is only safe to mark when
     we are about to break out of the loop. We could record
     the found pack and mark it after the loop finishes, of
     course, but that's more complicated and it doesn't buy
     us anything due to (1).

Note that this reordering does have a potential impact on
the final pack, as we store only a single "found" pack for
each object, even if it is present in multiple packs. In
principle, any copy is acceptable, as they all refer to the
same content. But in practice, they may differ in whether
they are stored as deltas, against which base, etc. This may
have an impact on delta reuse, and even the delta search
(since we skip pairs that were already in the same pack).

It's not clear whether this change of order would hurt or
even help average cases, though. The most likely reason to
have duplicate objects is from the completion of thin packs
(e.g., you have some objects in a "base" pack, then receive
several pushes; the packs you receive may be thin on the
wire, with deltas that refer to bases outside the pack, but
we complete them with duplicate base objects when indexing
them).

In such a case the current code would always find the thin
duplicates (because we currently walk the packs in reverse
chronological order). Whereas with this patch, some of those
duplicates would be found in the base pack instead.

In my tests repacking a real-world case of linux.git with
3600 thin-pack pushes (on top of a large "base" pack), the
resulting pack was about 0.04% larger with this patch. On
the other hand, because we were more likely to hit the base
pack, there were more opportunities for delta reuse, and we
had 50,000 fewer objects to examine in the delta search.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:44:23 -07:00
Jeff King
4cf2143e02 pack-objects: break delta cycles before delta-search phase
We do not allow cycles in the delta graph of a pack (i.e., A
is a delta of B which is a delta of A) for the obvious
reason that you cannot actually access any of the objects in
such a case.

There's a last-ditch attempt to notice cycles during the
write phase, during which we issue a warning to the user and
write one of the objects out in full. However, this is
"last-ditch" for two reasons:

  1. By this time, it's too late to find another delta for
     the object, so the resulting pack is larger than it
     otherwise could be.

  2. The warning is there because this is something that
     _shouldn't_ ever happen. If it does, then either:

       a. a pack we are reusing deltas from had its own
          cycle

       b. we are reusing deltas from multiple packs, and
          we found a cycle among them (i.e., A is a delta of
          B in one pack, but B is a delta of A in another,
          and we choose to use both deltas).

       c. there is a bug in the delta-search code

     So this code serves as a final check that none of these
     things has happened, warns the user, and prevents us
     from writing a bogus pack.

Right now, (2b) should never happen because of the static
ordering of packs in want_object_in_pack(). If two objects
have a delta relationship, then they must be in the same
pack, and therefore we will find them from that same pack.

However, a future patch would like to change that static
ordering, which will make (2b) a common occurrence. In
preparation, we should be able to handle those kinds of
cycles better. This patch does by introducing a
cycle-breaking step during the get_object_details() phase,
when we are deciding which deltas can be reused. That gives
us the chance to feed the objects into the delta search as
if the cycle did not exist.

We'll leave the detection and warning in the write_object()
phase in place, as it still serves as a check for case (2c).

This does mean we will stop warning for (2a). That case is
caused by bogus input packs, and we ideally would warn the
user about it.  However, since those cycles show up after
picking reusable deltas, they look the same as (2b) to us;
our new code will break the cycles early and the last-ditch
check will never see them.

We could do analysis on any cycles that we find to
distinguish the two cases (i.e., it is a bogus pack if and
only if every delta in the cycle is in the same pack), but
we don't need to. If there is a cycle inside a pack, we'll
run into problems not only reusing the delta, but accessing
the object data at all. So when we try to dig up the actual
size of the object, we'll hit that same cycle and kick in
our usual complain-and-try-another-source code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:44:13 -07:00
Jeff King
27b5c1a065 provide an initializer for "struct object_info"
An all-zero initializer is fine for this struct, but because
the first element is a pointer, call sites need to know to
use "NULL" instead of "0". Otherwise some static checkers
like "sparse" will complain; see d099b71 (Fix some sparse
warnings, 2013-07-18) for example.  So let's provide an
initializer to make this easier to get right.

But let's also comment that memset() to zero is explicitly
OK[1]. One of the callers embeds object_info in another
struct which is initialized via memset (expand_data in
builtin/cat-file.c). Since our subset of C doesn't allow
assignment from a compound literal, handling this in any
other way is awkward, so we'd like to keep the ability to
initialize by memset(). By documenting this property, it
should make anybody who wants to change the initializer
think twice before doing so.

There's one other caller of interest. In parse_sha1_header(),
we did not initialize the struct fully in the first place.
This turned out not to be a bug because the sub-function it
calls does not look at any other fields except the ones we
did initialize. But that assumption might not hold in the
future, so it's a dangerous construct. This patch switches
it to initializing the whole struct, which protects us
against unexpected reads of the other fields.

[1] Obviously using memset() to initialize a pointer
    violates the C standard, but we long ago decided that it
    was an acceptable tradeoff in the real world.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 10:42:23 -07:00
Junio C Hamano
11b53957ac Merge branch 'sb/submodule-update-dot-branch'
A few updates to "git submodule update".

Use of "| wc -l" break with BSD variant of 'wc'.

* sb/submodule-update-dot-branch:
  t7406: fix breakage on OSX
  submodule update: allow '.' for branch value
  submodule--helper: add remote-branch helper
  submodule-config: keep configured branch around
  submodule--helper: fix usage string for relative-path
  submodule update: narrow scope of local variable
  submodule update: respect depth in subsequent fetches
  t7406: future proof tests with hard coded depth
2016-08-10 12:33:20 -07:00
Junio C Hamano
1a5f1a3f25 Merge branch 'js/am-3-merge-recursive-direct'
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.

* js/am-3-merge-recursive-direct:
  merge-recursive: flush output buffer even when erroring out
  merge_trees(): ensure that the callers release output buffer
  merge-recursive: offer an option to retain the output in 'obuf'
  merge-recursive: write the commit title in one go
  merge-recursive: flush output buffer before printing error messages
  am -3: use merge_recursive() directly again
  merge-recursive: switch to returning errors instead of dying
  merge-recursive: handle return values indicating errors
  merge-recursive: allow write_tree_from_memory() to error out
  merge-recursive: avoid returning a wholesale struct
  merge_recursive: abort properly upon errors
  prepare the builtins for a libified merge_recursive()
  merge-recursive: clarify code in was_tracked()
  die(_("BUG")): avoid translating bug messages
  die("bug"): report bugs consistently
  t5520: verify that `pull --rebase` shows the helpful advice when failing
2016-08-10 12:33:20 -07:00
Junio C Hamano
7a3ea66633 Merge branch 'js/commit-slab-decl-fix'
* js/commit-slab-decl-fix:
  commit-slab.h: avoid duplicated global static variables
  config.c: avoid duplicated global static variables
2016-08-10 12:33:20 -07:00
Junio C Hamano
db40a62239 Merge branch 'jt/format-patch-from-config'
"git format-patch" learned format.from configuration variable to
specify the default settings for its "--from" option.

* jt/format-patch-from-config:
  format-patch: format.from gives the default for --from
2016-08-10 12:33:18 -07:00
Junio C Hamano
24fbe00490 Merge branch 'jk/reset-ident-time-per-commit'
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.

* jk/reset-ident-time-per-commit:
  am: reset cached ident date for each patch
2016-08-10 12:33:17 -07:00
Junio C Hamano
574a31b5b7 Merge branch 'rs/use-strbuf-addstr' into maint
* rs/use-strbuf-addstr:
  use strbuf_addstr() instead of strbuf_addf() with "%s"
  use strbuf_addstr() for adding constant strings to a strbuf
2016-08-10 11:55:34 -07:00
Junio C Hamano
d9d7ab3b1d Merge branch 'os/no-verify-skips-commit-msg-too' into maint
"git commit --help" said "--no-verify" is only about skipping the
pre-commit hook, and failed to say that it also skipped the
commit-msg hook.

* os/no-verify-skips-commit-msg-too:
  commit: describe that --no-verify skips the commit-msg hook in the help text
2016-08-10 11:55:25 -07:00
Junio C Hamano
b7fb136bf6 Merge branch 'rs/rm-strbuf-optim' into maint
The use of strbuf in "git rm" to build filename to remove was a bit
suboptimal, which has been fixed.

* rs/rm-strbuf-optim:
  rm: reuse strbuf for all remove_dir_recursively() calls
2016-08-10 11:55:24 -07:00
Junio C Hamano
60b84ba26c Merge branch 'jk/parse-options-concat' into maint
Users of the parse_options_concat() API function need to allocate
extra slots in advance and fill them with OPT_END() when they want
to decide the set of supported options dynamically, which makes the
code error-prone and hard to read.  This has been corrected by tweaking
the API to allocate and return a new copy of "struct option" array.

* jk/parse-options-concat:
  parse_options: allocate a new array when concatenating
2016-08-10 11:55:24 -07:00
Stefan Beller
2201ee09b5 submodule--helper: use parallel processor correctly
When developing another patch series I had a temporary state in
which git-clone would segfault, when the call was prepared in
prepare_to_clone_next_submodule. This lead to the call failing,
i.e. in `update_clone_task_finished` the task was scheduled to be
tried again.  The second call to prepare_to_clone_next_submodule
would return 0, as the segfaulted clone did create the .git file
already, such that was not considered to need to be cloned again. I
was seeing the "BUG: ce was a submodule before?\n" message, which
was the correct behavior at the time as my local code was
buggy. When trying to debug this failure, I tried to use printing
messages into the strbuf that is passed around, but these messages
were never printed as the die(..) doesn't flush the `err` strbuf.

When implementing the die() in 665b35ecc (2016-06-09, "submodule--helper:
initial clone learns retry logic"), I considered this condition to be
a severe condition, which should lead to an immediate abort as we do not
trust ourselves any more. However the queued messages in `err` are valuable
so let's not toss them out by immediately dying, but a graceful return.

Another thing to note: The error message itself was misleading. A return
value of 0 doesn't indicate the passed in `ce` is not a submodule any more,
but just that we do not consider cloning it any more.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-09 14:54:16 -07:00
Johannes Sixt
dc29ddebb9 config.c: avoid duplicated global static variables
Repeating the definition of a static variable seems to be valid in C.
Nevertheless, it is bad style because it can cause confusion, definitely
when it becomes necessary to change the type.

d64ec16 (git config: reorganize to use parseopt, 2009-02-21) added two
static variables near the top of the file config.c without removing the
definitions of the two variables that occurs later in the file.

The two variables were needed earlier in the file in the newly
introduced parseopt structure. These references were removed later in
d0e08d6 (config: fix parsing of "git config --get-color some.key -1",
2014-11-20).

Remove the redundant, younger, definitions near the top of the file and
keep the original definitions that occur later.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-09 10:19:24 -07:00
Junio C Hamano
19492555ca Merge branch 'jk/parseopt-string-list'
A small memory leak in the command line parsing of "git blame"
has been plugged.

* jk/parseopt-string-list:
  blame: drop strdup of string literal
2016-08-08 14:48:44 -07:00
Junio C Hamano
940622bc8b Merge branch 'rs/use-strbuf-addstr'
* rs/use-strbuf-addstr:
  use strbuf_addstr() instead of strbuf_addf() with "%s"
  use strbuf_addstr() for adding constant strings to a strbuf
2016-08-08 14:48:41 -07:00
Junio C Hamano
78849622ec Merge branch 'jk/pack-objects-optim'
"git pack-objects" has a few options that tell it not to pack
objects found in certain packfiles, which require it to scan .idx
files of all available packs.  The codepaths involved in these
operations have been optimized for a common case of not having any
non-local pack and/or any .kept pack.

* jk/pack-objects-optim:
  pack-objects: compute local/ignore_pack_keep early
  pack-objects: break out of want_object loop early
  find_pack_entry: replace last_found_pack with MRU cache
  add generic most-recently-used list
  sha1_file: drop free_pack_by_name
  t/perf: add tests for many-pack scenarios
2016-08-08 14:48:39 -07:00
Junio C Hamano
768ededa9c Merge branch 'va/i18n'
More i18n marking.

* va/i18n:
  i18n: config: unfold error messages marked for translation
  i18n: notes: mark comment for translation
2016-08-08 14:48:38 -07:00
Junio C Hamano
0d3279962a Merge branch 'jk/reflog-date'
The reflog output format is documented better, and a new format
--date=unix to report the seconds-since-epoch (without timezone)
has been added.

* jk/reflog-date:
  date: clarify --date=raw description
  date: add "unix" format
  date: document and test "raw-local" mode
  doc/pretty-formats: explain shortening of %gd
  doc/pretty-formats: describe index/time formats for %gd
  doc/rev-list-options: explain "-g" output formats
  doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
2016-08-08 14:48:37 -07:00
Junio C Hamano
a220e2bbbf Merge branch 'pb/commit-editmsg-path' into maint
Code clean-up.

* pb/commit-editmsg-path:
  builtin/commit.c: memoize git-path for COMMIT_EDITMSG
2016-08-08 14:21:38 -07:00
Junio C Hamano
aa9136a87e Merge branch 'nd/pack-ofs-4gb-limit' into maint
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.

* nd/pack-ofs-4gb-limit:
  fsck: use streaming interface for large blobs in pack
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  index-pack: correct "offset" type in unpack_entry_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "len" type in unpack_data()
  sha1_file.c: use type off_t* for object_info->disk_sizep
  pack-objects: pass length to check_pack_crc() without truncation
2016-08-08 14:21:36 -07:00
Junio C Hamano
327b3f8459 Merge branch 'mh/blame-worktree' into maint
"git blame file" allowed the lineage of lines in the uncommitted,
unadded contents of "file" to be inspected, but it refused when
"file" did not appear in the current commit.  When "file" was
created by renaming an existing file (but the change has not been
committed), this restriction was unnecessarily tight.

* mh/blame-worktree:
  t/t8003-blame-corner-cases.sh: Use here documents
  blame: allow to blame paths freshly added to the index
2016-08-08 14:21:32 -07:00
Johannes Schindelin
189d035e67 git mv: do not keep slash in git mv dir non-existing-dir/
When calling `rename("dir", "non-existing-dir/")` on Linux, it silently
succeeds, stripping the trailing slash of the second argument.

This is all good and dandy but this behavior disagrees with the specs at

http://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html

that state clearly regarding the 2nd parameter (called `new`):

	If the `new` argument does not resolve to an existing directory
	entry for a file of type directory and the `new` argument
	contains at least one non- <slash> character and ends with one
	or more trailing <slash> characters after all symbolic links
	have been processed, `rename()` shall fail.

Of course, we would like `git mv dir non-existing-dir/` to succeed (and
rename the directory "dir" to "non-existing-dir"). Let's be extra
careful to remove the trailing slash in that case.

This lets t7001-mv.sh pass in Bash on Windows.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-08 10:43:20 -07:00
René Scharfe
1eb47f167d use strbuf_add_unique_abbrev() for adding short hashes
Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs
instead of taking detours through find_unique_abbrev() and its static
buffer.  This is shorter and a bit more efficient.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-06 10:33:57 -07:00
Jeff Hostetler
c4f596b98e status: support --porcelain[=<version>]
Update --porcelain argument to take optional version parameter
to allow multiple porcelain formats to be supported in the future.

The token "v1" is the default value and indicates the traditional
porcelain format.  (The token "1" is an alias for that.)

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-05 15:46:42 -07:00
Jeff Hostetler
be7e795efe status: cleanup API to wt_status_print
Refactor the API between builtin/commit.c and wt-status.[ch].

Hide the details of the various wt_*status_print() routines inside
wt-status.c behind a single (new) wt_status_print() routine.
Eliminate the switch statements from builtin/commit.c.
Allow details of new status formats to be isolated within wt-status.c

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-05 15:46:08 -07:00
Jeff Hostetler
957a0fe2e5 status: rename long-format print routines
Rename the various wt_status_print*() routines to be
wt_longstatus_print*() to make it clear that these
routines are only concerned with the normal/long
status output and reduce developer confusion as other
status formats are added in the future.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-05 15:45:47 -07:00
René Scharfe
02a8cfa478 merge: use string_list_split() in add_strategies()
Call string_list_split() for cutting a space separated list into pieces
instead of reimplementing it based on struct strategy.  The attr member
of struct strategy was not used split_merge_strategies(); it was a pure
string operation.  Also be nice and clean up once we're done splitting;
the old code didn't bother freeing any of the allocated memory.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-05 15:11:06 -07:00
René Scharfe
542aa25d97 use CHILD_PROCESS_INIT to initialize automatic variables
Initialize struct child_process variables already when they're defined.
That's shorter and saves a function call.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-05 15:10:05 -07:00
René Scharfe
bc57b9c0cc use strbuf_addstr() instead of strbuf_addf() with "%s"
Call strbuf_addstr() for adding a simple string to a strbuf instead of
using the heavier strbuf_addf().  This is shorter and documents the
intent more clearly.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-05 15:09:25 -07:00
Junio C Hamano
1e9a4856fb Merge branch 'sb/submodule-clone-retry'
An earlier tweak to make "submodule update" retry a failing clone
of submodules was buggy and caused segfault, which has been fixed.

* sb/submodule-clone-retry:
  submodule-helper: fix indexing in clone retry error reporting path
  git-submodule: forward exit code of git-submodule--helper more faithfully
2016-08-04 14:39:17 -07:00
Stefan Beller
4d7bc52b17 submodule update: allow '.' for branch value
Gerrit has a "superproject subscription" feature[1], that triggers a
commit in a superproject that is subscribed to its submodules.
Conceptually this Gerrit feature can be done on the client side with
Git via (except for raciness, error handling etc):

  while [ true ]; do
    git -C <superproject> submodule update --remote --force
    git -C <superproject> commit -a -m "Update submodules"
    git -C <superproject> push
  done

for each branch in the superproject. To ease the configuration in Gerrit
a special value of "." has been introduced for the submodule.<name>.branch
to mean the same branch as the superproject[2], such that you can create a
new branch on both superproject and the submodule and this feature
continues to work on that new branch.

Now we find projects in the wild with such a .gitmodules file.
The .gitmodules used in these Gerrit projects do not conform
to Gits understanding of how .gitmodules should look like.
This teaches Git to deal gracefully with this syntax as well.

The redefinition of "." does no harm to existing projects unaware of
this change, as "." is an invalid branch name in Git, so we do not
expect such projects to exist.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-03 16:13:22 -07:00
Stefan Beller
92bbe7ccf1 submodule--helper: add remote-branch helper
In a later patch we want to enhance the logic for the branch selection.
Rewrite the current logic to be in C, so we can directly use C when
we enhance the logic.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-03 16:11:35 -07:00
Junio C Hamano
a58a8e3f71 Merge branch 'jk/push-progress'
"git push" and "git clone" learned to give better progress meters
to the end user who is waiting on the terminal.

* jk/push-progress:
  receive-pack: send keepalives during quiet periods
  receive-pack: turn on connectivity progress
  receive-pack: relay connectivity errors to sideband
  receive-pack: turn on index-pack resolving progress
  index-pack: add flag for showing delta-resolution progress
  clone: use a real progress meter for connectivity check
  check_connected: add progress flag
  check_connected: relay errors to alternate descriptor
  check_everything_connected: use a struct with named options
  check_everything_connected: convert to argv_array
  rev-list: add optional progress reporting
  check_everything_connected: always pass --quiet to rev-list
2016-08-03 15:10:28 -07:00
Junio C Hamano
d083d420b7 Merge branch 'jk/parse-options-concat'
Users of the parse_options_concat() API function need to allocate
extra slots in advance and fill them with OPT_END() when they want
to decide the set of supported options dynamically, which makes the
code error-prone and hard to read.  This has been corrected by tweaking
the API to allocate and return a new copy of "struct option" array.

* jk/parse-options-concat:
  parse_options: allocate a new array when concatenating
2016-08-03 15:10:25 -07:00
Junio C Hamano
cf27c7996e Merge branch 'sb/push-options'
"git push" learned to accept and pass extra options to the
receiving end so that hooks can read and react to them.

* sb/push-options:
  add a test for push options
  push: accept push options
  receive-pack: implement advertising and receiving push options
  push options: {pre,post}-receive hook learns about push options
2016-08-03 15:10:24 -07:00
Junio C Hamano
a35031240b Merge branch 'os/no-verify-skips-commit-msg-too'
"git commit --help" said "--no-verify" is only about skipping the
pre-commit hook, and failed to say that it also skipped the
commit-msg hook.

* os/no-verify-skips-commit-msg-too:
  commit: describe that --no-verify skips the commit-msg hook in the help text
2016-08-03 15:10:22 -07:00
Eric Sunshine
aa59e14b23 blame: drop strdup of string literal
This strdup was added as part of 58dbfa2 (blame: accept
multiple -L ranges, 2013-08-06) to be consistent with
parse_opt_string_list(), which appends to the same list.

But as of 7a7a517 (parse_opt_string_list: stop allocating
new strings, 2016-06-13), we should stop using strdup (to
match parse_opt_string_list, and for all the reasons
described in that commit; namely that it does nothing useful
and causes us to leak the memory).

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-03 08:52:46 -07:00
Jeff King
4d9c7e6f45 am: reset cached ident date for each patch
When we compute the date to go in author/committer lines of
commits, or tagger lines of tags, we get the current date
once and then cache it for the rest of the program.  This is
a good thing in some cases, like "git commit", because it
means we do not racily assign different times to the
author/committer fields of a single commit object.

But as more programs start to make many commits in a single
process (e.g., the recently builtin "git am"), it means that
you'll get long strings of commits with identical committer
timestamps (whereas before, we invoked "git commit" many
times and got true timestamps).

This patch addresses it by letting callers reset the cached
time, which means they'll get a fresh time on their next
call to git_committer_info() or git_author_info(). The first
caller to do so is "git am", which resets the time for each
patch it applies.

It would be nice if we could just do this automatically
before filling in the ident fields of commit and tag
objects. Unfortunately, it's hard to know where a particular
logical operation begins and ends.

For instance, if commit_tree_extended() were to call
reset_ident_date() before getting the committer/author
ident, that doesn't quite work; sometimes the author info is
passed in to us as a parameter, and it may or may not have
come from a previous call to ident_default_date(). So in
those cases, we lose the property that the committer and the
author timestamp always match.

You could similarly put a date-reset at the end of
commit_tree_extended(). That actually works in the current
code base, but it's fragile. It makes the assumption that
after commit_tree_extended() finishes, the caller has no
other operations that would logically want to fall into the
same timestamp.

So instead we provide the tool to easily do the reset, and
let the high-level callers use it to annotate their own
logical operations.

There's no automated test, because it would be inherently
racy (it depends on whether the program takes multiple
seconds to run). But you can see the effect with something
like:

  # make a fake 100-patch series
  top=$(git rev-parse HEAD)
  bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
  git log --format=email --reverse --first-parent \
          --binary -m -p $bottom..$top >patch

  # now apply it; this presumably takes multiple seconds
  git checkout --detach $bottom
  git am <patch

  # now count the number of distinct committer times;
  # prior to this patch, there would only be one, but
  # now we'd typically see several.
  git log --format=%ct $bottom.. | sort -u

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 14:49:41 -07:00
Stefan Beller
2de26ae1dc submodule--helper: fix usage string for relative-path
Internally we call the underscore version of relative_path, but externally
we present an API with no underscores.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 14:41:53 -07:00
René Scharfe
02962d3684 use strbuf_addstr() for adding constant strings to a strbuf
Replace uses of strbuf_addf() for adding strings with more lightweight
strbuf_addstr() calls.

In http-push.c it becomes easier to see what's going on without having
to verfiy that the definition of PROPFIND_ALL_REQUEST doesn't contain
any format specifiers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 13:42:10 -07:00
Josh Triplett
6bc6b6c0dc format-patch: format.from gives the default for --from
This helps users who would prefer format-patch to default to --from,
and makes it easier to change the default in the future.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 13:13:02 -07:00
Johannes Schindelin
548009c0d5 merge_trees(): ensure that the callers release output buffer
The recursive merge machinery accumulates its output in an output
buffer, to be flushed at the end of merge_recursive(). At this point,
we forgot to release the output buffer.

When calling merge_trees() (i.e. the non-recursive part of the recursive
merge) directly, the output buffer is never flushed because the caller
may be merge_recursive() which wants to flush the output itself.

For the same reason, merge_trees() cannot release the output buffer: it
may still be needed.

Forgetting to release the output buffer did not matter much when running
git-checkout, or git-merge-recursive, because we exited after the
operation anyway. Ever since cherry-pick learned to pick a commit range,
however, this memory leak had the potential of becoming a problem.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 11:45:30 -07:00
Jeff King
56dfeb6263 pack-objects: compute local/ignore_pack_keep early
In want_object_in_pack(), we can exit early from our loop if
neither "local" nor "ignore_pack_keep" are set. If they are,
however, we must examine each pack to see if it has the
object and is non-local or has a ".keep".

It's quite common for there to be no non-local or .keep
packs at all, in which case we know ahead of time that
looking further will be pointless. We can pre-compute this
by simply iterating over the list of packs ahead of time,
and dropping the flags if there are no packs that could
match.

Another similar strategy would be to modify the loop in
want_object_in_pack() to notice that we have already found
the object once, and that we are looping only to check for
"local" and "keep" attributes. If a pack has neither of
those, we can skip the call to find_pack_entry_one(), which
is the expensive part of the loop.

This has two advantages:

  - it isn't all-or-nothing; we still get some improvement
    when there's a small number of kept or non-local packs,
    and a large number of non-kept local packs

  - it eliminates any possible race where we add new
    non-local or kept packs after our initial scan. In
    practice, I don't think this race matters; we already
    cache the packed_git information, so somebody who adds a
    new pack or .keep file after we've started will not be
    noticed at all, unless we happen to need to call
    reprepare_packed_git() because a lookup fails.

    In other words, we're already racy, and the race is not
    a big deal (losing the race means we might include an
    object in the pack that would not otherwise be, which is
    an acceptable outcome).

However, it also has a disadvantage: we still loop over the
rest of the packs for each object to check their flags. This
is much less expensive than doing the object lookup, but
still not free. So if we wanted to implement that strategy
to cover the non-all-or-nothing cases, we could do so in
addition to this one (so you get the most speedup in the
all-or-nothing case, and the best we can do in the other
cases). But given that the all-or-nothing case is likely the
most common, it is probably not worth the trouble, and we
can revisit this later if evidence points otherwise.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:08 -07:00
Jeff King
cd37996795 pack-objects: break out of want_object loop early
When pack-objects collects the list of objects to pack
(either from stdin, or via its internal rev-list), it
filters each one through want_object_in_pack().

This function loops through each existing packfile, looking
for the object. When we find it, we mark the pack/offset
combo for later use. However, we can't just return "yes, we
want it" at that point. If --honor-pack-keep is in effect,
we must keep looking to find it in _all_ packs, to make sure
none of them has a .keep. Likewise, if --local is in effect,
we must make sure it is not present in any non-local pack.

As a result, the sum effort of these calls is effectively
O(nr_objects * nr_packs). In an ordinary repository, we have
only a handful of packs, and this doesn't make a big
difference. But in pathological cases, it can slow the
counting phase to a crawl.

This patch notices the case that we have neither "--local"
nor "--honor-pack-keep" in effect and breaks out of the loop
early, after finding the first instance. Note that our worst
case is still "objects * packs" (i.e., we might find each
object in the last pack we look in), but in practice we will
often break out early. On an "average" repo, my git.git with
8 packs, this shows a modest 2% (a few dozen milliseconds)
improvement in the counting-objects phase of "git
pack-objects --all <foo" (hackily instrumented by sticking
exit(0) right after list_objects).

But in a much more pathological case, it makes a bigger
difference. I ran the same command on a real-world example
with ~9 million objects across 1300 packs. The counting time
dropped from 413s to 45s, an improvement of about 89%.

Note that this patch won't do anything by itself for a
normal "git gc", as it uses both --honor-pack-keep and
--local.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:07 -07:00
Junio C Hamano
c81d283675 Merge branch 'dk/blame-move-no-reason-for-1-line-context' into maint
"git blame -M" missed a single line that was moved within the file.

* dk/blame-move-no-reason-for-1-line-context:
  blame: require 0 context lines while finding moved lines with -M
2016-07-28 11:26:01 -07:00
Junio C Hamano
174f9e622f Merge branch 'js/am-call-theirs-theirs-in-fallback-3way' into maint
One part of "git am" had an oddball helper function that called
stuff from outside "his" as opposed to calling what we have "ours",
which was not gender-neutral and also inconsistent with the rest of
the system where outside stuff is usuall called "theirs" in
contrast to "ours".

* js/am-call-theirs-theirs-in-fallback-3way:
  am: counteract gender bias
2016-07-28 11:25:59 -07:00
Junio C Hamano
87be95b6f9 Merge branch 'ew/gc-auto-pack-limit-fix' into maint
"gc.autoPackLimit" when set to 1 should not trigger a repacking
when there is only one pack, but the code counted poorly and did
so.

* ew/gc-auto-pack-limit-fix:
  gc: fix off-by-one error with gc.autoPackLimit
2016-07-28 11:25:56 -07:00
Junio C Hamano
c12c71fabb Merge branch 'nd/ita-cleanup' into maint
Git does not know what the contents in the index should be for a
path added with "git add -N" yet, so "git grep --cached" should not
show hits (or show lack of hits, with -L) in such a path, but that
logic does not apply to "git grep", i.e. searching in the working
tree files.  But we did so by mistake, which has been corrected.

* nd/ita-cleanup:
  grep: fix grepping for "intent to add" files
  t7810-grep.sh: fix a whitespace inconsistency
  t7810-grep.sh: fix duplicated test name
2016-07-28 11:25:51 -07:00
Junio C Hamano
4966b58f3e Merge branch 'js/find-commit-subject-ignore-leading-blanks' into maint
A helper function that takes the contents of a commit object and
finds its subject line did not ignore leading blank lines, as is
commonly done by other codepaths.  Make it ignore leading blank
lines to match.

* js/find-commit-subject-ignore-leading-blanks:
  reset --hard: skip blank lines when reporting the commit subject
  sequencer: use skip_blank_lines() to find the commit subject
  commit -C: skip blank lines at the beginning of the message
  commit.c: make find_commit_subject() more robust
  pretty: make the skip_blank_lines() function public
2016-07-28 11:25:50 -07:00
Junio C Hamano
ad2d777604 Merge branch 'nd/pack-ofs-4gb-limit'
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.

* nd/pack-ofs-4gb-limit:
  fsck: use streaming interface for large blobs in pack
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  index-pack: correct "offset" type in unpack_entry_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "len" type in unpack_data()
  sha1_file.c: use type off_t* for object_info->disk_sizep
  pack-objects: pass length to check_pack_crc() without truncation
2016-07-28 10:34:42 -07:00
Junio C Hamano
2c608e0f7c Merge branch 'nd/worktree-lock'
"git worktree prune" protected worktrees that are marked as
"locked" by creating a file in a known location.  "git worktree"
command learned a dedicated command pair to create and remove such
a file, so that the users do not have to do this with editor.

* nd/worktree-lock:
  worktree.c: find_worktree() search by path suffix
  worktree: add "unlock" command
  worktree: add "lock" command
  worktree.c: add is_worktree_locked()
  worktree.c: add is_main_worktree()
  worktree.c: add find_worktree()
2016-07-28 10:34:42 -07:00
Vasco Almeida
996ee6d27a i18n: notes: mark comment for translation
Mark comment displayed when editing a note for translation.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-28 09:09:18 -07:00
Jeff King
642833db78 date: add "unix" format
We already have "--date=raw", which is a Unix epoch
timestamp plus a contextual timezone (either the author's or
the local). But one may not care about the timezone and just
want the epoch timestamp by itself. It's not hard to parse
the two apart, but if you are using a pretty-print format,
you may want git to show the "finished" form that the user
will see.

We can accomodate this by adding a new date format, "unix",
which is basically "raw" without the timezone.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-27 14:15:51 -07:00
Orgad Shaneh
def480fe99 commit: describe that --no-verify skips the commit-msg hook in the help text
This brings the short help in line with the documentation.

Signed-off-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-26 13:44:55 -07:00
Johannes Schindelin
3f338f43b0 am -3: use merge_recursive() directly again
Last October, we had to change this code to run `git merge-recursive`
in a child process: git-am wants to print some helpful advice when the
merge failed, but the code in question was not prepared to return, it
die()d instead.

We are finally at a point when the code *is* prepared to return errors,
and can avoid the child process again.

This reverts commit c63d4b2 (am -3: do not let failed merge from
completing the error codepath, 2015-10-09), with the necessary changes
to adjust for the fact that Git's source code changed in the meantime
(such as: using OIDs instead of hashes in the recursive merge, and a
removed gender bias).

Note: the code now calls merge_recursive_generic() again. Unlike
merge_trees() and merge_recursive(), this function returns 0 upon success,
as most of Git's functions. Therefore, the error value -1 naturally is
handled correctly, and we do not have to take care of it specifically.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-26 11:13:44 -07:00
Johannes Schindelin
f241ff0d0a prepare the builtins for a libified merge_recursive()
Previously, callers of merge_trees() or merge_recursive() expected that
code to die() with an error message. This used to be okay because we
called those commands from scripts, and had a chance to print out a
message in case the command failed fatally (read: with exit code 128).

As scripting incurs its own set of problems (portability, speed,
idiosyncrasies of different shells, limited data structures leading to
inefficient code), we are converting more and more of these scripts into
builtins, using library functions directly.

We already tried to use merge_recursive() directly in the builtin
git-am, for example. Unfortunately, we had to roll it back temporarily
because some of the code in merge-recursive.c still deemed it okay to
call die(), when the builtin am code really wanted to print out a useful
advice after the merge failed fatally. In the next commits, we want to
fix that.

The code touched by this commit expected merge_trees() to die() with
some useful message when there is an error condition, but merge_trees()
is going to be improved by converting all die() calls to return error()
instead (i.e. return value -1 after printing out the message as before),
so that the caller can react more flexibly.

This is a step to prepare for the version of merge_trees() that no
longer dies,  even if we just imitate the previous behavior by calling
exit(128): this is what callers of e.g. `git merge` have come to expect.

Note that the callers of the sequencer (revert and cherry-pick) already
fail fast even for the return value -1; The only difference is that they
now get a chance to say "<command> failed".

A caller of merge_trees() might want handle error messages themselves
(or even suppress them). As this patch is already complex enough, we
leave that change for a later patch.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-26 11:13:44 -07:00
Johannes Schindelin
ef1177d18e die("bug"): report bugs consistently
The vast majority of error messages in Git's source code which report a
bug use the convention to prefix the message with "BUG:".

As part of cleaning up merge-recursive to stop die()ing except in case of
detected bugs, let's just make the remainder of the bug reports consistent
with the de facto rule.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-26 11:13:44 -07:00
Junio C Hamano
37e9c7f5e1 Merge branch 'mh/blame-worktree'
"git blame file" allowed the lineage of lines in the uncommitted,
unadded contents of "file" to be inspected, but it refused when
"file" did not appear in the current commit.  When "file" was
created by renaming an existing file (but the change has not been
committed), this restriction was unnecessarily tight.

* mh/blame-worktree:
  t/t8003-blame-corner-cases.sh: Use here documents
  blame: allow to blame paths freshly added to the index
2016-07-25 14:13:45 -07:00
Junio C Hamano
9db3979784 Merge branch 'js/fsck-name-object'
When "git fsck" reports a broken link (e.g. a tree object contains
a blob that does not exist), both containing object and the object
that is referred to were reported with their 40-hex object names.
The command learned the "--name-objects" option to show the path to
the containing object from existing refs (e.g. "HEAD~24^2:file.txt").

* js/fsck-name-object:
  fsck: optionally show more helpful info for broken links
  fsck: give the error function a chance to see the fsck_options
  fsck_walk(): optionally name objects on the go
  fsck: refactor how to describe objects
2016-07-25 14:13:44 -07:00
Junio C Hamano
03f25e85d9 Merge branch 'rs/rm-strbuf-optim'
The use of strbuf in "git rm" to build filename to remove was a bit
suboptimal, which has been fixed.

* rs/rm-strbuf-optim:
  rm: reuse strbuf for all remove_dir_recursively() calls
2016-07-25 14:13:36 -07:00
Junio C Hamano
87492cb24d Merge branch 'mh/ref-iterators'
The API to iterate over all the refs (i.e. for_each_ref(), etc.)
has been revamped.

* mh/ref-iterators:
  for_each_reflog(): reimplement using iterators
  dir_iterator: new API for iterating over a directory tree
  for_each_reflog(): don't abort for bad references
  do_for_each_ref(): reimplement using reference iteration
  refs: introduce an iterator interface
  ref_resolves_to_object(): new function
  entry_resolves_to_object(): rename function from ref_resolves_to_object()
  get_ref_cache(): only create an instance if there is a submodule
  remote rm: handle symbolic refs correctly
  delete_refs(): add a flags argument
  refs: use name "prefix" consistently
  do_for_each_ref(): move docstring to the header file
  refs: remove unnecessary "extern" keywords
2016-07-25 14:13:33 -07:00
Junio C Hamano
6b34ce90a7 Merge branch 'mh/split-under-lock'
Further preparatory work on the refs API before the pluggable
backend series can land.

* mh/split-under-lock: (33 commits)
  lock_ref_sha1_basic(): only handle REF_NODEREF mode
  commit_ref_update(): remove the flags parameter
  lock_ref_for_update(): don't resolve symrefs
  lock_ref_for_update(): don't re-read non-symbolic references
  refs: resolve symbolic refs first
  ref_transaction_update(): check refname_is_safe() at a minimum
  unlock_ref(): move definition higher in the file
  lock_ref_for_update(): new function
  add_update(): initialize the whole ref_update
  verify_refname_available(): adjust constness in declaration
  refs: don't dereference on rename
  refs: allow log-only updates
  delete_branches(): use resolve_refdup()
  ref_transaction_commit(): correctly report close_ref() failure
  ref_transaction_create(): disallow recursive pruning
  refs: make error messages more consistent
  lock_ref_sha1_basic(): remove unneeded local variable
  read_raw_ref(): move docstring to header file
  read_raw_ref(): improve docstring
  read_raw_ref(): rename symref argument to referent
  ...
2016-07-25 14:13:32 -07:00
Johannes Sixt
eb09121b74 submodule-helper: fix indexing in clone retry error reporting path
'git submodule--helper update-clone' has logic to retry failed clones
a second time. For this purpose, there is a list of submodules to clone,
and a second list that is filled with the submodules to retry. Within
these lists, the submodules are identified by an index as if both lists
were just appended.

This works nicely except when the second clone attempt fails as well. To
report an error, the identifying index must be adjusted by an offset so
that it can be used as an index into the second list. However, the
calculation uses the logical total length of the lists so that the result
always points one past the end of the second list.

Pick the correct index.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Acked-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-22 13:43:53 -07:00
Jeff King
83558686ce receive-pack: send keepalives during quiet periods
After a client has sent us the complete pack, we may spend
some time processing the data and running hooks. If the
client asked us to be quiet, receive-pack won't send any
progress data during the index-pack or connectivity-check
steps. And hooks may or may not produce their own progress
output. In these cases, the network connection is totally
silent from both ends.

Git itself doesn't care about this (it will wait forever),
but other parts of the system (e.g., firewalls,
load-balancers, etc) might hang up the connection. So we'd
like to send some sort of keepalive to let the network and
the client side know that we're still alive and processing.

We can use the same trick we did in 05e9515 (upload-pack:
send keepalive packets during pack computation, 2013-09-08).
Namely, we will send an empty sideband data packet every `N`
seconds that we do not relay any stderr data over the
sideband channel. As with 05e9515, this means that we won't
bother sending keepalives when there's actual progress data,
but will kick in when it has been disabled (or if there is a
lull in the progress data).

The concept is simple, but the details are subtle enough
that they need discussing here.

Before the client sends us the pack, we don't want to do any
keepalives. We'll have sent our ref advertisement, and we're
waiting for them to send us the pack (and tell us that they
support sidebands at all).

While we're receiving the pack from the client (or waiting
for it to start), there's no need for keepalives; it's up to
them to keep the connection active by sending data.
Moreover, it would be wrong for us to do so. When we are the
server in the smart-http protocol, we must treat our
connection as half-duplex. So any keepalives we send while
receiving the pack would potentially be buffered by the
webserver. Not only does this make them useless (since they
would not be delivered in a timely manner), but it could
actually cause a deadlock if we fill up the buffer with
keepalives. (It wouldn't be wrong to send keepalives in this
phase for a full-duplex connection like ssh; it's simply
pointless, as it is the client's responsibility to speak).

As soon as we've gotten all of the pack data, then the
client is waiting for us to speak, and we should start
keepalives immediately. From here until the end of the
connection, we send one any time we are not otherwise
sending data.

But there's a catch. Receive-pack doesn't know the moment
we've gotten all the data. It passes the descriptor to
index-pack, who reads all of the data, and then starts
resolving the deltas. We have to communicate that back.

To make this work, we instruct the sideband muxer to enable
keepalives in three phases:

  1. In the beginning, not at all.

  2. While reading from index-pack, wait for a signal
     indicating end-of-input, and then start them.

  3. Afterwards, always.

The signal from index-pack in phase 2 has to come over the
stderr channel which the muxer is reading. We can't use an
extra pipe because the portable run-command interface only
gives us stderr and stdout.

Stdout is already used to pass the .keep filename back to
receive-pack. We could also send a signal there, but then we
would find out about it in the main thread. And the
keepalive needs to be done by the async muxer thread (since
it's the one writing sideband data back to the client). And
we can't reliably signal the async thread from the main
thread, because the async code sometimes uses threads and
sometimes uses forked processes.

Therefore the signal must come over the stderr channel,
where it may be interspersed with other random
human-readable messages from index-pack. This patch makes
the signal a single NUL byte.  This is easy to parse, should
not appear in any normal stderr output, and we don't have to
worry about any timing issues (like seeing half the signal
bytes in one read(), and half in a subsequent one).

This is a bit ugly, but it's simple to code and should work
reliably.

Another option would be to stop using an async thread for
muxing entirely, and just poll() both stderr and stdout of
index-pack from the main thread. This would work for
index-pack (because we aren't doing anything useful in the
main thread while it runs anyway). But it would make the
connectivity check and the hook muxers much more
complicated, as they need to simultaneously feed the
sub-programs while reading their stderr.

The index-pack phase is the only one that needs this
signaling, so it could simply behave differently than the
other two. That would mean having two separate
implementations of copy_to_sideband (and the keepalive
code), though. And it still doesn't get rid of the
signaling; it just means we can write a nicer message like
"END_OF_INPUT" or something on stdout, since we don't have
to worry about separating it from the stderr cruft.

One final note: this signaling trick is only done with
index-pack, not with unpack-objects. There's no point in
doing it for the latter, because by definition it only kicks
in for a small number of objects, where keepalives are not
as useful (and this conveniently lets us avoid duplicating
the implementation).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:11 -07:00
Jeff King
6b4cd2f827 receive-pack: turn on connectivity progress
When we receive a large push, the server side of the
connection may spend a lot of time (30s or more for a full
push of linux.git) walking the object graph without
producing any output. Let's give the user some indication
that we're actually working.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:11 -07:00
Jeff King
d415092ac4 receive-pack: relay connectivity errors to sideband
If the connectivity check encounters a problem when
receiving a push, the error output goes to receive-pack's
stderr, whose destination depends on the protocol used
(ssh tends to send it to the user, though without a "remote"
prefix; http will generally eat it in the server's error
log).

The information should consistently go back to the user, as
there is a reasonable chance their client is buggy and
generating a bad pack.

We can do so by muxing it over the sideband as we do with
other sub-process stderr.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:10 -07:00
Jeff King
d06303bb9a receive-pack: turn on index-pack resolving progress
When we receive a large push, the server side may have to
spend a lot of CPU processing the incoming packfile.

During the "receiving" phase, we are typically network
bound, and the client is writing its own progress to the
user. But during the delta resolution phase, we may spend
minutes (e.g., for a full push of linux.git) without
making any indication to the user that the connection has
not hung.

Let's ask index-pack to produce progress output for this
phase (unless the client asked us to be quiet, of course).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:10 -07:00
Jeff King
e376f17fd1 index-pack: add flag for showing delta-resolution progress
The index-pack command has two progress meters: one for
"receiving objects", and one for "resolving deltas". You get
neither by default, or both with "-v".

But for a push through receive-pack, we would want only the
"resolving deltas" phase, _not_ the "receiving objects"
progress. There are two reasons for this.

One is simply that existing clients are already printing
"writing objects" progress at the same time.  Arguably
"receiving" from the far end is more useful, because it
tells you what has actually gotten there, as opposed to what
might be stuck in a buffer somewhere between the client and
server. But that would require a protocol extension to tell
clients not to print their progress. Possible, but
complexity for little gain.

The second reason is much more important. In a full-duplex
connection like git-over-ssh, we can print progress while
the pack is incoming, and it will immediately get to the
client. But for a half-duplex connection like git-over-http,
we should not say anything until we have received the full
request.  Anything we write is subject to being stuck in a
buffer by the webserver.  Worse, we can end up in a deadlock
if that buffer fills up.

So our best bet is to avoid writing anything that isn't a
small fixed size until we've received the full pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:10 -07:00
Jeff King
38e590ea12 clone: use a real progress meter for connectivity check
Because the initial connectivity check for a cloned
repository can be slow, 0781aa4 (clone: let the user know
when check_everything_connected is run, 2013-05-03) added a
"fake" progress meter; we simply say "Checking connectivity"
when it starts, and "done" at the end, with nothing between.

Since check_connected() now knows how to do a real progress
meter, we can drop our fake one and use that one instead.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:09 -07:00
Jeff King
7043c7071c check_everything_connected: use a struct with named options
The number of variants of check_everything_connected has
grown over the years, so that the "real" function takes
several possibly-zero, possibly-NULL arguments. We hid the
complexity behind some wrapper functions, but this doesn't
scale well when we want to add new options.

If we add more wrapper variants to handle the new options,
then we can get a combinatorial explosion when those options
might be used together (right now nobody wants to use both
"shallow" and "transport" together, so we get by with just a
few wrappers).

If instead we add new parameters to each function, each of
which can have a default value, then callers who want the
defaults end up with confusing invocations like:

  check_everything_connected(fn, 0, data, -1, 0, NULL);

where it is unclear which parameter is which (and every
caller needs updated when we add new options).

Instead, let's add a struct to hold all of the optional
parameters. This is a little more verbose for the callers
(who have to declare the struct and fill it in), but it
makes their code much easier to follow, because every option
is named as it is set (and unused options do not have to be
mentioned at all).

Note that we could also stick the iteration function and its
callback data into the option struct, too. But since those
are required for each call, by avoiding doing so, we can let
very simple callers just pass "NULL" for the options and not
worry about the struct at all.

While we're touching each site, let's also rename the
function to check_connected(). The existing name was quite
long, and not all of the wrappers even used the full name.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:10:53 -07:00
Jeff King
434ea3cdad rev-list: add optional progress reporting
It's easy to ask rev-list to do a traversal that may takes
many seconds (e.g., by calling "--objects --all"). In theory
you can monitor its progress by the output you get to
stdout, but this isn't always easy.

Some operations, like "--count", don't make any output until
the end.

And some callers, like check_everything_connected(), are
using it just for the error-checking of the traversal, and
throw away stdout entirely.

This patch adds a "--progress" option which can be used to
give some eye-candy for a user waiting for a long traversal.
This is just a rev-list option and not a regular traversal
option, because it needs cooperation from the callbacks in
builtin/rev-list.c to do the actual count.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:10:44 -07:00
Junio C Hamano
3d55eea805 Merge branch 'js/am-call-theirs-theirs-in-fallback-3way'
One part of "git am" had an oddball helper function that called
stuff from outside "his" as opposed to calling what we have "ours",
which was not gender-neutral and also inconsistent with the rest of
the system where outside stuff is usuall called "theirs" in
contrast to "ours".

* js/am-call-theirs-theirs-in-fallback-3way:
  am: counteract gender bias
2016-07-19 13:22:23 -07:00
Junio C Hamano
2b6456b808 Merge branch 'jk/write-file'
General code clean-up around a helper function to write a
single-liner to a file.

* jk/write-file:
  branch: use write_file_buf instead of write_file
  use write_file_buf where applicable
  write_file: add format attribute
  write_file: add pointer+len variant
  write_file: use xopen
  write_file: drop "gently" form
  branch: use non-gentle write_file for branch description
  am: ignore return value of write_file()
  config: fix bogus fd check when setting up default config
2016-07-19 13:22:23 -07:00