Commit Graph

8404 Commits

Author SHA1 Message Date
Jeff King
55cb10f9b5 rev-list: make --count work with --objects
The current behavior from "rev-list --count --objects" is nonsensical:
we enumerate all of the objects except commits, but then give a count of
commits. This wasn't planned, and is just what the code happens to do.

Instead, let's give the answer the user almost certainly wanted: the
full count of objects.

Note that there are more complicated cases around cherry-marking, etc.
We'll punt on those for now, but let the user know that we can't produce
an answer (rather than giving them something useless).

We'll test both the new feature as well as a vanilla --count of commits,
since that surprisingly doesn't seem to be covered in the existing
tests.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14 10:46:22 -08:00
Jeff King
792f811998 rev-list: factor out bitmap-optimized routines
There are a few operations in rev-list that are optimized for bitmaps.
Rather than having the code inline in cmd_rev_list(), let's move them
into helpers. This not only makes the flow of the main function simpler,
but it lets us replace the complex "can we do the optimization?"
conditionals with a series of early returns from the functions. That
also makes it easy to add comments explaining those conditions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14 10:46:22 -08:00
Jeff King
d90fe06ea7 pack-bitmap: refuse to do a bitmap traversal with pathspecs
rev-list has refused to use bitmaps with pathspec limiting since
c8a70d3509 (rev-list: disable --use-bitmap-index when pruning commits,
2015-07-01). But this is true not just for rev-list, but for anyone who
calls prepare_bitmap_walk(); the code isn't equipped to handle this
case.  We never noticed because the only other callers would never pass
a pathspec limiter.

But let's push the check down into prepare_bitmap_walk() anyway. That's
a more logical place for it to live, as callers shouldn't need to know
the details (and must be prepared to fall back to a regular traversal
anyway, since there might not be bitmaps in the repository).

It would also prepare us for a day where this case _is_ handled, but
that's pretty unlikely. E.g., we could use bitmaps to generate the set
of commits, and then diff each commit to see if it matches the pathspec.
That would be slightly faster than a naive traversal that actually walks
the commits. But you'd probably do better still to make use of the newer
commit-graph feature to make walking the commits very cheap.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-14 10:46:22 -08:00
Jeff King
e03f928e2a rev-list: fallback to non-bitmap traversal when filtering
The "--use-bitmap-index" option is usually aspirational: if we have
bitmaps and the request can be fulfilled more quickly using them we'll
do so, but otherwise fall back to a non-bitmap traversal.

The exception is object filtering, which explicitly dies if the two
options are combined. Let's convert this to the usual fallback behavior.
This is a minor convenience for now (since the caller can easily know
that --filter and --use-bitmap-index don't combine), but will become
much more useful as we start to support _some_ filters with bitmaps, but
not others.

The test infrastructure here is bigger than necessary for checking this
one small feature. But it will serve as the basis for more filtering
bitmap tests in future patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-13 09:08:58 -08:00
Junio C Hamano
44cba9c4b3 Merge branch 'jc/skip-prefix'
Code simplification.

* jc/skip-prefix:
  C: use skip_prefix() to avoid hardcoded string length
2020-02-12 12:41:37 -08:00
Junio C Hamano
556ccd4dd2 Merge branch 'pb/do-not-recurse-grep-no-index'
"git grep --no-index" should not get affected by the contents of
the .gitmodules file but when "--recurse-submodules" is given or
the "submodule.recurse" variable is set, it did.  Now these
settings are ignored in the "--no-index" mode.

* pb/do-not-recurse-grep-no-index:
  grep: ignore --recurse-submodules if --no-index is given
2020-02-12 12:41:36 -08:00
Junio C Hamano
f08132f889 rebase: --fork-point regression fix
"git rebase --fork-point master" used to work OK, as it internally
called "git merge-base --fork-point" that knew how to handle short
refname and dwim it to the full refname before calling the
underlying get_fork_point() function.

This is no longer true after the command was rewritten in C, as its
internall call made directly to get_fork_point() does not dwim a
short ref.

Move the "dwim the refname argument to the full refname" logic that
is used in "git merge-base" to the underlying get_fork_point()
function, so that the other caller of the function in the
implementation of "git rebase" behaves the same way to fix this
regression.

Signed-off-by: Alex Torok <alext9@gmail.com>
[jc: revamped the fix and used Alex's tests]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11 09:59:39 -08:00
Derrick Stolee
ef07659926 sparse-checkout: work with Windows paths
When using Windows, a user may run 'git sparse-checkout set A\B\C'
to add the Unix-style path A/B/C to their sparse-checkout patterns.
Normalizing the input path converts the backslashes to slashes before we
add the string 'A/B/C' to the recursive hashset.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11 09:06:47 -08:00
Derrick Stolee
2631dc879d sparse-checkout: create 'add' subcommand
When using the sparse-checkout feature, a user may want to incrementally
grow their sparse-checkout pattern set. Allow adding patterns using a
new 'add' subcommand. This is not much different from the 'set'
subcommand, because we still want to allow the '--stdin' option and
interpret inputs as directories when in cone mode and patterns
otherwise.

When in cone mode, we are growing the cone. This may actually reduce the
set of patterns when adding directory A when A/B is already a directory
in the cone. Test the different cases: siblings, parents, ancestors.

When not in cone mode, we can only assume the patterns should be
appended to the sparse-checkout file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11 09:06:46 -08:00
Derrick Stolee
4bf0c06c71 sparse-checkout: extract pattern update from 'set' subcommand
In anticipation of adding "add" and "remove" subcommands to the
sparse-checkout builtin, extract a modify_pattern_list() method from the
sparse_checkout_set() method. This command will read input from the
command-line or stdin to construct a set of patterns, then modify the
existing sparse-checkout patterns after a successful update of the
working directory.

Currently, the only way to modify the patterns is to replace all of the
patterns. This will be extended in a later update.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11 08:47:13 -08:00
Derrick Stolee
6fb705abcb sparse-checkout: extract add_patterns_from_input()
In anticipation of extending the sparse-checkout builtin with "add"
and "remove" subcommands, extract the code that fills a pattern list
based on the input values. The input changes depending on the
presence of "--stdin" or the value of core.sparseCheckoutCone.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-11 08:47:11 -08:00
Bert Wesarg
b3fd6cbf29 remote rename/remove: gently handle remote.pushDefault config
When renaming a remote with

    git remote rename X Y
    git remote remove X

Git already renames or removes any branch.<name>.remote and
branch.<name>.pushRemote configurations if their value is X.

However remote.pushDefault needs a more gentle approach, as this may be
set in a non-repo configuration file. In such a case only a warning is
printed, such as:

warning: The global configuration remote.pushDefault in:
	$HOME/.gitconfig:35
now names the non-existent remote origin

It is changed to remote.pushDefault = Y or removed when set in a repo
configuration though.

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:52:10 -08:00
Bert Wesarg
923d4a5ca4 remote rename/remove: handle branch.<name>.pushRemote config values
When renaming or removing a remote with

    git remote rename X Y
    git remote remove X

Git already renames/removes any config values from

    branch.<name>.remote = X

to

    branch.<name>.remote = Y

As branch.<name>.pushRemote also names a remote, it now also renames
or removes these config values from

    branch.<name>.pushRemote = X

to

    branch.<name>.pushRemote = Y

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:52:10 -08:00
Bert Wesarg
ceff1a1308 remote: clean-up config callback
Some minor clean-ups in function `config_read_branches`:

 * remove hardcoded length in `key += 7`
 * call `xmemdupz` only once
 * use a switch to handle the configuration type and add a `BUG()`

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:52:10 -08:00
Bert Wesarg
1a83068c26 remote: clean-up by returning early to avoid one indentation
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:52:10 -08:00
Bert Wesarg
88f8576eda pull --rebase/remote rename: document and honor single-letter abbreviations rebase types
When 46af44b07d (pull --rebase=<type>: allow single-letter abbreviations
for the type, 2018-08-04) landed in Git, it had the side effect that
not only 'pull --rebase=<type>' accepted the single-letter abbreviations
but also the 'pull.rebase' and 'branch.<name>.rebase' configurations.

However, 'git remote rename' did not honor these single-letter
abbreviations when reading the 'branch.*.rebase' configurations.

We now document the single-letter abbreviations and both code places
share a common function to parse the values of 'git pull --rebase=*',
'pull.rebase', and 'branches.*.rebase'.

The only functional change is the handling of the `branch_info::rebase`
value. Before it was an unsigned enum, thus the truth value could be
checked with `branch_info::rebase != 0`. But `enum rebase_type` is
signed, thus the truth value must now be checked with
`branch_info::rebase >= REBASE_TRUE`

Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:52:10 -08:00
Matthew Rogers
145d59f482 config: add '--show-scope' to print the scope of a config value
When a user queries config values with --show-origin, often it's
difficult to determine what the actual "scope" (local, global, etc.) of
a given value is based on just the origin file.

Teach 'git config' the '--show-scope' option to print the scope of all
displayed config values.  Note that we should never see anything of
"submodule" scope as that is only ever used by submodule-config.c when
parsing the '.gitmodules' file.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:49:12 -08:00
Matthew Rogers
e37efa40e1 config: teach git_config_source to remember its scope
There are many situations where the scope of a config command is known
beforehand, such as passing of '--local', '--file', etc. to an
invocation of git config.  However, this information is lost when moving
from builtin/config.c to /config.c.  This historically hasn't been a big
deal, but to prepare for the upcoming --show-scope option we teach
git_config_source to keep track of the source and the config machinery
to use that information to set current_parsing_scope appropriately.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 10:49:10 -08:00
René Scharfe
a91cc7fad0 strbuf: add and use strbuf_insertstr()
Add a function for inserting a C string into a strbuf.  Use it
throughout the source to get rid of magic string length constants and
explicit strlen() calls.

Like strbuf_addstr(), implement it as an inline function to avoid the
implicit strlen() calls to cause runtime overhead.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-10 09:04:45 -08:00
Heba Waly
887a0fd573 add: change advice config variables used by the add API
advice.addNothing config variable is used to control the visibility of
two advice messages in the add library. This config variable is
replaced by two new variables, whose names are more clear and relevant
to the two cases.

Also add the two new variables to the documentation.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-06 11:08:00 -08:00
Junio C Hamano
d0e70cd32e Merge branch 'am/checkout-file-and-ref-ref-ambiguity'
"git checkout X" did not correctly fail when X is not a local
branch but could name more than one remote-tracking branches
(i.e. to be dwimmed as the starting point to create a corresponding
local branch), which has been corrected.

* am/checkout-file-and-ref-ref-ambiguity:
  checkout: don't revert file on ambiguous tracking branches
  parse_branchname_arg(): extract part as new function
2020-02-05 14:34:58 -08:00
Junio C Hamano
9a5315edfd Merge branch 'js/patch-mode-in-others-in-c'
The effort to move "git-add--interactive" to C continues.

* js/patch-mode-in-others-in-c:
  commit --interactive: make it work with the built-in `add -i`
  built-in add -p: implement the "worktree" patch modes
  built-in add -p: implement the "checkout" patch modes
  built-in stash: use the built-in `git add -p` if so configured
  legacy stash -p: respect the add.interactive.usebuiltin setting
  built-in add -p: implement the "stash" and "reset" patch modes
  built-in add -p: prepare for patch modes other than "stage"
2020-02-05 14:34:58 -08:00
René Scharfe
079f970971 name-rev: sort tip names before applying
name_ref() is called for each ref and checks if its a better name for
the referenced commit.  If that's the case it remembers it and checks if
a name based on it is better for its ancestors as well.  This in done in
the the order for_each_ref() imposes on us.

That might not be optimal.  If bad names happen to be encountered first
(as defined by is_better_name()), names derived from them may spread to
a lot of commits, only to be replaced by better names later.  Setting
better names first can avoid that.

is_better_name() prefers tags, short distances and old references.  The
distance is a measure that we need to calculate for each candidate
commit, but the other two properties are not dependent on the
relationships of commits.  Sorting the refs by them should yield better
performance than the essentially random order we currently use.

And applying older references first should also help to reduce rework
due to the fact that older commits have less ancestors than newer ones.

So add all details of names to the tip table first, then sort them
to prefer tags and older references and then apply them in this order.
Here's the performance as measures by hyperfine for the Linux repo
before:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     851.1 ms ±   4.5 ms    [User: 806.7 ms, System: 44.4 ms]
  Range (min … max):   845.9 ms … 859.5 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     736.2 ms ±   8.7 ms    [User: 688.4 ms, System: 47.5 ms]
  Range (min … max):   726.0 ms … 755.2 ms    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:36:33 -08:00
René Scharfe
2d53975488 name-rev: release unused name strings
name_rev() assigns a name to a commit and its parents and grandparents
and so on.  Commits share their name string with their first parent,
which in turn does the same, recursively to the root.  That saves a lot
of allocations.  When a better name is found, the old name is replaced,
but its memory is not released.  That leakage can become significant.

Can we release these old strings exactly once even though they are
referenced multiple times?  Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.

Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0.  These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.

That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.

If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.

We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.

For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly.  Here's the output of GNU
time before:

0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps

... and with this patch:

1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps

It also gets faster; hyperfine before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.534 s ±  0.006 s    [User: 1.039 s, System: 0.494 s]
  Range (min … max):    1.522 s …  1.542 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.338 s ±  0.006 s    [User: 1.047 s, System: 0.291 s]
  Range (min … max):    1.327 s …  1.346 s    10 runs

For the Linux repo it doesn't pay off; memory usage only gets down from:

0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps

... to:

0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps

The runtime actually increases slightly from:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     828.8 ms ±   5.0 ms    [User: 797.2 ms, System: 31.6 ms]
  Range (min … max):   824.1 ms … 838.9 ms    10 runs

... to:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     847.6 ms ±   3.4 ms    [User: 807.9 ms, System: 39.6 ms]
  Range (min … max):   843.4 ms … 854.3 ms    10 runs

Why is that?  In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca.  1000x longer in the latter.

Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
977dc1912b name-rev: generate name strings only if they are better
Leave setting the tip_name member of struct rev_name to callers of
create_or_update_name().  This avoids allocations for names that are
rejected by that function.  Here's how this affects the runtime when
working with a fresh clone of Git's own repository; performance numbers
by hyperfine before:

Benchmark #1: ./git -C ../git-pristine/ name-rev --all
  Time (mean ± σ):     437.8 ms ±   4.0 ms    [User: 422.5 ms, System: 15.2 ms]
  Range (min … max):   432.8 ms … 446.3 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../git-pristine/ name-rev --all
  Time (mean ± σ):     408.5 ms ±   1.4 ms    [User: 387.2 ms, System: 21.2 ms]
  Range (min … max):   407.1 ms … 411.7 ms    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
1c56fc2084 name-rev: pre-size buffer in get_parent_name()
We can calculate the size of new name easily and precisely. Open-code
the xstrfmt() calls and grow the buffers as needed before filling them.
This provides a surprisingly large benefit when working with the
Chromium repository; here are the numbers measured using hyperfine
before:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      5.822 s ±  0.013 s    [User: 5.304 s, System: 0.516 s]
  Range (min … max):    5.803 s …  5.837 s    10 runs

... and with this patch:

Benchmark #1: ./git -C ../chromium/src name-rev --all
  Time (mean ± σ):      1.527 s ±  0.003 s    [User: 1.015 s, System: 0.511 s]
  Range (min … max):    1.524 s …  1.535 s    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
ddc42ec786 name-rev: factor out get_parent_name()
Reduce nesting by moving code to come up with a name for the parent into
its own function.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
f13ca7cef5 name-rev: put struct rev_name into commit slab
The commit slab commit_rev_name contains a pointer to a struct rev_name,
and the actual struct is allocated separatly.  Avoid that allocation and
pointer indirection by storing the full struct in the commit slab.  Use
the tip_name member pointer to determine if the returned struct is
initialized.

Performance in the Linux repository measured with hyperfine before:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     953.5 ms ±   6.3 ms    [User: 901.2 ms, System: 52.1 ms]
  Range (min … max):   945.2 ms … 968.5 ms    10 runs

... and with this patch:

Benchmark #1: ./git -C ../linux/ name-rev --all
  Time (mean ± σ):     851.0 ms ±   3.1 ms    [User: 807.4 ms, System: 43.6 ms]
  Range (min … max):   846.7 ms … 857.0 ms    10 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
d689d6d82f name-rev: don't _peek() in create_or_update_name()
Look up the commit slab slot for the commit once using
commit_rev_name_at() and populate it in case it is empty, instead of
checking for emptiness in a separate step using commit_rev_name_peek()
via get_commit_rev_name().

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
15a4205d96 name-rev: don't leak path copy in name_ref()
name_ref() duplicates the path string and passes it to name_rev(), which
either puts it into a commit slab or ignores it if there is already a
better name, leaking it.  Move the duplication to name_rev() and release
the copy in the latter case.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
36d2419c9a name-rev: respect const qualifier
Keep the const qualifier of the first parameter of get_rev_name() even
when casting the object pointer to a commit pointer, and further for the
parameter of get_commit_rev_name(), as all these uses are read-only.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
René Scharfe
71620ca86c name-rev: remove unused typedef
The type alias became unused with bf43abc6e6 (name-rev: use sizeof(*ptr)
instead of sizeof(type) in allocation, 2019-11-12); remove it.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:24:15 -08:00
Martin Ågren
3e2feb0d64 name-rev: rewrite create_or_update_name()
This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.

Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-05 10:23:42 -08:00
Jeff King
a21781011f index-pack: downgrade twice-resolved REF_DELTA to die()
When we're resolving a REF_DELTA, we compare-and-swap its type from
REF_DELTA to whatever real type the base object has, as discussed in
ab791dd138 (index-pack: fix race condition with duplicate bases,
2014-08-29). If the old type wasn't a REF_DELTA, we consider that a
BUG(). But as discussed in that commit, we might see this case whenever
we try to resolve an object twice, which may happen because we have
multiple copies of the base object.

So this isn't a bug at all, but rather a sign that the input pack is
broken. And indeed, this case is triggered already in t5309.5 and
t5309.6, which create packs with delta cycles and duplicate bases. But
we never noticed because those tests are marked expect_failure.

Those tests were added by b2ef3d9ebb (test index-pack on packs with
recoverable delta cycles, 2013-08-23), which was leaving the door open
for cases that we theoretically _could_ handle. And when we see an
already-resolved object like this, in theory we could keep going after
confirming that the previously resolved child->real_type matches
base->obj->real_type. But:

  - enforcing the "only resolve once" rule here saves us from an
    infinite loop in other parts of the code. If we keep going, then the
    delta cycle in t5309.5 causes us to loop infinitely, as
    find_ref_delta_children() doesn't realize which objects have already
    been resolved. So there would be more changes needed to make this
    case work, and in the meantime we'd be worse off.

  - any pack that triggers this is broken anyway. It either has a
    duplicate base object, or it has a cycle which causes us to bring in
    a duplicate via --fix-thin. In either case, we'd end up rejecting
    the pack in write_idx_file(), which also detects duplicates.

So the tests have little value in documenting what we _could_ be doing
(and have been neglected for 6+ years). Let's switch them to confirming
that we handle this case cleanly (and switch out the BUG() for a more
informative die() so that we do so).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 13:19:11 -08:00
Taylor Blau
a7df60cac8 commit-graph.h: use odb in 'load_commit_graph_one_fd_st'
Apply a similar treatment as in the previous patch to pass a 'struct
object_directory *' through the 'load_commit_graph_one_fd_st'
initializer, too.

This prevents a potential bug where a pointer comparison is made to a
NULL 'g->odb', which would cause the commit-graph machinery to think
that a pair of commit-graphs belonged to different alternates when in
fact they do not (i.e., in the case of no '--object-dir').

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 11:36:51 -08:00
Taylor Blau
ad2dd5bb63 commit-graph.c: remove path normalization, comparison
As of the previous patch, all calls to 'commit-graph.c' functions which
perform path normalization (for e.g., 'get_commit_graph_filename()') are
of the form 'ctx->odb->path', which is always in normalized form.

Now that there are no callers passing non-normalized paths to these
functions, ensure that future callers are bound by the same restrictions
by making these functions take a 'struct object_directory *' instead of
a 'const char *'. To match, replace all calls with arguments of the form
'ctx->odb->path' with 'ctx->odb' To recover the path, functions that
perform path manipulation simply use 'odb->path'.

Further, avoid string comparisons with arguments of the form
'odb->path', and instead prefer raw pointer comparisons, which
accomplish the same effect, but are far less brittle.

This has a pleasant side-effect of making these functions much more
robust to paths that cannot be normalized by 'normalize_path_copy()',
i.e., because they are outside of the current working directory.

For example, prior to this patch, Valgrind reports that the following
uninitialized memory read [1]:

  $ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ )

because 'normalize_path_copy()' can't normalize '../.git' (since it's
relative to but above of the current working directory) [2].

By using a 'struct object_directory *' directly,
'get_commit_graph_filename()' does not need to normalize, because all
paths are relative to the current working directory since they are
always read from the '->path' of an object directory.

[1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net.
[2]: The bug here is that 'get_commit_graph_filename()' returns the
     result of 'normalize_path_copy()' without checking the return
     value.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 11:36:51 -08:00
Taylor Blau
13c2499249 commit-graph.h: store object directory in 'struct commit_graph'
In a previous patch, the 'char *object_dir' in 'struct commit_graph' was
replaced with a 'struct object_directory'. This patch applies the same
treatment to 'struct commit_graph', which is another intermediate step
towards getting rid of all path normalization in 'commit-graph.c'.

Instead of taking a 'char *object_dir', functions that construct a
'struct commit_graph' now take a 'struct object_directory *'. Any code
that needs an object directory path use '->path' instead.

This ensures that all calls to functions that perform path normalization
are given arguments which do not themselves require normalization. This
prepares those functions to drop their normalization entirely, which
will occur in the subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 11:36:51 -08:00
Taylor Blau
0bd52e27e3 commit-graph.h: store an odb in 'struct write_commit_graph_context'
There are lots of places in 'commit-graph.h' where a function either has
(or almost has) a full 'struct object_directory *', accesses '->path',
and then throws away the rest of the struct.

This can cause headaches when comparing the locations of object
directories across alternates (e.g., in the case of deciding if two
commit-graph layers can be merged). These paths are normalized with
'normalize_path_copy()' which mitigates some comparison issues, but not
all [1].

Replace usage of 'char *object_dir' with 'odb->path' by storing a
'struct object_directory *' in the 'write_commit_graph_context'
structure. This is an intermediate step towards getting rid of all path
normalization in 'commit-graph.c'.

Resolving a user-provided '--object-dir' argument now requires that we
compare it to the known alternates for equality.  Prior to this patch,
an unknown '--object-dir' argument would silently exit with status zero.

This can clearly lead to unintended behavior, such as verifying
commit-graphs that aren't in a repository's own object store (or one of
its alternates), or causing a typo to mask a legitimate commit-graph
verification failure. Make this error non-silent by 'die()'-ing when the
given '--object-dir' does not match any known alternate object store.

[1]: In my testing, for example, I can get one side of the commit-graph
code to fill object_dir with "./objects" and the other with just
"objects".

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 11:36:37 -08:00
Derrick Stolee
e53ffe2704 sparse-checkout: escape all glob characters on write
The sparse-checkout patterns allow special globs according to
fnmatch(3). When writing cone-mode patterns for paths containing
these characters, they must be escaped.

Use is_glob_special() to check which characters must be escaped
this way, and add a path to the tests that contains all glob
characters at once. Note that ']' is not special, since the
initial bracket '[' is escaped.

Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 13:05:29 -08:00
Derrick Stolee
e55682ea26 sparse-checkout: use C-style quotes in 'list' subcommand
When in cone mode, the 'git sparse-checkout list' subcommand lists
the directories included in the sparse cone. When these directories
contain odd characters, such as a backslash, then we need to use
C-style quotes similar to 'git ls-tree'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 13:05:29 -08:00
Derrick Stolee
bd64de42de sparse-checkout: unquote C-style strings over --stdin
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.

Even more specifically, the goal is to always allow the following from
the root of a repo:

  git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin

The ls-tree command provides directory names with an unescaped asterisk.
It also quotes the directories that contain an escaped backslash. We
must remove these quotes, then keep the escaped backslashes.

Use unquote_c_style() when parsing lines from stdin. Command-line
arguments will be parsed as-is, assuming the user can do the correct
level of escaping from their environment to match the exact directory
names.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 13:05:29 -08:00
Derrick Stolee
d585f0e799 sparse-checkout: write escaped patterns in cone mode
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.

However, there is some care needed for the timing of these escapes. The
in-memory pattern list is used to update the working directory before
writing the patterns to disk. Thus, we need the command to have the
unescaped names in the hashsets for the cone comparisons, then escape
the patterns later.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 13:05:29 -08:00
Junio C Hamano
145136a95a C: use skip_prefix() to avoid hardcoded string length
We often skip an optional prefix in a string with a hardcoded
constant, e.g.

	if (starts_with(string, "prefix"))
		string += 6;

which is less error prone when written

	skip_prefix(string, "prefix", &string);

Note that this changes a few error messages from "git reflog expire
--expire=nonsense.timestamp", which used to complain by saying

    '--expire=nonsense.timestamp' is not a valid timestamp

but with this change, we say

    'nonsense.timestamp' is not a valid timestamp

which is more technically correct (the string with --expire= as
a prefix obviously cannot be a valid timestamp, but the error is
about the part of the input without that prefix).

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 13:03:45 -08:00
Matheus Tavares
b98d188581 sha1-file: allow check_object_signature() to handle any repo
Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Matheus Tavares
2dcde20e1c sha1-file: pass git_hash_algo to hash_object_file()
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Matheus Tavares
c8123e72f6 streaming: allow open_istream() to handle any repo
Some callers of open_istream() at archive-tar.c and archive-zip.c are
capable of working on arbitrary repositories but the repo struct is not
passed down to open_istream(), which uses the_repository internally. For
now, that's not a problem since the said callers are only being called
with the_repository. But to be consistent and avoid future problems,
let's allow open_istream() to receive a struct repository and use that
instead of the_repository. This parameter addition will also be used in
a future patch to make sha1-file.c:check_object_signature() be able to
work on arbitrary repos.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Junio C Hamano
11ad30b887 Merge branch 'hi/gpg-mintrustlevel'
gpg.minTrustLevel configuration variable has been introduced to
tell various signature verification codepaths the required minimum
trust level.

* hi/gpg-mintrustlevel:
  gpg-interface: add minTrustLevel as a configuration option
2020-01-30 14:17:08 -08:00
Jonathan Tan
2df1aa239c fetch: forgo full connectivity check if --filter
If a filter is specified, we do not need a full connectivity check on
the contents of the packfile we just fetched; we only need to check that
the objects referenced are promisor objects.

This significantly speeds up fetches into repositories that have many
promisor objects, because during the connectivity check, all promisor
objects are enumerated (to mark them UNINTERESTING), and that takes a
significant amount of time.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30 10:55:47 -08:00
Jonathan Tan
50033772d5 connected: verify promisor-ness of partial clone
Commit dfa33a298d ("clone: do faster object check for partial clones",
2019-04-21) optimized the connectivity check done when cloning with
--filter to check only the existence of objects directly pointed to by
refs. But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are
promisor objects, that is, they appear in a promisor pack.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30 10:55:31 -08:00
Philippe Blain
c56c48dd07 grep: ignore --recurse-submodules if --no-index is given
Since grep learned to recurse into submodules in 0281e487fd
(grep: optionally recurse into submodules, 2016-12-16),
using --recurse-submodules along with --no-index makes Git
die().

This is unfortunate because if submodule.recurse is set in a user's
~/.gitconfig, invoking `git grep --no-index` either inside or outside
a Git repository results in

    fatal: option not supported with --recurse-submodules

Let's allow using these options together, so that setting submodule.recurse
globally does not prevent using `git grep --no-index`.

Using `--recurse-submodules` should not have any effect if `--no-index`
is used inside a repository, as Git will recurse into the checked out
submodule directories just like into regular directories.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-30 10:15:58 -08:00
Peter Kaestle
3b2885ec9b submodule: fix status of initialized but not cloned submodules
Original bash helper for "submodule status" was doing a check for
initialized but not cloned submodules and prefixed the status with
a minus sign in case no .git file or folder was found inside the
submodule directory.

This check was missed when the original port of the functionality
from bash to C was done.

Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-27 10:14:00 -08:00
Derrick Stolee
47dbf10d8a clone: fix --sparse option with URLs
The --sparse option was added to the clone builtin in d89f09c (clone:
add --sparse mode, 2019-11-21) and was tested with a local path clone
in t1091-sparse-checkout-builtin.sh. However, due to a difference in
how local paths are handled versus URLs, this mechanism does not work
with URLs.

Modify the test to use a "file://" URL, which would output this error
before the code change:

  Cloning into 'clone'...
  fatal: cannot change to 'file://.../repo': No such file or directory
  error: failed to initialize sparse-checkout

These errors are due to using a "-C <path>" option to call 'git -C
<path> sparse-checkout init' but the URL is being given instead of
the target directory.

Update that target directory to evaluate this correctly. I have also
manually tested that https:// URLs are handled correctly as well.

Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-24 13:26:54 -08:00
Derrick Stolee
3c754067a1 sparse-checkout: create leading directories
The 'git init' command creates the ".git/info" directory and fills it
with some default files. However, 'git worktree add' does not create
the info directory for that worktree. This causes a problem when running
"git sparse-checkout init" inside a worktree. While care was taken to
allow the sparse-checkout config to be specific to a worktree, this
initialization was untested.

Safely create the leading directories for the sparse-checkout file. This
is the safest thing to do even without worktrees, as a user could delete
their ".git/info" directory and expect Git to recover safely.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-24 13:26:54 -08:00
Matthew Rogers
329e6ec397 config: fix typo in variable name
In git config use of the end_null variable to determine if we should be
null terminating our output.  While it is correct to say a string is
"null terminated" the character is actually the "nul" character, so this
malapropism is being fixed.

Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-24 10:42:13 -08:00
Alban Gruin
767a9c417e rebase -i: stop checking out the tip of the branch to rebase
One of the first things done when using a sequencer-based
rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo
list.  This requires knowledge of the commit range to rebase.  To get
the oid of the last commit of the range, the tip of the branch to rebase
is checked out with prepare_branch_to_be_rebased(), then the oid of the
head is read.  After this, the tip of the branch is not even modified.
The `am' backend, on the other hand, does not check out the branch.

On big repositories, it's a performance penalty: with `rebase -i', the
user may have to wait before editing the todo list while git is
extracting the branch silently, and "quiet" rebases will be slower than
`am'.

Since we already have the oid of the tip of the branch in
`opts->orig_head', it's useless to switch to this commit.

This removes the call to prepare_branch_to_be_rebased() in
do_interactive_rebase(), and adds a `orig_head' parameter to
get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
is no longer used.

This introduces a visible change: as we do not switch on the tip of the
branch to rebase, no reflog entry is created at the beginning of the
rebase for it.

Unscientific performance measurements, performed on linux.git, are as
follow:

  Before this patch:

    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50

    real    0m8,940s
    user    0m6,830s
    sys     0m2,121s

  After this patch:

    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50

    real    0m1,834s
    user    0m0,916s
    sys     0m0,206s

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-24 10:29:42 -08:00
Jeff King
92fb0db94c pack-objects: add checks for duplicate objects
Additional checks are added in have_duplicate_entry() and
obj_is_packed() to avoid duplicate objects in the reuse
bitmap. It was probably buggy to not have such a check
before.

Git as a client would never both asks for a tag by sha1 and
specify "include-tag", but libgit2 will, so a libgit2 client
cloning from a Git server would trigger the bug.

If a client both asks for a tag by sha1 and specifies
"include-tag", we may end up including the tag in the reuse
bitmap (due to the first thing), and then later adding it to
the packlist (due to the second). This results in duplicate
objects in the pack, which git chokes on. We should notice
that we are already including it when doing the include-tag
portion, and avoid adding it to the packlist.

The simplest place to fix this is right in add_ref_tag(),
where we could avoid peeling the tag at all if we know that
we are already including it. However, this pushes the check
instead into have_duplicate_entry(). This fixes not only
this case, but also means that we cannot have any similar
problems lurking in other code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23 10:51:50 -08:00
Jeff King
bb514de356 pack-objects: improve partial packfile reuse
The old code to reuse deltas from an existing packfile
just tried to dump a whole segment of the pack verbatim.
That's faster than the traditional way of actually adding
objects to the packing list, but it didn't kick in very
often. This new code is really going for a middle ground:
do _some_ per-object work, but way less than we'd
traditionally do.

The general strategy of the new code is to make a bitmap
of objects from the packfile we'll include, and then
iterate over it, writing out each object exactly as it is
in our on-disk pack, but _not_ adding it to our packlist
(which costs memory, and increases the search space for
deltas).

One complication is that if we're omitting some objects,
we can't set a delta against a base that we're not
sending. So we have to check each object in
try_partial_reuse() to make sure we have its delta.

About performance, in the worst case we might have
interleaved objects that we are sending or not sending,
and we'd have as many chunks as objects. But in practice
we send big chunks.

For instance, packing torvalds/linux on GitHub servers
now reused 6.5M objects, but only needed ~50k chunks.

Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23 10:51:50 -08:00
Jeff King
ff483026a9 builtin/pack-objects: introduce obj_is_packed()
Let's refactor the way we check if an object is packed by
introducing obj_is_packed(). This function is now a simple
wrapper around packlist_find(), but it will evolve in a
following commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23 10:51:50 -08:00
Jeff King
e704fc7978 pack-objects: introduce pack.allowPackReuse
Let's make it possible to configure if we want pack reuse or not.

The main reason it might not be wanted is probably debugging and
performance testing, though pack reuse _might_ cause larger packs,
because we wouldn't consider the reused objects as bases for
finding new deltas.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-23 10:51:50 -08:00
Junio C Hamano
09e393d913 Merge branch 'nd/switch-and-restore'
"git restore --staged" did not correctly update the cache-tree
structure, resulting in bogus trees to be written afterwards, which
has been corrected.

* nd/switch-and-restore:
  restore: invalidate cache-tree when removing entries with --staged
2020-01-22 15:07:32 -08:00
Junio C Hamano
9403e5dcdd Merge branch 'hw/commit-advise-while-rejecting'
"git commit" gives output similar to "git status" when there is
nothing to commit, but without honoring the advise.statusHints
configuration variable, which has been corrected.

* hw/commit-advise-while-rejecting:
  commit: honor advice.statusHints when rejecting an empty commit
2020-01-22 15:07:30 -08:00
Elijah Newren
22a69fda19 git-rebase.txt: update description of --allow-empty-message
Commit b00bf1c9a8 ("git-rebase: make --allow-empty-message the
default", 2018-06-27) made --allow-empty-message the default and thus
turned --allow-empty-message into a no-op but did not update the
documentation to reflect this.  Update the documentation now, and hide
the option from the normal -h output since it is not useful.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:58:30 -08:00
Matheus Tavares
f1928f04b2 grep: use no. of cores as the default no. of threads
When --threads is not specified, git-grep will use 8 threads by default.
This fixed number may be too many for machines with fewer cores and too
little for machines with more cores. So, instead, use the number of
logical cores available in the machine, which seems to result in the
best overall performance: The following measurements correspond to the
mean elapsed times for 30 git-grep executions in chromium's
repository[1] with a 95% confidence interval (each set of 30 were
performed after 2 warmup runs). Regex 1 is 'abcd[02]' and Regex 2 is
'(static|extern) (int|double) \*'.

      |          Working tree         |           Object Store
------|-------------------------------|--------------------------------
 #ths |  Regex 1      |  Regex 2      |   Regex 1      |   Regex 2
------|---------------|---------------|----------------|---------------
  32  |  2.92s ± 0.01 |  3.72s ± 0.21 |   5.36s ± 0.01 |   6.07s ± 0.01
  16  |  2.84s ± 0.01 |  3.57s ± 0.21 |   5.05s ± 0.01 |   5.71s ± 0.01
>  8  |  2.53s ± 0.00 |  3.24s ± 0.21 |   4.86s ± 0.01 |   5.48s ± 0.01
   4  |  2.43s ± 0.02 |  3.22s ± 0.20 |   5.22s ± 0.02 |   6.03s ± 0.02
   2  |  3.06s ± 0.20 |  4.52s ± 0.01 |   7.52s ± 0.01 |   9.06s ± 0.01
   1  |  6.16s ± 0.01 |  9.25s ± 0.02 |  14.10s ± 0.01 |  17.22s ± 0.01

The above tests were performed in a desktop running Debian 10.0 with
Intel(R) Xeon(R) CPU E3-1230 V2 (4 cores w/ hyper-threading), 32GB of
RAM and a 7200 rpm, SATA 3.1 HDD.

Bellow, the tests were repeated for a machine with SSD: a Manjaro laptop
with Intel(R) i7-7700HQ (4 cores w/ hyper-threading) and 16GB of RAM:

      |          Working tree          |           Object Store
------|--------------------------------|--------------------------------
 #ths |  Regex 1      |  Regex 2       |   Regex 1      |   Regex 2
------|---------------|----------------|----------------|---------------
  32  |  3.29s ± 0.21 |   4.30s ± 0.01 |   6.30s ± 0.01 |   7.30s ± 0.02
  16  |  3.19s ± 0.20 |   4.14s ± 0.02 |   5.91s ± 0.01 |   6.83s ± 0.01
>  8  |  2.90s ± 0.04 |   3.82s ± 0.20 |   5.70s ± 0.02 |   6.53s ± 0.01
   4  |  2.84s ± 0.02 |   3.77s ± 0.20 |   6.19s ± 0.02 |   7.18s ± 0.02
   2  |  3.73s ± 0.21 |   5.57s ± 0.02 |   9.28s ± 0.01 |  11.22s ± 0.01
   1  |  7.48s ± 0.02 |  11.36s ± 0.03 |  17.75s ± 0.01 |  21.87s ± 0.08

[1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
     04-06-2019), after a 'git gc' execution.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:15 -08:00
Matheus Tavares
70a9fef240 grep: move driver pre-load out of critical section
In builtin/grep.c:add_work() we pre-load the userdiff drivers before
adding the grep_source in the todo list. This operation is currently
being performed after acquiring the grep_mutex, but as it's already
thread-safe, we don't need to protect it here. So let's move it out of
the critical section which should avoid thread contention and improve
performance.

Running[1] `git grep --threads=8 abcd[02] HEAD` on chromium's
repository[2], I got the following mean times for 30 executions after 2
warmups:

        Original         |  6.2886s
-------------------------|-----------
 Out of critical section |  5.7852s

[1]: Tests performed on an i7-7700HQ with 16GB of RAM and SSD, running
     Manjaro Linux.
[2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
         04-06-2019), after a 'git gc' execution.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Matheus Tavares
1184a95ea2 grep: re-enable threads in non-worktree case
They were disabled at 53b8d93 ("grep: disable threading in non-worktree
case", 12-12-2011), due to observable performance drops (to the point
that using a single thread would be faster than multiple threads). But
now that zlib inflation can be performed in parallel we can regain the
speedup, so let's re-enable threads in non-worktree grep.

Grepping 'abcd[02]' ("Regex 1") and '(static|extern) (int|double) \*'
("Regex 2") at chromium's repository[1] I got:

 Threads |   Regex 1  |  Regex 2
---------|------------|-----------
    1    |  17.2920s  |  20.9624s
    2    |   9.6512s  |  11.3184s
    4    |   6.7723s  |   7.6268s
    8**  |   6.2886s  |   6.9843s

These are all means of 30 executions after 2 warmup runs. All tests were
executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and
SSD, running Manjaro Linux. But to make sure the optimization also
performs well on HDD, the tests were repeated on another machine with an
i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III,
5400 rpm), also running Manjaro Linux:

 Threads |   Regex 1  |  Regex 2
---------|------------|-----------
    1    |  18.4035s  |  22.5368s
    2    |  12.5063s  |  14.6409s
    4**  |  10.9136s  |  12.7106s

** Note that in these cases we relied on hyper-threading, and that's
   probably why we don't see a big difference in time.

Unfortunately, multithreaded git-grep might be slow in the non-worktree
case when --textconv is used and there're too many text conversions.
Probably the reason for this is that the object read lock is used to
protect fill_textconv() and therefore there is a mutual exclusion
between textconv execution and object reading. Because both are
time-consuming operations, not being able to perform them in parallel
can cause performance drops. To inform the users about this (and other
threading details), let's also add a "NOTES ON THREADS" section to
Documentation/git-grep.txt.

[1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
     04-06-2019), after a 'git gc' execution.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Matheus Tavares
6c307626f1 grep: protect packed_git [re-]initialization
Some fields in struct raw_object_store are lazy initialized by the
thread-unsafe packfile.c:prepare_packed_git(). Although this function is
present in the call stack of git-grep threads, all paths to it are
currently protected by obj_read_lock() (and the main thread usually
indirectly calls it before firing the worker threads, anyway). However,
it's possible that future modifications add new unprotected paths to it,
introducing a race condition. Because errors derived from it wouldn't
happen often, it could be hard to detect. So to prevent future
headaches, let's force eager initialization of packed_git when setting
git-grep up. There'll be a small overhead in the cases where we didn't
really need to prepare packed_git during execution but this shouldn't be
very noticeable.

Also, packed_git may be re-initialized by
packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep
are already protected by obj_read_lock() but it may suffer from the same
problem in the future. So let's also internally protect it with
obj_read_lock() (which is a recursive mutex).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Matheus Tavares
c441ea4edc grep: allow submodule functions to run in parallel
Now that object reading operations are internally protected, the
submodule initialization functions at builtin/grep.c:grep_submodule()
are very close to being thread-safe. Let's take a look at each call and
remove from the critical section what we can, for better performance:

- submodule_from_path() and is_submodule_active() cannot be called in
  parallel yet only because they call repo_read_gitmodules() which
  contains, in its call stack, operations that would otherwise be in
  race condition with object reading (for example parse_object() and
  is_promisor_remote()). However, they only call repo_read_gitmodules()
  if it wasn't read before. So let's pre-read it before firing the
  threads and allow these two functions to safely be called in
  parallel.

- repo_submodule_init() is already thread-safe, so remove it from the
  critical section without other necessary changes.

- The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as
  no other thread is performing object reading operations in the subrepo
  yet. However, threads might be working in the superproject, and this
  function calls add_to_alternates_memory() internally, which is racy
  with object readings in the superproject. So it must be kept
  protected for now. Let's add a "NEEDSWORK" to it, informing why it
  cannot be removed from the critical section yet.

- Finally, add_to_alternates_memory() must be kept protected for the
  same reason as the item above.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Matheus Tavares
d7992421e1 submodule-config: add skip_if_read option to repo_read_gitmodules()
Currently, submodule-config.c doesn't have an externally accessible
function to read gitmodules only if it wasn't already read. But this
exact behavior is internally implemented by gitmodules_read_check(), to
perform a lazy load. Let's merge this function with
repo_read_gitmodules() adding a 'skip_if_read' which allows both
internal and external callers to access this functionality. This
simplifies a little the code. The added option will also be used in
the following patch.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Matheus Tavares
1d1729caeb grep: replace grep_read_mutex by internal obj read lock
git-grep uses 'grep_read_mutex' to protect its calls to object reading
operations. But these have their own internal lock now, which ensures a
better performance (allowing parallel access to more regions). So, let's
remove the former and, instead, activate the latter with
enable_obj_read_lock().

Sections that are currently protected by 'grep_read_mutex' but are not
internally protected by the object reading lock should be surrounded by
obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion
with object reading operations, keeping the current behavior and
avoiding race conditions. Namely, these places are:

  In grep.c:

  - fill_textconv() at fill_textconv_grep().
  - userdiff_get_textconv() at grep_source_1().

  In builtin/grep.c:

  - parse_object_or_die() and the submodule functions at
    grep_submodule().
  - deref_tag() and gitmodules_config_oid() at grep_objects().

If these functions become thread-safe, in the future, we might remove
the locking and probably get some speedup.

Note that some of the submodule functions will already be thread-safe
(or close to being thread-safe) with the internal object reading lock.
However, as some of them will require additional modifications to be
removed from the critical section, this will be done in its own patch.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Matheus Tavares
d5b0bac528 grep: fix racy calls in grep_objects()
deref_tag() calls is_promisor_object() and parse_object(), both of which
perform lazy initializations and other thread-unsafe operations. If it
was only called by grep_objects() this wouldn't be a problem as the
latter is only executed by the main thread. However, deref_tag() is also
present in read_object_file()'s call stack. So calling deref_tag() in
grep_objects() without acquiring the grep_read_mutex may incur in a race
condition with object reading operations (such as the ones internally
performed by fill_textconv(), called at fill_textconv_grep()). The same
problem happens with the call to gitmodules_config_oid() which also has
parse_object() in its call stack. Fix that protecting both calls with
the said grep_read_mutex.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Matheus Tavares
faf123c730 grep: fix race conditions at grep_submodule()
There're currently two function calls in builtin/grep.c:grep_submodule()
which might result in race conditions:

- submodule_from_path(): it has config_with_options() in its call stack
  which, in turn, may have read_object_file() in its own. Therefore,
  calling the first function without acquiring grep_read_mutex may end
  up causing a race condition with other object read operations
  performed by worker threads (for example, at the fill_textconv()
  call in grep.c:fill_textconv_grep()).
- parse_object_or_die(): it falls into the same problem, having
  repo_has_object_file(the_repository, ...) in its call stack. Besides
  that, parse_object(), which is also called by parse_object_or_die(),
  is thread-unsafe and also called by object reading functions.

It's unlikely to really fall into a data race with these operations as
the volume of calls to them is usually very low. But we better protect
ourselves against this possibility, anyway. So, to solve these issues,
move both of these function calls into the critical section of
grep_read_mutex.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-17 13:52:14 -08:00
Hans Jerry Illikainen
54887b4689 gpg-interface: add minTrustLevel as a configuration option
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature().  If that was the case, the process die()d.

The other code paths that did signature verification relied entirely on
the return code from check_commit_signature().  And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().

This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).

The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`).  These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].

The GPG documentation says the following on the TRUST_ status codes [1]:

    """
    These are several similar status codes:

    - TRUST_UNDEFINED <error_token>
    - TRUST_NEVER     <error_token>
    - TRUST_MARGINAL  [0  [<validation_model>]]
    - TRUST_FULLY     [0  [<validation_model>]]
    - TRUST_ULTIMATE  [0  [<validation_model>]]

    For good signatures one of these status lines are emitted to
    indicate the validity of the key used to create the signature.
    The error token values are currently only emitted by gpgsm.
    """

My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature.  That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.

The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).

I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).

I also think it makes sense to not store the trust level in the same
struct member as the key/signature status.  While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.

This patch introduces a new configuration option: gpg.minTrustLevel.  It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.

Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced.  If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.

Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure.  A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.

Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature().  This would also have made the
behavior consistent with other parts of git that perform signature
verification.  However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case.  For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].

[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] 9674c1991d/scripts/verify-git-tag (L43)

Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-15 14:06:06 -08:00
Heba Waly
bf66db37f1 add: use advise function to display hints
Use the advise function in advice.c to display hints to the users, as
it provides a neat and a standard format for hint messages, i.e: the
text is colored in yellow and the line starts by the word "hint:".

Also this will enable us to control the messages using advice.*
configuration variables.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-15 12:15:04 -08:00
Junio C Hamano
4d924528d8 Revert "Merge branch 'ra/rebase-i-more-options'"
This reverts commit 5d9324e0f4, reversing
changes made to c58ae96fc4.

The topic turns out to be too buggy for real use.

cf. <f2fe7437-8a48-3315-4d3f-8d51fe4bb8f1@gmail.com>
2020-01-12 13:25:18 -08:00
Jeff King
e701bab3e9 restore: invalidate cache-tree when removing entries with --staged
When "git restore --staged <path>" removes a path that's in the index,
it marks the entry with CE_REMOVE, but we don't do anything to
invalidate the cache-tree. In the non-staged case, we end up in
checkout_worktree(), which calls remove_marked_cache_entries(). That
actually drops the entries from the index, as well as invalidating the
cache-tree and untracked-cache.

But with --staged, we never call checkout_worktree(), and the CE_REMOVE
entries remain. Interestingly, they are dropped when we write out the
index, but that means the resulting index is inconsistent: its
cache-tree will not match the actual entries, and running "git commit"
immediately after will create the wrong tree.

We can solve this by calling remove_marked_cache_entries() ourselves
before writing out the index. Note that we can't just hoist it out of
checkout_worktree(); that function needs to iterate over the CE_REMOVE
entries (to drop their matching worktree files) before removing them.

One curiosity about the test: without this patch, it actually triggers a
BUG() when running git-restore:

  BUG: cache-tree.c:810: new1 with flags 0x4420000 should not be in cache-tree

But in the original problem report, which used a similar recipe,
git-restore actually creates the bogus index (and the commit is created
with the wrong tree). I'm not sure why the test here behaves differently
than my out-of-suite reproduction, but what's here should catch either
symptom (and the fix corrects both cases).

Reported-by: Torsten Krah <krah.tm@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-08 09:03:36 -08:00
Alexandr Miloslavskiy
fa74180d08 checkout: don't revert file on ambiguous tracking branches
For easier understanding, here are the existing good scenarios:

  1) Have *no* file 'foo', *no* local branch 'foo' and a *single*
     remote branch 'foo'
  2) `git checkout foo` will create local branch foo, see [1]

  and

  1) Have *a* file 'foo', *no* local branch 'foo' and a *single*
     remote branch 'foo'
  2) `git checkout foo` will complain, see [3]

This patch prevents the following scenario:

  1) Have *a* file 'foo', *no* local branch 'foo' and *multiple*
     remote branches 'foo'
  2) `git checkout foo` will successfully... revert contents of
     file `foo`!

That is, adding another remote suddenly changes behavior significantly,
which is a surprise at best and could go unnoticed by user at worst.
Please see [3] which gives some real world complaints.

To my understanding, fix in [3] overlooked the case of multiple remotes,
and the whole behavior of falling back to reverting file was never
intended:

  [1] introduces the unexpected behavior. Before, there was fallback
  from not-a-ref to pathspec. This is reasonable fallback. After, there
  is another fallback from ambiguous-remote to pathspec. I understand
  that it was a copy&paste oversight.

  [2] noticed the unexpected behavior but chose to semi-document it
  instead of forbidding, because the goal of the patch series was
  focused on something else.

  [3] adds `die()` when there is ambiguity between branch and file. The
  case of multiple tracking branches is seemingly overlooked.

The new behavior: if there is no local branch and multiple remote
candidates, just die() and don't try reverting file whether it
exists (prevents surprise) or not (improves error message).

[1] Commit 70c9ac2f ("DWIM "git checkout frotz" to "git checkout -b frotz origin/frotz"" 2009-10-18)
    https://public-inbox.org/git/7vaazpxha4.fsf_-_@alter.siamese.dyndns.org/
[2] Commit ad8d5104 ("checkout: add advice for ambiguous "checkout <branch>"", 2018-06-05)
    https://public-inbox.org/git/20180502105452.17583-1-avarab@gmail.com/
[3] Commit be4908f1 ("checkout: disambiguate dwim tracking branches and local files", 2018-11-13)
    https://public-inbox.org/git/20181110120707.25846-1-pclouds@gmail.com/

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-07 13:15:38 -08:00
Alexandr Miloslavskiy
2957709bd4 parse_branchname_arg(): extract part as new function
This is done for the next commit to avoid crazy 7x tab code padding.

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-07 13:15:37 -08:00
Junio C Hamano
037f067587 Merge branch 'ds/commit-graph-set-size-mult'
The code to write split commit-graph file(s) upon fetching computed
bogus value for the parameter used in splitting the resulting
files, which has been corrected.

* ds/commit-graph-set-size-mult:
  commit-graph: prefer default size_mult when given zero
2020-01-06 14:17:51 -08:00
Junio C Hamano
c20d4fd44a Merge branch 'ds/sparse-list-in-cone-mode'
"git sparse-checkout list" subcommand learned to give its output in
a more concise form when the "cone" mode is in effect.

* ds/sparse-list-in-cone-mode:
  sparse-checkout: document interactions with submodules
  sparse-checkout: list directories in cone mode
2020-01-06 14:17:51 -08:00
Derrick Stolee
63020f175f commit-graph: prefer default size_mult when given zero
In 50f26bd ("fetch: add fetch.writeCommitGraph config setting",
2019-09-02), the fetch builtin added the capability to write a
commit-graph using the "--split" feature. This feature creates
multiple commit-graph files, and those can merge based on a set
of "split options" including a size multiple. The default size
multiple is 2, which intends to provide a log_2 N depth of the
commit-graph chain where N is the number of commits.

However, I noticed during dogfooding that my commit-graph chains
were becoming quite large when left only to builds by 'git fetch'.
It turns out that in split_graph_merge_strategy(), we default the
size_mult variable to 2 except we override it with the context's
split_opts if they exist. In builtin/fetch.c, we create such a
split_opts, but do not populate it with values.

This problem is due to two failures:

 1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
    with a NULL split_opts.
 2. If we have a non-NULL split_opts, then we override the default
    values even if a zero value is given.

Correct both of these issues. First, do not override size_mult when
the options provide a zero value. Second, stop creating a split_opts
in the fetch builtin.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-02 13:46:34 -08:00
Junio C Hamano
e0e1ac5db0 Merge branch 'en/rebase-signoff-fix'
"git rebase --signoff" stopped working when the command was written
in C, which has been corrected.

* en/rebase-signoff-fix:
  rebase: fix saving of --signoff state for am-based rebases
2020-01-02 12:38:30 -08:00
Derrick Stolee
de11951b03 sparse-checkout: list directories in cone mode
When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set'
command takes a list of directories as input, then creates an ordered
list of sparse-checkout patterns such that those directories are
recursively included and all sibling entries along the parent directories
are also included. Listing the patterns is less user-friendly than the
directories themselves.

In cone mode, and as long as the patterns match the expected cone-mode
pattern types, change the output of 'git sparse-checkout list' to only
show the directories that created the patterns.

With this change, the following piped commands would not change the
working directory:

	git sparse-checkout list | git sparse-checkout set --stdin

The only time this would not work is if core.sparseCheckoutCone is
true, but the sparse-checkout file contains patterns that do not
match the expected pattern types for cone mode.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-30 09:07:18 -08:00
Junio C Hamano
65099bd775 Merge branch 'mr/bisect-save-pointer-to-const-string'
Code cleanup.

* mr/bisect-save-pointer-to-const-string:
  bisect--helper: convert `*_warning` char pointers to char arrays.
2019-12-25 11:22:01 -08:00
Junio C Hamano
45b96a6fa1 Merge branch 'js/add-p-in-c'
The effort to move "git-add--interactive" to C continues.

* js/add-p-in-c:
  built-in add -p: show helpful hint when nothing can be staged
  built-in add -p: only show the applicable parts of the help text
  built-in add -p: implement the 'q' ("quit") command
  built-in add -p: implement the '/' ("search regex") command
  built-in add -p: implement the 'g' ("goto") command
  built-in add -p: implement hunk editing
  strbuf: add a helper function to call the editor "on an strbuf"
  built-in add -p: coalesce hunks after splitting them
  built-in add -p: implement the hunk splitting feature
  built-in add -p: show different prompts for mode changes and deletions
  built-in app -p: allow selecting a mode change as a "hunk"
  built-in add -p: handle deleted empty files
  built-in add -p: support multi-file diffs
  built-in add -p: offer a helpful error message when hunk navigation failed
  built-in add -p: color the prompt and the help text
  built-in add -p: adjust hunk headers as needed
  built-in add -p: show colored hunks by default
  built-in add -i: wire up the new C code for the `patch` command
  built-in add -i: start implementing the `patch` functionality in C
2019-12-25 11:22:01 -08:00
Junio C Hamano
87cbb1ca66 Merge branch 'rs/ref-read-cleanup'
Code cleanup.

* rs/ref-read-cleanup:
  remote: pass NULL to read_ref_full() because object ID is not needed
  refs: pass NULL to refs_read_ref_full() because object ID is not needed
2019-12-25 11:22:00 -08:00
Junio C Hamano
4bfc9ccfb6 Merge branch 'mr/bisect-use-after-free'
Use-after-free fix.

* mr/bisect-use-after-free:
  bisect--helper: avoid use-after-free
2019-12-25 11:21:59 -08:00
Junio C Hamano
bd72a08d6c Merge branch 'ds/sparse-cone'
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.

* ds/sparse-cone: (21 commits)
  sparse-checkout: improve OS ls compatibility
  sparse-checkout: respect core.ignoreCase in cone mode
  sparse-checkout: check for dirty status
  sparse-checkout: update working directory in-process for 'init'
  sparse-checkout: cone mode should not interact with .gitignore
  sparse-checkout: write using lockfile
  sparse-checkout: use in-process update for disable subcommand
  sparse-checkout: update working directory in-process
  sparse-checkout: sanitize for nested folders
  unpack-trees: add progress to clear_ce_flags()
  unpack-trees: hash less in cone mode
  sparse-checkout: init and set in cone mode
  sparse-checkout: use hashmaps for cone patterns
  sparse-checkout: add 'cone' mode
  trace2: add region in clear_ce_flags
  sparse-checkout: create 'disable' subcommand
  sparse-checkout: add '--stdin' option to set subcommand
  sparse-checkout: 'set' subcommand
  clone: add --sparse mode
  sparse-checkout: create 'init' subcommand
  ...
2019-12-25 11:21:58 -08:00
Junio C Hamano
f3c520e17f Merge branch 'sg/name-rev-wo-recursion'
Redo "git name-rev" to avoid recursive calls.

* sg/name-rev-wo-recursion:
  name-rev: cleanup name_ref()
  name-rev: eliminate recursion in name_rev()
  name-rev: use 'name->tip_name' instead of 'tip_name'
  name-rev: drop name_rev()'s 'generation' and 'distance' parameters
  name-rev: restructure creating/updating 'struct rev_name' instances
  name-rev: restructure parsing commits and applying date cutoff
  name-rev: pull out deref handling from the recursion
  name-rev: extract creating/updating a 'struct name_rev' into a helper
  t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
  name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
  name-rev: avoid unnecessary cast in name_ref()
  name-rev: use strbuf_strip_suffix() in get_rev_name()
  t6120-describe: modernize the 'check_describe' helper
  t6120-describe: correct test repo history graph in comment
2019-12-25 11:21:58 -08:00
Junio C Hamano
17066bea38 Merge branch 'dl/format-patch-notes-config-fixup'
"git format-patch" can take a set of configured format.notes values
to specify which notes refs to use in the log message part of the
output.  The behaviour of this was not consistent with multiple
--notes command line options, which has been corrected.

* dl/format-patch-notes-config-fixup:
  notes.h: fix typos in comment
  notes: break set_display_notes() into smaller functions
  config/format.txt: clarify behavior of multiple format.notes
  format-patch: move git_config() before repo_init_revisions()
  format-patch: use --notes behavior for format.notes
  notes: extract logic into set_display_notes()
  notes: create init_display_notes() helper
  notes: rename to load_display_notes()
2019-12-25 11:21:58 -08:00
Junio C Hamano
135365dd99 Merge branch 'am/pathspec-f-f-checkout'
A few more commands learned the "--pathspec-from-file" command line
option.

* am/pathspec-f-f-checkout:
  checkout, restore: support the --pathspec-from-file option
  doc: restore: synchronize <pathspec> description
  doc: checkout: synchronize <pathspec> description
  doc: checkout: fix broken text reference
  doc: checkout: remove duplicate synopsis
  add: support the --pathspec-from-file option
  cmd_add: prepare for next patch
2019-12-25 11:21:57 -08:00
Junio C Hamano
ff0cb70d45 Merge branch 'am/pathspec-from-file'
An earlier series to teach "--pathspec-from-file" to "git commit"
forgot to make the option incompatible with "--all", which has been
corrected.

* am/pathspec-from-file:
  commit: forbid --pathspec-from-file --all
2019-12-25 11:21:57 -08:00
Johannes Schindelin
c480eeb574 commit --interactive: make it work with the built-in add -i
The built-in `git add -i` machinery obviously has its `the_repository`
structure initialized at the point where `cmd_commit()` calls it, and
therefore does not look at the environment variable `GIT_INDEX_FILE`.

But when being called from `commit --interactive`, it has to, because
the index was already locked in that case, and we want to ask the
interactive add machinery to work on the `index.lock` file instead of
the `index` file.

Technically, we could teach `run_add_i()`, or for that matter
`run_add_p()`, to look specifically at that environment variable, but
the entire idea of passing in a parameter of type `struct repository *`
is to allow working on multiple repositories (and their index files)
independently.

So let's instead override the `index_file` field of that structure
temporarily.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-21 16:06:22 -08:00
Johannes Schindelin
cee6cb7300 built-in add -p: implement the "worktree" patch modes
This is a straight-forward port of 2f0896ec3a (restore: support
--patch, 2019-04-25) which added support for `git restore -p`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-21 16:06:22 -08:00
Johannes Schindelin
52628f94fc built-in add -p: implement the "checkout" patch modes
This patch teaches the built-in `git add -p` machinery all the tricks it
needs to know in order to act as the work horse for `git checkout -p`.

Apart from the minor changes (slightly reworded messages, different
`diff` and `apply --check` invocations), it requires a new function to
actually apply the changes, as `git checkout -p` is a bit special in
that respect: when the desired changes do not apply to the index, but
apply to the work tree, Git does not fail straight away, but asks the
user whether to apply the changes to the worktree at least.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-21 16:06:22 -08:00
Johannes Schindelin
6610e4628a built-in stash: use the built-in git add -p if so configured
The scripted version of `git stash` called directly into the Perl script
`git-add--interactive.perl`, and this was faithfully converted to C.

However, we have a much better way to do this now: call the internal API
directly, which will now incidentally also respect the
`add.interactive.useBuiltin` setting. Let's just do this.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-21 16:06:21 -08:00
Johannes Schindelin
90a6bb98d1 legacy stash -p: respect the add.interactive.usebuiltin setting
As `git add` traditionally did not expose the `--patch=<mode>` modes via
command-line options, the scripted version of `git stash` had to call
`git add--interactive` directly.

But this prevents the built-in `add -p` from kicking in, as
`add--interactive` is the scripted version (which does not have a
"fall-back" to the built-in version).

So let's introduce support for internal switch for `git add` that the
scripted `git stash` can use to call the appropriate backend (scripted
or built-in, depending on `add.interactive.useBuiltin`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-21 16:06:21 -08:00
Johannes Schindelin
36bae1dc0e built-in add -p: implement the "stash" and "reset" patch modes
The `git stash` and `git reset` commands support a `--patch` option, and
both simply hand off to `git add -p` to perform that work. Let's teach
the built-in version of that command to be able to perform that work, too.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-21 16:06:21 -08:00
Johannes Schindelin
d2a233cb8b built-in add -p: prepare for patch modes other than "stage"
The Perl script backing `git add -p` is used not only for that command,
but also for `git stash -p`, `git reset -p` and `git checkout -p`.

In preparation for teaching the C version of `git add -p` to support
also the latter commands, let's abstract away what is "stage" specific
into a dedicated data structure describing the differences between the
patch modes.

Finally, please note that the Perl version tries to make sure that the
diffs are only generated for the modified files. This is not actually
necessary, as the calls to Git's diff machinery already perform that
work, and perform it well. This makes it unnecessary to port the
`FILTER` field of the `%patch_modes` struct, as well as the
`get_diff_reference()` function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-21 16:06:21 -08:00
Elijah Newren
4fe7e43c53 rebase: fix saving of --signoff state for am-based rebases
This was an error introduced in the conversion from shell in commit
21853626ea ("built-in rebase: call `git am` directly", 2019-01-18),
which was noticed by a random browsing of the code.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-20 11:29:49 -08:00
Heba Waly
5c4f55f1f6 commit: honor advice.statusHints when rejecting an empty commit
In ea9882bfc4 (commit: disable status hints when writing to
COMMIT_EDITMSG, 2013-09-12) the intent was to disable status hints
when writing to COMMIT_EDITMSG, because giving the hints in the "git
status" like output in the commit message template are too late to
be useful (they say things like "'git add' to stage", but that is
only possible after aborting the current "git commit" session).

But there is one case that the hints can be useful: When the current
attempt to commit is rejected because no change is recorded in the
index.  The message is given and "git commit" errors out, so the
hints can immediately be followed by the user.  Teach the codepath
to honor the configuration variable.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-19 11:58:08 -08:00
Alexandr Miloslavskiy
509efef789 commit: forbid --pathspec-from-file --all
I forgot this in my previous patch `--pathspec-from-file` for
`git commit` [1]. When both `--pathspec-from-file` and `--all` were
specified, `--all` took precedence and `--pathspec-from-file` was
ignored. Before `--pathspec-from-file` was implemented, this case was
prevented by this check in `parse_and_validate_options()` :

    die(_("paths '%s ...' with -a does not make sense"), argv[0]);

It is unfortunate that these two cases are disconnected. This came as
result of how the code was laid out before my patches, where `pathspec`
is parsed outside of `parse_and_validate_options()`. This branch is
already full of refactoring patches and I did not dare to go for another
one.

Fix by mirroring `die()` for `--pathspec-from-file` as well.

[1] Commit e440fc58 ("commit: support the --pathspec-from-file option" 2019-11-19)

Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-18 14:14:14 -08:00
Tanushree Tumane
7c5cea7242 bisect--helper: convert *_warning char pointers to char arrays.
Instead of using a pointer that points at a constant string,
just give name directly to the constant string; this way, we
do not have to allocate a pointer variable in addition to
the string we want to use.

Let's convert `need_bad_and_good_revision_warning` and
`need_bisect_start_warning` char pointers to char arrays.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-17 14:55:36 -08:00
Junio C Hamano
59d0b3be45 Merge branch 'rs/patch-id-use-oid-to-hex'
Code cleanup.

* rs/patch-id-use-oid-to-hex:
  patch-id: use oid_to_hex() to print multiple object IDs
2019-12-16 13:14:48 -08:00
Junio C Hamano
e3b72391d1 Merge branch 'rs/commit-export-env-simplify'
Code cleanup.

* rs/commit-export-env-simplify:
  commit: use strbuf_add() to add a length-limited string
2019-12-16 13:14:47 -08:00
Junio C Hamano
3a44db2ed2 Merge branch 'dr/branch-usage-casefix'
Message fix.

* dr/branch-usage-casefix:
  l10n: minor case fix in 'git branch' '--unset-upstream' description
2019-12-16 13:14:46 -08:00
Junio C Hamano
d1c0fe8d9b Merge branch 'dl/range-diff-with-notes'
Code clean-up.

* dl/range-diff-with-notes:
  range-diff: clear `other_arg` at end of function
  range-diff: mark pointers as const
  t3206: fix incorrect test name
2019-12-16 13:08:46 -08:00
Junio C Hamano
71a7de7a99 Merge branch 'dl/rebase-with-autobase'
"git rebase" did not work well when format.useAutoBase
configuration variable is set, which has been corrected.

* dl/rebase-with-autobase:
  rebase: fix format.useAutoBase breakage
  format-patch: teach --no-base
  t4014: use test_config()
  format-patch: fix indentation
  t3400: demonstrate failure with format.useAutoBase
2019-12-16 13:08:32 -08:00
Junio C Hamano
37c2619d91 Merge branch 'ag/sequencer-todo-updates'
Reduce unnecessary reading of state variables back from the disk
during sequencer operation.

* ag/sequencer-todo-updates:
  sequencer: directly call pick_commits() from complete_action()
  rebase: fill `squash_onto' in get_replay_opts()
  sequencer: move the code writing total_nr on the disk to a new function
  sequencer: update `done_nr' when skipping commands in a todo list
  sequencer: update `total_nr' when adding an item to a todo list
2019-12-16 13:08:31 -08:00
Johannes Schindelin
f6aa7ecc34 built-in add -i: start implementing the patch functionality in C
In the previous steps, we re-implemented the main loop of `git add -i`
in C, and most of the commands.

Notably, we left out the actual functionality of `patch`, as the
relevant code makes up more than half of `git-add--interactive.perl`,
and is actually pretty independent of the rest of the commands.

With this commit, we start to tackle that `patch` part. For better
separation of concerns, we keep the code in a separate file,
`add-patch.c`. The new code is still guarded behind the
`add.interactive.useBuiltin` config setting, and for the moment,
it can only be called via `git add -p`.

The actual functionality follows the original implementation of
5cde71d64a (git-add --interactive, 2006-12-10), but not too closely
(for example, we use string offsets rather than copying strings around,
and after seeing whether the `k` and `j` commands are applicable, in the
C version we remember which previous/next hunk was undecided, and use it
rather than looking again when the user asked to jump).

As a further deviation from that commit, We also use a comma instead of
a slash to separate the available commands in the prompt, as the current
version of the Perl script does this, and we also add a line about the
question mark ("print help") to the help text.

While it is tempting to use this conversion of `git add -p` as an excuse
to work on `apply_all_patches()` so that it does _not_ want to read a
file from `stdin` or from a file, but accepts, say, an `strbuf` instead,
we will refrain from this particular rabbit hole at this stage.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-13 12:37:13 -08:00
Derrick Stolee
190a65f9db sparse-checkout: respect core.ignoreCase in cone mode
When a user uses the sparse-checkout feature in cone mode, they
add patterns using "git sparse-checkout set <dir1> <dir2> ..."
or by using "--stdin" to provide the directories line-by-line over
stdin. This behaviour naturally looks a lot like the way a user
would type "git add <dir1> <dir2> ..."

If core.ignoreCase is enabled, then "git add" will match the input
using a case-insensitive match. Do the same for the sparse-checkout
feature.

Perform case-insensitive checks while updating the skip-worktree
bits during unpack_trees(). This is done by changing the hash
algorithm and hashmap comparison methods to optionally use case-
insensitive methods.

When this is enabled, there is a small performance cost in the
hashing algorithm. To tease out the worst possible case, the
following was run on a repo with a deep directory structure:

	git ls-tree -d -r --name-only HEAD |
		git sparse-checkout set --stdin

The 'set' command was timed with core.ignoreCase disabled or
enabled. For the repo with a deep history, the numbers were

	core.ignoreCase=false: 62s
	core.ignoreCase=true:  74s (+19.3%)

For reproducibility, the equivalent test on the Linux kernel
repository had these numbers:

	core.ignoreCase=false: 3.1s
	core.ignoreCase=true:  3.6s (+16%)

Now, this is not an entirely fair comparison, as most users
will define their sparse cone using more shallow directories,
and the performance improvement from eb42feca97 ("unpack-trees:
hash less in cone mode" 2019-11-21) can remove most of the
hash cost. For a more realistic test, drop the "-r" from the
ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in
each case, and the deep repository takes one second, plus or
minus 0.05s, in each case.

Thus, we _can_ demonstrate a cost to this change, but it is
unlikely to matter to any reasonable sparse-checkout cone.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-13 12:01:02 -08:00
Denton Liu
1d7297513d notes: break set_display_notes() into smaller functions
In 8164c961e1 (format-patch: use --notes behavior for format.notes,
2019-12-09), we introduced set_display_notes() which was a monolithic
function with three mutually exclusive branches. Break the function up
into three small and simple functions that each are only responsible for
one task.

This family of functions accepts an `int *show_notes` instead of
returning a value suitable for assignment to `show_notes`. This is for
two reasons. First of all, this guarantees that the external
`show_notes` variable changes in lockstep with the
`struct display_notes_opt`. Second, this prompts future developers to be
careful about doing something meaningful with this value. In fact, a
NULL check is intentionally omitted because causing a segfault here
would tell the future developer that they are meant to use the value for
something meaningful.

One alternative was making the family of functions accept a
`struct rev_info *` instead of the `struct display_notes_opt *`, since
the former contains the `show_notes` field as well. This does not work
because we have to call git_config() before repo_init_revisions().
However, if we had a `struct rev_info`, we'd need to initialize it before
it gets assigned values from git_config(). As a result, we break the
circular dependency by having standalone `int show_notes` and
`struct display_notes_opt notes_opt` variables which temporarily hold
values from git_config() until the values are copied over to `rev`.

To implement this change, we need to get a pointer to
`rev_info::show_notes`. Unfortunately, this is not possible with
bitfields and only direct-assignment is possible. Change
`rev_info::show_notes` to a non-bitfield int so that we can get its
address.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-13 11:07:15 -08:00
René Scharfe
99f86bde83 remote: pass NULL to read_ref_full() because object ID is not needed
read_ref_full() wraps refs_read_ref_full(), which in turn wraps
refs_resolve_ref_unsafe(), which handles a NULL oid pointer of callers
not interested in the resolved object ID.  Make use of that feature to
document that mv() is such a caller.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-11 13:48:46 -08:00
Tanushree Tumane
51a0a4ed95 bisect--helper: avoid use-after-free
In 5e82c3dd22 (bisect--helper: `bisect_reset` shell function in C,
2019-01-02), the `git bisect reset` subcommand was ported to C. When the
call to `git checkout` failed, an error message was reported to the
user.

However, this error message used the `strbuf` that had just been
released already. Let's switch that around: first use it, then release
it.

Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-11 09:24:27 -08:00
Junio C Hamano
930078ba39 Merge branch 'hi/gpg-use-check-signature'
Hide lower-level verify_signed-buffer() API as a pure helper to
implement the public check_signature() function, in order to
encourage new callers to use the correct and more strict
validation.

* hi/gpg-use-check-signature:
  gpg-interface: prefer check_signature() for GPG verification
2019-12-10 13:11:45 -08:00
Junio C Hamano
5dd1d59d35 Merge branch 'jt/clone-recursesub-ref-advise'
The interaction between "git clone --recurse-submodules" and
alternate object store was ill-designed.  The documentation and
code have been taught to make more clear recommendations when the
users see failures.

* jt/clone-recursesub-ref-advise:
  submodule--helper: advise on fatal alternate error
  Doc: explain submodule.alternateErrorStrategy
2019-12-10 13:11:43 -08:00
Junio C Hamano
c58ae96fc4 Merge branch 'am/pathspec-from-file'
A few commands learned to take the pathspec from the
standard input or a named file, instead of taking it as the command
line arguments.

* am/pathspec-from-file:
  commit: support the --pathspec-from-file option
  doc: commit: synchronize <pathspec> description
  reset: support the `--pathspec-from-file` option
  doc: reset: synchronize <pathspec> description
  pathspec: add new function to parse file
  parse-options.h: add new options `--pathspec-from-file`, `--pathspec-file-nul`
2019-12-10 13:11:41 -08:00
Junio C Hamano
5d9324e0f4 Merge branch 'ra/rebase-i-more-options'
"git rebase -i" learned a few options that are known by "git
rebase" proper.

* ra/rebase-i-more-options:
  rebase -i: finishing touches to --reset-author-date
  rebase: add --reset-author-date
  rebase -i: support --ignore-date
  sequencer: rename amend_author to author_to_rename
  rebase -i: support --committer-date-is-author-date
  sequencer: allow callers of read_author_script() to ignore fields
  rebase -i: add --ignore-whitespace flag
2019-12-10 13:11:41 -08:00
Junio C Hamano
7034cd094b Sync with Git 2.24.1 2019-12-09 22:17:55 -08:00
Denton Liu
09ac67a183 format-patch: move git_config() before repo_init_revisions()
In 13cdf78094 (format-patch: teach format.notes config option,
2019-05-16), the order in which git_config() and repo_init_revisions()
were swapped so that `rev.notes_opt` would be initialized before
git_config() was called. This is problematic, however, as git_config()
should generally be called before repo_init_revisions().

Break this circular dependency by creating `show_notes` and `notes_opt`
which git_config() reads into. Then, copy these values over to
`rev.show_notes` and `rev.notes_opt` after repo_init_revisions() is
called.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 13:37:21 -08:00
Denton Liu
8164c961e1 format-patch: use --notes behavior for format.notes
When we had multiple `format.notes` config values where we had `<ref1>`,
`false`, `<ref2>` (in that order), then we would print out the notes for
both `<ref1>` and `<ref2>`. This doesn't make sense, however, since we
parse the config in a top-down manner and a `false` should be able to
override previous configurations, just like how `--no-notes` will
override previous `--notes`.

Duplicate the logic that handles the `--[no-]notes[=]` option to
`format.notes` for consistency. As a result, when parsing the config
from top to bottom, `format.notes = true` will behave like `--notes`,
`format.notes = <ref>` will behave like `--notes=<ref>` and
`format.notes = false` will behave like `--no-notes`.

This change isn't strictly backwards compatible but since it is an edge
case where a sane user would not mix notes refs with `false` and this
feature is relatively new (released only in v2.23.0), this change should
be harmless.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 13:37:20 -08:00
Denton Liu
1e6ed5441a notes: rename to load_display_notes()
According to the function comment, init_display_notes() was supposed to
"Load the notes machinery for displaying several notes trees." Rename
this function to load_display_notes() so that its use is more accurately
represented.

This is done because, in a future commit, we will reuse the name
init_display_notes().

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 13:36:42 -08:00
SZEDER Gábor
2866fd284c name-rev: cleanup name_ref()
Earlier patches in this series moved a couple of conditions from the
recursive name_rev() function into its caller name_ref(), for no other
reason than to make eliminating the recursion a bit easier to follow.

Since the previous patch name_rev() is not recursive anymore, so let's
move all those conditions back into name_rev().

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 13:33:01 -08:00
SZEDER Gábor
49f7a2fde9 name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space.  E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).

Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.

Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].

The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure.  Now
the recursion is gone, so flip it to expect success.  Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.

Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git.  This shouldn't matter in
practice, because the output has always been unordered anyway.

This patch is best viewed with '--ignore-all-space'.

[1] Early versions of this patch used a 'commit_list', resulting in
    ~15% performance penalty for 'git name-rev --all' in 'linux.git',
    presumably because of the memory allocation and release for each
    insertion and removal. Using a LIFO 'prio_queue' has basically no
    effect on performance.

[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
    'v0.1^2~5', meaning that usually following the first parent of a
    merge results in the best name for its ancestors.  So when later
    we follow the remaining parent(s) of a merge, and reach an already
    named commit, then we usually find that we can't give that commit
    a better name, and thus we don't have to visit any of its
    ancestors again.

    OTOH, if we were to follow the Nth parent of the merge first, then
    the name of all its ancestors would include a corresponding '^N'.
    Those are not the best names for those commits, so when later we
    reach an already named commit following the first parent of that
    merge, then we would have to update the name of that commit and
    the names of all of its ancestors as well.  Consequently, we would
    have to visit many commits several times, resulting in a
    significant slowdown.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 13:33:01 -08:00
SZEDER Gábor
fee984bcab name-rev: use 'name->tip_name' instead of 'tip_name'
Following the previous patches in this series we can get the value of
'name_rev()'s 'tip_name' parameter from the 'struct rev_name'
associated with the commit as well.

So let's use 'name->tip_name' instead, which makes the patch
eliminating the recursion of name_rev() a bit easier to follow.

Note that at this point we could drop the 'tip_name' parameter as
well, but that parameter will be necessary later, after the recursion
is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 13:33:01 -08:00
Dimitriy Ryazantcev
11de8dd7ef l10n: minor case fix in 'git branch' '--unset-upstream' description
Signed-off-by: Dimitriy Ryazantcev <dimitriy.ryazantcev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:30:55 -08:00
René Scharfe
4507ecc771 patch-id: use oid_to_hex() to print multiple object IDs
flush_current_id() prints the hexadecimal representation of two object
IDs.  When the code was added in f97672225b (Add "git-patch-id" program
to generate patch ID's., 2005-06-23), sha1_to_hex() had only a single
internal static buffer, so the result of one invocation had to be stored
in a local buffer.

Since dcb3450fd8 (sha1_to_hex() usage cleanup, 2006-05-03) it rotates
through four buffers, which allows to print up to four object IDs at the
same time.  1a876a69af (patch-id: convert to use struct object_id,
2015-03-13) replaced sha1_to_hex() with oid_to_hex(), which has the same
feature.  Use it to simplify the code.

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:26:40 -08:00
René Scharfe
147ee35558 commit: use strbuf_add() to add a length-limited string
This is shorter and simpler.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 11:25:27 -08:00
Junio C Hamano
391fb22ac7 Merge branch 'rs/use-skip-prefix-more'
Code cleanup.

* rs/use-skip-prefix-more:
  name-rev: use skip_prefix() instead of starts_with()
  push: use skip_prefix() instead of starts_with()
  shell: use skip_prefix() instead of starts_with()
  fmt-merge-msg: use skip_prefix() instead of starts_with()
  fetch: use skip_prefix() instead of starts_with()
2019-12-06 15:09:22 -08:00
SZEDER Gábor
8c5724c585 name-rev: drop name_rev()'s 'generation' and 'distance' parameters
Following the previous patches in this series we can get the values of
name_rev()'s 'generation' and 'distance' parameters from the 'stuct
rev_name' associated with the commit as well.

Let's simplify the function's signature and remove these two
unnecessary parameters.

Note that at this point we could do the same with the 'tip_name',
'taggerdate' and 'from_tag' parameters as well, but those parameters
will be necessary later, after the recursion is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
SZEDER Gábor
3a52150301 name-rev: restructure creating/updating 'struct rev_name' instances
At the beginning of the recursive name_rev() function it creates a new
'struct rev_name' instance for each previously unvisited commit or, if
this visit results in better name for an already visited commit, then
updates the 'struct rev_name' instance attached to the commit, or
returns early.

Restructure this so it's caller creates or updates the 'struct
rev_name' instance associated with the commit to be passed as
parameter, i.e. both name_ref() before calling name_rev() and
name_rev() itself as it iterates over the parent commits.

This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.

This change also plugs the memory leak that was temporarily unplugged
in the earlier "name-rev: pull out deref handling from the recursion"
patch in this series.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
SZEDER Gábor
dd432a6ecf name-rev: restructure parsing commits and applying date cutoff
At the beginning of the recursive name_rev() function it parses the
commit it got as parameter, and returns early if the commit is older
than a cutoff limit.

Restructure this so the caller parses the commit and checks its date,
and doesn't invoke name_rev() if the commit to be passed as parameter
is older than the cutoff, i.e. both name_ref() before calling
name_rev() and name_rev() itself as it iterates over the parent
commits.

This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
SZEDER Gábor
dd090a8a37 name-rev: pull out deref handling from the recursion
The 'if (deref) { ... }' condition near the beginning of the recursive
name_rev() function can only ever be true in the first invocation,
because the 'deref' parameter is always 0 in the subsequent recursive
invocations.

Extract this condition from the recursion into name_rev()'s caller and
drop the function's 'deref' parameter.  This makes eliminating the
recursion a bit easier to follow, and it will be moved back into
name_rev() after the recursion is eliminated.

Furthermore, drop the condition that die()s when both 'deref' and
'generation' are non-null (which should have been a BUG() to begin
with).

Note that this change reintroduces the memory leak that was plugged in
in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
case, 2017-05-04), but a later patch (name-rev: restructure
creating/updating 'struct rev_name' instances) in this series will
plug it in again.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
SZEDER Gábor
766f9e39c0 name-rev: extract creating/updating a 'struct name_rev' into a helper
In a later patch in this series we'll want to do this in two places.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
SZEDER Gábor
bf43abc6e6 name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
SZEDER Gábor
e0c4da6f2a name-rev: avoid unnecessary cast in name_ref()
Casting a 'struct object' to 'struct commit' is unnecessary there,
because it's already available in the local 'commit' variable.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
René Scharfe
c3794d4ccb name-rev: use strbuf_strip_suffix() in get_rev_name()
get_name_rev() basically open-codes strip_suffix() before adding a
string to a strbuf.

Let's use the strbuf right from the beginning, i.e. add the whole
string to the strbuf and then use strbuf_strip_suffix(), making the
code more idiomatic.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 13:29:04 -08:00
Denton Liu
abcf857300 range-diff: clear other_arg at end of function
We were leaking memory by not clearing `other_arg` after we were done
using it. Clear it after we've finished using it.

Note that this isn't strictly necessary since the memory will be
reclaimed once the command exits. However, since we are releasing the
strbufs, we should also clear `other_arg` for consistency.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 12:36:53 -08:00
Phillip Wood
430b75f720 commit: give correct advice for empty commit during a rebase
In dcb500dc16 (cherry-pick/revert: advise using --skip, 2019-07-02),
`git commit` learned to suggest to run `git cherry-pick --skip` when
trying to cherry-pick an empty patch.

However, it was overlooked that there are more conditions than just a
`git cherry-pick` when this advice is printed (which originally
suggested the neutral `git reset`): the same can happen during a rebase.

Let's suggest the correct command, even during a rebase.

While at it, we adjust more places in `builtin/commit.c` that
incorrectly assumed that the presence of a `CHERRY_PICK_HEAD` meant that
surely this must be a `cherry-pick` in progress.

Note: we take pains to handle the situation when a user runs a `git
cherry-pick` _during_ a rebase. This is quite valid (e.g. in an `exec`
line in an interactive rebase). On the other hand, it is not possible to
run a rebase during a cherry-pick, meaning: if both `rebase-merge/` and
`sequencer/` exist or CHERRY_PICK_HEAD and REBASE_HEAD point to the same
commit , we still want to advise to use `git cherry-pick --skip`.

Original-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 09:32:02 -08:00
Phillip Wood
901ba7b1ef commit: encapsulate determine_whence() for sequencer
Working out which command wants to create a commit requires detailed
knowledge of the sequencer internals and that knowledge is going to
increase in subsequent commits. With that in mind lets encapsulate that
knowledge in sequencer.c rather than spreading it into builtin/commit.c.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 09:32:02 -08:00
Phillip Wood
8d57f75749 commit: use enum value for multiple cherry-picks
Add FROM_CHERRY_PICK_MULTI for a sequence of cherry-picks rather than
using a separate variable.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-06 09:32:01 -08:00
Johannes Schindelin
67af91c47a Sync with 2.23.1
* maint-2.23: (44 commits)
  Git 2.23.1
  Git 2.22.2
  Git 2.21.1
  mingw: sh arguments need quoting in more circumstances
  mingw: fix quoting of empty arguments for `sh`
  mingw: use MSYS2 quoting even when spawning shell scripts
  mingw: detect when MSYS2's sh is to be spawned more robustly
  t7415: drop v2.20.x-specific work-around
  Git 2.20.2
  t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  ...
2019-12-06 16:31:39 +01:00
Johannes Schindelin
7fd9fd94fb Sync with 2.22.2
* maint-2.22: (43 commits)
  Git 2.22.2
  Git 2.21.1
  mingw: sh arguments need quoting in more circumstances
  mingw: fix quoting of empty arguments for `sh`
  mingw: use MSYS2 quoting even when spawning shell scripts
  mingw: detect when MSYS2's sh is to be spawned more robustly
  t7415: drop v2.20.x-specific work-around
  Git 2.20.2
  t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  ...
2019-12-06 16:31:30 +01:00
Johannes Schindelin
5421ddd8d0 Sync with 2.21.1
* maint-2.21: (42 commits)
  Git 2.21.1
  mingw: sh arguments need quoting in more circumstances
  mingw: fix quoting of empty arguments for `sh`
  mingw: use MSYS2 quoting even when spawning shell scripts
  mingw: detect when MSYS2's sh is to be spawned more robustly
  t7415: drop v2.20.x-specific work-around
  Git 2.20.2
  t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  ...
2019-12-06 16:31:23 +01:00
Johannes Schindelin
fc346cb292 Sync with 2.20.2
* maint-2.20: (36 commits)
  Git 2.20.2
  t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  ...
2019-12-06 16:31:12 +01:00
Jonathan Nieder
c154745074 submodule: defend against submodule.update = !command in .gitmodules
In v2.15.4, we started to reject `submodule.update` settings in
`.gitmodules`. Let's raise a BUG if it somehow still made it through
from anywhere but the Git config.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
2019-12-06 16:30:50 +01:00
Johannes Schindelin
d851d94151 Sync with 2.19.3
* maint-2.19: (34 commits)
  Git 2.19.3
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  ...
2019-12-06 16:30:49 +01:00
Johannes Schindelin
7c9fbda6e2 Sync with 2.18.2
* maint-2.18: (33 commits)
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  ...
2019-12-06 16:30:38 +01:00
Johannes Schindelin
14af7ed5a9 Sync with 2.17.3
* maint-2.17: (32 commits)
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  ...
2019-12-06 16:29:15 +01:00
Johannes Schindelin
bdfef0492c Sync with 2.16.6
* maint-2.16: (31 commits)
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  ...
2019-12-06 16:27:36 +01:00
Johannes Schindelin
9ac92fed5b Sync with 2.15.4
* maint-2.15: (29 commits)
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  clone --recurse-submodules: prevent name squatting on Windows
  is_ntfs_dotgit(): only verify the leading segment
  ...
2019-12-06 16:27:18 +01:00
Johannes Schindelin
d3ac8c3f27 Sync with 2.14.6
* maint-2.14: (28 commits)
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  clone --recurse-submodules: prevent name squatting on Windows
  is_ntfs_dotgit(): only verify the leading segment
  test-path-utils: offer to run a protectNTFS/protectHFS benchmark
  ...
2019-12-06 16:26:55 +01:00
Junio C Hamano
88cf80949e Merge branch 'mg/submodule-status-from-a-subdirectory'
"git submodule status" that is run from a subdirectory of the
superproject did not work well, which has been corrected.

* mg/submodule-status-from-a-subdirectory:
  submodule: fix 'submodule status' when called from a subdirectory
2019-12-05 12:52:47 -08:00
Junio C Hamano
6b3cb32f43 Merge branch 'nl/reset-patch-takes-a-tree'
"git reset --patch $object" without any pathspec should allow a
tree object to be given, but incorrectly required a committish,
which has been corrected.

* nl/reset-patch-takes-a-tree:
  reset: parse rev as tree-ish in patch mode
2019-12-05 12:52:47 -08:00
Junio C Hamano
36fd304d81 Merge branch 'jk/fail-show-toplevel-outside-working-tree'
"git rev-parse --show-toplevel" run outside of any working tree did
not error out, which has been corrected.

* jk/fail-show-toplevel-outside-working-tree:
  rev-parse: make --show-toplevel without a worktree an error
2019-12-05 12:52:45 -08:00
Junio C Hamano
cf91c31688 Merge branch 'sg/unpack-progress-throughput'
"git unpack-objects" used to show progress based only on the number
of received and unpacked objects, which stalled when it has to
handle an unusually large object.  It now shows the throughput as
well.

* sg/unpack-progress-throughput:
  builtin/unpack-objects.c: show throughput progress
2019-12-05 12:52:45 -08:00
Junio C Hamano
f3c7bfdde2 Merge branch 'dl/range-diff-with-notes'
"git range-diff" learned to take the "--notes=<ref>" and the
"--no-notes" options to control the commit notes included in the
log message that gets compared.

* dl/range-diff-with-notes:
  format-patch: pass notes configuration to range-diff
  range-diff: pass through --notes to `git log`
  range-diff: output `## Notes ##` header
  t3206: range-diff compares logs with commit notes
  t3206: s/expected/expect/
  t3206: disable parameter substitution in heredoc
  t3206: remove spaces after redirect operators
  pretty-options.txt: --notes accepts a ref instead of treeish
  rev-list-options.txt: remove reference to --show-notes
  argv-array: add space after `while`
2019-12-05 12:52:44 -08:00
Junio C Hamano
f7998d9793 Merge branch 'js/builtin-add-i'
The beginning of rewriting "git add -i" in C.

* js/builtin-add-i:
  built-in add -i: implement the `help` command
  built-in add -i: use color in the main loop
  built-in add -i: support `?` (prompt help)
  built-in add -i: show unique prefixes of the commands
  built-in add -i: implement the main loop
  built-in add -i: color the header in the `status` command
  built-in add -i: implement the `status` command
  diff: export diffstat interface
  Start to implement a built-in version of `git add --interactive`
2019-12-05 12:52:43 -08:00
Johannes Schindelin
a8dee3ca61 Disallow dubiously-nested submodule git directories
Currently it is technically possible to let a submodule's git
directory point right into the git dir of a sibling submodule.

Example: the git directories of two submodules with the names `hippo`
and `hippo/hooks` would be `.git/modules/hippo/` and
`.git/modules/hippo/hooks/`, respectively, but the latter is already
intended to house the former's hooks.

In most cases, this is just confusing, but there is also a (quite
contrived) attack vector where Git can be fooled into mistaking remote
content for file contents it wrote itself during a recursive clone.

Let's plug this bug.

To do so, we introduce the new function `validate_submodule_git_dir()`
which simply verifies that no git dir exists for any leading directories
of the submodule name (if there are any).

Note: this patch specifically continues to allow sibling modules names
of the form `core/lib`, `core/doc`, etc, as long as `core` is not a
submodule name.

This fixes CVE-2019-1387.

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05 15:36:51 +01:00
Denton Liu
cae0bc09ab rebase: fix format.useAutoBase breakage
With `format.useAutoBase = true`, running rebase resulted in an
error:

	fatal: failed to get upstream, if you want to record base commit automatically,
	please use git branch --set-upstream-to to track a remote branch.
	Or you could specify base commit by --base=<base-commit-id> manually
	error:
	git encountered an error while preparing the patches to replay
	these revisions:

	    ede2467cdedc63784887b587a61c36b7850ebfac..d8f581194799ae29bf5fa72a98cbae98a1198b12

	As a result, git cannot rebase them.

Fix this by always passing `--no-base` to format-patch from rebase so
that the effect of `format.useAutoBase` is negated.

Reported-by: Christian Biesinger <cbiesinger@google.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-05 06:06:18 -08:00
Denton Liu
945dc55dda format-patch: teach --no-base
If `format.useAutoBase = true`, there was no way to override this from
the command-line. Teach the `--no-base` option in format-patch to
override `format.useAutoBase`.

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-05 06:06:18 -08:00
Denton Liu
a749d01e1d format-patch: fix indentation
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-05 06:06:18 -08:00
Alexandr Miloslavskiy
a9aecc7abb checkout, restore: support the --pathspec-from-file option
Decisions taken for simplicity:
1) For now, `--pathspec-from-file` is declared incompatible with
   `--patch`, even when <file> is not `stdin`. Such use case it not
   really expected.
2) It is not allowed to pass pathspec in both args and file.

`you must specify path(s) to restore` block was moved down to be able to
test for `pathspec.nr` instead, because testing for `argc` is no longer
correct.

`git switch` does not support the new options because it doesn't expect
`<pathspec>` arguments.

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 10:10:37 -08:00
Alexandr Miloslavskiy
bebb5d6d6b add: support the --pathspec-from-file option
Decisions taken for simplicity:
1) For now, `--pathspec-from-file` is declared incompatible with
   `--interactive/--patch/--edit`, even when <file> is not `stdin`.
   Such use case it not really expected. Also, it would require changes
   to `interactive_add()` and `edit_patch()`.
2) It is not allowed to pass pathspec in both args and file.

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 10:10:37 -08:00
Alexandr Miloslavskiy
21bb3083c3 cmd_add: prepare for next patch
Some code blocks were moved down to be able to test for `pathspec.nr`
in the next patch. Blocks are moved as is without any changes. This
is done as separate patch to reduce the amount of diffs in next patch.

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-04 10:10:37 -08:00
Johannes Schindelin
0060fd1511 clone --recurse-submodules: prevent name squatting on Windows
In addition to preventing `.git` from being tracked by Git, on Windows
we also have to prevent `git~1` from being tracked, as the default NTFS
short name (also known as the "8.3 filename") for the file name `.git`
is `git~1`, otherwise it would be possible for malicious repositories to
write directly into the `.git/` directory, e.g. a `post-checkout` hook
that would then be executed _during_ a recursive clone.

When we implemented appropriate protections in 2b4c6efc82 (read-cache:
optionally disallow NTFS .git variants, 2014-12-16), we had analyzed
carefully that the `.git` directory or file would be guaranteed to be
the first directory entry to be written. Otherwise it would be possible
e.g. for a file named `..git` to be assigned the short name `git~1` and
subsequently, the short name generated for `.git` would be `git~2`. Or
`git~3`. Or even `~9999999` (for a detailed explanation of the lengths
we have to go to protect `.gitmodules`, see the commit message of
e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11)).

However, by exploiting two issues (that will be addressed in a related
patch series close by), it is currently possible to clone a submodule
into a non-empty directory:

- On Windows, file names cannot end in a space or a period (for
  historical reasons: the period separating the base name from the file
  extension was not actually written to disk, and the base name/file
  extension was space-padded to the full 8/3 characters, respectively).
  Helpfully, when creating a directory under the name, say, `sub.`, that
  trailing period is trimmed automatically and the actual name on disk
  is `sub`.

  This means that while Git thinks that the submodule names `sub` and
  `sub.` are different, they both access `.git/modules/sub/`.

- While the backslash character is a valid file name character on Linux,
  it is not so on Windows. As Git tries to be cross-platform, it
  therefore allows backslash characters in the file names stored in tree
  objects.

  Which means that it is totally possible that a submodule `c` sits next
  to a file `c\..git`, and on Windows, during recursive clone a file
  called `..git` will be written into `c/`, of course _before_ the
  submodule is cloned.

Note that the actual exploit is not quite as simple as having a
submodule `c` next to a file `c\..git`, as we have to make sure that the
directory `.git/modules/b` already exists when the submodule is checked
out, otherwise a different code path is taken in `module_clone()` that
does _not_ allow a non-empty submodule directory to exist already.

Even if we will address both issues nearby (the next commit will
disallow backslash characters in tree entries' file names on Windows,
and another patch will disallow creating directories/files with trailing
spaces or periods), it is a wise idea to defend in depth against this
sort of attack vector: when submodules are cloned recursively, we now
_require_ the directory to be empty, addressing CVE-2019-1349.

Note: the code path we patch is shared with the code path of `git
submodule update --init`, which must not expect, in general, that the
directory is empty. Hence we have to introduce the new option
`--force-init` and hand it all the way down from `git submodule` to the
actual `git submodule--helper` process that performs the initial clone.

Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-04 13:20:05 +01:00
Jonathan Tan
4f3e57ef13 submodule--helper: advise on fatal alternate error
When recursively cloning a superproject with some shallow modules
defined in its .gitmodules, then recloning with "--reference=<path>", an
error occurs. For example:

  git clone --recurse-submodules --branch=master -j8 \
    https://android.googlesource.com/platform/superproject \
    master
  git clone --recurse-submodules --branch=master -j8 \
    https://android.googlesource.com/platform/superproject \
    --reference master master2

fails with:

  fatal: submodule '<snip>' cannot add alternate: reference repository
  '<snip>' is shallow

When a alternate computed from the superproject's alternate cannot be
added, whether in this case or another, advise about configuring the
"submodule.alternateErrorStrategy" configuration option and using
"--reference-if-able" instead of "--reference" when cloning.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-03 08:49:45 -08:00
Junio C Hamano
723a8adba5 Merge branch 'ds/test-read-graph'
Dev support for commit-graph feature.

* ds/test-read-graph:
  test-tool: use 'read-graph' helper
2019-12-01 09:04:39 -08:00
Junio C Hamano
fce9e836d3 Merge branch 'jt/fetch-remove-lazy-fetch-plugging'
"git fetch" codepath had a big "do not lazily fetch missing objects
when I ask if something exists" switch.  This has been corrected by
marking the "does this thing exist?" calls with "if not please do not
lazily fetch it" flag.

* jt/fetch-remove-lazy-fetch-plugging:
  promisor-remote: remove fetch_if_missing=0
  clone: remove fetch_if_missing=0
  fetch: remove fetch_if_missing=0
2019-12-01 09:04:38 -08:00
Junio C Hamano
3c3e5d0ea2 Merge branch 'tg/stash-refresh-index'
Recent update to "git stash pop" made the command empty the index
when run with the "--quiet" option, which has been corrected.

* tg/stash-refresh-index:
  stash: make sure we have a valid index before writing it
2019-12-01 09:04:37 -08:00
Junio C Hamano
ca5c8aa8e1 Merge branch 'rj/bundle-ui-updates'
"git bundle" has been taught to use the parse options API.  "git
bundle verify" learned "--quiet" and "git bundle create" learned
options to control the progress output.

* rj/bundle-ui-updates:
  bundle-verify: add --quiet
  bundle-create: progress output control
  bundle: framework for options before bundle file
2019-12-01 09:04:36 -08:00
Junio C Hamano
d3096d2ba6 Merge branch 'en/doc-typofix'
Docfix.

* en/doc-typofix:
  Fix spelling errors in no-longer-updated-from-upstream modules
  multimail: fix a few simple spelling errors
  sha1dc: fix trivial comment spelling error
  Fix spelling errors in test commands
  Fix spelling errors in messages shown to users
  Fix spelling errors in names of tests
  Fix spelling errors in comments of testcases
  Fix spelling errors in code comments
  Fix spelling errors in documentation outside of Documentation/
  Documentation: fix a bunch of typos, both old and new
2019-12-01 09:04:35 -08:00
Junio C Hamano
bcb06e204c Merge branch 'js/fetch-multi-lockfix'
Fetching from multiple remotes into the same repository in parallel
had a bad interaction with the recent change to (optionally) update
the commit-graph after a fetch job finishes, as these parallel
fetches compete with each other.  Which has been corrected.

* js/fetch-multi-lockfix:
  fetch: avoid locking issues between fetch.jobs/fetch.writeCommitGraph
  fetch: add the command-line option `--write-commit-graph`
2019-12-01 09:04:33 -08:00
Junio C Hamano
7ab2088255 Merge branch 'rt/fetch-message-fix'
A small message update.

* rt/fetch-message-fix:
  fetch.c: fix typo in a warning message
2019-12-01 09:04:32 -08:00
Junio C Hamano
05fc6471e3 Merge branch 'pb/no-recursive-reset-hard-in-worktree-add'
"git worktree add" internally calls "reset --hard" that should not
descend into submodules, even when submodule.recurse configuration
is set, but it was affected.  This has been corrected.

* pb/no-recursive-reset-hard-in-worktree-add:
  worktree: teach "add" to ignore submodule.recurse config
2019-12-01 09:04:31 -08:00
Junio C Hamano
532d983823 Merge branch 'sg/blame-indent-heuristics-is-now-the-default'
Message update.

* sg/blame-indent-heuristics-is-now-the-default:
  builtin/blame.c: remove '--indent-heuristic' from usage string
2019-12-01 09:04:30 -08:00
Junio C Hamano
dfc03e48ec Merge branch 'mr/clone-dir-exists-to-path-exists'
Code cleanup.

* mr/clone-dir-exists-to-path-exists:
  clone: rename static function `dir_exists()`.
2019-12-01 09:04:30 -08:00
Junio C Hamano
0e07c1cd83 Merge branch 'jk/cleanup-object-parsing-and-fsck'
Crufty code and logic accumulated over time around the object
parsing and low-level object access used in "git fsck" have been
cleaned up.

* jk/cleanup-object-parsing-and-fsck: (23 commits)
  fsck: accept an oid instead of a "struct tree" for fsck_tree()
  fsck: accept an oid instead of a "struct commit" for fsck_commit()
  fsck: accept an oid instead of a "struct tag" for fsck_tag()
  fsck: rename vague "oid" local variables
  fsck: don't require an object struct in verify_headers()
  fsck: don't require an object struct for fsck_ident()
  fsck: drop blob struct from fsck_finish()
  fsck: accept an oid instead of a "struct blob" for fsck_blob()
  fsck: don't require an object struct for report()
  fsck: only require an oid for skiplist functions
  fsck: only provide oid/type in fsck_error callback
  fsck: don't require object structs for display functions
  fsck: use oids rather than objects for object_name API
  fsck_describe_object(): build on our get_object_name() primitive
  fsck: unify object-name code
  fsck: require an actual buffer for non-blobs
  fsck: stop checking tag->tagged
  fsck: stop checking commit->parent counts
  fsck: stop checking commit->tree value
  commit, tag: don't set parsed bit for parse failures
  ...
2019-12-01 09:04:28 -08:00
Hans Jerry Illikainen
72b006f4bf gpg-interface: prefer check_signature() for GPG verification
This commit refactors the use of verify_signed_buffer() outside of
gpg-interface.c to use check_signature() instead.  It also turns
verify_signed_buffer() into a file-local function since it's now only
invoked internally by check_signature().

There were previously two globally scoped functions used in different
parts of Git to perform GPG signature verification:
verify_signed_buffer() and check_signature().  Now only
check_signature() is used.

The verify_signed_buffer() function doesn't guard against duplicate
signatures as described by Michał Górny [1].  Instead it only ensures a
non-erroneous exit code from GPG and the presence of at least one
GOODSIG status field.  This stands in contrast with check_signature()
that returns an error if more than one signature is encountered.

The lower degree of verification makes the use of verify_signed_buffer()
problematic if callers don't parse and validate the various parts of the
GPG status message themselves.  And processing these messages seems like
a task that should be reserved to gpg-interface.c with the function
check_signature().

Furthermore, the use of verify_signed_buffer() makes it difficult to
introduce new functionality that relies on the content of the GPG status
lines.

Now all operations that does signature verification share a single entry
point to gpg-interface.c.  This makes it easier to propagate changed or
additional functionality in GPG signature verification to all parts of
Git, without having odd edge-cases that don't perform the same degree of
verification.

[1] https://dev.gentoo.org/~mgorny/articles/attack-on-git-signature-verification.html

Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-30 13:52:35 -08:00
René Scharfe
2059e79c0d name-rev: use skip_prefix() instead of starts_with()
Let skip_prefix() advance refname to get rid of two magic numbers.

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-27 11:21:18 +09:00
René Scharfe
1768aaf01d push: use skip_prefix() instead of starts_with()
Get rid of a magic number by using skip_prefix().

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-27 11:18:39 +09:00
René Scharfe
7e412e8a34 fmt-merge-msg: use skip_prefix() instead of starts_with()
Get rid of two magic numbers by using skip_prefix().

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-27 11:17:30 +09:00
René Scharfe
a6293f5d28 fetch: use skip_prefix() instead of starts_with()
Get rid of magic numbers by letting skip_prefix() set the pointer
"what".

Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-27 11:17:22 +09:00
Junio C Hamano
d82dfa7f5b rebase -i: finishing touches to --reset-author-date
Clarify the way the `--reset-author-date` option is described,
and mark its usage string translatable.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-25 15:09:29 +09:00
Manish Goregaokar
1f3aea22c7 submodule: fix 'submodule status' when called from a subdirectory
When calling `git submodule status` while in a subdirectory, we are
incorrectly not detecting modified submodules and
thus reporting that all of the submodules are unchanged.

This is because the submodule helper is calling `diff-index` with the
submodule path assuming the path is relative to the current prefix
directory, however the submodule path used is actually relative to the root.

Always pass NULL as the `prefix` when running diff-files on the
submodule, to make sure the submodule's path is interpreted as relative
to the superproject's repository root.

Signed-off-by: Manish Goregaokar <manishsmail@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-25 14:08:25 +09:00
Alban Gruin
a2dd67f105 rebase: fill `squash_onto' in get_replay_opts()
When sequencer_continue() is called by complete_action(), `opts' has
been filled by get_replay_opts().  Currently, it does not initialise the
`squash_onto' field (used by the `--root' mode), only
read_populate_opts() does.  It’s not a problem yet since
sequencer_continue() calls it before pick_commits(), but it would lead
to incorrect results once complete_action() is modified to call
pick_commits() directly.

Let’s change that.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-25 12:24:49 +09:00
Nika Layzell
0a8e3036a3 reset: parse rev as tree-ish in patch mode
Since 2f328c3d ("reset $sha1 $pathspec: require $sha1 only to be
treeish", 2013-01-14), we allowed "git reset $object -- $path" to reset
individual paths that match the pathspec to take the blob from a tree
object, not necessarily a commit, while the form to reset the tip of the
current branch to some other commit still must be given a commit.

Like resetting with paths, "git reset --patch" does not update HEAD, and
need not require a commit. The path-filtered form, "git reset --patch
$object -- $pathspec", has accepted a tree-ish since 2f328c3d.

"git reset --patch" is documented as accepting a <tree-ish> since
bf44142f ("reset: update documentation to require only tree-ish with
paths", 2013-01-16). Documentation changes are not required.

Loosen the restriction that requires a commit for the unfiltered "git
reset --patch $object".

Signed-off-by: Nika Layzell <nika@thelayzells.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-25 11:01:22 +09:00
Derrick Stolee
cff4e9138d sparse-checkout: check for dirty status
The index-merge performed by 'git sparse-checkout' will erase any staged
changes, which can lead to data loss. Prevent these attempts by requiring
a clean 'git status' output.

Helped-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:45 +09:00
Derrick Stolee
416adc8711 sparse-checkout: update working directory in-process for 'init'
The 'git sparse-checkout init' subcommand previously wrote directly
to the sparse-checkout file and then updated the working directory.
This may fail if there are modified files not included in the initial
pattern set. However, that left a populated sparse-checkout file.

Use the in-process working directory update to guarantee that the
init subcommand only changes the sparse-checkout file if the working
directory update succeeds.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:45 +09:00
Derrick Stolee
fb10ca5b54 sparse-checkout: write using lockfile
If two 'git sparse-checkout set' subcommands are launched at the
same time, the behavior can be unexpected as they compete to write
the sparse-checkout file and update the working directory.

Take a lockfile around the writes to the sparse-checkout file. In
addition, acquire this lock around the working directory update
to avoid two commands updating the working directory in different
ways.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:45 +09:00
Derrick Stolee
99dfa6f970 sparse-checkout: use in-process update for disable subcommand
The 'git sparse-checkout disable' subcommand returns a user to a
full working directory. The old process for doing this required
updating the sparse-checkout file with the "/*" pattern and then
updating the working directory with core.sparseCheckout enabled.
Finally, the sparse-checkout file could be removed and the config
setting disabled.

However, it is valuable to keep a user's sparse-checkout file
intact so they can re-enable the sparse-checkout they previously
used with 'git sparse-checkout init'. This is now possible with
the in-process mechanism for updating the working directory.

Reported-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:45 +09:00
Derrick Stolee
e091228e17 sparse-checkout: update working directory in-process
The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the
skip-worktree bits in the index and to update the working directory.
This extra process is overly complex, and prone to failure. It also
requires that we write our changes to the sparse-checkout file before
trying to update the index.

Remove this extra process call by creating a direct call to
unpack_trees() in the same way 'git read-tree -mu HEAD' does. In
addition, provide an in-memory list of patterns so we can avoid
reading from the sparse-checkout file. This allows us to test a
proposed change to the file before writing to it.

An earlier version of this patch included a bug when the 'set' command
failed due to the "Sparse checkout leaves no entry on working directory"
error. It would not rollback the index.lock file, so the replay of the
old sparse-checkout specification would fail. A test in t1091 now
covers that scenario.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Derrick Stolee
e9de487aa3 sparse-checkout: sanitize for nested folders
If a user provides folders A/ and A/B/ for inclusion in a cone-mode
sparse-checkout file, the parsing logic will notice that A/ appears
both as a "parent" type pattern and as a "recursive" type pattern.
This is unexpected and hence will complain via a warning and revert
to the old logic for checking sparse-checkout patterns.

Prevent this from happening accidentally by sanitizing the folders
for this type of inclusion in the 'git sparse-checkout' builtin.
This happens in two ways:

1. Do not include any parent patterns that also appear as recursive
   patterns.

2. Do not include any recursive patterns deeper than other recursive
   patterns.

In order to minimize duplicate code for scanning parents, create
hashmap_contains_parent() method. It takes a strbuf buffer to
avoid reallocating a buffer when calling in a tight loop.

Helped-by: Eric Wong <e@80x24.org>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Derrick Stolee
af09ce24a9 sparse-checkout: init and set in cone mode
To make the cone pattern set easy to use, update the behavior of
'git sparse-checkout (init|set)'.

Add '--cone' flag to 'git sparse-checkout init' to set the config
option 'core.sparseCheckoutCone=true'.

When running 'git sparse-checkout set' in cone mode, a user only
needs to supply a list of recursive folder matches. Git will
automatically add the necessary parent matches for the leading
directories.

When testing 'git sparse-checkout set' in cone mode, check the
error stream to ensure we do not see any errors. Specifically,
we want to avoid the warning that the patterns do not match
the cone-mode patterns.

Helped-by: Eric Wong <e@80x24.org>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Derrick Stolee
72918c1ad9 sparse-checkout: create 'disable' subcommand
The instructions for disabling a sparse-checkout to a full
working directory are complicated and non-intuitive. Add a
subcommand, 'git sparse-checkout disable', to perform those
steps for the user.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Derrick Stolee
7bffca95ea sparse-checkout: add '--stdin' option to set subcommand
The 'git sparse-checkout set' subcommand takes a list of patterns
and places them in the sparse-checkout file. Then, it updates the
working directory to match those patterns. For a large list of
patterns, the command-line call can get very cumbersome.

Add a '--stdin' option to instead read patterns over standard in.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00
Derrick Stolee
f6039a9423 sparse-checkout: 'set' subcommand
The 'git sparse-checkout set' subcommand takes a list of patterns
as arguments and writes them to the sparse-checkout file. Then, it
updates the working directory using 'git read-tree -mu HEAD'.

The 'set' subcommand will replace the entire contents of the
sparse-checkout file. The write_patterns_and_update() method is
extracted from cmd_sparse_checkout() to make it easier to implement
'add' and/or 'remove' subcommands in the future.

If the core.sparseCheckout config setting is disabled, then enable
the config setting in the worktree config. If we set the config
this way and the sparse-checkout fails, then re-disable the config
setting.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:43 +09:00
Derrick Stolee
d89f09c828 clone: add --sparse mode
When someone wants to clone a large repository, but plans to work
using a sparse-checkout file, they either need to do a full
checkout first and then reduce the patterns they included, or
clone with --no-checkout, set up their patterns, and then run
a checkout manually. This requires knowing a lot about the repo
shape and how sparse-checkout works.

Add a new '--sparse' option to 'git clone' that initializes the
sparse-checkout file to include the following patterns:

	/*
	!/*/

These patterns include every file in the root directory, but
no directories. This allows a repo to include files like a
README or a bootstrapping script to grow enlistments from that
point.

During the 'git sparse-checkout init' call, we must first look
to see if HEAD is valid, since 'git clone' does not have a valid
HEAD at the point where it initializes the sparse-checkout. The
following checkout within the clone command will create the HEAD
ref and update the working directory correctly.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:43 +09:00
Derrick Stolee
bab3c35908 sparse-checkout: create 'init' subcommand
Getting started with a sparse-checkout file can be daunting. Help
users start their sparse enlistment using 'git sparse-checkout init'.
This will set 'core.sparseCheckout=true' in their config, write
an initial set of patterns to the sparse-checkout file, and update
their working directory.

Make sure to use the `extensions.worktreeConfig` setting and write
the sparse checkout config to the worktree-specific config file.
This avoids confusing interactions with other worktrees.

The use of running another process for 'git read-tree' is sub-
optimal. This will be removed in a later change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:43 +09:00
Derrick Stolee
94c0956b60 sparse-checkout: create builtin with 'list' subcommand
The sparse-checkout feature is mostly hidden to users, as its
only documentation is supplementary information in the docs for
'git read-tree'. In addition, users need to know how to edit the
.git/info/sparse-checkout file with the right patterns, then run
the appropriate 'git read-tree -mu HEAD' command. Keeping the
working directory in sync with the sparse-checkout file requires
care.

Begin an effort to make the sparse-checkout feature a porcelain
feature by creating a new 'git sparse-checkout' builtin. This
builtin will be the preferred mechanism for manipulating the
sparse-checkout file and syncing the working directory.

The documentation provided is adapted from the "git read-tree"
documentation with a few edits for clarity in the new context.
Extra sections are added to hint toward a future change to
a more restricted pattern set.

Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:43 +09:00
Denton Liu
5b583e6a09 format-patch: pass notes configuration to range-diff
Since format-patch accepts `--[no-]notes`, one would expect the
range-diff generated to also respect the setting. Unfortunately, the
range-diff we currently generate only uses the default option (which
always outputs default notes, even when notes are not being used
elsewhere).

Pass the notes configuration to range-diff so that it can honor it.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-21 09:29:52 +09:00