Commit Graph

16879 Commits

Author SHA1 Message Date
Chris Torek
9b906af657 git-mv: improve error message for conflicted file
'git mv' has always complained about renaming a conflicted
file, as it cannot handle multiple index entries for one file.
However, the error message it uses has been the same as the
one for an untracked file:

    fatal: not under version control, src=...

which is patently wrong.  Distinguish the two cases and
add a test to make sure we produce the correct message.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-20 14:35:43 -07:00
Martin Ågren
cada7308ad dir: check pathspecs before returning path_excluded
In 95c11ecc73 ("Fix error-prone fill_directory() API; make it only
return matches", 2020-04-01), we taught `fill_directory()`, or more
specifically `treat_path()`, to check against any pathspecs so that we
could simplify the callers.

But in doing so, we added a slightly-too-early return for the "excluded"
case. We end up not checking the pathspecs, meaning we return
`path_excluded` when maybe we should return `path_none`. As a result,
`git status --ignored -- pathspec` might show paths that don't actually
match "pathspec".

Move the "excluded" check down to after we've checked any pathspecs.

Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-20 13:25:07 -07:00
Junio C Hamano
ae46588be0 Merge branch 'dl/branch-cleanup' into master
Last minute fix-up to tests for portability.

* dl/branch-cleanup:
  t3200: don't grep for `strerror()` string
2020-07-18 16:35:22 -07:00
Martin Ågren
d223e85407 t3200: don't grep for strerror() string
In 6b7093064a ("t3200: test for specific errors", 2020-06-15), we
learned to grep stderr to ensure that the failing `git branch`
invocations fail for the right reason. In two of these tests, we grep
for "File exists", expecting the string to show up there since config.c
calls `error_errno()`, which ends up including `strerror(errno)` in the
error message.

But as we saw in 4605a73073 ("t1091: don't grep for `strerror()`
string", 2020-03-08), there exists at least one implementation where
`strerror()` yields a slightly different string than the one we're
grepping for. In particular, these tests fail on the NonStop platform.

Similar to 4605a73073, grep for the beginning of the string instead to
avoid relying on `strerror()` behavior.

Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-18 13:47:05 -07:00
Junio C Hamano
d13b7f2198 Merge branch 'jn/v0-with-extensions-fix' into master
In 2.28-rc0, we corrected a bug that some repository extensions are
honored by mistake even in a version 0 repositories (these
configuration variables in extensions.* namespace were supposed to
have special meaning in repositories whose version numbers are 1 or
higher), but this was a bit too big a change.

* jn/v0-with-extensions-fix:
  repository: allow repository format upgrade with extensions
  Revert "check_repository_format_gently(): refuse extensions for old repositories"
2020-07-16 17:58:42 -07:00
Han-Wen Nienhuys
0b7de6c683 t1400: use git rev-parse for testing PSEUDOREF existence
This will allow these tests to run with alternative ref backends

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:19:03 -07:00
Jonathan Tan
77aa0941ce upload-pack: do not lazy-fetch "have" objects
When upload-pack receives a request containing "have" hashes, it (among
other things) checks if the served repository has the corresponding
objects. However, it does not do so with the
OBJECT_INFO_SKIP_FETCH_OBJECT flag, so if serving a partial clone, a
lazy fetch will be triggered first.

This was discovered at $DAYJOB when a user fetched from a partial clone
(into another partial clone - although this would also happen if the
repo to be fetched into is not a partial clone).

Therefore, whenever "have" hashes are checked for existence, pass the
OBJECT_INFO_SKIP_FETCH_OBJECT flag. Also add the OBJECT_INFO_QUICK flag
to improve performance, as it is typical that such objects do not exist
in the serving repo, and the consequences of a false negative are minor
(usually, a slightly larger pack sent).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:07:19 -07:00
Christian Couder
b6839fda68 ref-filter: add support for %(contents:size)
It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.

Also the result of the following:

`git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`

is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its output
with the output from `git cat-file`.

As with %(contents), %(contents:size) is silently ignored, if a
ref points to something other than a commit or a tag:

```
$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first

$ git for-each-ref --format='%(contents:size)' refs/mytrees/first

```

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 10:46:55 -07:00
Jeff King
ec91ffca04 verify_repository_format(): complain about new extensions in v0 repo
We made the mistake in the past of respecting extensions.* even when the
repository format version was set to 0. This is bad because forgetting
to bump the repository version means that older versions of Git (which
do not know about our extensions) won't complain. I.e., it's not a
problem in itself, but it means your repository is in a state which does
not give you the protection you think you're getting from older
versions.

For compatibility reasons, we are stuck with that decision for existing
extensions. However, we'd prefer not to extend the damage further. We
can do that by catching any newly-added extensions and complaining about
the repository format.

Note that this is a pretty heavy hammer: we'll refuse to work with the
repository at all. A lesser option would be to ignore (possibly with a
warning) any new extensions. But because of the way the extensions are
handled, that puts the burden on each new extension that is added to
remember to "undo" itself (because they are handled before we know
for sure whether we are in a v1 repo or not, since we don't insist on a
particular ordering of config entries).

So one option would be to rewrite that handling to record any new
extensions (and their values) during the config parse, and then only
after proceed to handle new ones only if we're in a v1 repository. But
I'm not sure if it's worth the trouble:

  - ignoring extensions is likely to end up with broken results anyway
    (e.g., ignoring a proposed objectformat extension means parsing any
    object data is likely to encounter errors)

  - this is a sign that whatever tool wrote the extension field is
    broken. We may be better off notifying immediately and forcefully so
    that such tools don't even appear to work accidentally.

The only downside is that fixing the situation is a little tricky,
because programs like "git config" won't want to work with the
repository. But:

  git config --file=.git/config core.repositoryformatversion 1

should still suffice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 10:39:45 -07:00
Jonathan Nieder
62f2eca606 repository: allow repository format upgrade with extensions
Now that we officially permit repository extensions in repository
format v0, permit upgrading a repository with extensions from v0 to v1
as well.

For example, this means a repository where the user has set
"extensions.preciousObjects" can use "git fetch --filter=blob:none
origin" to upgrade the repository to use v1 and the partial clone
extension.

To avoid mistakes, continue to forbid repository format upgrades in v0
repositories with an unrecognized extension.  This way, a v0 user
using a misspelled extension field gets a chance to correct the
mistake before updating to the less forgiving v1 format.

While we're here, make the error message for failure to upgrade the
repository format a bit shorter, and present it as an error, not a
warning.

Reported-by: Huan Huan Chen <huanhuanchen@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 09:36:39 -07:00
Jonathan Nieder
11664196ac Revert "check_repository_format_gently(): refuse extensions for old repositories"
This reverts commit 14c7fa269e.

The core.repositoryFormatVersion field was introduced in ab9cb76f66
(Repository format version check., 2005-11-25), providing a welcome
bit of forward compatibility, thanks to some welcome analysis by
Martin Atukunda.  The semantics are simple: a repository with
core.repositoryFormatVersion set to 0 should be comprehensible by all
Git implementations in active use; and Git implementations should
error out early instead of trying to act on Git repositories with
higher core.repositoryFormatVersion values representing new formats
that they do not understand.

A new repository format did not need to be defined until 00a09d57eb
(introduce "extensions" form of core.repositoryformatversion,
2015-06-23).  This provided a finer-grained extension mechanism for
Git repositories.  In a repository with core.repositoryFormatVersion
set to 1, Git implementations can act on "extensions.*" settings that
modify how a repository is interpreted.  In repository format version
1, unrecognized extensions settings cause Git to error out.

What happens if a user sets an extension setting but forgets to
increase the repository format version to 1?  The extension settings
were still recognized in that case; worse, unrecognized extensions
settings do *not* cause Git to error out.  So combining repository
format version 0 with extensions settings produces in some sense the
worst of both worlds.

To improve that situation, since 14c7fa269e
(check_repository_format_gently(): refuse extensions for old
repositories, 2020-06-05) Git instead ignores extensions in v0 mode.
This way, v0 repositories get the historical (pre-2015) behavior and
maintain compatibility with Git implementations that do not know about
the v1 format.  Unfortunately, users had been using this sort of
configuration and this behavior change came to many as a surprise:

- users of "git config --worktree" that had followed its advice
  to enable extensions.worktreeConfig (without also increasing the
  repository format version) would find their worktree configuration
  no longer taking effect

- tools such as copybara[*] that had set extensions.partialClone in
  existing repositories (without also increasing the repository format
  version) would find that setting no longer taking effect

The behavior introduced in 14c7fa269e might be a good behavior if we
were traveling back in time to 2015, but we're far too late.  For some
reason I thought that it was what had been originally implemented and
that it had regressed.  Apologies for not doing my research when
14c7fa269e was under development.

Let's return to the behavior we've had since 2015: always act on
extensions.* settings, regardless of repository format version.  While
we're here, include some tests to describe the effect on the "upgrade
repository version" code path.

[*] ca76c0b1e1

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 09:36:37 -07:00
Jeff King
180a4d76ac t9100: stop depending on commit timestamps
An earlier "fix" to this script gave up updating it not to rely on
the current time because we cannot control what timestamp subversion
gives its commits.  We however could solve the issue in a different
way and still use deterministic timestamps on Git commits.

One fix would be to sort the list of trees before removing duplicates,
but that loses information:

  - we do care that the fetched history is in the same order

  - there's a tree which appears twice in the history, and we'd want to
    make sure that it's there both times

So instead, let's de-duplicate using a hash (preserving the order), and
drop only lines with identical trees and subjects (preserving the tree
which appears twice, since it has different subjects each time).

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-15 08:02:58 -07:00
Jeff King
f2e3937d94 test-lib: set deterministic default author/committer date
We always set the name and email for committer and author idents to make
the test suite more deterministic, but not timestamps. Many scripts use
test_tick to get consistent and sensibly incrementing timestamps as they
create commits. But other scripts don't particularly care about the
timestamp, and are happy to use whatever the current system time is.

This non-determinism can be annoying:

  - when debugging a test, comparing results between two runs can be
    difficult, because the commit ids change

  - this can sometimes cause tests to be racy. E.g., traversal order
    depends on timestamp order. Even in a well-ordered set of commands,
    because our timestamp granularity is one second, two commits might
    sometimes have the same timestamp and sometimes differ.

Let's set a default timestamp for all scripts to use. Any that use
test_tick already will be unaffected (because their first test_tick call
will overwrite our default), but it will make things a bit more
deterministic for those that don't.

We should be able to choose any time we want here. I picked this one
because:

  - it differs from the initial test_tick default, which may make it
    easier to distinguish when debugging tests. I picked "April 1st
    13:14:15" in the hope that it might stand out.

  - it's slightly before the test_tick default. Some tests create some
    commits before the first call to test_tick, so using an older
    timestamps for those makes sense chronologically. Note that this
    isn't how things currently work (where system times are usually more
    recent than test_tick), but that also allows us to flush out a few
    hidden timestamp dependencies (like the one recently fixed in
    t5539).

  - we could likewise pick any timezone we want. Choosing +0000 would
    have required fixing up fewer tests, but we're more likely to turn
    up interesting cases by not matching $TZ exactly. And since
    test_tick already checks "-0700", let's try something in the "+"
    zone range for variety.

It's possible that the non-deterministic times could help flush out bugs
(e.g., if something broke when the clock flipped over to 2021, our test
suite would let us know). But historically that hasn't been the case;
all time-dependent outcomes we've seen turned out to be accidentally
flaky tests (which we fixed by using test_tick). If we do want to cover
handling the current time, we should dedicate one script to doing so,
and have it unset GIT_COMMITTER_DATE explicitly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-14 14:28:11 -07:00
Jeff King
96ac26fd05 t9100: explicitly unset GIT_COMMITTER_DATE
The early part of t9100 creates an unusual "doubled" history in the
"git-svn" ref. When we get to t9100.17, it looks like this:

  $ git log --oneline --graph git-svn
  [...]
  *   efd0303 detect node change from file to directory #2
  |\
  * | 3e727c0 detect node change from file to directory #2
  |/
  *   3b00468 try a deep --rmdir with a commit
  |\
  * | b4832d8 try a deep --rmdir with a commit
  |/
  * f0d7bd5 import for git svn

Each commit we make with "git commit" is paired with one from "git svn
set-tree", with the latter as a merge of the first and its grandparent.

Later, t9100.17 wants to check that "git svn fetch" gets the same trees.
And it does, but just one copy of each. So it uses rev-list to get the
tree of each commit and pipes it to "uniq" to drop the duplicates. Our
input isn't sorted, but it will find adjacent duplicates. This works
reliably because the order of commits from rev-list always shows the
duplicates next to each other. For any one of those merges, we could
choose to show its duplicate or the grandparent first. But barring
clocks running backwards, the duplicate will always have a time equal to
or greater than the grandparent. Even if equal, we break ties by showing
the first-parent first, so the duplicates remain adjacent.

But this would break if the timestamps stopped moving in chronological
order. Normally we would rely on test_tick for this, but we have _two_
sources of time here:

  - "git commit" creates one commit based on GIT_COMMITTER_DATE (which
    respects test_tick)

  - the "svn set-tree" one is based on subversion, which does not have
    an easy way to specify a timestamp

So using test_tick actually breaks the test, because now the duplicates
are far in the past, and we'll show the grandparent before the
duplicate. And likewise, a proposed change to set GIT_COMMITTER_DATE in
all scripts will break it.

We _could_ fix this by sorting before removing duplicates, but
presumably it's a useful part of the test to make sure the trees appear
in the same order in both spots. Likewise, we could use something like:

  perl -ne 'print unless $seen{$_}++'

to remove duplicates without impacting the order. But that doesn't work
either, because there are actually multiple (non-duplicate) commits with
the same trees (we change a file mode and then change it back). So we'd
actually have to de-duplicate the combination of subject and tree. Which
then further throws off t9100.18, which compares the tree hashes
exactly; we'd have to strip the result back down.

Since this test _isn't_ buggy, the simplest thing is to just work around
the proposed change by documenting our expectation that git-created
commits are correctly interleaved using the current time.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-14 14:27:56 -07:00
Han-Wen Nienhuys
ce57d85645 t3432: use git-reflog to inspect the reflog for HEAD
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 13:53:37 -07:00
Christian Couder
6e2ef8eb06 t6300: test refs pointing to tree and blob
Adding tests for refs pointing to tree and blob shows that
we care about testing both positive ("see, my shiny new toy
does work") and negative ("and it won't do nonsensical
things when given an input it is not designed to work with")
cases.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 13:15:44 -07:00
Jeff King
f4eec0ba84 t5539: make timestamp requirements more explicit
The test for "no shallow lines after receiving ACK ready" is very
sensitive to the timestamps of the commits we create. It's looking for
the fetch negotiation to send a "ready", which in turn depends on the
order in which we traverse commits during the negotiation.

It works reliably now because the base commit "7" is created without
test_commit, and thus gets a commit time matching the current system
clock. Whereas the new commits created in this test do use test_commit,
and get the usual test_tick time from 2005. So the fetch into the
"clone" repository results in a commit graph like this (I omitted some
of the "unrelated" commits for clarity; they're all just a sequence of
test_ticks):

  $ git log --graph --format='%ct %s %d'
  * 1112912953 new  (origin/master, origin/HEAD)
  * 1594322236 7  (grafted, master)
  * 1112912893 unrelated15  (origin/unrelated15, unrelated15)
  [...]
  * 1112912053 unrelated1  (origin/unrelated1, unrelated1)
  * 1112911993 new-too  (HEAD -> newnew, tag: new-too)

The important things to see are:

  - "7" is way in the future compared to the other commits

  - "new-too" in the fetching repo is older than "new" (and its
    "unrelated" ancestors) in the shallow repo

If we change our "setup shallow clone" step to use test_tick, too (and
get rid of the dependency on the system clock), then the test will fail.
The resulting graph looks like this:

  $ git log --graph --format='%ct %s %d'
  * 1112913373 new  (origin/master, origin/HEAD)
  * 1112912353 7  (grafted, master)
  * 1112913313 unrelated15  (origin/unrelated15, unrelated15)
  [...]
  * 1112912473 unrelated1  (origin/unrelated1, unrelated1)
  * 1112912413 new-too  (HEAD -> newnew, tag: new-too)

Our "new-too" is still older than "new" and "unrelated", but now "7" is
older than all of them (because it advanced test_tick, which the other
tests built on top of). In the original, we advertised "7" as the first
"have" before anything else, but now "new-too" is more recent. You'd see
the same thing in the unlikely event that the system clock was set
before our test_tick default in 2005.

Let's make the timing requirements more explicit. The important thing is
that the client advertise all of its shared commits first, before
presenting its unique "new-too" commit. We can do that and get rid of
the system clock dependency at the same time by creating all of the
shared commits around time X (using test_tick), and then creating
"new-too" with some time long before X. The resulting graph looks like
this:

  $ git log --graph --format='%ct %s %d'
  * 1500001380 new  (origin/master, origin/HEAD)
  * 1500000420 7  (grafted, master)
  * 1500001320 unrelated15  (origin/unrelated15, unrelated15)
  [...]
  * 1500000480 unrelated1  (origin/unrelated1, unrelated1)
  * 1400000060 new-too  (HEAD -> newnew, tag: new-too)

That also lets us get rid of the hacky test_tick added by f0e802ca20
(t5539: update a flaky test, 2014-07-14). That was clearly dancing
around the same problem, but only addressed the relationship between
commits created in the two subshells (which did use test_tick, but
overlapped because increments of test_tick in subshells are lost). Now
that we're using consistent and well-placed times for both lines of
history, we don't have to care about a one-tick difference between the
two sides.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 11:56:01 -07:00
Jeff King
fccf41e35a t9700: loosen ident timezone regex
A few of the perl tests in t9700 ask for the author and committer ident,
and then make sure we get something sensible. For the timestamp portion,
we just match [0-9]+, because the actual value will depend on when the
test is run. However, we do require that the timezone be "+0000". This
works reliably because we set $TZ in test-lib.sh. But in preparation for
changing the default timezone, let's be a bit more flexible. We don't
actually care about the exact value here, just that we were able to get
a sensible output from the perl module's access methods.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 11:56:01 -07:00
Ben Wijen
dfaa209a79 git clone: don't clone into non-empty directory
When using git clone with --separate-git-dir realgitdir and
realgitdir already exists, it's content is destroyed.

So, make sure we don't clone into an existing non-empty directory.

When d45420c1 (clone: do not clean up directories we didn't create,
2018-01-02) tightened the clean-up procedure after a failed cloning
into an empty directory, it assumed that the existing directory
given is an empty one so it is OK to keep that directory, while
running the clean-up procedure that is designed to remove everything
in it (since there won't be any, anyway).  Check and make sure that
the $GIT_DIR is empty even cloning into an existing repository.

Signed-off-by: Ben Wijen <ben@wijen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 11:43:29 -07:00
Junio C Hamano
24ecfdf206 Merge branch 'tb/fix-persistent-shallow' into master
When "fetch.writeCommitGraph" configuration is set in a shallow
repository and a fetch moves the shallow boundary, we wrote out
broken commit-graph files that do not match the reality, which has
been corrected.

* tb/fix-persistent-shallow:
  commit.c: don't persist substituted parents when unshallowing
2020-07-09 14:00:44 -07:00
Junio C Hamano
20d451c4da Merge branch 'rs/line-log-until' into master
"git log -Lx,y:path --before=date" lost track of where the range
should be because it didn't take the changes made by the youngest
commits that are omitted from the output into account.

* rs/line-log-until:
  revision: disable min_age optimization with line-log
2020-07-09 14:00:42 -07:00
Junio C Hamano
b7ebe8f047 Merge branch 'ra/send-email-in-reply-to-from-command-line-wins' into master
"git send-email --in-reply-to=<msg>" did not use the In-Reply-To:
header with the value given from the command line, and let it be
overridden by the value on In-Reply-To: header in the messages
being sent out (if exists).

* ra/send-email-in-reply-to-from-command-line-wins:
  send-email: restore --in-reply-to superseding behavior
2020-07-09 14:00:42 -07:00
Taylor Blau
ce16364e89 commit.c: don't persist substituted parents when unshallowing
Since 37b9dcabfc (shallow.c: use '{commit,rollback}_shallow_file',
2020-04-22), Git knows how to reset stat-validity checks for the
$GIT_DIR/shallow file, allowing it to change between a shallow and
non-shallow state in the same process (e.g., in the case of 'git fetch
--unshallow').

However, when $GIT_DIR/shallow changes, Git does not alter or remove any
grafts (nor substituted parents) in memory.

This comes up in a "git fetch --unshallow" with fetch.writeCommitGraph
set to true. Ordinarily in a shallow repository (and before 37b9dcabfc,
even in this case), commit_graph_compatible() would return false,
indicating that the repository should not be used to write a
commit-graphs (since commit-graph files cannot represent a shallow
history). But since 37b9dcabfc, in an --unshallow operation that check
succeeds.

Thus even though the repository isn't shallow any longer (that is, we
have all of the objects), the in-core representation of those objects
still has munged parents at the shallow boundaries.  When the
commit-graph write proceeds, we use the incorrect parentage, producing
wrong results.

There are two ways for a user to work around this: either (1) set
'fetch.writeCommitGraph' to 'false', or (2) drop the commit-graph after
unshallowing.

One way to fix this would be to reset the parsed object pool entirely
(flushing the cache and thus preventing subsequent reads from modifying
their parents) after unshallowing. That would produce a problem when
callers have a now-stale reference to the old pool, and so this patch
implements a different approach. Instead, attach a new bit to the pool,
'substituted_parent', which indicates if the repository *ever* stored a
commit which had its parents modified (i.e., the shallow boundary
prior to unshallowing).

This bit needs to be sticky because all reads subsequent to modifying a
commit's parents are unreliable when unshallowing. Modify the check in
'commit_graph_compatible' to take this bit into account, and correctly
avoid generating commit-graphs in this case, thus solving the bug.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Reported-by: Jay Conrod <jayconrod@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-08 16:13:46 -07:00
Jeff King
f421e029ae t6000: use test_tick consistently
The first two commits created in t6000 are done without test_tick,
meaning they use the current system clock. After that, we create one
with test_tick, which means it uses a deterministic time in the past.

The result of the "symleft flag bit is propagated down from tag" test
relies on the output order of commits from git-log, which in turn
depends on these timestamps. So this test is technically dependent on
the system clock time, though in practice it would only matter if your
system clock was set before test_tick's default time (which is in 2005).

However, let's use test_tick consistently for those early commits (and
update the expected output to match). This makes the test deterministic,
which is in turn easier to reason about and debug.

Note that there's also a fourth commit here, and it does not use
test_tick. It does have a deterministic timestamp because of the prior
use of test_tick in the script, but it will always be the same time as
the third commit. Let's use test_tick here, too, for consistency.  The
matching timestamps between the third and fourth commit are not an
important part of the test.

We could also use test_commit in all of these cases, as it runs
test_tick under the hood. But it would be awkward to do so, as these
tests diverge from the usual test_commit patterns (e.g., by creating
multiple files in a single commit).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07 16:18:53 -07:00
Denton Liu
6a67c75948 test-lib-functions: restrict test_must_fail usage
In previous commits, we removed the usage of test_must_fail() for most
commands except for a set of pre-approved commands. Since that's done,
only allow test_must_fail() to run those pre-approved commands.

Obviously, we should allow `git`.

We allow `__git*` as some completion functions return an error code that
comes from a git invocation. It's good to avoid using test_must_fail
unnecessarily but it wouldn't hurt to err on the side of caution when
we're potentially wrapping a git command (like in these cases).

We also allow `test-tool` and `test-svn-fe` because these are helper
commands that are written by us and we want to catch their failure.

Finally, we allow `test_terminal` because `test_terminal` just wraps
around git commands. Also, we cannot rewrite
`test_must_fail test_terminal` as `test_terminal test_must_fail` because
test_must_fail() is a shell function and as a result, it cannot be
invoked from the test-terminal Perl script.

We opted to explicitly list the above tools instead of using a catch-all
such as `test[-_]*` because we want to be as restrictive as possible so
that in the future, someone would not accidentally introduce an
unrelated usage of test_must_fail() on an "unapproved" command.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07 15:47:16 -07:00
Denton Liu
41feac6f74 t9400: don't use test_must_fail with cvs
We are using `test_must_fail cvs` to test that the cvs command fails as
expected. However, test_must_fail() is used to ensure that commands fail
in an expected way, not due to something like a segv. Since we are not
in the business of verifying the sanity of the external world, replace
`test_must_fail cvs` with `! cvs` and assume that the cvs command does
not die unexpectedly.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07 15:46:35 -07:00
Denton Liu
6e7b0ea864 t9834: remove use of test_might_fail p4
The test_must_fail() family of functions (including test_might_fail())
should only be used on git commands. Replace test_might_fail() with
a compound command wrapping the old p4 invocation that always returns 0.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07 13:07:27 -07:00
Denton Liu
c96050ff34 t7107: don't use test_must_fail()
We had a `test_must_fail verify_expect`. However, the git command in
verify_expect() was not expected to fail; the test_cmp() was the failing
command. Be more precise about testing failure by accepting an optional
first argument of '!' which causes the result of the file comparison to
be negated.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07 13:07:27 -07:00
Denton Liu
6861ac806b t5324: reorder run_with_limited_open_files test_might_fail
In the future, we plan on only allowing `test_might_fail` to work on a
restricted subset of commands, including `git`. Reorder the commands so
that `run_with_limited_open_files` comes before `test_might_fail`. This
way, `test_might_fail` operates on a git command.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07 13:07:27 -07:00
Denton Liu
4d9e7c153d t3701: stop using env in force_color()
In a future patch, we plan on making the test_must_fail()-family of
functions accept only git commands. Even though force_color() wraps an
invocation of `env git`, test_must_fail() will not be able to figure
this out since it will assume that force_color() is just some random
function which is disallowed.

Instead of using `env` in force_color() (which does not support shell
functions), export the environment variables in a subshell. Write the
invocation as `force_color test_must_fail git ...` since shell functions
are now supported.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-07 13:07:26 -07:00
Junio C Hamano
efafdca421 Merge branch 'dl/test-must-fail-fixes-5'
The effort to avoid using test_must_fail on non-git command continues.

* dl/test-must-fail-fixes-5:
  lib-submodule-update: pass 'test_must_fail' as an argument
  lib-submodule-update: prepend "git" to $command
  lib-submodule-update: consolidate --recurse-submodules
  lib-submodule-update: add space after function name
2020-07-06 22:09:18 -07:00
Junio C Hamano
0a23331aa6 Merge branch 'jk/fast-export-anonym-alt'
"git fast-export --anonymize" learned to take customized mapping to
allow its users to tweak its output more usable for debugging.

* jk/fast-export-anonym-alt:
  fast-export: use local array to store anonymized oid
  fast-export: anonymize "master" refname
  fast-export: allow seeding the anonymized mapping
  fast-export: add a "data" callback parameter to anonymize_str()
  fast-export: move global "idents" anonymize hashmap into function
  fast-export: use a flex array to store anonymized entries
  fast-export: stop storing lengths in anonymized hashmaps
  fast-export: tighten anonymize_mem() interface to handle only strings
  fast-export: store anonymized oids as hex strings
  fast-export: use xmemdupz() for anonymizing oids
  t9351: derive anonymized tree checks from original repo
2020-07-06 22:09:17 -07:00
Junio C Hamano
0ac0947b14 Merge branch 'js/diff-files-i-t-a-fix-for-difftool'
"git difftool" has trouble dealing with paths added to the index
with the intent-to-add bit.

* js/diff-files-i-t-a-fix-for-difftool:
  difftool -d: ensure that intent-to-add files are handled correctly
  diff-files --raw: show correct post-image of intent-to-add files
2020-07-06 22:09:17 -07:00
Junio C Hamano
11cbda2add Merge branch 'js/default-branch-name'
The name of the primary branch in existing repositories, and the
default name used for the first branch in newly created
repositories, is made configurable, so that we can eventually wean
ourselves off of the hardcoded 'master'.

* js/default-branch-name:
  contrib: subtree: adjust test to change in fmt-merge-msg
  testsvn: respect `init.defaultBranch`
  remote: use the configured default branch name when appropriate
  clone: use configured default branch name when appropriate
  init: allow setting the default for the initial branch name via the config
  init: allow specifying the initial branch name for the new repository
  docs: add missing diamond brackets
  submodule: fall back to remote's HEAD for missing remote.<name>.branch
  send-pack/transport-helper: avoid mentioning a particular branch
  fmt-merge-msg: stop treating `master` specially
2020-07-06 22:09:17 -07:00
Junio C Hamano
67d99b82de Merge branch 'bc/http-push-flagsfix'
The code to push changes over "dumb" HTTP had a bad interaction
with the commit reachability code due to incorrect allocation of
object flag bits, which has been corrected.

* bc/http-push-flagsfix:
  http-push: ensure unforced pushes fail when data would be lost
2020-07-06 22:09:17 -07:00
Junio C Hamano
8a78e4d615 Merge branch 'js/pu-to-seen'
The documentation and some tests have been adjusted for the recent
renaming of "pu" branch to "seen".

* js/pu-to-seen:
  tests: reference `seen` wherever `pu` was referenced
  docs: adjust the technical overview for the rename `pu` -> `seen`
  docs: adjust for the recent rename of `pu` to `seen`
2020-07-06 22:09:16 -07:00
Junio C Hamano
0258ed1e08 Merge branch 'cb/is-descendant-of'
Code clean-up.

* cb/is-descendant-of:
  commit-reach: avoid is_descendant_of() shim
2020-07-06 22:09:16 -07:00
Junio C Hamano
645f63111b Merge branch 'es/get-worktrees-unsort'
API cleanup for get_worktrees()

* es/get-worktrees-unsort:
  worktree: drop get_worktrees() unused 'flags' argument
  worktree: drop get_worktrees() special-purpose sorting option
2020-07-06 22:09:15 -07:00
Junio C Hamano
e7e113a1df Merge branch 'bc/sha-256-cvs-svn-updates'
CVS/SVN interface have been prepared for SHA-256 transition

* bc/sha-256-cvs-svn-updates:
  git-cvsexportcommit: port to SHA-256
  git-cvsimport: port to SHA-256
  git-cvsserver: port to SHA-256
  git-svn: set the OID length based on hash algorithm
  perl: make SVN code hash independent
  perl: make Git::IndexInfo work with SHA-256
  perl: create and switch variables for hash constants
  t/lib-git-svn: make hash size independent
  t9101: make hash independent
  t9104: make hash size independent
  t9100: make test work with SHA-256
  t9108: make test hash independent
  t9168: make test hash independent
  t9109: make test hash independent
2020-07-06 22:09:14 -07:00
Junio C Hamano
d80bea479d Merge branch 'ak/commit-graph-to-slab'
A few fields in "struct commit" that do not have to always be
present have been moved to commit slabs.

* ak/commit-graph-to-slab:
  commit-graph: minimize commit_graph_data_slab access
  commit: move members graph_pos, generation to a slab
  commit-graph: introduce commit_graph_data_slab
  object: drop parsed_object_pool->commit_count
2020-07-06 22:09:14 -07:00
Junio C Hamano
33a22c1a88 Merge branch 'ps/ref-transaction-hook'
A new hook.

* ps/ref-transaction-hook:
  refs: implement reference transaction hook
2020-07-06 22:09:13 -07:00
Junio C Hamano
12210859da Merge branch 'bc/sha-256-part-2'
SHA-256 migration work continues.

* bc/sha-256-part-2: (44 commits)
  remote-testgit: adapt for object-format
  bundle: detect hash algorithm when reading refs
  t5300: pass --object-format to git index-pack
  t5704: send object-format capability with SHA-256
  t5703: use object-format serve option
  t5702: offer an object-format capability in the test
  t/helper: initialize the repository for test-sha1-array
  remote-curl: avoid truncating refs with ls-remote
  t1050: pass algorithm to index-pack when outside repo
  builtin/index-pack: add option to specify hash algorithm
  remote-curl: detect algorithm for dumb HTTP by size
  builtin/ls-remote: initialize repository based on fetch
  t5500: make hash independent
  serve: advertise object-format capability for protocol v2
  connect: parse v2 refs with correct hash algorithm
  connect: pass full packet reader when parsing v2 refs
  Documentation/technical: document object-format for protocol v2
  t1302: expect repo format version 1 for SHA-256
  builtin/show-index: provide options to determine hash algo
  t5302: modernize test formatting
  ...
2020-07-06 22:09:13 -07:00
Han-Wen Nienhuys
9e35a6a986 lib-t6000.sh: write tag using git-update-ref
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-06 21:38:32 -07:00
René Scharfe
01faa91cb7 revision: disable min_age optimization with line-log
If one of the options --before, --min-age or --until is given,
limit_list() filters out younger commits early on.  Line-log needs all
those commits to trace the movement of line ranges, though.  Skip this
optimization if both are used together.

Reported-by: Мария Долгополова <dolgopolovamariia@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-06 18:38:03 -07:00
Johannes Schindelin
3080c50980 difftool -d: ensure that intent-to-add files are handled correctly
In https://github.com/git-for-windows/git/issues/2677, a `git difftool
-d` problem was reported. The underlying cause was a bug in `git
diff-files --raw` that we just fixed: it reported intent-to-add files
with the empty _tree_ as the post-image OID, when we need to show
an all-zero (or, "null") OID instead, to indicate to the caller that
they have to look at the worktree file.

The symptom of that problem shown by `git difftool` was this:

	error: unable to read sha1 file of <path> (<empty-tree-OID>)
	error: could not write '<filename>'

Make sure that the reported `difftool` problem stays fixed.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 16:15:45 -07:00
Johannes Schindelin
85953a3187 diff-files --raw: show correct post-image of intent-to-add files
The documented behavior of `git diff-files --raw` is to display

	[...] 0{40} if creation, unmerged or "look at work tree".

on the right hand (i.e. postimage) side. This happens for files that
have unstaged modifications, and for files that are unmodified but
stat-dirty.

For intent-to-add files, we used to show the empty blob's hash instead.
In c26022ea8f (diff: convert diff_addremove to struct object_id,
2017-05-30), we made that worse by inadvertently changing that to the
hash of the empty tree.

Let's make the behavior consistent with files that have unstaged
modifications (which applies to intent-to-add files, too) by showing
all-zero values also for intent-to-add files.

Accordingly, this patch adjusts the expectations set by the regression
test introduced in feea6946a5 (diff-files: treat "i-t-a" files as
"not-in-index", 2020-06-20).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 16:15:43 -07:00
Rafael Aquini
f9f60d7066 send-email: restore --in-reply-to superseding behavior
git send-email --in-reply-to= fails to override In-Reply-To email headers,
if they're present in the output of format-patch, even when explicitly
told to do so by the option --no-thread, which breaks the contract of the
command line switch option, per its man page.

"
   --in-reply-to=<identifier>
       Make the first mail (or all the mails with --no-thread) appear as
       a reply to the given Message-Id, which avoids breaking threads to
       provide a new patch series.
"

This patch fixes the aformentioned issue, by bringing --in-reply-to's old
overriding behavior back.

The test was donated by Carlo Marcelo Arenas Belón.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 16:12:21 -07:00
SZEDER Gábor
c525ce95b4 commit-graph: check all leading directories in changed path Bloom filters
The file 'dir/subdir/file' can only be modified if its leading
directories 'dir' and 'dir/subdir' are modified as well.

So when checking modified path Bloom filters looking for commits
modifying a path with multiple path components, then check not only
the full path in the Bloom filters, but all its leading directories as
well.  Take care to check these paths in "deepest first" order,
because it's the full path that is least likely to be modified, and
the Bloom filter queries can short circuit sooner.

This can significantly reduce the average false positive rate, by
about an order of magnitude or three(!), and can further speed up
pathspec-limited revision walks.  The table below compares the average
false positive rate and runtime of

  git rev-list HEAD -- "$path"

before and after this change for 5000+ randomly* selected paths from
each repository:

                    Average false           Average        Average
                    positive rate           runtime        runtime
                  before     after     before     after   difference
  ------------------------------------------------------------------
  git             3.220%   0.7853%     0.0558s   0.0387s   -30.6%
  linux           2.453%   0.0296%     0.1046s   0.0766s   -26.8%
  tensorflow      2.536%   0.6977%     0.0594s   0.0420s   -29.2%

*Path selection was done with the following pipeline:

	git ls-tree -r --name-only HEAD | sort -R | head -n 5000

The improvements in runtime are much smaller than the improvements in
average false positive rate, as we are clearly reaching diminishing
returns here.  However, all these timings depend on that accessing
tree objects is reasonably fast (warm caches).  If we had a partial
clone and the tree objects had to be fetched from a promisor remote,
e.g.:

  $ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git
  $ git -C webkit.git -c core.modifiedPathBloomFilters=1 \
        commit-graph write --reachable
  $ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/
  $ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \
        rev-list HEAD -- "$path"

then checking all leading path component can reduce the runtime from
over an hour to a few seconds (and this is with the clone and the
promisor on the same machine).

This adjusts the tracing values in t4216-log-bloom.sh, which provides a
concrete way to notice the improvement.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 14:17:43 -07:00
Taylor Blau
f3c2a36810 revision: empty pathspecs should not use Bloom filters
The prepare_to_use_bloom_filter() method was not intended to be called
on an empty pathspec. However, 'git log -- .' and 'git log' are subtly
different: the latter reports all commits while the former will simplify
commits that do not change the root tree.

This means that the path used to construct the bloom_key might be empty,
and that value is not added to the Bloom filter during construction.
That means that the results are likely incorrect!

To resolve the issue, be careful about the length of the path and stop
filling Bloom filters. To be completely sure we do not use them, drop
the pointer to the bloom_filter_settings from the commit-graph. That
allows our test to look at the trace2 logs to verify no Bloom filter
statistics are reported.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 14:17:43 -07:00
Derrick Stolee
0087a87ba8 commit-graph: persist existence of changed-paths
The changed-path Bloom filters were released in v2.27.0, but have a
significant drawback. A user can opt-in to writing the changed-path
filters using the "--changed-paths" option to "git commit-graph write"
but the next write will drop the filters unless that option is
specified.

This becomes even more important when considering the interaction with
gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
features.experimental). These config options trigger commit-graph writes
that the user did not signal, and hence there is no --changed-paths
option available.

Allow a user that opts-in to the changed-path filters to persist the
property of "my commit-graph has changed-path filters" automatically. A
user can drop filters using the --no-changed-paths option.

In the process, we need to be extremely careful to match the Bloom
filter settings as specified by the commit-graph. This will allow future
versions of Git to customize these settings, and the version with this
change will persist those settings as commit-graphs are rewritten on
top.

Use the trace2 API to signal the settings used during the write, and
check that output in a test after manually adjusting the correct bytes
in the commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 14:17:43 -07:00
Derrick Stolee
949197420e bloom: fix logic in get_bloom_filter()
The get_bloom_filter() method is a bit complicated in some parts where
it does not need to be. In particular, it needs to return a NULL filter
only when compute_if_not_present is zero AND the filter data cannot be
loaded from a commit-graph file. This currently happens by accident
because the commit-graph does not load changed-path Bloom filters from
an existing commit-graph when writing a new one. This will change in a
later patch.

Also clean up some style issues while we are here.

One side-effect of returning a NULL filter is that the filters that are
reported as "too large" will now be reported as NULL insead of length
zero. This case was not properly covered before, so add a test. Further,
remote the counting of the zero-length filters from revision.c and the
trace2 logs.

Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 14:17:43 -07:00
Junio C Hamano
298d704e70 Merge branch 'sk/diff-files-show-i-t-a-as-new'
"git diff-files" has been taught to say paths that are marked as
intent-to-add are new files, not modified from an empty blob.

* sk/diff-files-show-i-t-a-as-new:
  diff-files: treat "i-t-a" files as "not-in-index"
2020-06-29 14:17:27 -07:00
Junio C Hamano
1033b98291 Merge branch 'xl/upgrade-repo-format'
Allow runtime upgrade of the repository format version, which needs
to be done carefully.

There is a rather unpleasant backward compatibility worry with the
last step of this series, but it is the right thing to do in the
longer term.

* xl/upgrade-repo-format:
  check_repository_format_gently(): refuse extensions for old repositories
  sparse-checkout: upgrade repository to version 1 when enabling extension
  fetch: allow adding a filter after initial clone
  repository: add a helper function to perform repository format upgrade
2020-06-29 14:17:24 -07:00
Jeff King
8a49495583 fast-export: anonymize "master" refname
Running "fast-export --anonymize" will leave "refs/heads/master"
untouched in the output, for two reasons:

  - it helped to have some known reference point between the original
    and anonymized repository

  - since it's historically the default branch name, it doesn't leak any
    information

Now that we can ask fast-export to retain particular tokens, we have a
much better tool for the first one (because it works for any ref, not
just master).

For the second, the notion of "default branch name" is likely to become
configurable soon, at which point the name _does_ leak information.
Let's drop this special case in preparation.

Note that we have to adjust the test a bit, since it relied on using the
name "master" in the anonymized repos. We could just use
--anonymize-map=master to keep the same output, but then we wouldn't
know if it works because of our hard-coded master or because of the
explicit map.

So let's flip the test a bit, and confirm that we anonymize "master",
but keep "other" in the output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
Jeff King
65b5d9fae7 fast-export: allow seeding the anonymized mapping
After you anonymize a repository, it can be hard to find which commits
correspond between the original and the result, and thus hard to
reproduce commands that triggered bugs in the original.

Let's make it possible to seed the anonymization map. This lets users
either:

  - mark names to be retained as-is, if they don't consider them secret
    (in which case their original commands would just work)

  - map names to new values, which lets them adapt the reproduction
    recipe to the new names without revealing the originals

The implementation is fairly straight-forward. We already store each
anonymized token in a hashmap (so that the same token appearing twice is
converted to the same result). We can just introduce a new "seed"
hashmap which is consulted first.

This does make a few more promises to the user about how we'll anonymize
things (e.g., token-splitting pathnames). But it's unlikely that we'd
want to change those rules, even if the actual anonymization of a single
token changes. And it makes things much easier for the user, who can
unblind only a directory name without having to specify each path within
it.

One alternative to this approach would be to anonymize as we see fit,
and then dump the whole refname and pathname mappings to a file. This
does work, but it's a bit awkward to use (you have to manually dig the
items you care about out of the mapping).

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 14:19:23 -07:00
Junio C Hamano
f33b5bddaf Merge branch 'pb/t4014-unslave'
A branch name used in a test has been clarified to match what is
going on.

* pb/t4014-unslave:
  t4014: do not use "slave branch" nomenclature
2020-06-25 12:27:48 -07:00
Junio C Hamano
34e849b05a Merge branch 'jt/cdn-offload'
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.

* jt/cdn-offload:
  upload-pack: fix a sparse '0 as NULL pointer' warning
  upload-pack: send part of packfile response as uri
  fetch-pack: support more than one pack lockfile
  upload-pack: refactor reading of pack-objects out
  Documentation: add Packfile URIs design doc
  Documentation: order protocol v2 sections
  http-fetch: support fetching packfiles by URL
  http-fetch: refactor into function
  http: refactor finish_http_pack_request()
  http: use --stdin when indexing dumb HTTP pack
2020-06-25 12:27:47 -07:00
Junio C Hamano
7b2685ef2d Merge branch 'dl/branch-cleanup'
Code clean-up around "git branch" with a minor bugfix.

* dl/branch-cleanup:
  branch: don't mix --edit-description
  t3200: test for specific errors
  t3200: rename "expected" to "expect"
2020-06-25 12:27:47 -07:00
Junio C Hamano
1457886ce2 Merge branch 'ct/diff-with-merge-base-clarification'
"git diff" used to take arguments in random and nonsense range
notation, e.g. "git diff A..B C", "git diff A..B C...D", etc.,
which has been cleaned up.

* ct/diff-with-merge-base-clarification:
  Documentation: usage for diff combined commits
  git diff: improve range handling
  t/t3430: avoid undefined git diff behavior
2020-06-25 12:27:46 -07:00
Junio C Hamano
320421840e Merge branch 'jk/complete-git-switch'
The command line completion (in contrib/) learned to complete
options that the "git switch" command takes.

* jk/complete-git-switch:
  completion: improve handling of --orphan option of switch/checkout
  completion: improve handling of -c/-C and -b/-B in switch/checkout
  completion: improve handling of --track in switch/checkout
  completion: improve handling of --detach in checkout
  completion: improve completion for git switch with no options
  completion: improve handling of DWIM mode for switch/checkout
  completion: perform DWIM logic directly in __git_complete_refs
  completion: extract function __git_dwim_remote_heads
  completion: replace overloaded track term for __git_complete_refs
  completion: add tests showing subpar switch/checkout --orphan logic
  completion: add tests showing subpar -c/C argument completion
  completion: add tests showing subpar -c/-C startpoint completion
  completion: add tests showing subpar switch/checkout --track logic
  completion: add tests showing subar checkout --detach logic
  completion: add tests showing subpar DWIM logic for switch/checkout
  completion: add test showing subpar git switch completion
2020-06-25 12:27:45 -07:00
Johannes Schindelin
6dca5dbf93 tests: reference seen wherever pu was referenced
As our test suite partially reflects how we work in the Git project, it
is natural that the branch name `pu` was used in a couple places.

Since that branch was renamed to `seen`, let's use the new name
consistently.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-25 09:18:56 -07:00
Johannes Schindelin
0068f2116e testsvn: respect init.defaultBranch
The default name of the initial branch in new repositories can now be
configured. The `testsvn` remote helper translates the remote Subversion
repository's branch name `trunk` to the hard-coded name `master`.
Clearly, the intention was to make the name align with Git's defaults.

So while we are not talking about a newly-created repository in the
`testsvn` context, it is a newly-created _Git_ repository, si it _still_
makes sense to use the overridden default name for the initial branch
whenever users configured it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Johannes Schindelin
a471214bd6 remote: use the configured default branch name when appropriate
When guessing the default branch name of a remote, and there are no refs
to guess from, we want to go with the preference specified by the user
for the fall-back, i.e. the default name to be used for the initial
branch of new repositories (because as far as the user is concerned, a
remote that has no branches yet is a new repository).

At the same time, when talking to an older Git server that does not
report a symref for `HEAD` (but instead reports a commit hash), let's
try to guess the configured default branch name first. If it does not
match the reported commit hash, let's fall back to `master` as before.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Johannes Schindelin
0cc1b475bb clone: use configured default branch name when appropriate
When cloning a repository without any branches, Git chooses a default
branch name for the as-yet unborn branch.

As part of the implicit initialization of the local repository, Git just
learned to respect `init.defaultBranch` to choose a different initial
branch name. We now really want that branch name to be used as a
fall-back.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Don Goodman-Wilson
8747ebb7cd init: allow setting the default for the initial branch name via the config
We just introduced the command-line option
`--initial-branch=<branch-name>` to allow initializing a new repository
with a different initial branch than the hard-coded one.

To allow users to override the initial branch name more permanently
(i.e. without having to specify the name manually for each and every
`git init` invocation), let's introduce the `init.defaultBranch` config
setting.

Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Don Goodman-Wilson <don@goodman-wilson.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Johannes Schindelin
32ba12dab2 init: allow specifying the initial branch name for the new repository
There is a growing number of projects and companies desiring to change
the main branch name of their repositories (see e.g.
https://twitter.com/mislav/status/1270388510684598272 for background on
this).

To change that branch name for new repositories, currently the only way
to do that automatically is by copying all of Git's template directory,
then hard-coding the desired default branch name into the `.git/HEAD`
file, and then configuring `init.templateDir` to point to those copied
template files.

To make this process much less cumbersome, let's introduce a new option:
`--initial-branch=<branch-name>`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Johannes Schindelin
f0a96e8d4c submodule: fall back to remote's HEAD for missing remote.<name>.branch
When `remote.<name>.branch` is not configured, `git submodule update`
currently falls back to using the branch name `master`. A much better
idea, however, is to use the remote `HEAD`: on all Git servers running
reasonably recent Git versions, the symref `HEAD` points to the main
branch.

Note: t7419 demonstrates that there _might_ be use cases out there that
_expect_ `git submodule update --remote` to update submodules to the
remote `master` branch even if the remote `HEAD` points to another
branch. Arguably, this patch makes the behavior more intuitive, but
there is a slight possibility that this might cause regressions in
obscure setups.

Even so, it should be okay to fix this behavior without anything like a
longer transition period:

- The `git submodule update --remote` command is not really common.

- Current Git's behavior when running this command is outright
  confusing, unless the remote repository's current branch _is_ `master`
  (in which case the proposed behavior matches the old behavior).

- If a user encounters a regression due to the changed behavior, the fix
  is actually trivial: setting `submodule.<name>.branch` to `master`
  will reinstate the old behavior.

Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Johannes Schindelin
4d04658d8b send-pack/transport-helper: avoid mentioning a particular branch
When trying to push all matching branches, but none match, we offer a
message suggesting to push the `master` branch.

However, we want to step away from making that branch any more special
than any other branch, so let's reword that message to mention no branch
in particular.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Denton Liu
5b0ac09fb1 lib-submodule-update: pass 'test_must_fail' as an argument
When we run a test helper function in test_submodule_switch_common(), we
sometimes specify a whole helper function as the $command. When we do
this, in some test cases, we just mark the whole function with
`test_must_fail`. However, it's possible that the helper function might
fail earlier or later than expected due to an introduced bug. If this
happens, then the test case will still report as passing but it should
really be marked as failing since it didn't actually display the
intended behaviour.

Instead of invoking `test_must_fail $command`, pass the string
"test_must_fail" as the second argument in case where the git command is
expected to fail.

When $command is a helper function, the parent function calling
test_submodule_switch_common() is test_submodule_switch_func(). For all
test_submodule_switch_func() invocations, increase the granularity of
the argument test helper function by prefixing the git invocation which is
meant to fail with the second argument like this:

	$2 git checkout "$1"

In the other cases, test_submodule_switch() and
test_submodule_forced_switch(), instead of passing in the git command
directly, wrap it using the git_test_func() and pass the git arguments
using the global variable $gitcmd. Unfortunately, since closures aren't
a thing in shell scripts, the global variable is necessary. Another
unfortunate result is that the "git_test_func" will used as the test
case name when $command is printed but it's worth it for the cleaner
code.

Finally, as an added bonus, `test_must_fail` will now only run on git
commands.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 08:54:18 -07:00
Jeff King
b897bf5f37 fast-export: use xmemdupz() for anonymizing oids
Our anonymize_mem() function is careful to take a ptr/len pair to allow
storing binary tokens like object ids, as well as partial strings (e.g.,
just "foo" of "foo/bar"). But it duplicates the hash key using
xstrdup()! That means that:

  - for a partial string, we'd store all bytes up to the NUL, even
    though we'd never look at anything past "len". This didn't produce
    wrong behavior, but was wasteful.

  - for a binary oid that doesn't contain a zero byte, we'd copy garbage
    bytes off the end of the array (though as long as nothing complained
    about reading uninitialized bytes, further reads would be limited by
    "len", and we'd produce the correct results)

  - for a binary oid that does contain a zero byte, we'd copy _fewer_
    bytes than intended into the hashmap struct. When we later try to
    look up a value, we'd access uninitialized memory and potentially
    falsely claim that a particular oid is not present.

The most common reason to store an oid is an anonymized gitlink, but our
test case doesn't have any gitlinks at all. So let's add one whose oid
contains a NUL and is present at two different paths. ASan catches the
memory error, but even without it we can detect the bug because the oid
is not anonymized the same way for both paths.

And of course the fix is to copy the correct number of bytes. We don't
technically need the appended NUL from xmemdupz(), but it doesn't hurt
as an extra protection against anybody treating it like a string (plus a
future patch will push us more in that direction).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Jeff King
b8c0689bb9 t9351: derive anonymized tree checks from original repo
Our tests of the anonymized repo just hard-code the expected set of
objects in the root and subdirectory trees. This makes them brittle to
the test setup changing (e.g., adding new paths that need tested).

Let's look at the original repo to compute our expected set of objects.
Note that this isn't completely perfect (e.g., we still rely on there
being only one tree in the root), but it does simplify later patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 19:56:26 -07:00
Johannes Schindelin
489947cee5 fmt-merge-msg: stop treating master specially
In the context of many projects renaming their primary branch names away
from `master`, Git wants to stop treating the `master` branch specially.

Let's start with `git fmt-merge-msg`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 17:22:35 -07:00
Derrick Stolee
7b671f8c2b commit-graph: change test to die on parse, not load
43d3561 (commit-graph write: don't die if the existing graph is corrupt,
2019-03-25) introduced the GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD environment
variable. This was created to verify that commit-graph was not loaded
when writing a new non-incremental commit-graph.

An upcoming change wants to load a commit-graph in some valuable cases,
but we want to maintain that we don't trust the commit-graph data when
writing our new file. Instead of dying on load, instead die if we ever
try to parse a commit from the commit-graph. This functionally verifies
the same intended behavior, but allows a more advanced feature in the
next change.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 17:12:08 -07:00
Carlo Marcelo Arenas Belón
c1ea625f72 commit-reach: avoid is_descendant_of() shim
d91d6fbf26 (commit-reach: create repo_is_descendant_of(), 2020-06-17)
adds a repository aware version of is_descendant_of() and a backward
compatibility shim that is barely used.

Update all callers to directly use the new repo_is_descendant_of()
function instead; making the codebase simpler and pushing more
the_repository references higher up the stack.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 16:36:53 -07:00
brian m. carlson
64472d15e9 http-push: ensure unforced pushes fail when data would be lost
When we push using the DAV-based protocol, the client is the one that
performs the ref updates and therefore makes the checks to see whether
an unforced push should be allowed.  We make this check by determining
if either (a) we lack the object file for the old value of the ref or
(b) the new value of the ref is not newer than the old value, and in
either case, reject the push.

However, the ref_newer function, which performs this latter check, has
an odd behavior due to the reuse of certain object flags.  Specifically,
it will incorrectly return false in its first invocation and then
correctly return true on a subsequent invocation.  This occurs because
the object flags used by http-push.c are the same as those used by
commit-reach.c, which implements ref_newer, and one piece of code
misinterprets the flags set by the other.

Note that this does not occur in all cases.  For example, if the example
used in the tests is changed to use one repository instead of two and
rewind the head to add a commit, the test passes and we correctly reject
the push.  However, the example provided does trigger this behavior, and
the code has been broken in this way since at least Git 2.0.0.

To solve this problem, let's move the two sets of object flags so that
they don't overlap, since we're clearly using them at the same time.
The new set should not conflict with other usage because other users are
either builtin code (which is not compiled into git http-push) or
upload-pack (which we similarly do not use here).

Reported-by: Michael Ward <mward@smartsoftwareinc.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-23 15:40:59 -07:00
Junio C Hamano
9740ef888e Merge branch 'es/worktree-duplicate-paths'
The same worktree directory must be registered only once, but
"git worktree move" allowed this invariant to be violated, which
has been corrected.

* es/worktree-duplicate-paths:
  worktree: make "move" refuse to move atop missing registered worktree
  worktree: generalize candidate worktree path validation
  worktree: prune linked worktree referencing main worktree path
  worktree: prune duplicate entries referencing same worktree path
  worktree: make high-level pruning re-usable
  worktree: give "should be pruned?" function more meaningful name
  worktree: factor out repeated string literal
2020-06-22 15:55:03 -07:00
Junio C Hamano
b8a5299594 Merge branch 'jt/redact-all-cookies'
The interface to redact sensitive information in the trace output
has been simplified.

* jt/redact-all-cookies:
  http: redact all cookies, teach GIT_TRACE_REDACT=0
2020-06-22 15:55:02 -07:00
brian m. carlson
148f193d16 t/lib-git-svn: make hash size independent
The record size used in the git svn storage is four bytes plus the
length of the binary hash.  Pass the hash length into our Perl
invocation and use it to compute the size of the records.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 11:21:07 -07:00
Srinidhi Kaushik
feea6946a5 diff-files: treat "i-t-a" files as "not-in-index"
The `diff-files' command and related commands which call the function
`cmd_diff_files()', consider the "intent-to-add" files as a part of the
index when comparing the work-tree against it. This was previously
addressed in commits [1] and [2] by turning the option
`--ita-invisible-in-index' (introduced in [3]) on by default.

For `diff-files' (and `add -p' as a consequence) to show the i-t-a
files as as new, `ita_invisible_in_index' will be enabled by default
here as well.

[1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default,
                2018-05-26)
[2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist
                in index", 2016-10-24)
[3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)

Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:46:45 -07:00
Eric Sunshine
03f2465bb1 worktree: drop get_worktrees() unused 'flags' argument
get_worktrees() accepts a 'flags' argument, however, there are no
existing flags (the lone flag GWT_SORT_LINKED was recently retired) and
no behavior which can be tweaked. Therefore, drop the 'flags' argument.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 10:31:15 -07:00
brian m. carlson
3e04b6e1b6 t9101: make hash independent
Instead of hard-coding the object ID for our test .gitignore file, let's
compute it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 09:52:02 -07:00
brian m. carlson
bbe0616cd8 t9104: make hash size independent
The size of a record in the database used by git svn is four bytes plus
the length of the binary hash.  Instead of hard-coding 24, compute this
value based on the size of the hash in use.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 09:52:02 -07:00
brian m. carlson
407527ba44 t9100: make test work with SHA-256
Compute the relevant tree objects for SHA-256 and use those when
appropriate instead of using the SHA-1 ones.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 09:52:02 -07:00
brian m. carlson
606b9749c6 t9108: make test hash independent
Instead of stripping off the first 41 characters of git log output,
let's just strip off the first space-separated component, which will
work for any size hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 09:52:02 -07:00
brian m. carlson
5aa6877540 t9168: make test hash independent
Instead of stripping off the first 41 characters of git log output,
let's just strip off the first space-separated component, which will
work for any size hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 09:52:02 -07:00
brian m. carlson
62814dfd17 t9109: make test hash independent
Instead of stripping off the first 41 characters of git log output,
let's just strip off the first space-separated component, which will
work for any size hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Acked-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-22 09:52:02 -07:00
brian m. carlson
3716d50dd5 remote-testgit: adapt for object-format
When using an algorithm other than SHA-1, we need the remote helper to
advertise support for the object-format extension and provide
information back to us so that we can properly parse refs and return
data. Ensure that the test remote helper understands these extensions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:09 -07:00
brian m. carlson
371c4079f4 t5300: pass --object-format to git index-pack
git index-pack by default reads the repository to determine the object
format. However, when outside of a repository, it's necessary to specify
the hash algorithm in use so that the pack can be properly indexed. Add
an --object-format argument when invoking git index-pack outside of a
repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:09 -07:00
brian m. carlson
4ddd3f5063 t5704: send object-format capability with SHA-256
When we speak protocol v2 in this test, we must pass the object-format
header if the algorithm is not SHA-1.  Otherwise, git upload-pack fails
because the hash algorithm doesn't match and not because we've failed to
speak the protocol correctly.  Pass the header so that our assertions
test what we're really interested in.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:09 -07:00
brian m. carlson
f7c6a3bf08 t5703: use object-format serve option
When we're using an algorithm other than SHA-1, we need to specify the
algorithm in use so we don't get a failure with an "unknown format"
message. Add a wrapper function that specifies this header if required.
Skip specifying this header for SHA-1 to test that it works both with an
without this header.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:09 -07:00
brian m. carlson
8fc7003540 t5702: offer an object-format capability in the test
In order to make this test work with SHA-256, offer an object-format
capability so that both sides use the same algorithm.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:09 -07:00
brian m. carlson
54cbbe4c6e t/helper: initialize the repository for test-sha1-array
test-sha1-array uses the_hash_algo under the hood. Since t0064 wants to
use the value that is correct for the hash algorithm that we're testing,
make sure the test helper initializes the repository to set
the_hash_algo correctly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:08 -07:00
brian m. carlson
793731f742 t1050: pass algorithm to index-pack when outside repo
When outside a repository, git index-pack is unable to guess the hash
algorithm in use for a pack, since packs don't contain any information
on the algorithm in use. Pass an option to index-pack to help it out in
this test.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:08 -07:00
brian m. carlson
ac093d0790 remote-curl: detect algorithm for dumb HTTP by size
When reading the info/refs file for a repository, we have no explicit
way to detect which hash algorithm is in use because the file doesn't
provide one. Detect the hash algorithm in use by the size of the first
object ID.

If we have an empty repository, we don't know what the hash algorithm is
on the remote side, so default to whatever the local side has
configured.  Without doing this, we cannot clone an empty repository
since we don't know its hash algorithm.  Test this case appropriately,
since we currently have no tests for cloning an empty repository with
the dumb HTTP protocol.

We anonymize the URL like elsewhere in the function in case the user has
decided to include a secret in the URL.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:08 -07:00
Patrick Steinhardt
6754159767 refs: implement reference transaction hook
The low-level reference transactions used to update references are
currently completely opaque to the user. While certainly desirable in
most usecases, there are some which might want to hook into the
transaction to observe all queued reference updates as well as observing
the abortion or commit of a prepared transaction.

One such usecase would be to have a set of replicas of a given Git
repository, where we perform Git operations on all of the repositories
at once and expect the outcome to be the same in all of them. While
there exist hooks already for a certain subset of Git commands that
could be used to implement a voting mechanism for this, many others
currently don't have any mechanism for this.

The above scenario is the motivation for the new "reference-transaction"
hook that reaches directly into Git's reference transaction mechanism.
The hook receives as parameter the current state the transaction was
moved to ("prepared", "committed" or "aborted") and gets via its
standard input all queued reference updates. While the exit code gets
ignored in the "committed" and "aborted" states, a non-zero exit code in
the "prepared" state will cause the transaction to be aborted
prematurely.

Given the usecase described above, a voting mechanism can now be
implemented via this hook: as soon as it gets called, it will take all
of stdin and use it to cast a vote to a central service. When all
replicas of the repository agree, the hook will exit with zero,
otherwise it will abort the transaction by returning non-zero. The most
important upside is that this will catch _all_ commands writing
references at once, allowing to implement strong consistency for
reference updates via a single mechanism.

In order to test the impact on the case where we don't have any
"reference-transaction" hook installed in the repository, this commit
introduce two new performance tests for git-update-refs(1). Run against
an empty repository, it produces the following results:

  Test                         origin/master     HEAD
  --------------------------------------------------------------------
  1400.2: update-ref           2.70(2.10+0.71)   2.71(2.10+0.73) +0.4%
  1400.3: update-ref --stdin   0.21(0.09+0.11)   0.21(0.07+0.14) +0.0%

The performance test p1400.2 creates, updates and deletes a branch a
thousand times, thus averaging runtime of git-update-refs over 3000
invocations. p1400.3 instead calls `git-update-refs --stdin` three times
and queues a thousand creations, updates and deletes respectively.

As expected, p1400.3 consistently shows no noticeable impact, as for
each batch of updates there's a single call to access(3P) for the
negative hook lookup. On the other hand, for p1400.2, one can see an
impact caused by this patchset. But doing five runs of the performance
tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead
ranged from -1.5% to +1.1%. These inconsistent performance numbers can
be explained by the overhead of spawning 3000 processes. This shows that
the overhead of assembling the hook path and executing access(3P) once
to check if it's there is mostly outweighed by the operating system's
overhead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 10:46:13 -07:00
Paolo Bonzini
08dc26061f t4014: do not use "slave branch" nomenclature
Git branches have been qualified as topic branches, integration branches,
development branches, feature branches, release branches and so on.
Git has a branch that is the master *for* development, but it is not
the master *of* any "slave branch": Git does not have slave branches,
and has never had, except for a single testcase that claims otherwise. :)

Independent of any future change to the naming of the "master" branch,
removing this sole appearance of the term is a strict improvement: it
avoids divisive language, and talking about "feature branch" clarifies
which developer workflow the test is trying to emulate.

Reported-by: Till Maas <tmaas@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 10:26:34 -07:00
Junio C Hamano
653a3514cc Merge branch 'dl/t-readme-spell-git-correctly'
Doc updates.

* dl/t-readme-spell-git-correctly:
  t/README: avoid poor-man's small caps GIT
2020-06-17 21:54:05 -07:00
Junio C Hamano
64efa11e6b Merge branch 'en/do-match-pathspec-fix'
Use of negative pathspec, while collecting paths including
untracked ones in the working tree, was broken.

* en/do-match-pathspec-fix:
  dir: fix treatment of negated pathspecs
2020-06-17 21:54:03 -07:00
Junio C Hamano
a554228ffb Merge branch 'en/sparse-checkout'
The behaviour of "sparse-checkout" in the state "git clone
--no-checkout" left was changed accidentally in 2.27, which has
been corrected.

* en/sparse-checkout:
  sparse-checkout: avoid staging deletions of all files
2020-06-17 21:54:02 -07:00
Junio C Hamano
524caf8035 Merge branch 'js/reflog-anonymize-for-clone-and-fetch'
The reflog entries for "git clone" and "git fetch" did not
anonymize the URL they operated on.

* js/reflog-anonymize-for-clone-and-fetch:
  clone/fetch: anonymize URLs in the reflog
2020-06-17 21:54:01 -07:00
Junio C Hamano
abacefe865 Merge branch 'tb/t5318-cleanup'
Code cleanup.

* tb/t5318-cleanup:
  t5318: test that '--stdin-commits' respects '--[no-]progress'
  t5318: use 'test_must_be_empty'
2020-06-17 21:54:01 -07:00
Abhishek Kumar
6da43d937c object: drop parsed_object_pool->commit_count
14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.

However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.

Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.

As commit_count has no use anymore, let's also drop it from the struct.

Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 14:37:14 -07:00
Denton Liu
6b7093064a t3200: test for specific errors
In the "--set-upstream-to" and "--unset-upstream" tests, specific error
conditions are being tested. However, there is no way of ensuring that a
test case is failing because of some specific error.

Check stderr of failing commands to ensure that they are failing in the
expected way.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 11:12:32 -07:00
Denton Liu
6d504d5b0f t3200: rename "expected" to "expect"
Clean up style of test by changing some filenames from "expected" to
"expect", which follows typical test convention.

Also, change a space-indent into a tab-indent.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-17 11:12:31 -07:00
Junio C Hamano
eebb51ba8c Merge branch 'hn/refs-cleanup'
Preliminary clean-ups around refs API, plus file format
specification documentation for the reftable backend.

* hn/refs-cleanup:
  reftable: define version 2 of the spec to accomodate SHA256
  reftable: clarify how empty tables should be written
  reftable: file format documentation
  refs: improve documentation for ref iterator
  t: use update-ref and show-ref to reading/writing refs
  refs.h: clarify reflog iteration order
2020-06-12 13:57:13 -07:00
Denton Liu
aa06180ac9 lib-submodule-update: prepend "git" to $command
Since all invocations of test_submodule_forced_switch() are git
commands, automatically prepend "git" before invoking
test_submodule_switch_common().

Similarly, many invocations of test_submodule_switch() are also git
commands so automatically prepend "git" before invoking
test_submodule_switch_common() as well.

Finally, for invocations of test_submodule_switch() that invoke a custom
function, rename the old function to test_submodule_switch_func().

This is necessary because in a future commit, we will be adding some
logic that needs to distinguish between an invocation of a plain git
comamnd and an invocation of a test helper function.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 11:33:40 -07:00
Chris Torek
8bfcb3a690 git diff: improve range handling
When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).

This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.

Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:

 * If there is a symmetric difference merge base, this is used
   as the left side of the diff.  The last final ref is used as
   the right side.
 * If there is no merge base, the symmetric status is completely
   lost.  We will produce a combined diff instead.

Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.

To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense.  As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-12 10:53:44 -07:00
Jonathan Tan
dd4b732df7 upload-pack: send part of packfile response as uri
Teach upload-pack to send part of its packfile response as URIs.

An administrator may configure a repository with one or more
"uploadpack.blobpackfileuri" lines, each line containing an OID, a pack
hash, and a URI. A client may configure fetch.uriprotocols to be a
comma-separated list of protocols that it is willing to use to fetch
additional packfiles - this list will be sent to the server. Whenever an
object with one of those OIDs would appear in the packfile transmitted
by upload-pack, the server may exclude that object, and instead send the
URI. The client will then download the packs referred to by those URIs
before performing the connectivity check.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
Jonathan Tan
8d5d2a34df http-fetch: support fetching packfiles by URL
Teach http-fetch the ability to download packfiles directly, given a
URL, and to verify them.

The http_pack_request suite has been augmented with a function that
takes a URL directly. With this function, the hash is only used to
determine the name of the temporary file.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 18:06:34 -07:00
Eric Sunshine
810382ed37 worktree: make "move" refuse to move atop missing registered worktree
"git worktree add" takes special care to avoid creating a new worktree
at a location already registered to an existing worktree even if that
worktree is missing (which can happen, for instance, if the worktree
resides on removable media). "git worktree move", however, is not so
careful when validating the destination location and will happily move
the source worktree atop the location of a missing worktree. This leads
to the anomalous situation of multiple worktrees being associated with
the same path, which is expressly forbidden by design. For example:

    $ git clone foo.git
    $ cd foo
    $ git worktree add ../bar
    $ git worktree add ../baz
    $ rm -rf ../bar
    $ git worktree move ../baz ../bar
    $ git worktree list
    .../foo beefd00f [master]
    .../bar beefd00f [bar]
    .../bar beefd00f [baz]
    $ git worktree remove ../bar
    fatal: validation failed, cannot remove working tree:
        '.../bar' does not point back to '.git/worktrees/bar'

Fix this shortcoming by enhancing "git worktree move" to perform the
same additional validation of the destination directory as done by "git
worktree add".

While at it, add a test to verify that "git worktree move" won't move a
worktree atop an existing (non-worktree) path -- a restriction which has
always been in place but was never tested.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Eric Sunshine
916133ef8e worktree: prune linked worktree referencing main worktree path
"git worktree prune" detects when multiple entries are associated with
the same path and prunes the duplicates, however, it does not detect
when a linked worktree points at the path of the main worktree.
Although "git worktree add" disallows creating a new worktree with the
same path as the main worktree, such a case can arise outside the
control of Git even without the user mucking with .git/worktree/<id>/
administrative files. For instance:

    $ git clone foo.git
    $ git -C foo worktree add ../bar
    $ rm -rf bar
    $ mv foo bar
    $ git -C bar worktree list
    .../bar deadfeeb [master]
    .../bar deadfeeb [bar]

Help the user recover from such corruption by extending "git worktree
prune" to also detect when a linked worktree is associated with the path
of the main worktree.

Reported-by: Jonathan Müller <jonathanmueller.dev@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Eric Sunshine
4a3ce479ce worktree: prune duplicate entries referencing same worktree path
A fundamental restriction of linked working trees is that there must
only ever be a single worktree associated with a particular path, thus
"git worktree add" explicitly disallows creation of a new worktree at
the same location as an existing registered worktree. Nevertheless,
users can still "shoot themselves in the foot" by mucking with
administrative files in .git/worktree/<id>/. Worse, "git worktree move"
is careless[1] and allows a worktree to be moved atop a registered but
missing worktree (which can happen, for instance, if the worktree is on
removable media). For instance:

    $ git clone foo.git
    $ cd foo
    $ git worktree add ../bar
    $ git worktree add ../baz
    $ rm -rf ../bar
    $ git worktree move ../baz ../bar
    $ git worktree list
    .../foo beefd00f [master]
    .../bar beefd00f [bar]
    .../bar beefd00f [baz]

Help users recover from this form of corruption by teaching "git
worktree prune" to detect when multiple worktrees are associated with
the same path.

[1]: A subsequent commit will fix "git worktree move" validation to be
     more strict.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-10 10:54:49 -07:00
Chris Torek
bafa2d741e t/t3430: avoid undefined git diff behavior
The autosquash-and-exec test used "git diff HEAD^!" to mean
"git diff HEAD^ HEAD".  Use these directly instead of relying
on the undefined but actual-current behavior of "HEAD^!".

Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-09 15:13:56 -07:00
Junio C Hamano
0b925a469e Merge branch 'jt/curl-verbose-on-trace-curl'
Rewrite support for GIT_CURL_VERBOSE in terms of GIT_TRACE_CURL.

Looking good.

* jt/curl-verbose-on-trace-curl:
  http, imap-send: stop using CURLOPT_VERBOSE
  t5551: test that GIT_TRACE_CURL redacts password
2020-06-08 18:06:32 -07:00
Junio C Hamano
63e50b8678 Merge branch 'cb/bisect-helper-parser-fix'
The code to parse "git bisect start" command line was lax in
validating the arguments.

* cb/bisect-helper-parser-fix:
  bisect--helper: avoid segfault with bad syntax in `start --term-*`
2020-06-08 18:06:32 -07:00
Junio C Hamano
2bdf00e66a Merge branch 'js/checkout-p-new-file'
"git checkout -p" did not handle a newly added path at all.

* js/checkout-p-new-file:
  checkout -p: handle new files correctly
2020-06-08 18:06:31 -07:00
Junio C Hamano
b37fd14beb Merge branch 'dl/remote-curl-deadlock-fix'
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects.  The communication has
been updated to make it more robust.

* dl/remote-curl-deadlock-fix:
  stateless-connect: send response end packet
  pkt-line: define PACKET_READ_RESPONSE_END
  remote-curl: error on incomplete packet
  pkt-line: extern packet_length()
  transport: extract common fetch_pack() call
  remote-curl: remove label indentation
  remote-curl: fix typo
2020-06-08 18:06:30 -07:00
Junio C Hamano
ded44afa02 Merge branch 'bc/filter-process'
Code simplification and test coverage enhancement.

* bc/filter-process:
  t2060: add a test for switch with --orphan and --discard-changes
  builtin/checkout: simplify metadata initialization
2020-06-08 18:06:30 -07:00
Junio C Hamano
7e75aeb290 Merge branch 'rs/fsck-duplicate-names-in-trees'
The check in "git fsck" to ensure that the tree objects are sorted
still had corner cases it missed unsorted entries.

* rs/fsck-duplicate-names-in-trees:
  fsck: detect more in-tree d/f conflicts
  t1450: demonstrate undetected in-tree d/f conflict
  t1450: increase test coverage of in-tree d/f detection
  fsck: fix a typo in a comment
2020-06-08 18:06:29 -07:00
Junio C Hamano
dc57a9be5e Merge branch 'tb/commit-graph-no-check-oids'
Clean-up the commit-graph codepath.

* tb/commit-graph-no-check-oids:
  commit-graph: drop COMMIT_GRAPH_WRITE_CHECK_OIDS flag
  t5318: reorder test below 'graph_read_expect'
  commit-graph.c: simplify 'fill_oids_from_commits'
  builtin/commit-graph.c: dereference tags in builtin
  builtin/commit-graph.c: extract 'read_one_commit()'
  commit-graph.c: peel refs in 'add_ref_to_set'
  commit-graph.c: show progress of finding reachable commits
  commit-graph.c: extract 'refs_cb_data'
2020-06-08 18:06:27 -07:00
Junio C Hamano
f4cec40dbd Merge branch 'cb/t4210-illseq-auto-detect'
As FreeBSD is not the only platform whose regexp library reports
a REG_ILLSEQ error when fed invalid UTF-8, add logic to detect that
automatically and skip the affected tests.

* cb/t4210-illseq-auto-detect:
  t4210: detect REG_ILLSEQ dynamically and skip affected tests
  t/helper: teach test-regex to report pattern errors (like REG_ILLSEQ)
2020-06-08 18:06:27 -07:00
Junio C Hamano
c3a02824cf Merge branch 'ds/line-log-on-bloom'
"git log -L..." now takes advantage of the "which paths are touched
by this commit?" info stored in the commit-graph system.

* ds/line-log-on-bloom:
  line-log: integrate with changed-path Bloom filters
  line-log: try to use generation number-based topo-ordering
  line-log: more responsive, incremental 'git log -L'
  t4211-line-log: add tests for parent oids
  line-log: remove unused fields from 'struct line_log_data'
2020-06-08 18:06:26 -07:00
SZEDER Gábor
2ad4f1a7c4 commit-graph: simplify parse_commit_graph() #1
While we iterate over all entries of the Chunk Lookup table we make
sure that we don't attempt to read past the end of the mmap-ed
commit-graph file, and check in each iteration that the chunk ID and
offset we are about to read is still within the mmap-ed memory region.
However, these checks in each iteration are not really necessary,
because the number of chunks in the commit-graph file is already known
before this loop from the just parsed commit-graph header.

So let's check that the commit-graph file is large enough for all
entries in the Chunk Lookup table before we start iterating over those
entries, and drop those per-iteration checks.  While at it, take into
account the size of everything that is necessary to have a valid
commit-graph file, i.e. the size of the header, the size of the
mandatory OID Fanout chunk, and the size of the signature in the
trailer as well.

Note that this necessitates the change of the error message as well,
and, consequently, have to update the 'detect incorrect chunk count'
test in 't5318-commit-graph.sh' as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 12:28:49 -07:00
SZEDER Gábor
cb9daf16db commit-graph: fix parsing the Chunk Lookup table
The commit-graph file format specifies that the chunks may be in any
order.  However, if the OID Lookup chunk happens to be the last one in
the file, then any command attempting to access the commit-graph data
will fail with:

  fatal: invalid commit position. commit-graph is likely corrupt

In this case the error is wrong, the commit-graph file does conform to
the specification, but the parsing of the Chunk Lookup table is a bit
buggy, and leaves the field holding the number of commits in the
commit-graph zero-initialized.

The number of commits in the commit-graph is determined while parsing
the Chunk Lookup table, by dividing the size of the OID Lookup chunk
with the hash size.  However, the Chunk Lookup table doesn't actually
store the size of the chunks, but it stores their starting offset.
Consequently, the size of a chunk can only be calculated by
subtracting the starting offsets of that chunk from the offset of the
subsequent chunk, or in case of the last chunk from the offset
recorded in the terminating label.  This is currenly implemented in a
bit complicated way: as we iterate over the entries of the Chunk
Lookup table, we check the ID of each chunk and store its starting
offset, then we check the ID of the last seen chunk and calculate its
size using its previously saved offset if necessary (at the moment
it's only necessary for the OID Lookup chunk).  Alas, while parsing
the Chunk Lookup table we only interate through the "real" chunks, but
never look at the terminating label, thus don't even check whether
it's necessary to calulate the size of the last chunk.  Consequently,
if the OID Lookup chunk is the last one, then we don't calculate its
size and turn don't run the piece of code determining the number of
commits in the commit graph, leaving the field holding that number
unchanged (i.e. zero-initialized), eventually triggering the sanity
check in load_oid_from_graph().

Fix this by iterating through all entries in the Chunk Lookup table,
including the terminating label.

Note that this is the minimal fix, suitable for the maintenance track.
A better fix would be to simplify how the chunk sizes are calculated,
but that is a more invasive change, less suitable for 'maint', so that
will be done in later patches.

This additional flexibility of scanning more chunks breaks a test for
"git commit-graph verify" so alter that test to mutate the commit-graph
to have an even lower chunk count.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 12:28:48 -07:00
SZEDER Gábor
35a9f1e99c tree-walk.c: don't match submodule entries for 'submod/anything'
Submodules should be handled the same as regular directories with
respect to the presence of a trailing slash, i.e. commands like:

  git diff rev1 rev2 -- $path
  git rev-list HEAD -- $path

should produce the same output whether $path is 'submod' or 'submod/'.
This has been fixed in commit 74b4f7f277 (tree-walk.c: ignore trailing
slash on submodule in tree_entry_interesting(), 2014-01-23).

Unfortunately, that commit had the unintended side effect to handle
'submod/anything' the same as 'submod' and 'submod/' as well, e.g.:

  $ git log --oneline --name-only -- sha1collisiondetection/whatever
  4125f78222 sha1dc: update from upstream
  sha1collisiondetection
  07a20f569b Makefile: fix unaligned loads in sha1dc with UBSan
  sha1collisiondetection
  23e37f8e9d sha1dc: update from upstream
  sha1collisiondetection
  86cfd61e6b sha1dc: optionally use sha1collisiondetection as a submodule
  sha1collisiondetection

Fix this by rejecting submodules as partial pathnames when their
trailing slash is followed by anything.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 12:28:48 -07:00
Denton Liu
788db145c7 t/README: avoid poor-man's small caps GIT
In 48a8c26c62 (Documentation: avoid poor-man's small caps GIT,
2013-01-21), the documentation was amended to spell Git's name as Git
when talking about the system as a whole. However, t/README was skipped
over when the treatment was applied.

Bring t/README into conformance with the CodingGuidelines by casing
"Git" properly.

While we're at it, fix a small typo. Change "the git internal" to "the
Git internals".

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-08 10:32:24 -07:00
Jonathan Tan
827e7d4da4 http: redact all cookies, teach GIT_TRACE_REDACT=0
In trace output (when GIT_TRACE_CURL is true), redact the values of all
HTTP cookies by default. Now that auth headers (since the implementation
of GIT_TRACE_CURL in 74c682d3c6 ("http.c: implement the GIT_TRACE_CURL
environment variable", 2016-05-24)) and cookie values (since this
commit) are redacted by default in these traces, also allow the user to
inhibit these redactions through an environment variable.

Since values of all cookies are now redacted by default,
GIT_REDACT_COOKIES (which previously allowed users to select individual
cookies to redact) now has no effect.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 15:05:04 -07:00
Elijah Newren
f1f061e11d dir: fix treatment of negated pathspecs
do_match_pathspec() started life as match_pathspec_depth_1() and for
correctness was only supposed to be called from match_pathspec_depth().
match_pathspec_depth() was later renamed to match_pathspec(), so the
invariant we expect today is that do_match_pathspec() has no direct
callers outside of match_pathspec().

Unfortunately, this intention was lost with the renames of the two
functions, and additional calls to do_match_pathspec() were added in
commits 75a6315f74 ("ls-files: add pathspec matching for submodules",
2016-10-07) and 89a1f4aaf7 ("dir: if our pathspec might match files
under a dir, recurse into it", 2019-09-17).  Of course,
do_match_pathspec() had an important advantge over match_pathspec() --
match_pathspec() would hardcode flags to one of two values, and these
new callers needed to pass some other value for flags.  Also, although
calling do_match_pathspec() directly was incorrect, there likely wasn't
any difference in the observable end output, because the bug just meant
that fill_diretory() would recurse into unneeded directories.  Since
subsequent does-this-path-match checks on individual paths under the
directory would cause those extra paths to be filtered out, the only
difference from using the wrong function was unnecessary computation.

The second of those bad calls to do_match_pathspec() was involved -- via
either direct movement or via copying+editing -- into a number of later
refactors.  See commits 777b420347 ("dir: synchronize
treat_leading_path() and read_directory_recursive()", 2019-12-19),
8d92fb2927 ("dir: replace exponential algorithm with a linear one",
2020-04-01), and 95c11ecc73 ("Fix error-prone fill_directory() API; make
it only return matches", 2020-04-01).  The last of those introduced the
usage of do_match_pathspec() on an individual file, and thus resulted in
individual paths being returned that shouldn't be.

The problem with calling do_match_pathspec() instead of match_pathspec()
is that any negated patterns such as ':!unwanted_path` will be ignored.
Add a new match_pathspec_with_flags() function to fulfill the needs of
specifying special flags while still correctly checking negated
patterns, add a big comment above do_match_pathspec() to prevent others
from misusing it, and correct current callers of do_match_pathspec() to
instead use either match_pathspec() or match_pathspec_with_flags().

One final note is that DO_MATCH_LEADING_PATHSPEC needs special
consideration when working with DO_MATCH_EXCLUDE.  The point of
DO_MATCH_LEADING_PATHSPEC is that if we have a pathspec like
   */Makefile
and we are checking a directory path like
   src/module/component
that we want to consider it a match so that we recurse into the
directory because it _might_ have a file named Makefile somewhere below.
However, when we are using an exclusion pattern, i.e. we have a pathspec
like
   :(exclude)*/Makefile
we do NOT want to say that a directory path like
   src/module/component
is a (negative) match.  While there *might* be a file named 'Makefile'
somewhere below that directory, there could also be other files and we
cannot pre-emptively rule all the files under that directory out; we
need to recurse and then check individual files.  Adjust the
DO_MATCH_LEADING_PATHSPEC logic to only get activated for positive
pathspecs.

Reported-by: John Millikin <jmillikin@stripe.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 15:02:16 -07:00
Xin Li
14c7fa269e check_repository_format_gently(): refuse extensions for old repositories
Previously, extensions were recognized regardless of repository format
version.  If the user sets an undefined "extensions" value on a
repository of version 0 and that value is used by a future git version,
they might get an undesired result.

Because all extensions now also upgrade repository versions, tightening
the check would help avoid this for future extensions.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
Xin Li
98564d8059 sparse-checkout: upgrade repository to version 1 when enabling extension
The 'extensions' configuration variable gets special meaning in the new
repository version, so when enabling the extension we should upgrade the
repository to version 1.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
Xin Li
01bbbbd9da fetch: allow adding a filter after initial clone
Retroactively adding a filter can be useful for existing shallow clones as
they allow users to see earlier change histories without downloading all
git objects in a regular --unshallow fetch.

Without this patch, users can make a clone partial by editing the
repository configuration to convert the remote into a promisor, like:

  git config core.repositoryFormatVersion 1
  git config extensions.partialClone origin
  git fetch --unshallow --filter=blob:none origin

Since the hard part of making this work is already in place and such
edits can be error-prone, teach Git to perform the required configuration
change automatically instead.

Note that this change does not modify the existing git behavior which
recognizes setting extensions.partialClone without changing
repositoryFormatVersion.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
Elijah Newren
b5bfc08a97 sparse-checkout: avoid staging deletions of all files
sparse-checkout's purpose is to update the working tree to have it
reflect a subset of the tracked files.  As such, it shouldn't be
switching branches, making commits, downloading or uploading data, or
staging or unstaging changes.  Other than updating the worktree, the
only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the
index.  In particular, this sets up a nice invariant: running
sparse-checkout will never change the status of any file in `git status`
(reflecting the fact that we only set the SKIP_WORKTREE bit if the file
is safe to delete, i.e. if the file is unmodified).

Traditionally, we did a _really_ bad job with this goal.  The
predecessor to sparse-checkout involved manual editing of
.git/info/sparse-checkout and running `git read-tree -mu HEAD`.  That
command would stage and unstage changes and overwrite dirty changes in
the working tree.

The initial implementation of the sparse-checkout command was no better;
it simply invoked `git read-tree -mu HEAD` as a subprocess and had the
same caveats, though this issue came up repeatedly in review comments
and workarounds for the problems were put in place before the feature
was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6].

[1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@mail.gmail.com/
[2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@mail.gmail.com/
[4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@mail.gmail.com/
[5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@mail.gmail.com/
[6] https://lore.kernel.org/git/20191121163706.GV23183@szeder.dev/

However, these workarounds, in addition to disabling the feature in a
number of important cases, also missed one special case.  I'll get back
to it later.

In the 2.27.0 cycle, the disabling of the feature was lifted by finally
replacing the internal equivalent of `git read-tree -mu HEAD` with
something that did what we wanted: the new update_sparsity() function in
unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index
and updates the working tree to match.  This new function handles all
the cases that were problematic for the old implementation, except that
it breaks the same special case that avoided the workarounds of the old
implementation, but broke it in a different way.

So...that brings us to the special case: a git clone performed with
--no-checkout.  As per the meaning of the flag, --no-checkout does not
check out any branch, with the implication that you aren't on one and
need to switch to one after the clone.  Implementationally, HEAD is
still set (so in some sense you are partially on a branch), but
  * the index is "unborn" (non-existent)
  * there are no files in the working tree (other than .git/)
  * the next time git switch (or git checkout) is run it will run
    unpack_trees with `initial_checkout` flag set to true.
It is not until you run, e.g. `git switch <somebranch>` that the index
will be written and files in the working tree populated.

With this special --no-checkout case, the traditional `read-tree -mu
HEAD` behavior would have done the equivalent of acting like checkout --
switch to the default branch (HEAD), write out an index that matches
HEAD, and update the working tree to match.  This special case slipped
through the avoid-making-changes checks in the original sparse-checkout
command and thus continued there.

After update_sparsity() was introduced and used (see commit f56f31af03
("sparse-checkout: use new update_sparsity() function", 2020-03-27)),
the behavior for the --no-checkout case changed:  Due to git's
auto-vivification of an empty in-memory index (see do_read_index() and
note that `must_exist` is false), and due to sparse-checkout's
update_working_directory() code to always write out the index after it
was done, we got a new bug.  That made it so that sparse-checkout would
switch the repository from a clone with an "unborn" index (i.e. still
needing an initial_checkout), to one that had a recorded index with no
entries.  Thus, instead of all the files appearing deleted in `git
status` being known to git as a special artifact of not yet being on a
branch, our recording of an empty index made it suddenly look to git as
though it was definitely on a branch with ALL files staged for deletion!
A subsequent checkout or switch then had to contend with the fact that
it wasn't on an initial_checkout but had a bunch of staged deletions.

Make sure that sparse-checkout changes nothing in the index other than
the SKIP_WORKTREE bit; in particular, when the index is unborn we do not
have any branch checked out so there is no sparsification or
de-sparsification work to do.  Simply return from
update_working_directory() early.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 08:05:50 -07:00
Johannes Schindelin
46da295a77 clone/fetch: anonymize URLs in the reflog
Even if we strongly discourage putting credentials into the URLs passed
via the command-line, there _is_ support for that, and users _do_ do
that.

Let's scrub them before writing them to the reflog.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 13:20:21 -07:00
Taylor Blau
94fbd9149a t5318: test that '--stdin-commits' respects '--[no-]progress'
The following lines were not covered in a recent line-coverage test
against Git:

  builtin/commit-graph.c
  5b6653e5 244) progress = start_delayed_progress(
  5b6653e5 268) stop_progress(&progress);

These statements are executed when both '--stdin-commits' and
'--progress' are passed. Introduce a trio of tests that exercise various
combinations of these options to ensure that these lines are covered.

More importantly, this is exercising a (somewhat) previously-ignored
feature of '--stdin-commits', which is that it respects '--progress'.
Prior to 5b6653e523 (builtin/commit-graph.c: dereference tags in
builtin, 2020-05-13), dereferencing input from '--stdin-commits' was
done inside of commit-graph.c.

Now that an additional progress meter may be generated from outside of
commit-graph.c, add a corresponding test to make sure that it also
respects '--[no]-progress'.

The other location that generates progress meter output (from d335ce8f24
(commit-graph.c: show progress of finding reachable commits,
2020-05-13)) is already covered by any test that passes '--reachable'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 07:54:08 -07:00
Taylor Blau
6334c5ff97 t5318: use 'test_must_be_empty'
A handful of tests in t5318 use 'test_line_count = 0 ...' to make sure
that some command does not write any output. While correct, it is more
idiomatic to use 'test_must_be_empty' instead. Switch the former
invocations to use the latter instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-04 07:52:54 -07:00
Junio C Hamano
54041832d7 Merge branch 'en/fast-import-looser-date'
Some repositories in the wild have commits that record nonsense
committer timezone (e.g. rails.git); "git fast-import" learned an
option to pass these nonsense timestamps intact to allow recreating
existing repositories as-is.

* en/fast-import-looser-date:
  fast-import: add new --date-format=raw-permissive format
2020-06-02 13:35:05 -07:00
Junio C Hamano
e34df9a6e5 Merge branch 'la/diff-relative-config'
The commands in the "diff" family learned to honor "diff.relative"
configuration variable.

* la/diff-relative-config:
  diff: add config option relative
2020-06-02 13:35:04 -07:00
Junio C Hamano
de82fb45db Merge branch 'rs/checkout-b-track-error'
The error message from "git checkout -b foo -t bar baz" was
confusing.

* rs/checkout-b-track-error:
  checkout: improve error messages for -b with extra argument
  checkout: add tests for -b and --track
2020-06-02 13:35:04 -07:00
Junio C Hamano
1ab0dfde2c Merge branch 'cb/t5608-cleanup'
Test fixup.

* cb/t5608-cleanup:
  t5608: avoid say() and use "skip_all" instead for consistency
2020-06-02 13:35:03 -07:00
Junio C Hamano
56219baf1e Merge branch 'cb/test-use-ere-for-alternation'
Portability fix for tests added recently.

* cb/test-use-ere-for-alternation:
  t: avoid alternation (not POSIX) in grep's BRE
2020-05-31 11:38:44 -07:00
Elijah Newren
d42a2fb72f fast-import: add new --date-format=raw-permissive format
There are multiple repositories in the wild with random, invalid
timezones.  Most notably is a commit from rails.git with a timezone of
"+051800"[1].  A few searches will find other repos with that same
invalid timezone as well.  Further, Peff reports that GitHub relaxed
their fsck checks in August 2011 to accept any timezone value[2], and
there have been multiple reports to filter-repo about fast-import
crashing while trying to import their existing repositories since they
had timezone values such as "-7349423" and "-43455309"[3].

The existing check on timezone values inside fast-import may prove
useful for people who are crafting fast-import input by hand or with a
new script.  For them, the check may help them avoid accidentally
recording invalid dates.  (Note that this check is rather simplistic and
there are still several forms of invalid dates that fast-import does not
check for: dates in the future, timezone values with minutes that are
not divisible by 15, and timezone values with minutes that are 60 or
greater.)  While this simple check may have some value for those users,
other users or tools will want to import existing repositories as-is.
Provide a --date-format=raw-permissive format that will not error out on
these otherwise invalid timezones so that such existing repositories can
be imported.

[1] 4cf94979c9
[2] https://lore.kernel.org/git/20200521195513.GA1542632@coredump.intra.peff.net/
[3] https://github.com/newren/git-filter-repo/issues/88

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-31 09:03:10 -07:00
Carlo Marcelo Arenas Belón
46022ca34f t: avoid alternation (not POSIX) in grep's BRE
f1e3df3169 (t: increase test coverage of signature verification output,
2020-03-04) adds GPG dependent tests to t4202 and t6200 that were found
problematic with at least OpenBSD 6.7.

Using an escaped '|' for alternations works only in some implementations
of grep (e.g. GNU and busybox).

It is not part of POSIX[1] and not supported by some BSD, macOS, and
possibly other POSIX compatible implementations.

Use `grep -E`, and write it using extended regular expression.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-29 15:04:03 -07:00
Jacob Keller
91439928ec completion: improve handling of --orphan option of switch/checkout
The --orphan option is used to create a local branch which is detached
from the current history. In git switch, it always resets to the empty
tree, and thus the only completion we can provide is a branch name.
Follow the same rules for -c/-C (and -b/-B) when completing the argument
to --orphan.

In the case of git switch, after we complete the argument, there is
nothing more we can complete for git switch, so do not even try. Nothing
else would be valid.

In the case of git checkout, --orphan takes a start point which it uses
to determine the checked out tree, even though it created orphaned
history.

Update the previously added test cases as they are now passing.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:57:07 -07:00
Jacob Keller
acb658fe7d completion: improve handling of -c/-C and -b/-B in switch/checkout
A previous commit added several test cases highlighting the subpar
completion logic for -c/-C and -b/-B when completing git switch and git
checkout.

In order to distinguish completing the argument vs the start-point for
this option, we now use the wordlist to determine the previous full word
on the command line.

If it's -c or -C (-b/-B for checkout), then we know that we are
completing the argument for the branch name.

Given that a user who already knows the branch name they want to
complete will simply not use completion, it makes sense to complete the
small subset of local branches when completing the argument for -c/-C.

In all other cases, if -c/-C are on the command line but are not the
most recent option, then we must be completing a start-point, and should
allow completing against all references.

Update the -c/-C and -b/-B tests to indicate they now pass.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:57:07 -07:00
Jacob Keller
00e7bd2b00 completion: improve handling of --track in switch/checkout
Current completion for the --track option of git switch and git checkout
is sub par. In addition to the DWIM logic of a bare branch name, --track
has DWIM logic to convert specified remote/branch names into a local
branch tracking that remote. For example

  $git switch --track origin/master

This will create a local branch name master, that tracks the master
branch of the origin remote.

In fact, git switch --track on its own will not accept other forms of
references. These must instead be specified manually via the -c/-C/-b/-B
options.

Introduce __git_remote_heads() and the "remote-heads" mode for
__git_complete_refs. Use this when the --track option is provided while
completing in _git_switch and _git_checkout. Just as in the --detach
case, we never enable DWIM mode for --track, because it doesn't make
sense.

It should be noted that completion support is still a bit sub par when
it comes to handling -c/-C and --orphan. This will be resolved in
a future change.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:57:07 -07:00
Jacob Keller
6d76a5cc7f completion: improve handling of --detach in checkout
Just like git switch, we should not complete DWIM remote branch names
if --detach has been specified. To avoid this, refactor _git_checkout in
a similar way to _git_switch.

Note that we don't simply clear dwim_opt when we find -d or --detach, as
we will be adding other modes and checks, making this flow easier to
follow.

Update the previously failing tests to show that the breakage has been
resolved.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:57:07 -07:00
Jacob Keller
68d97c7fdd completion: improve completion for git switch with no options
Add a new --mode option to __git_complete_refs, which allows changing
the behavior to call __git_heads instead of __git_refs.

By passing --mode=heads, __git_complete_refs will only output local
branches. This enables using "--mode=heads --dwim" to enable listing
local branches and the remote unique branch names for DWIM.

Refactor completion support to use the new mode option, rather than
calling __git_heads directly. This has the advantage that we can now
correctly allow local branches along with suitable DWIM refs, rather
than only allowing DWIM when we complete all references.

Choose what mode it uses when calling __git_complete_refs. If -d or
--detach have been provided, then simply complete all refs, but
*without* the DWIM option as these DWIM names won't work properly in
--detach mode.

Otherwise, call __git_complete_refs with the default dwim_opt value and
use the new "heads" mode.

In this way, the basic support for completing just "git switch <TAB>"
will result in only local branches and remote unique names for DWIM.

The basic no-options tests for git switch, as well as several of the
-c/-C tests now pass, so remove the known breakage tags.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:57:07 -07:00
Jacob Keller
4e79adf4e5 completion: improve handling of DWIM mode for switch/checkout
A new helper, __git_find_last_on_cmdline is introduced, similar to the
already existing __git_find_on_cmdline, but which operates in reverse,
finding the *last* matching word of the provided wordlist.

Use this in a new __git_checkout_default_dwim_mode() function that will
determine when to enable listing of DWIM remote branches.

The __git_find_last_on_cmdline() function is used to determine which
--guess or --no-guess is in effect. If either one is provided, then we
unconditionally enable or disable the DWIM mode based on the last
provided option.

If neither --guess nor --no-guess is provided, then we check for
--no-track, and finally for GIT_COMPLETION_CHECKOUT_NO_GUESS=1.

This function is then used in _git_switch and _git_checkout to improve
the handling for when we enable listing of these DWIM remote branches.

This new logic is more robust, as we will correctly identify superseded
options, and ensure that both _git_switch and _git_checkout enable DWIM
in similar ways.

We can now update a few tests to indicate they pass. A few of the tests
previously added to highlight issues with the old DWIM logic still fail.
This is because of a separate issue related to the default completion
behavior of git switch, which will be addressed in a future change.

Additionally, due to this change, a few tests for the -b/-B handling of
git checkout now fail. This is a minor regression, and will be fixed by
a following change that improves the overall handling of -b/-B. Mark
these tests as known breakages for now.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:53:24 -07:00
Jacob Keller
c81ca56bca completion: add tests showing subpar switch/checkout --orphan logic
Similar to -c/-C, --orphan takes an argument which is the branch name to
use. We ought to complete this branch name using similar rules as to how
we complete new branch names for -c/-C and -b/-B. Namely, limit the
total number of options provided by completing to the local branches.

Additionally, git switch --orphan does not take any start point and will
always create using the empty-tree. Thus, after the branch name is
completed, git switch --orphan should not complete any references.

Add test cases showing the expected behavior of --orphan, for both the
argument and starting point.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:53:24 -07:00
Jacob Keller
7f59d60429 completion: add tests showing subpar -c/C argument completion
When using the branch creation argument for git switch or git checkout
(-c/-C or -b/-B), the commands switch to a different mode: `git switch
-c <branch> <some-referance>` means to create a branch named <branch> at
the commit referred to by <some-reference>.

When completing git switch or git checkout, it makes sense to complete
the branch name differently from the start point.

When completing a branch, one might consider that we do not have
anything worth completing. After all, a new branch must have an entirely
new name. Consider, however, that if a user names branches using some
similar scheme, they might wish to name a new branch by modifying the
name of an existing branch.

To avoid overloading completion for the argument, it seems reasonable to
complete only the local branch names and the valid "Do What I Mean"
remote branch names.

Add tests for the completion of the argument to -c/-C and -b/-B,
highlighting this preferred completion behavior.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-28 12:53:24 -07:00