Commit Graph

59228 Commits

Author SHA1 Message Date
Jeff King
7383b25d76 bisect: stop referring to sha1_array
Our join_sha1_array_hex() function long ago switched to using an
oid_array; let's change the name to match.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
Jeff King
ed4b804e46 test-tool: rename sha1-array to oid-array
This matches the actual data structure name, as well as the source file
that contains the code we're testing. The test scripts need updating to
use the new name, as well.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
Jeff King
fe299ec5ae oid_array: rename source file from sha1-array
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.

Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).

I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
Jeff King
eccce5253b oid_array: use size_t for iteration
The previous commit started using size_t for our allocations. There are
some iterations that use int or unsigned, though. These aren't dangerous
with respect to memory, but they could produce incorrect results.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
Jeff King
600bee4e70 oid_array: use size_t for count and allocation
The oid_array object uses an "int" to store the number of items and the
allocated size. It's rather unlikely for somebody to have more than 2^31
objects in a repository (the sha1's alone would be 40GB!), but if they
do, we'd overflow our alloc variable.

You can reproduce this case with something like:

  git init repo
  cd repo

  # make a pack with 2^24 objects
  perl -e '
    my $nr = 2**24;

    for (my $i = 0; $i < $nr; $i++) {
	    print "blob\n";
	    print "data 4\n";
	    print pack("N", $i);
    }
  ' | git fast-import

  # now make 256 copies of it; most of these objects will be duplicates,
  # but oid_array doesn't de-dup until all values are read and it can
  # sort the result.
  cd .git/objects/pack/
  pack=$(echo *.pack)
  idx=$(echo *.idx)
  for i in $(seq 0 255); do
    # no need to waste disk space
    ln "$pack" "pack-extra-$i.pack"
    ln "$idx" "pack-extra-$i.idx"
  done

  # and now force an oid_array to store all of it
  git cat-file --batch-all-objects --batch-check

which results in:

  fatal: size_t overflow: 32 * 18446744071562067968

So the good news is that st_mult() sees the problem (the large number is
because our int wraps negative, and then that gets cast to a size_t),
doing the job it was meant to: bailing in crazy situations rather than
causing an undersized buffer.

But we should avoid hitting this case at all, and instead limit
ourselves based on what malloc() is willing to give us. We can easily do
that by switching to size_t.

The cat-file process above made it to ~120GB virtual set size before the
integer overflow (our internal hash storage is 32-bytes now in
preparation for sha256, so we'd expect ~128GB total needed, plus
potentially more to copy from one realloc'd block to another)). After
this patch (and about 130GB of RAM+swap), it does eventually read in the
whole set. No test for obvious reasons.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
brian m. carlson
2149b6748f docs: add a FAQ
Git is an enormously flexible and powerful piece of software.  However,
it can be intimidating for many users and there are a set of common
questions that users often ask.  While we already have some new user
documentation, it's worth adding a FAQ to address common questions that
users often have.  Even though some of this is addressed elsewhere in
the documentation, experience has shown that it is difficult for users
to find, so a centralized location is helpful.

Add such a FAQ and fill it with some common questions and answers.
While there are few entries now, we can expand it in the future to cover
more things as we find new questions that users have.  Let's also add
section markers so that people answering questions can directly link
users to the proper answer.

The FAQ also addresses common configuration questions that apply not
only to Git as an independent piece of software but also the ecosystem
of CI tools and hosting providers that people use, since these are the
source of common questions.  An attempt has been made to avoid
mentioning any particular provider or tool, but to nevertheless cover
common configurations that apply to a wide variety of such tools.

Note that the long lines for certain questions are required, since
Asciidoctor does not permit broken lines there.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:39:48 -07:00
Patrick Steinhardt
bd021f3910 strbuf: provide function to append whole lines
While the strbuf interface already provides functions to read a line
into it that completely replaces its current contents, we do not have an
interface that allows for appending lines without discarding current
contents.

Add a new function `strbuf_appendwholeline` that reads a line including
its terminating character into a strbuf non-destructively. This is a
preparatory step for git-update-ref(1) reading standard input line-wise
instead of as a block.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:34:11 -07:00
Patrick Steinhardt
faa35eec4d git-update-ref.txt: add missing word
The description for the "verify" command is lacking a single word "is",
which this commit corrects.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:34:11 -07:00
Patrick Steinhardt
edc30691e5 refs: fix segfault when aborting empty transaction
When cleaning up a transaction that has no updates queued, then the
transaction's backend data will not have been allocated. We correctly
handle this for the packed backend, where the cleanup function checks
whether the backend data has been allocated at all -- if not, then there
is nothing to clean up. For the files backend we do not check this and
as a result will hit a segfault due to dereferencing a `NULL` pointer
when cleaning up such a transaction.

Fix the issue by checking whether `backend_data` is set in the files
backend, too.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:34:11 -07:00
Jonathan Tan
2b98478c6f connected: always use partial clone optimization
With 50033772d5 ("connected: verify promisor-ness of partial clone",
2020-01-30), the fast path (checking promisor packs) in
check_connected() now passes a subset of the slow path (rev-list) - if
all objects to be checked are found in promisor packs, both the fast
path and the slow path will pass; otherwise, the fast path will
definitely not pass. This means that we can always attempt the fast path
whenever we need to do the slow path.

The fast path is currently guarded by a flag; therefore, remove that
flag. Also, make the fast path fallback to the slow path - if the fast
path fails, the failing OID and all remaining OIDs will be passed to
rev-list.

The main user-visible benefit is the performance of fetch from a partial
clone - specifically, the speedup of the connectivity check done before
the fetch. In particular, a no-op fetch into a partial clone on my
computer was sped up from 7 seconds to 0.01 seconds. This is a
complement to the work in 2df1aa239c ("fetch: forgo full
connectivity check if --filter", 2020-01-30), which is the child of the
aforementioned 50033772d5. In that commit, the connectivity check
*after* the fetch was sped up.

The addition of the fast path might cause performance reductions in
these cases:

 - If a partial clone or a fetch into a partial clone fails, Git will
   fruitlessly run rev-list (it is expected that everything fetched
   would go into promisor packs, so if that didn't happen, it is most
   likely that rev-list will fail too).

 - Any connectivity checks done by receive-pack, in the (in my opinion,
   unlikely) event that a partial clone serves receive-pack.

I think that these cases are rare enough, and the performance reduction
in this case minor enough (additional object DB access), that the
benefit of avoiding a flag outweighs these.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 10:37:44 -07:00
Junio C Hamano
9fadedd637 Merge branch 'ds/default-pack-use-sparse-to-true'
The 'pack.useSparse' configuration variable now defaults to 'true',
enabling an optimization that has been experimental since Git 2.21.

* ds/default-pack-use-sparse-to-true:
  pack-objects: flip the use of GIT_TEST_PACK_SPARSE
  config: set pack.useSparse=true by default
2020-03-29 09:32:51 -07:00
Martin Ågren
5a80d85bbe INSTALL: drop support for docbook-xsl before 1.74
Several of the previous commits have been bumping the minimum supported
version of docbook-xsl and dropping various workarounds. Most recently,
we made the minimum be 1.73.0.

In INSTALL, we claim that with 1.73, one needs a certain patch in
contrib/patches/. There is no such patch. It was added in 2ec39edad9
("INSTALL: add warning on docbook-xsl 1.72 and 1.73", 2007-08-03) and
dropped in 9721ac9010 ("contrib: remove continuous/ and patches/",
2013-06-03).

Rather than resurrecting version 1.73 and the patch and testing them,
just raise our minimum supported docbook-xsl version to 1.74.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 09:25:38 -07:00
Martin Ågren
f7421a1438 manpage-normal.xsl: fold in manpage-base.xsl
After an earlier commit, we only include manpage-base.xsl from a single
file, manpage-normal.xsl. Fold the former into the latter.

We only ever needed the "base, normal and non-normal" construct to
support a single non-normal case, namely to work around issues with
docbook-xsl 1.72 handling backslashes and dots. If we ever need
something like this again, we can re-introduce manpage-base.xsl and
friends. Whatever issue we'd be trying to work around, it probably
wouldn't involve dots and backslashes like this, anyway.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 09:25:38 -07:00
Martin Ågren
4344be057e manpage-bold-literal.xsl: stop using git.docbook.backslash
We used to assign git.docbook.backslash one of two different values --
one "normal" and one for working around a problem with docbook-xsl 1.72.
After the previous commit, we don't support that version anymore and
always use the "normal" value, a literal backslash.

Just explicitly use a backslash instead of using git.docbook.backslash.
The next commit will drop the definition of git.docbook.backslash
entirely.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 09:25:38 -07:00
Martin Ågren
388f5b52b0 Doc: drop support for docbook-xsl before 1.73.0
Drop the DOCBOOK_XSL_172 config knob, which was needed with docbook-xsl
1.72 (but neither 1.71 nor 1.73). Version 1.73.0 is more than twelve
years old.

Together with the last few commits, we are now at a point where we don't
have any Makefile knobs to cater to old/broken versions of docbook-xsl.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 09:25:38 -07:00
Martin Ågren
40b970078b Doc: drop support for docbook-xsl before 1.72.0
docbook-xsl 1.72.0 is thirteen years old. Drop the ASCIIDOC_ROFF knob
which was needed to support 1.68.1 - 1.71.1. The next commit will
increase the required/assumed version further.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 09:25:38 -07:00
Martin Ågren
def3ce00ae Doc: drop support for docbook-xsl before 1.71.1
Drop the DOCBOOK_SUPPRESS_SP mechanism, which needs to be used with
docbook-xsl versions 1.69.1 through 1.71.0.

We probably broke this for Asciidoctor builds in f6461b82b9
("Documentation: fix build with Asciidoctor 2", 2019-09-15). That is, we
should/could fix this similar to 55aca515eb ("manpage-bold-literal.xsl:
match for namespaced "d:literal" in template", 2019-10-31). But rather
than digging out such an old version of docbook-xsl to test that, let's
just use this as an excuse for dropping this decade-old workaround.

DOCBOOK_SUPPRESS_SP was not needed with docbook-xsl 1.69.0 and older.
Maybe such old versions still work fine on our docs, or maybe not. Let's
just refer to everything before 1.71.1 as "not supported". The next
commit will increase the required/assumed version further.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 09:25:38 -07:00
Julien Moutinho
2ecfcdecc6 gitweb: fix UTF-8 encoding when using CGI::Fast
FCGI streams are implemented using the older stream API: TIEHANDLE,
therefore applying PerlIO layers using binmode() has no effect to them.
The solution in this patch is to redefine the FCGI::Stream::PRINT function
to use UTF-8 as output encoding, except within git_blob_plain() and git_snapshot()
which must still output in raw binary mode.

This problem and solution were previously reported back in 2012:
- http://git.661346.n2.nabble.com/Gitweb-running-as-FCGI-does-not-print-its-output-in-UTF-8-td7573415.html
- http://stackoverflow.com/questions/5005104

Signed-off-by: Julien Moutinho <julm+git@sourcephile.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 09:06:51 -07:00
Jeff King
cacae4329f test-lib-functions: simplify packetize() stdin code
The code path in packetize() for reading stdin needs to handle NUL
bytes, so we can't rely on shell variables. However, the current code
takes a whopping 4 processes and uses a temporary file. We can do this
much more simply and efficiently by using a single perl invocation (and
we already rely on perl in the matching depacketize() function).

We'll keep the non-stdin code path as it is, since that uses zero extra
processes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 08:49:47 -07:00
Junio C Hamano
78725ebda9 CodingGuidelines: allow ${#posix} == strlen($posix)
The construct has been in POSIX for the past 10+ years, and we have
used in t9xxx (subversion) series of the tests, so we know it is at
portable across systems that people have run those tests, which is
almost everything we'd care about.

Let's loosen the rule; luckily, the check-non-portable-shell script
does not have any rule to find its use, so the only change needed is
a removal of one paragraph from the documentation.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 08:41:54 -07:00
Junio C Hamano
7cc112dc95 t/README: suggest how to leave test early with failure
Over time, we added the support to our test framework to make it
easy to leave a test early with failure, but it was not clearly
documented in t/README to help developers writing new tests.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-29 08:39:40 -07:00
Philippe Blain
344420bf0f git-rebase.txt: fix typo
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-28 18:14:33 -07:00
René Scharfe
13ac5edbfa pull: pass documented fetch options on
The fetch options --deepen, --negotiation-tip, --server-option,
--shallow-exclude, and --shallow-since are documented for git pull as
well, but are not actually accepted by that command.  Pass them on to
make the code match its documentation.

Reported-by: 天几 <muzimuzhi@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-28 18:13:11 -07:00
René Scharfe
f05558f3e2 pull: remove --update-head-ok from documentation
'git pull' implicitly passes --update-head-ok to 'git fetch', but
doesn't itself accept that option from users.  That makes sense, as it
wouldn't work without the possibility to update HEAD.  Remove the option
from the command's documentation to match its actual behavior.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-28 18:13:09 -07:00
Alban Gruin
4d55d63bde sequencer: mark messages for translation
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-28 18:11:25 -07:00
Denton Liu
7cd54d37dc wrapper: indent with tabs
The codebase uses tabs for indentation. Convert an erroneous space
indent into a tab indent.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-28 18:06:51 -07:00
Damien Robert
796d61cdc0 midx.c: fix an integer underflow
When verifying a midx index with 0 objects, the
    m->num_objects - 1
underflows and wraps around to 4294967295.

Fix this both by checking that the midx contains at least one oid,
and also that we don't write any midx when there is no packfiles.

Update the tests to check that `git multi-pack-index write` does
not write an midx when there is no objects, and another to check
that `git multi-pack-index verify` warns when it verifies an midx with no
objects. For this last test, use t5319/no-objects.midx which was
generated by an older version of git.

Signed-off-by: Damien Robert <damien.olivier.robert+git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-28 16:50:40 -07:00
Elijah Newren
fbae70ddc6 pull: avoid running both merge and rebase
When opt_rebase is true, we still first check if we can fast-forward.
If the branch is fast-forwardable, then we can avoid the rebase and just
use merge to do the fast-forward logic.  However, when commit a6d7eb2c7a
("pull: optionally rebase submodules (remote submodule changes only)",
2017-06-23) added the ability to rebase submodules it accidentally
caused us to run BOTH a merge and a rebase.  Add a flag to avoid doing
both.

This was found when a user had both pull.rebase and rebase.autosquash
set to true.  In such a case, the running of both merge and rebase would
cause ORIG_HEAD to be updated twice (and match HEAD at the end instead
of the commit before the rebase started), against expectation.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 15:54:54 -07:00
Jeff King
897d68e7af Makefile: use curl-config --cflags
We add the result of "curl-config --libs" when linking curl programs,
but we never bother calling "curl-config --cflags". Presumably nobody
noticed because:

  - a system libcurl installed into /usr/include/curl wouldn't need any
    flags ("/usr/include" is already in the search path, and the
    #include lines all look <curl/curl.h>, etc).

  - using CURLDIR sets up both the includes and the library path

However, if you prefer CURL_CONFIG to CURLDIR, something simple like:

  make CURL_CONFIG=/path/to/curl-config

doesn't work. We'd link against the libcurl specified by that program,
but not find its header files when compiling.

Let's invoke "curl-config --cflags" similar to the way we do for
"--libs". Note that we'll feed the result into BASIC_CFLAGS. The rest of
the Makefile doesn't distinguish which files need curl support during
compilation and which do not. That should be OK, though. At most this
should be adding a "-I" directive, and this is how CURLDIR already
behaves. And since we follow the immediate-variable pattern from
CURL_LDFLAGS, we won't accidentally invoke curl-config once per
compilation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 15:11:54 -07:00
Jeff King
94a88e2524 Makefile: avoid running curl-config multiple times
If the user hasn't set the CURL_LDFLAGS Makefile variable, we invoke
curl-config like this:

  CURL_LIBCURL += $(shell $(CURL_CONFIG) --libs)

Because the shell function is run when the value is expanded, we invoke
curl-config each time we need to link something (which generally ends up
being four times for a full build).

Instead, let's use an immediate Makefile variable, which only needs
expanding once. We can't combine that with the existing "+=", but since
we only do this when CURL_LDFLAGS is undefined, we can just set that
variable.

That also allows us to simplify our conditional a bit, since both sides
will then put the result into CURL_LIBCURL. While we're touching it,
let's fix the indentation to match the nearby code (we're inside an
outer conditional, so everything else is indented one level).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 15:11:53 -07:00
Jeff King
14d277879c p5310: stop timing non-bitmap pack-to-disk
Commit 645c432d61 (pack-objects: use reachability bitmap index when
generating non-stdout pack, 2016-09-10) added two timing tests for
packing to an on-disk file, both with and without bitmaps. However, the
non-bitmap one isn't interesting to have as part of p5310's regression
suite. It _could_ be used as a baseline to show off the improvement in
the bitmap case, but:

  - the point of the t/perf suite is to find performance regressions,
    and it won't help with that. We don't compare the numbers between
    two tests (which the perf suite has no idea are even related), and
    any change in its numbers would have nothing to do with bitmaps.

  - it did show off the improvement in the commit message of 645c432d61,
    but it wasn't even necessary there. The bitmap case already shows an
    improvement (because before the patch, it behaved the same as the
    non-bitmap case), and the perf suite is even able to show the
    difference between the before and after measurements.

On top of that, it's one of the most expensive tests in the suite,
clocking in around 60s for linux.git on my machine (as compared to 16s
for the bitmapped version). And by default when using "./run", we'd run
it three times!

So let's just drop it. It's not useful and is adding minutes to perf
runs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 15:11:21 -07:00
Jeff King
4845b77245 upload-pack: handle unexpected delim packets
When processing the arguments list for a v2 ls-refs or fetch command, we
loop like this:

  while (packet_reader_read(request) != PACKET_READ_FLUSH) {
          const char *arg = request->line;
	  ...handle arg...
  }

to read and handle packets until we see a flush. The hidden assumption
here is that anything except PACKET_READ_FLUSH will give us valid packet
data to read. But that's not true; PACKET_READ_DELIM or PACKET_READ_EOF
will leave packet->line as NULL, and we'll segfault trying to look at
it.

Instead, we should follow the more careful model demonstrated on the
client side (e.g., in process_capabilities_v2): keep looping as long
as we get normal packets, and then make sure that we broke out of the
loop due to a real flush. That fixes the segfault and correctly
diagnoses any unexpected input from the client.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 12:18:48 -07:00
Jeff King
88124ab263 test-lib-functions: make packetize() more efficient
The packetize() function takes its input on stdin, and requires 4
separate sub-processes to format a simple string. We can do much better
by getting the length via the shell's "${#packet}" construct. The one
caveat is that the shell can't put a NUL into a variable, so we'll have
to continue to provide the stdin form for a few calls.

There are a few other cleanups here in the touched code:

 - the stdin form of packetize() had an extra stray "%s" when printing
   the packet

 - the converted calls in t5562 can be made simpler by redirecting
   output as a block, rather than repeated appending

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:50:54 -07:00
Elijah Newren
5644ca28cd sparse-checkout: provide a new reapply subcommand
If commands like merge or rebase materialize files as part of their work,
or a previous sparse-checkout command failed to update individual files
due to dirty changes, users may want a command to simply 'reapply' the
sparsity rules.  Provide one.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:31 -07:00
Elijah Newren
681c637b4a unpack-trees: failure to set SKIP_WORKTREE bits always just a warning
Setting and clearing of the SKIP_WORKTREE bit is not only done when
users run 'sparse-checkout'; other commands such as 'checkout' also run
through unpack_trees() which has logic for handling this special bit.
As such, we need to consider how they handle special cases.  A couple
comparison points should help explain the rationale for changing how
unpack_trees() handles these bits:

    Ignoring sparse checkouts for a moment, if you are switching
    branches and have dirty changes, it is only considered an error that
    will prevent the branch switching from being successful if the dirty
    file happens to be one of the paths with different contents.

    SKIP_WORKTREE has always been considered advisory; for example, if
    rebase or merge need or even want to materialize a path as part of
    their work, they have always been allowed to do so regardless of the
    SKIP_WORKTREE setting.  This has been used for unmerged paths, but
    it was often used for paths it wasn't needed just because it made
    the code simpler.  It was a best-effort consideration, and when it
    materialized paths contrary to the SKIP_WORKTREE setting, it was
    never required to even print a warning message.

In the past if you trying to run e.g. 'git checkout' and:
  1) you had a path that was materialized and had some dirty changes
  2) the path was listed in $GITDIR/info/sparse-checkout
  3) this path did not different between the current and target branches
then despite the comparison points above, the inability to set
SKIP_WORKTREE was treated as a *hard* error that would abort the
checkout operation.  This is completely inconsistent with how
SKIP_WORKTREE is handled elsewhere, and rather annoying for users as
leaving the paths materialized in the working copy (with a simple
warning) should present no problem at all.

Downgrade any errors from inability to toggle the SKIP_WORKTREE bit to a
warning and allow the operations to continue.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
ebb568b9e2 unpack-trees: provide warnings on sparse updates for unmerged paths too
When sparse-checkout runs to update the list of sparsity patterns, it
gives warnings if it can't remove paths from the working tree because
those files have dirty changes.  Add a similar warning for unmerged
paths as well.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
22ab0b37d8 unpack-trees: make sparse path messages sound like warnings
The messages for problems with sparse paths are phrased as errors that
cause the operation to abort, even though we are not making the
operation abort.  Reword the messages to make sense in their new
context.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
6271d77cb1 unpack-trees: split display_error_msgs() into two
display_error_msgs() is never called to show messages of both ERROR_*
and WARNING_* types at the same time; it is instead called multiple
times, separately for each type.  Since we want to display these types
differently, make two slightly different versions of this function.

A subsequent commit will further modify unpack_trees() and how it calls
the new display_warning_msgs().

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
1ac83f42da unpack-trees: rename ERROR_* fields meant for warnings to WARNING_*
We want to treat issues with setting the SKIP_WORKTREE bit as a warning
rather than an error; rename the enum values to reflect this intent as
a simple step towards that goal.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
cd002c1561 unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlier
A minor change, but we want to convert the sparse messages to warnings
and this allows us to group warnings and errors.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
4ee5d50fc3 sparse-checkout: use improved unpack_trees porcelain messages
setup_unpack_trees_porcelain() provides much improved error/warning
messages; instead of a message that assumes that there is only one path
with a given problem despite being used by code that intentionally is
grouping and showing errors together, it uses a message designed to be
used with groups of paths.  For example, this transforms

    error: Entry '	folder1/a
	folder2/a
    ' not uptodate. Cannot update sparse checkout.

into

    error: Cannot update sparse checkout: the following entries are not up to date:
	folder1/a
	folder2/a

In the past the suboptimal messages were never actually triggered
because we would error out if the working directory wasn't clean before
we even called unpack_trees().  The previous commit changed that,
though, so let's use the better error messages.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
f56f31af03 sparse-checkout: use new update_sparsity() function
Remove the equivalent of 'git read-tree -mu HEAD' in the sparse-checkout
codepaths for setting the SKIP_WORKTREE bits and instead use the new
update_sparsity() function.

Note that when an issue is hit, the error message splits 'error' and
'Cannot update sparse checkout' on separate lines.  For now, we use two
greps to find both pieces of the error message but subsequent commits
will clean up the messages reported to the user.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
7af7a25853 unpack-trees: add a new update_sparsity() function
Previously, the only way to update the SKIP_WORKTREE bits for various
paths was invoking `git read-tree -mu HEAD` or calling the same code
that this codepath invoked.  This however had a number of problems if
the index or working directory were not clean.  First, let's consider
the case:

  Flipping SKIP_WORKTREE -> !SKIP_WORKTREE (materializing files)

If the working tree was clean this was fine, but if there were files or
directories or symlinks or whatever already present at the given path
then the operation would abort with an error.  Let's label this case
for later discussion:

    A) There is an untracked path in the way

Now let's consider the opposite case:

  Flipping !SKIP_WORKTREE -> SKIP_WORKTREE (removing files)

If the index and working tree was clean this was fine, but if there were
any unclean paths we would run into problems.  There are three different
cases to consider:

    B) The path is unmerged
    C) The path has unstaged changes
    D) The path has staged changes (differs from HEAD)

If any path fell into case B or C, then the whole operation would be
aborted with an error.  With sparse-checkout, the whole operation would
be aborted for case D as well, but for its predecessor of using `git
read-tree -mu HEAD` directly, any paths that fell into case D would be
removed from the working copy and the index entry for that path would be
reset to match HEAD -- which looks and feels like data loss to users
(only a few are even aware to ask whether it can be recovered, and even
then it requires walking through loose objects trying to match up the
right ones).

Refusing to remove files that have unsaved user changes is good, but
refusing to work on any other paths is very problematic for users.  If
the user is in the middle of a rebase or has made modifications to files
that bring in more dependencies, then for their build to work they need
to update the sparse paths.  This logic has been preventing them from
doing so.  Sometimes in response, the user will stage the files and
re-try, to no avail with sparse-checkout or to the horror of losing
their changes if they are using its predecessor of `git read-tree -mu
HEAD`.

Add a new update_sparsity() function which will not error out in any of
these cases but behaves as follows for the special cases:
    A) Leave the file in the working copy alone, clear the SKIP_WORKTREE
       bit, and print a warning (thus leaving the path in a state where
       status will report the file as modified, which seems logical).
    B) Do NOT mark this path as SKIP_WORKTREE, and leave it as unmerged.
    C) Do NOT mark this path as SKIP_WORKTREE and print a warning about
       the dirty path.
    D) Mark the path as SKIP_WORKTREE, but do not revert the version
       stored in the index to match HEAD; leave the contents alone.

I tried a different behavior for A (leave the SKIP_WORKTREE bit set),
but found it very surprising and counter-intuitive (e.g. the user sees
it is present along with all the other files in that directory, tries to
stage it, but git add ignores it since the SKIP_WORKTREE bit is set).  A
& C seem like optimal behavior to me.  B may be as well, though I wonder
if printing a warning would be an improvement.  Some might be slightly
surprised by D at first, but given that it does the right thing with
`git commit` and even `git commit -a` (`git add` ignores entries that
are marked SKIP_WORKTREE and thus doesn't delete them, and `commit -a`
is similar), it seems logical to me.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
30e89c12f0 unpack-trees: pull sparse-checkout pattern reading into a new function
Create a populate_from_existing_patterns() function for reading the
path_patterns from $GIT_DIR/info/sparse-checkout so that we can re-use
it elsewhere.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
3cc7c50402 unpack-trees: do not mark a dirty path with SKIP_WORKTREE
If a path is dirty, removing from the working tree risks losing data.
As such, we want to make sure any such path is not marked with
SKIP_WORKTREE.  While the current callers of this code detect this case
and re-populate with a previous set of sparsity patterns, we want to
allow some paths to be marked with SKIP_WORKTREE while others are left
unmarked without it being considered an error.  The reason this
shouldn't be considered an error is that SKIP_WORKTREE has always been
an advisory-only setting; merge and rebase for example were free to
materialize paths and clear the SKIP_WORKTREE bit in order to accomplish
their work even though they kept the SKIP_WORKTREE bit set for other
paths.  Leaving dirty working files in the working tree is thus a
natural extension of what we have already been doing.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:30 -07:00
Elijah Newren
b0a5a12a60 unpack-trees: allow check_updates() to work on a different index
check_updates() previously assumed it was working on o->result.  We want
to use this function in combination with a different index_state, so
take the intended index_state as a parameter.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:29 -07:00
Elijah Newren
72064ee578 t1091: make some tests a little more defensive against failures
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:29 -07:00
Elijah Newren
fa0bde45cd unpack-trees: simplify pattern_list freeing
commit e091228e17 ("sparse-checkout: update working directory
in-process", 2019-11-21) allowed passing a pre-defined set of patterns
to unpack_trees().  However, if o->pl was NULL, it would still read the
existing patterns and use those.  If those patterns were read into a
data structure that was allocated, naturally they needed to be free'd.
However, despite the same function being responsible for knowing about
both the allocation and the free'ing, the logic for tracking whether to
free the pattern_list was hoisted to an outer function with an
additional flag in unpack_trees_options.  Put the logic back in the
relevant function and discard the now unnecessary flag.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:29 -07:00
Elijah Newren
d61633ae18 unpack-trees: simplify verify_absent_sparse()
verify_absent_sparse() was introduced in commit 08402b0409
("merge-recursive: distinguish "removed" and "overwritten" messages",
2010-08-11), and has always had exactly one caller which always passes
error_type == ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN.  This function
then checks whether error_type is this value, and if so, sets it instead
to ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN.  It has been nearly a decade
and no other caller has been created, and no other value has ever been
passed, so just pass the expected value to begin with.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:29 -07:00
Elijah Newren
d7dc1e1668 unpack-trees: remove unused error type
commit 08402b0409 ("merge-recursive: distinguish "removed" and
"overwritten" messages", 2010-08-11) split
    ERROR_WOULD_LOSE_UNTRACKED
into both
    ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN
    ERROR_WOULD_LOSE_UNTRACKED_REMOVED
and also split
    ERROR_WOULD_LOSE_ORPHANED
into both
    ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN
    ERROR_WOULD_LOSE_ORPHANED_REMOVED

However, despite the split only three of these four types were used.
ERROR_WOULD_LOSE_ORPHANED_REMOVED was not put into use when it was
introduced and nothing else has used it in the intervening decade
either.  Remove it.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-27 11:33:29 -07:00