"git checkout -f <branch>" while the index has an unmerged path
incorrectly left some paths in an unmerged state, which has been
corrected.
* nd/checkout-f-while-conflicted-fix:
unpack-trees: fix oneway_merge accidentally carry over stage index
"git format-patch" used overwrite an existing patch/cover-letter
file. A new "--no-clobber" option stops it.
* jc/format-patch-error-check:
format-patch: notice failure to open cover letter for writing
builtin/log: downcase the beginning of error messages
A corner-case object name ambiguity while the sequencer machinery
is working (e.g. "rebase -i -x") has been (half) fixed.
* js/get-short-oid-drop-cache:
get_oid(): when an object was not found, try harder
sequencer: move stale comment into correct location
sequencer: improve error message when an OID could not be parsed
rebase -i: demonstrate obscure loose object cache bug
"git init" forgot to read platform-specific repository
configuration, which made Windows port to ignore settings of
core.hidedotfiles, for example.
* js/init-db-update-for-mingw:
mingw: respect core.hidedotfiles = false in git-init again
Error messages given from the http transport have been updated so
that they can be localized.
* js/remote-curl-i18n:
remote-curl: mark all error messages for translation
remote-http transport did not anonymize URLs reported in its error
messages at places.
* js/anonymize-remote-curl-diag:
curl: anonymize URLs in error messages and warnings
Documentation mark-up fixes.
* ma/asciidoctor-fixes-more:
Documentation: turn middle-of-line tabs into spaces
git-svn.txt: drop escaping '\' that ends up being rendered
git.txt: remove empty line before list continuation
config/fsck.txt: avoid starting line with dash
config/diff.txt: drop spurious backtick
Build fix around use of asciidoctor instead of asciidoc
* ma/asciidoctor-fixes:
asciidoctor-extensions: fix spurious space after linkgit
Documentation/Makefile: add missing dependency on asciidoctor-extensions
Documentation/Makefile: add missing xsl dependencies for manpages
Help developers by making it easier to run most of the tests under
different versions of over-the-wire protocols.
* jt/test-protocol-version:
t5552: compensate for v2 filtering ref adv.
tests: fix protocol version for overspecifications
t5700: only run with protocol version 1
t5512: compensate for v0 only sending HEAD symrefs
t5503: fix overspecification of trace expectation
tests: always test fetch of unreachable with v0
t5601: check ssh command only with protocol v0
tests: define GIT_TEST_PROTOCOL_VERSION
Once upon a time the force flag meant something when writing info/refs,
but it hasn't done anything since 60d0526aaa (Unoptimize info/refs
creation., 2005-09-14).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing objects/info/packs, we use the basename of each pack
(i.e., just the "pack-1234abcd.pack" part). We compute that manually by
adding "objdirlen + 6" to the name.
This _should_ work consistently, as we do not include non-local packs,
meaning everything should be in $objdir/pack/. Before f13d7db4af
(server-info.c: use pack_local like everybody else., 2005-12-05), this
was definitely true, since we computed "local" based on comparing the
objdir string. Since then, we're relying on the code on packfile.c to
match our expectations of p->pack_name and p->local.
I think our expectations do still hold today, but we can be a bit more
defensive by just using pack_basename() to get the base. That
future-proofs us, and should hopefully be more obviously safe to
somebody reading the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We keep an array of struct pointers, with each one representing a single
packfile. But for some reason there is a nr_alloc parameter inside each
struct, which has never been used.
This is probably cruft left over from development, where we might have
wanted a nr_alloc to dynamically grow the list. But as it turns out, we
do not dynamically grow the list at all, but rather count up the total
number of packs and use that as a maximum size. So while we're thinking
of this, let's add an assert() that documents (and checks!) that our
allocation and fill loops stay in sync.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This old code uses fgets with a fixed-size buffer. Let's use a strbuf
instead, so we don't have to wonder if "1000" is big enough, or what
happens if we see a long line.
This also lets us drop our custom code to trim the newline.
Probably nobody actually cares about the 1000-char limit (after all, the
lines generally only say "P pack-[0-9a-f]{40}.pack"), so this is mostly
just about cleanup/readability.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two exits from the function: either we jump to the out_stale
label or not. But in both exits we repeat our cleanup, and the only
difference is our return value. Let's just use a variable for the return
value to avoid repeating ourselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're writing out a new objects/info/packs file, we read back the
old one to try to keep the ordering the same. When we see a line
starting with "P", we expect "P pack-1234..." and blindly jump to "line
+ 2" to parse the pack name. If we saw a line with _just_ "P" and
nothing else, we'd jump past the end of the buffer and start reading
arbitrary memory.
This shouldn't be a big attack vector, as the files are local to the
repository and written by us, but it's clearly worth fixing (we do read
remote copies of the file for dumb-http fetches, but using a totally
different parser!).
Let's instead use skip_prefix() here, which avoids pointer arithmetic
altogether. Note that this converts our switch statement to an if/else
chain, making it slightly more verbose. But it will also make it easier
to do a few follow-on cleanups.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can use skip_prefix() and parse_oid_hex() to continuously increment
our pointer, rather than dealing with magic numbers. This also fixes a
few small shortcomings:
- if we see a line with the right prefix, suffix, and length, i.e.
matching /P pack-.{40}.pack\n/, we'll interpret the middle part as
hex without checking if it could be parsed. This could lead to us
looking at uninitialized garbage in the hash array. In practice this
means we'll just make a garbage request to the server which will
fail, though it's interesting that a malicious server could convince
us to leak 40 bytes of uninitialized stack to them.
- the current code is picky about seeing a newline at the end of file,
but we can easily be more liberal
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we have a multi-pack-index that covers many packfiles, we try to
avoid opening the .idx for those packfiles. To do that we feed the pack
name to midx_contains_pack(). But that function wants to see only the
basename, which we compute using strrchr() to find the final slash. But
that leaves an extra "/" at the start of our string.
We can fix this by incrementing the pointer. That also raises the
question of what to do when the name does not have a '/' at all. This
should generally not happen (we always find files in "pack/"), but it
doesn't hurt to be defensive here.
Let's wrap all of that up in a helper function and make it publicly
available, since a later patch will need to use it, too.
The tests don't notice because there's nothing about opening those .idx
files that would cause us to give incorrect output. It's just a little
slower. The new test checks this case by corrupting the covered .idx,
and then making sure we don't complain about it.
We also have to tweak t5570, which intentionally corrupts a .idx file
and expects us to notice it. When run with GIT_TEST_MULTI_PACK_INDEX,
this will fail since we now will (correctly) not bother opening the .idx
at all. We can fix that by unconditionally dropping any midx that's
there, which ensures we'll have to read the .idx.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A midx file (and the struct we parse from it) contains a list of all of
the covered packfiles, mentioned by their ".idx" names (e.g.,
"pack-1234.idx", etc). And thus calls to midx_contains_pack() expect
callers to provide the idx name.
This works for most of the calls, but the one in open_packed_git_1()
tries to feed a packed_git->pack_name, which is the ".pack" name,
meaning we'll never find a match (even if the pack is covered by the
midx).
We can fix this by converting the ".pack" to ".idx" in the caller.
However, that requires allocating a new string. Instead, let's make
midx_contains_pack() a bit friendlier, and allow it take _either_ the
.pack or .idx variant.
All cleverness in the matching code is credited to René. Bugs are mine.
There's no test here, because while this does fix _a_ bug, it's masked
by another bug in that same caller. That will be covered (with a test)
in the next patch.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cat-file --buffer option is the default already when using
--batch-all-objects. It doesn't hurt to specify it, but it's nice for
the test scripts to model good usage.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no such argument as "--unsorted"; it's spelled "--unordered".
But our test failed to notice that cat-file didn't run at all because:
1. It lost the exit code of git on the left-hand side of a pipe.
2. It was comparing two runs of the broken invocation with and without
a particular config variable (and indeed, both cases produced no
output!).
Let's fix the option, but also tweak the helper function to check the
exit code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can't create a pack revindex if we haven't actually looked at the
index. Normally we would never get as far as creating a revindex without
having already been looking in the pack, so this code never bothered to
double-check that pack->index_data had been loaded.
But with the new multi-pack-index feature, many code paths might not
load the individual pack .idx at all (they'd find objects via the midx
and then open the .pack, but not its index).
This can't yet be triggered in practice, because a bug in the midx code
means we accidentally open up the individual .idx files anyway. But in
preparation for fixing that, let's have the revindex code check that
everything it needs has been loaded.
In most cases this will just be a quick noop. But note that this does
introduce a possibility of error (if we have to open the index and it's
corrupt), so load_pack_revindex() now returns a result code, and callers
need to handle the error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As CodingGuidelines recommends, we do not need an "extern" when
declaring a public function. Let's drop these. Note that we leave the
extern on report_garbage(), as that is actually a function pointer, not
a function itself.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's command-line parsers support uniquely abbreviated options, e.g.
`git init --ba` would automatically expand `--ba` to `--bare`.
This is a very convenient feature in every day life for Git users, in
particular when tab completion is not available.
However, it is not a good idea to rely on that in Git's test suite, as
something that is a unique abbreviation of a command line option today
might no longer be a unique abbreviation tomorrow.
For example, if a future contribution added a new mode
`git init --babyproofing` and a previously-introduced test case used the
fact that `git init --ba` expanded to `git init --bare`, that future
contribution would now have to touch seemingly unrelated tests just to
keep the test suite from failing.
So let's disallow abbreviated options in the test suite by default.
Note: for ease of implementation, this patch really only touches the
`parse-options` machinery: more and more hand-rolled option parsers are
converted to use that internal API, and more and more scripts are
converted to built-ins (naturally using the parse-options API, too), so
in practice this catches most issues, and is definitely the biggest bang
for the buck.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches git-submodule the set-branch subcommand which allows the
branch of a submodule to be set through a porcelain command without
having to manually manipulate the .gitmodules file.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rebase --rebase-merges" replaces its old "--preserve-merges"
option; the latter is now marked as deprecated.
* js/rebase-deprecate-preserve-merges:
rebase: deprecate --preserve-merges
"git worktree add" used to do a "find an available name with stat
and then mkdir", which is race-prone. This has been fixed by using
mkdir and reacting to EEXIST in a loop.
* ms/worktree-add-atomic-mkdir:
worktree: fix worktree add race
"git log -L<from>,<to>:<path>" with "-s" did not suppress the patch
output as it should. This has been corrected.
* jk/line-log-with-patch:
line-log: detect unsupported formats
line-log: suppress diff output with "-s"
A GSoC micro.
* ra/t3600-test-path-funcs:
t3600: use helpers to replace test -d/f/e/s <path>
t3600: modernize style
test functions: add function `test_file_not_empty`
"git rebase" uses the refs/rewritten/ hierarchy to store its
intermediate states, which inherently makes the hierarchy per
worktree, but it didn't quite work well.
* nd/rewritten-ref-is-per-worktree:
Make sure refs/rewritten/ is per-worktree
files-backend.c: reduce duplication in add_per_worktree_entries_to_dir()
files-backend.c: factor out per-worktree code in loose_fill_ref_dir()
When the "clean" filter can reduce the size of a huge file in the
working tree down to a small "token" (a la Git LFS), there is no
point in allocating a huge scratch area upfront, but the buffer is
sized based on the original file size. The convert mechanism now
allocates very minimum and reallocates as it receives the output
from the clean filter process.
* jh/resize-convert-scratch-buffer:
convert: avoid malloc of original file size
In 'ci/test-documentation.sh' we save the standard error of 'make
doc', and, in an attempt to make sure that neither AsciiDoc nor
Asciidoctor printed any warnings, we check the emptiness of the
resulting file with '! test -s stderr.log'. This check has never
actually worked, because in our 'ci/*' build scripts we rely on 'set
-e' aborting the build job when a command exits with error, and,
unfortunately, the combination of the two doesn't work as intended.
According to POSIX [1]:
"The -e setting shall be ignored when executing [...] a pipeline
beginning with the ! reserved word" [2]
Watch and learn:
$ echo unexpected >file
$ ( set -e; ! test -s file ; echo "should not reach this" ) ; echo $?
should not reach this
0
This is why we haven't noticed the warnings from Asciidoctor that were
fixed in the first patches of this patch series, though some of them
were already there in the build of v2.18.0-rc0 [3].
Check the emptiness of that file with 'test ! -s' instead, which works
properly with 'set -e':
$ ( set -e; test ! -s file ; echo "should not reach this" ) ; echo $?
1
Furthermore, dump the contents of that file to the log for our
convenience, so if it were to unexpectedly end up being non-empty,
then we wouldn't have to scroll through all that long build log
looking for warnings, but could see them right away near the end of
the log.
Note that we are only really interested in the standard error of
AsciiDoc and Asciidoctor, but by saving the stderr of 'make doc' we
also save any error output from the make rules. Currently there is
only one such line: we build the docs with Asciidoctor right after a
'make clean', meaning that 'make USE_ASCIIDOCTOR=1 doc' always starts
with running 'GIT-VERSION-GEN', which in turn prints the version to
stderr. A 'sed' command was supposed to remove this version line to
prevent it from triggering that (previously defunct) emptiness check,
but, unfortunately, this command doesn't work as intended, either,
because it leaves the file to be checked intact, but that defunct
emptiness check hid this issue, too... Furthermore, in the near
future there will be an other line on stderr, because commit
9a71722b4d (Doc: auto-detect changed build flags, 2019-03-17) in the
currently cooking branch 'ma/doc-diff-doc-vs-doctor-comparison' will
print "* new asciidoc flags" at the beginning of both 'make doc'
invokations.
Extend that 'sed' command to remove this line, too, wrap it in a
helper function so the output of both 'make doc' is filtered the same
way, and change its invokation to actually write the logfile to be
checked.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#set
[2] POSIX doesn't discuss the meaning of '! cmd' in case of simple
commands, but it defines that "A pipeline is a sequence of one or
more commands separated by the control operator '|'", so
apparently a simple command is considered as pipeline as well.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_02
[3] https://travis-ci.org/git/git/jobs/385932007#L1463
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The recent release of Asciidoctor v2.0.0 broke our documentation
build job on Travis CI, where we 'gem install asciidoctor', which
always brings us the latest and (supposedly) greatest. Alas, we are
not ready for that just yet, because it removed support for DocBook
4.5, and we have been requiring that particular DocBook version to
build 'user-manual.xml' with Asciidoctor, resulting in:
ASCIIDOC user-manual.xml
asciidoctor: FAILED: missing converter for backend 'docbook45'. Processing aborted.
Use --trace for backtrace
make[1]: *** [user-manual.xml] Error 1
Unfortunately, we can't simply switch to DocBook 5 right away, as
doing so leads to validation errors from 'xmlto', and working around
those leads to yet another errors... [1]
So let's stick with Asciidoctor v1.5.8 (latest stable release before
v2.0.0) in our documentation build job on Travis CI for now, until we
figure out how to deal with the fallout from Asciidoctor v2.0.0.
[1] https://public-inbox.org/git/20190324162131.GL4047@pobox.com/
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ensure that a FLEX_MALLOC_MEM that uses 'strlen' for its 'len' uses
FLEX_ALLOC_STR instead, since these are equivalent forms.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch fixes a quadratic list insertion in rewrite_one() when
pathspec limiting is combined with --parents. What happens is something
like this:
1. We see that some commit X touches the path, so we try to rewrite
its parents.
2. rewrite_one() loops forever, rewriting parents, until it finds a
relevant parent (or hits the root and decides there are none). The
heavy lifting is done by process_parent(), which uses
try_to_simplify_commit() to drop parents.
3. process_parent() puts any intermediate parents into the
&revs->commits list, inserting by commit date as usual.
So if commit X is recent, and then there's a large chunk of history that
doesn't touch the path, we may add a lot of commits to &revs->commits.
And insertion by commit date is O(n) in the worst case, making the whole
thing quadratic.
We tried to deal with this long ago in fce87ae538 (Fix quadratic
performance in rewrite_one., 2008-07-12). In that scheme, we cache the
oldest commit in the list; if the new commit to be added is older, we
can start our linear traversal there. This often works well in practice
because parents are older than their descendants, and thus we tend to
add older and older commits as we traverse.
But this isn't guaranteed, and in fact there's a simple case where it is
not: merges. Imagine we look at the first parent of a merge and see a
very old commit (let's say 3 years old). And on the second parent, as we
go back 3 years in history, we might have many commits. That one
first-parent commit has polluted our oldest-commit cache; it will remain
the oldest while we traverse a huge chunk of history, during which we
have to fall back to the slow, linear method of adding to the list.
Naively, one might imagine that instead of caching the oldest commit,
we'd start at the last-added one. But that just makes some cases faster
while making others slower (and indeed, while it made a real-world test
case much faster, it does quite poorly in the perf test include here).
Fundamentally, these are just heuristics; our worst case is still
quadratic, and some cases will approach that.
Instead, let's use a data structure with better worst-case performance.
Swapping out revs->commits for something else would have repercussions
all over the code base, but we can take advantage of one fact: for the
rewrite_one() case, nobody actually needs to see those commits in
revs->commits until we've finished generating the whole list.
That leaves us with two obvious options:
1. We can generate the list _unordered_, which should be O(n), and
then sort it afterwards, which would be O(n log n) total. This is
"sort-after" below.
2. We can insert the commits into a separate data structure, like a
priority queue. This is "prio-queue" below.
I expected that sort-after would be the fastest (since it saves us the
extra step of copying the items into the linked list), but surprisingly
the prio-queue seems to be a bit faster.
Here are timings for the new p0001.6 for all three techniques across a
few repositories, as compared to master:
master cache-last sort-after prio-queue
--------------------------------------------------------------------------------------------
GIT_PERF_REPO=git.git
0.52(0.50+0.02) 0.53(0.51+0.02) +1.9% 0.37(0.33+0.03) -28.8% 0.37(0.32+0.04) -28.8%
GIT_PERF_REPO=linux.git
20.81(20.74+0.07) 20.31(20.24+0.07) -2.4% 0.94(0.86+0.07) -95.5% 0.91(0.82+0.09) -95.6%
GIT_PERF_REPO=llvm-project.git
83.67(83.57+0.09) 4.23(4.15+0.08) -94.9% 3.21(3.15+0.06) -96.2% 2.98(2.91+0.07) -96.4%
A few items to note:
- the cache-list tweak does improve the bad case for llvm-project.git
that started my digging into this problem. But it performs terribly
on linux.git, barely helping at all.
- the sort-after and prio-queue techniques work well. They approach
the timing for running without --parents at all, which is what you'd
expect (see below for more data).
- prio-queue just barely outperforms sort-after. As I said, I'm not
really sure why this is the case, but it is. You can see it even
more prominently in this real-world case on llvm-project.git:
git rev-list --parents 07ef786652e7 -- llvm/test/CodeGen/Generic/bswap.ll
where prio-queue routinely outperforms sort-after by about 7%. One
guess is that the prio-queue may just be more efficient because it
uses a compact array.
There are three new perf tests:
- "rev-list --parents" gives us a baseline for running with --parents.
This isn't sped up meaningfully here, because the bad case is
triggered only with simplification. But it's good to make sure we
don't screw it up (now, or in the future).
- "rev-list -- dummy" gives us a baseline for just traversing with
pathspec limiting. This gives a lower bound for the next test (and
it's also a good thing for us to be checking in general for
regressions, since we don't seem to have any existing tests).
- "rev-list --parents -- dummy" shows off the problem (and our fix)
Here are the timings for those three on llvm-project.git, before and
after the fix:
Test master prio-queue
------------------------------------------------------------------------------
0001.3: rev-list --parents 2.24(2.12+0.12) 2.22(2.11+0.11) -0.9%
0001.5: rev-list -- dummy 2.89(2.82+0.07) 2.92(2.89+0.03) +1.0%
0001.6: rev-list --parents -- dummy 83.67(83.57+0.09) 2.98(2.91+0.07) -96.4%
Changes in the first two are basically noise, and you can see we
approach our lower bound in the final one.
Note that we can't fully get rid of the list argument from
process_parents(). Other callers do have lists, and it would be hard to
convert them. They also don't seem to have this problem (probably
because they actually remove items from the list as they loop, meaning
it doesn't grow so large in the first place). So this basically just
drops the "cache_ptr" parameter (which was used only by the one caller
we're fixing here) and replaces it with a prio_queue. Callers are free
to use either data structure, depending on what they're prepared to
handle.
Reported-by: Björn Pettersson A <bjorn.a.pettersson@ericsson.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>