After a partial clone, repeated fetches from promisor remote would
have accumulated many packfiles marked with .promisor bit without
getting them coalesced into fewer packfiles, hurting performance.
"git repack" now learned to repack them.
* jt/repack-promisor-packs:
repack: repack promisor objects if -a or -A is set
repack: refactor setup of pack-objects cmd
A test prerequisite defined by various test scripts with slightly
different semantics has been consolidated into a single copy and
made into a lazily defined one.
* wc/make-funnynames-shared-lazy-prereq:
t: factor out FUNNYNAMES as shared lazy prereq
"git tbdiff" that lets us compare individual patches in two
iterations of a topic has been rewritten and made into a built-in
command.
* js/range-diff: (21 commits)
range-diff: use dim/bold cues to improve dual color mode
range-diff: make --dual-color the default mode
range-diff: left-pad patch numbers
completion: support `git range-diff`
range-diff: populate the man page
range-diff --dual-color: skip white-space warnings
range-diff: offer to dual-color the diffs
diff: add an internal option to dual-color diffs of diffs
color: add the meta color GIT_COLOR_REVERSE
range-diff: use color for the commit pairs
range-diff: add tests
range-diff: do not show "function names" in hunk headers
range-diff: adjust the output of the commit pairs
range-diff: suppress the diff headers
range-diff: indent the diffs just like tbdiff
range-diff: right-trim commit messages
range-diff: also show the diff between patches
range-diff: improve the order of the shown commits
range-diff: first rudimentary implementation
Introduce `range-diff` to compare iterations of a topic branch
...
Improve built-in facility to catch broken &&-chain in the tests.
* es/chain-lint-more:
chainlint: add test of pathological case which triggered false positive
chainlint: recognize multi-line quoted strings more robustly
chainlint: let here-doc and multi-line string commence on same line
chainlint: recognize multi-line $(...) when command cuddled with "$("
chainlint: match 'quoted' here-doc tags
chainlint: match arbitrary here-docs tags rather than hard-coded names
The API to iterate over all objects learned to optionally list
objects in the order they appear in packfiles, which helps locality
of access if the caller accesses these objects while as objects are
enumerated.
* jk/for-each-object-iteration:
for_each_*_object: move declarations to object-store.h
cat-file: use a single strbuf for all output
cat-file: split batch "buf" into two variables
cat-file: use oidset check-and-insert
cat-file: support "unordered" output for --batch-all-objects
cat-file: rename batch_{loose,packed}_object callbacks
t1006: test cat-file --batch-all-objects with duplicates
for_each_packed_object: support iterating in pack-order
for_each_*_object: give more comprehensive docstrings
for_each_*_object: take flag arguments as enum
for_each_*_object: store flag definitions in a single location
Test fixes.
* en/t7406-fixes:
t7406: avoid using test_must_fail for commands other than git
t7406: prefer test_* helper functions to test -[feds]
t7406: avoid having git commands upstream of a pipe
t7406: simplify by using diff --name-only instead of diff --raw
t7406: fix call that was failing for the wrong reason
The "--exec" option to "git rebase --rebase-merges" placed the exec
commands at wrong places, which has been corrected.
* js/rebase-merges-exec-fix:
rebase --exec: make it work with --rebase-merges
t3430: demonstrate what -r, --autosquash & --exec should do
"git pull --rebase=interactive" learned "i" as a short-hand for
"interactive".
* js/pull-rebase-type-shorthand:
pull --rebase=<type>: allow single-letter abbreviations for the type
"git merge --abort" etc. did not clean things up properly when
there were conflicted entries in the index in certain order that
are involved in D/F conflicts. This has been corrected.
* en/abort-df-conflict-fixes:
read-cache: fix directory/file conflict handling in read_index_unmerged()
t1015: demonstrate directory/file conflict recovery failures
The http-backend (used for smart-http transport) used to slurp the
whole input until EOF, without paying attention to CONTENT_LENGTH
that is supplied in the environment and instead expecting the Web
server to close the input stream. This has been fixed.
* mk/http-backend-content-length:
t5562: avoid non-portable "export FOO=bar" construct
http-backend: respect CONTENT_LENGTH for receive-pack
http-backend: respect CONTENT_LENGTH as specified by rfc3875
http-backend: cleanup writing to child process
Update to a few other topics around 'git fetch'.
* ab/fetch-nego:
fetch doc: cross-link two new negotiation options
negotiator: unknown fetch.negotiationAlgorithm should error out
"git fetch $there refs/heads/s" ought to fetch the tip of the
branch 's', but when "refs/heads/refs/heads/s", i.e. a branch whose
name is "refs/heads/s" exists at the same time, fetched that one
instead by mistake. This has been corrected to honor the usual
disambiguation rules for abbreviated refnames.
* jt/refspec-dwim-precedence-fix:
remote: make refspec follow the same disambiguation rule as local refs
The automatic tree-matching in "git merge -s subtree" was broken 5
years ago and nobody has noticed since then, which is now fixed.
* jk/merge-subtree-heuristics:
score_trees(): fix iteration over trees with missing entries
The test performed at the receiving end of "git push" to prevent
bad objects from entering repository can be customized via
receive.fsck.* configuration variables; we now have gained a
counterpart to do the same on the "git fetch" side, with
fetch.fsck.* configuration variables.
* ab/fsck-transfer-updates:
fsck: test and document unknown fsck.<msg-id> values
fsck: add stress tests for fsck.skipList
fsck: test & document {fetch,receive}.fsck.* config fallback
fetch: implement fetch.fsck.*
transfer.fsckObjects tests: untangle confusing setup
config doc: elaborate on fetch.fsckObjects security
config doc: elaborate on what transfer.fsckObjects does
config doc: unify the description of fsck.* and receive.fsck.*
config doc: don't describe *.fetchObjects twice
receive.fsck.<msg-id> tests: remove dead code
"git fetch" sometimes failed to update the remote-tracking refs,
which has been corrected.
* jt/connectivity-check-after-unshallow:
fetch-pack: unify ref in and out param
"git p4 submit" learns to ask its own pre-submit hook if it should
continue with submitting.
* cb/p4-pre-submit-hook:
git-p4: add the `p4-pre-submit` hook
When the sparse checkout feature is in use, "git cherry-pick" and
other mergy operations lost the skip_worktree bit when a path that
is excluded from checkout requires content level merge, which is
resolved as the same as the HEAD version, without materializing the
merge result in the working tree, which made the path appear as
deleted. This has been corrected by preserving the skip_worktree
bit (and not materializing the file in the working tree).
* en/merge-recursive-skip-fix:
merge-recursive: preserve skip_worktree bit when necessary
t3507: add a testcase showing failure with sparse checkout
The wire-protocol v2 relies on the client to send "ref prefixes" to
limit the bandwidth spent on the initial ref advertisement. "git
fetch $remote branch:branch" that asks tags that point into the
history leading to the "branch" automatically followed sent to
narrow prefix and broke the tag following, which has been fixed.
* jt/tag-following-with-proto-v2-fix:
fetch: send "refs/tags/" prefix upon CLI refspecs
t5702: test fetch with multiple refspecs at a time
Many more strings are prepared for l10n.
* nd/i18n: (23 commits)
transport-helper.c: mark more strings for translation
transport.c: mark more strings for translation
sha1-file.c: mark more strings for translation
sequencer.c: mark more strings for translation
replace-object.c: mark more strings for translation
refspec.c: mark more strings for translation
refs.c: mark more strings for translation
pkt-line.c: mark more strings for translation
object.c: mark more strings for translation
exec-cmd.c: mark more strings for translation
environment.c: mark more strings for translation
dir.c: mark more strings for translation
convert.c: mark more strings for translation
connect.c: mark more strings for translation
config.c: mark more strings for translation
commit-graph.c: mark more strings for translation
builtin/replace.c: mark more strings for translation
builtin/pack-objects.c: mark more strings for translation
builtin/grep.c: mark strings for translation
builtin/config.c: mark more strings for translation
...
Teach "git tag -s" etc. a few configuration variables (gpg.format
that can be set to "openpgp" or "x509", and gpg.<format>.program
that is used to specify what program to use to deal with the format)
to allow x.509 certs with CMS via "gpgsm" to be used instead of
openpgp via "gnupg".
* hs/gpgsm:
gpg-interface t: extend the existing GPG tests with GPGSM
gpg-interface: introduce new signature format "x509" using gpgsm
gpg-interface: introduce new config to select per gpg format program
gpg-interface: do not hardcode the key string len anymore
gpg-interface: introduce an abstraction for multiple gpg formats
t/t7510: check the validation of the new config gpg.format
gpg-interface: add new config to select how to sign a commit
The wire-protocol v2 relies on the client to send "ref prefixes" to
limit the bandwidth spent on the initial ref advertisement. "git
clone" when learned to speak v2 forgot to do so, which has been
corrected.
* bw/clone-ref-prefixes:
clone: send ref-prefixes when using protocol v2
A new configuration variable core.usereplacerefs has been added,
primarily to help server installations that want to ignore the
replace mechanism altogether.
* jk/core-use-replace-refs:
add core.usereplacerefs config option
check_replace_refs: rename to read_replace_refs
check_replace_refs: fix outdated comment
One of the "diff --color-moved" mode "dimmed_zebra" that was named
in an unusual way has been deprecated and replaced by
"dimmed-zebra".
* es/diff-color-moved-fix:
diff: --color-moved: rename "dimmed_zebra" to "dimmed-zebra"
The `chainlint` target compares actual output to expected output, where
the actual output is generated from files that are specifically checked
out with LF-only line endings. So the expected output needs to be
checked out with LF-only line endings, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test 'pack-objects to file can use bitmap' added in 645c432d61
(pack-objects: use reachability bitmap index when generating
non-stdout pack, 2016-09-10) is silently buggy and doesn't check what
it's supposed to.
In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function
does what its name implies by running:
git show-index <"$1" | cut -d' ' -f2
The test in question invokes this function like this:
list_packed_objects <packa-$packasha1.idx >packa.objects &&
list_packed_objects <packb-$packbsha1.idx >packb.objects &&
test_cmp packa.objects packb.objects
Note how these two callsites don't specify the name of the pack index
file as the function's parameter, but redirect the function's standard
input from it. This triggers an error message from the shell, as it
has no filename to redirect from in the function, but this error is
ignored, because it happens upstream of a pipe. Consequently, both
invocations produce empty 'pack{a,b}.objects' files, and the
subsequent 'test_cmp' happily finds those two empty files identical.
Fix these two 'list_packed_objects' invocations by specifying the pack
index files as parameters. Furthermore, eliminate the pipe in that
function by replacing it with an &&-chained pair of commands using an
intermediate file, so a failure of 'git show-index' or the shell
redirection will fail the test.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you're going to access the contents of every object in a
packfile, it's generally much more efficient to do so in
pack order, rather than in hash order. That increases the
locality of access within the packfile, which in turn is
friendlier to the delta base cache, since the packfile puts
related deltas next to each other. By contrast, hash order
is effectively random, since the sha1 has no discernible
relationship to the content.
This patch introduces an "--unordered" option to cat-file
which iterates over packs in pack-order under the hood. You
can see the results when dumping all of the file content:
$ time ./git cat-file --batch-all-objects --buffer --batch | wc -c
6883195596
real 0m44.491s
user 0m42.902s
sys 0m5.230s
$ time ./git cat-file --unordered \
--batch-all-objects --buffer --batch | wc -c
6883195596
real 0m6.075s
user 0m4.774s
sys 0m3.548s
Same output, different order, way faster. The same speed-up
applies even if you end up accessing the object content in a
different process, like:
git cat-file --batch-all-objects --buffer --batch-check |
grep blob |
git cat-file --batch='%(objectname) %(rest)' |
wc -c
Adding "--unordered" to the first command drops the runtime
in git.git from 24s to 3.5s.
Side note: there are actually further speedups available
for doing it all in-process now. Since we are outputting
the object content during the actual pack iteration, we
know where to find the object and could skip the extra
lookup done by oid_object_info(). This patch stops short
of that optimization since the underlying API isn't ready
for us to make those sorts of direct requests.
So if --unordered is so much better, why not make it the
default? Two reasons:
1. We've promised in the documentation that --batch-all-objects
outputs in hash order. Since cat-file is plumbing,
people may be relying on that default, and we can't
change it.
2. It's actually _slower_ for some cases. We have to
compute the pack revindex to walk in pack order. And
our de-duplication step uses an oidset, rather than a
sort-and-dedup, which can end up being more expensive.
If we're just accessing the type and size of each
object, for example, like:
git cat-file --batch-all-objects --buffer --batch-check
my best-of-five warm cache timings go from 900ms to
1100ms using --unordered. Though it's possible in a
cold-cache or under memory pressure that we could do
better, since we'd have better locality within the
packfile.
And one final question: why is it "--unordered" and not
"--pack-order"? The answer is again two-fold:
1. "pack order" isn't a well-defined thing across the
whole set of objects. We're hitting loose objects, as
well as objects in multiple packs, and the only
ordering we're promising is _within_ a single pack. The
rest is apparently random.
2. The point here is optimization. So we don't want to
promise any particular ordering, but only to say that
we will choose an ordering which is likely to be
efficient for accessing the object content. That leaves
the door open for further changes in the future without
having to add another compatibility option.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test for --batch-all-objects in t1006 covers a variety
of object storage situations, but one thing it doesn't cover
is that we avoid mentioning duplicate objects. We won't have
any because running "git repack -ad" will have packed them
all and deleted the loose ones.
This does work (because we sort and de-dup the output list),
but it's good to include it in our test. And doubly so for
when we add an unordered mode which has to de-dup in a
different way.
Note that we cannot just re-create one of the objects, as
Git will omit the write of an object that is already
present. However, we can create a new pack with one of the
objects, which forces the duplication.
One alternative would be to just use "git repack -a" instead
of "-ad". But then _every_ object would be duplicated as
loose and packed, and we might miss a bug that omits packed
objects (because we'd show their loose counterparts).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct a comment referring to the removal of just the branch to also
refer to the tag. This should have been changed in my
ca3065e7e7 ("fetch tests: add a tag to be deleted to the pruning
tests", 2018-02-09) when the tag deletion was added, but I missed it
at the time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This extract from contrib/subtree/t7900 triggered a false positive due
to three chainlint limitations:
* recognizing only a "blessed" set of here-doc tag names in a subshell
("EOF", "EOT", "INPUT_END"), of which "TXT" is not a member
* inability to recognize multi-line $(...) when the first statement of
the body is cuddled with the opening "$("
* inability to recognize multiple constructs on a single line, such as
opening a multi-line $(...) and starting a here-doc
Now that all of these shortcomings have been addressed, turn this rather
pathological bit of shell coding into a chainlint test case.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed recognizes multi-line quoted strings within subshells:
echo "abc
def" >out &&
so it can avoid incorrectly classifying lines internal to the string as
breaking the &&-chain. To identify the first line of a multi-line
string, it checks if the line contains a single quote. However, this is
fragile and can be easily fooled by a line containing multiple strings:
echo "xyz" "abc
def" >out &&
Make detection more robust by checking for an odd number of quotes
rather than only a single one.
(Escaped quotes are not handled, but support may be added later.)
The original multi-line string recognizer rather cavalierly threw away
all but the final quote, whereas the new one is careful to retain all
quotes, so the "expected" output of a couple existing chainlint tests is
updated to account for this new behavior.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After swallowing a here-doc, chainlint.sed assumes that no other
processing needs to be done on the line aside from checking for &&-chain
breakage; likewise, after folding a multi-line quoted string. However,
it's conceivable (even if unlikely in practice) that both a here-doc and
a multi-line quoted string might commence on the same line:
cat <<\EOF && echo "foo
bar"
data
EOF
Support this case by sending the line (after swallowing and folding)
through the normal processing sequence rather than jumping directly to
the check for broken &&-chain.
This change also allows other somewhat pathological cases to be handled,
such as closing a subshell on the same line starting a here-doc:
(
cat <<-\INPUT)
data
INPUT
or, for instance, opening a multi-line $(...) expression on the same
line starting a here-doc:
x=$(cat <<-\END &&
data
END
echo "x")
among others.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For multi-line $(...) expressions nested within subshells, chainlint.sed
only recognizes:
x=$(
echo foo &&
...
but it is not unlikely that test authors may also cuddle the command
with the opening "$(", so support that style, as well:
x=$(echo foo &&
...
The closing ")" is already correctly recognized when cuddled or not.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A here-doc tag can be quoted ('EOF') or escaped (\EOF) to suppress
interpolation within the body. Although, chainlint recognizes escaped
tags, it does not know about quoted tags. For completeness, teach it to
recognize quoted tags, as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
chainlint.sed swallows top-level here-docs to avoid being fooled by
content which might look like start-of-subshell. It likewise swallows
here-docs in subshells to avoid marking content lines as breaking the
&&-chain, and to avoid being fooled by content which might look like
end-of-subshell, start-of-nested-subshell, or other specially-recognized
constructs.
At the time of implementation, it was believed that it was not possible
to support arbitrary here-doc tag names since 'sed' provides no way to
stash the opening tag name in a variable for later comparison against a
line signaling end-of-here-doc. Consequently, tag names are hard-coded,
with "EOF" being the only tag recognized at the top-level, and only
"EOF", "EOT", and "INPUT_END" being recognized within subshells. Also,
special care was taken to avoid being confused by here-docs nested
within other here-docs.
In practice, this limited number of hard-coded tag names has been "good
enough" for the 13000+ existing Git test, despite many of those tests
using tags other than the recognized ones, since the bodies of those
here-docs do not contain content which would fool the linter.
Nevertheless, the situation is not ideal since someone writing new
tests, and choosing a name not in the "blessed" set could potentially
trigger a false-positive.
To address this shortcoming, upgrade chainlint.sed to handle arbitrary
here-doc tag names, both at the top-level and within subshells.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Two tests added in dade47c06c (commit-graph: add repo arg to graph
readers, 2018-07-11) prepare the contents of 'expect' files by
'echo'ing the results of command substitutions. That's unncessary,
avoid them by directly saving the output of the commands executed in
those command substitutions.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph files are binary files, so they should not be
compared with 'test_cmp', because that might cause issues like
crashing[1] or infinite loop[2] on Windows, where 'test_cmp' is a
shell function to deal with random LF-CRLF conversions[3].
Use 'test_cmp_bin' instead.
1 - b93e6e3663 (t5000, t5003: do not use test_cmp to compare binary
files, 2014-06-04)
2 - f9f3851b4d (t9300: use test_cmp_bin instead of test_cmp to compare
binary files, 2014-09-12)
3 - 4d715ac05c (Windows: a test_cmp that is agnostic to random LF <>
CRLF conversions, 2013-10-26)
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>