The graph_git_two_modes() helper added in 177722b344 (commit:
integrate commit graph with commit parsing, 2018-04-10) didn't
&&-chain its "git commit-graph" invocations, which as can be seen with
SANITIZE=leak will happily mark tests as passing if both of these
commands die, since test_cmp() will be comparing two empty files.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Try to re-chmod the trash directory on startup if we fail to "rm -rf"
it. This fixes problems where the test leaves the trash directory
behind in a bad permission state for whatever reason.
This fixes an interaction between [1] where t0004-unwritable.sh was
made to use "test_when_finished" for cleanup, and [2] which added the
"--immediate" mode. If a test in this file failed when running with
"--immediate" we wouldn't run the "test_when_finished" block, which
re-chmods the ".git/objects" directory (see [1]).
This can be demonstrated as e.g. (output snipped for less verbosity):
$ ./t0004-unwritable.sh --run=3 --immediate
ok 1 # skip setup (--run)
ok 2 # skip write-tree should notice unwritable repository (--run)
not ok 3 - commit should notice unwritable repository
[...]
$ ./t0004-unwritable.sh --run=3 --immediate
rm: cannot remove '[...]/trash directory.t0004-unwritable/.git/objects/info': Permission denied
FATAL: Cannot prepare test area
[...]
Instead of some version of reverting [1] let's make the test-lib.sh
resilient to this edge-case, it will happen due to [1], but also
e.g. if the relevant "test-lib.sh" process is kill -9'd during the
test run. We should try harder to recover in this case. If we fail to
remove the test directory let's retry after (re-)chmod-ing it.
This doesn't need to be guarded by something that's equivalent to
"POSIXPERM" since if we don't support "chmod" we were about to fail
anyway.
Let's also discard any error output from (a possibly nonexisting)
"chmod", we'll fail on the subsequent "rm -rf" anyway, likewise for
the first "rm -rf" invocation, we don't want to get the "cannot
remove" output if we can get around it with the "chmod", but we do
want any error output from the second "rm -rf", in case that doesn't
fix the issue.
The lack of &&-chaining between the "chmod" and "rm -rf" is
intentional, if we fail the first "rm -rf", can't chmod, but then
succeed the second time around that's what we were hoping for. We just
want to nuke the directory, not carry forward every possible error
code or error message.
1. dbda967684 (t0004 (unwritable files): simplify error handling,
2010-09-06)
2. b586744a86 (test: skip clean-up when running under --immediate
mode, 2011-06-27)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode added in
956d2e4639 (tests: add a test mode for SANITIZE=leak, run it in CI,
2021-09-23) to use a TAP "Bail out!" message when exiting. This will
cause the test run to exit immediately under a TAP consumer like
"prove(1)".
See 614fe01521 (test-lib: bail out when "-v" used under "prove",
2016-10-22) for the initial introduction of "Bail out!" to the
--verbose being amended here.
Before this compiling with "SANITIZE=" and running the tests with
"prove(1)" would cause all the tests to be run to the end (output
trimmed for fewer columns):
$ GIT_TEST_PASSING_SANITIZE_LEAK=true make
rm -f -r 'test-results'
*** prove ***
t0000-basic.sh ......... Dubious, test returned 1 (wstat 256, 0x100)
No subtests run
t0001-init.sh .......... Dubious, test returned 1 (wstat 256, 0x100)
No subtests run
[...output where we list every single t[0-9]*.sh file as failing snipped]
Whereas now we'll fail early, like this ("->" line wrapping added):
$ GIT_TEST_PASSING_SANITIZE_LEAK=true make
[...]
t0000-basic.sh ..................................... Bailout called. Further testing stopped:
-> GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak
FAILED--Further testing stopped: GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except
-> when compiled with SANITIZE=leak
make: *** [Makefile:53: prove] Error 1
This change also adds a red color to the "Bailout called" line, as
we're now using "say_color error". That improves existing output in
the case of e.g.:
$ prove -j8 t[0-9]*.sh :: -v
Bailout called. Further testing stopped: verbose mode forbidden under TAP harness; try --verbose-log
FAILED--Further testing stopped: verbose mode forbidden under TAP harness; try --verbose-log
We don't need to have a "Bail out! " prefix when we're not running
under a TAP consumer (i.e. if test -n "$HARNESS_ACTIVE"), but let's
not make the output conditional on that. Showing it under e.g.:
$ GIT_TEST_PASSING_SANITIZE_LEAK=true ./t0095-bloom.sh
Bail out! GIT_TEST_PASSING_SANITIZE_LEAK=true has no effect except when compiled with SANITIZE=leak
Doesn't harm anything, and I don't think the (small) complexity of
only adding this if we're under "$HARNESS_ACTIVE" is worth it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
De-duplicate the "finalize_junit_xml; GIT_EXIT_OK=t; exit 1" code
shared between the "error()" and "--immediate on failure" code paths,
in preparation for adding a third user in a subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git cmd -h" shows more than one line of usage text (e.g.
the cmd subcommand may take sub-sub-command), parse-options API
learned to align these lines, even across i18n/l10n.
* ab/align-parse-options-help:
parse-options: properly align continued usage output
git rev-parse --parseopt tests: add more usagestr tests
send-pack: properly use parse_options() API for usage string
parse-options API users: align usage output in C-strings
Teach "git help -c" into helping the command line completion of
configuration variables.
* ab/help-config-vars:
help: move column config discovery to help.c library
help / completion: make "git help" do the hard work
help tests: test --config-for-completion option & output
help: simplify by moving to OPT_CMDMODE()
help: correct logic error in combining --all and --guides
help: correct logic error in combining --all and --config
help tests: add test for --config output
help: correct usage & behavior of "git help --guides"
help: correct the usage string in -h and documentation
Built-in fsmonitor (part 1).
* jh/builtin-fsmonitor-part1:
t/helper/simple-ipc: convert test-simple-ipc to use start_bg_command
run-command: create start_bg_command
simple-ipc/ipc-win32: add Windows ACL to named pipe
simple-ipc/ipc-win32: add trace2 debugging
simple-ipc: move definition of ipc_active_state outside of ifdef
simple-ipc: preparations for supporting binary messages.
trace2: add trace2_child_ready() to report on background children
Updates to the tests in t0000 to test the test framework.
* ab/lib-subtest:
test-lib tests: get rid of copy/pasted mock test code
test-lib tests: assert 1 exit code, not non-zero
test-lib tests: refactor common part of check_sub_test_lib_test*()
test-lib tests: avoid subshell for "test_cmp" for readability
test-lib tests: don't provide a description for the sub-tests
test-lib tests: split up "write and run" into two functions
test-lib tests: move "run_sub_test" to a new lib-subtest.sh
Various fixes in code paths that move untracked files away to make room.
* en/removing-untracked-fixes:
Documentation: call out commands that nuke untracked files/directories
Comment important codepaths regarding nuking untracked files/dirs
unpack-trees: avoid nuking untracked dir in way of locally deleted file
unpack-trees: avoid nuking untracked dir in way of unmerged file
Change unpack_trees' 'reset' flag into an enum
Remove ignored files by default when they are in the way
unpack-trees: make dir an internal-only struct
unpack-trees: introduce preserve_ignored to unpack_trees_options
read-tree, merge-recursive: overwrite ignored files by default
checkout, read-tree: fix leak of unpack_trees_options.dir
t2500: add various tests for nuking untracked files
"git grep --recurse-submodules" takes trees and blobs from the
submodule repository, but the textconv settings when processing a
blob from the submodule is not taken from the submodule repository.
A test is added to demonstrate the issue, without fixing it.
* mt/grep-submodule-textconv:
grep: demonstrate bug with textconv attributes and submodules
"git add", "git mv", and "git rm" have been adjusted to avoid
updating paths outside of the sparse-checkout definition unless
the user specifies a "--sparse" option.
* ds/add-rm-with-sparse-index:
advice: update message to suggest '--sparse'
mv: refuse to move sparse paths
rm: skip sparse paths with missing SKIP_WORKTREE
rm: add --sparse option
add: update --renormalize to skip sparse paths
add: update --chmod to skip sparse paths
add: implement the --sparse option
add: skip tracked paths outside sparse-checkout cone
add: fail when adding an untracked sparse file
dir: fix pattern matching on dirs
dir: select directories correctly
t1092: behavior for adding sparse files
t3705: test that 'sparse_entry' is unstaged
When parsing a URL to normalize it, we allow hostnames to contain only
dot (".") or dash ("-"), plus brackets and colons for IPv6 literals.
This matches the old URL standard in RFC 1738, which says:
host = hostname | hostnumber
hostname = *[ domainlabel "." ] toplabel
domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
But this was later updated by RFC 3986, which is more liberal:
host = IP-literal / IPv4address / reg-name
reg-name = *( unreserved / pct-encoded / sub-delims )
unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
While names with underscore in them are not common and possibly violate
some DNS rules, they do work in practice, and we will happily contact
them over http://, git://, or ssh://. It seems odd to ignore them for
purposes of URL matching, especially when the URL RFC seems to allow
them.
There shouldn't be any downside here. It's not a syntactically
significant character in a URL, so we won't be confused about parsing;
we'd have simply rejected such a URL previously (the test here checks
the url code directly, but the obvious user-visible effect would be
failing to match credential.http://foo_bar.example.com.helper, or
similar config in http.<url>.*).
Arguably we'd want to allow tilde ("~") here, too. There's likewise
probably no downside, but I didn't add it simply because it seems like
an even less likely character to appear in a hostname.
Reported-by: Alex Waite <alex@waite.eu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark some tests that match "*{mktree,commit,diff,grep,rm,merge,hunk}*"
as passing when git is compiled with SANITIZE=leak. They'll now be
listed as running under the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test
mode (the "linux-leaks" CI target).
These were picked because we still have a lot of failures in adjacent
areas, and we didn't have much if any coverage of e.g. grep and diff
before this change, we could still whitelist a lot more tests, but
let's stop for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark various "generic" tests as passing when git is compiled with
SANITIZE=leak. These tests were subjectively picked from the lists of
passing tests since they're all small, and test some generic feature
such as wildmatch(), commonly used environment variables, ident
parsing etc.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark some tests that match "*read-tree*" as passing when git is
compiled with SANITIZE=leak. They'll now be listed as running under
the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks"
CI target). We still have around half the tests that match
"*read-tree*" failing, but let's whitelist those that don't.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark some tests that match "*ls-files*" as passing when git is
compiled with SANITIZE=leak. They'll now be listed as running under
the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks"
CI target). We still have others that match '*ls-files*" that fail
under SANITIZE=leak.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark some tests that match "*{checkout,switch}*" as passing when git
is compiled with SANITIZE=leak. They'll now be listed as running under
the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks"
CI target).
Unfortunately almost all of those tests fail when compiled with
SANITIZE=leak, these only pass because they run "checkout-index", not
the main "checkout" command.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark all tests that match "*trace2*" as passing when git is compiled
with SANITIZE=leak. They'll now be listed as running under the
"GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks" CI
target).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark those tests that match "*ls-tree*" as passing when git is
compiled with SANITIZE=leak. They'll now be listed as running under
the "GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks"
CI target).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark various existing tests in t00*.sh that invoke a "test-tool" with
as passing when git is compiled with SANITIZE=leak.
They'll now be listed as running under the
"GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks" CI
target).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark various existing tests in t00*.sh that invoke git built-ins with
TEST_PASSES_SANITIZE_LEAK=true as passing when git is compiled with
SANITIZE=leak.
They'll now be listed as running under the
"GIT_TEST_PASSING_SANITIZE_LEAK=true" test mode (the "linux-leaks" CI
target).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Protocol v0 clients can get stuck parsing a malformed feature line.
* ah/connect-parse-feature-v0-fix:
connect: also update offset for features without values
Sensitive data in the HTTP trace were supposed to be redacted, but
we failed to do so in HTTP/2 requests.
* jk/http-redact-fix:
http: match headers case-insensitively when redacting
"git cvsserver" had a long-standing bug in its authentication code,
which has finally been corrected (it is unclear and is a separate
question if anybody is seriously using it, though).
* cb/cvsserver:
Documentation: cleanup git-cvsserver
git-cvsserver: protect against NULL in crypt(3)
git-cvsserver: use crypt correctly to compare password hashes
"git clone" from a repository whose HEAD is unborn into a bare
repository didn't follow the branch name the other side used, which
is corrected.
* jk/clone-unborn-head-in-bare:
clone: handle unborn branch in bare repos
"git stash", where the tentative change involves changing a
directory to a file (or vice versa), was confused, which has been
corrected.
* en/stash-df-fix:
stash: restore untracked files AFTER restoring tracked files
stash: avoid feeding directories to update-index
t3903: document a pair of directory/file bugs
When "git am --abort" fails to abort correctly, it still exited
with exit status of 0, which has been corrected.
* en/am-abort-fix:
am: fix incorrect exit status on am fail to abort
t4151: add a few am --abort tests
git-am.txt: clarify --abort behavior
"git update-ref --stdin" failed to flush its output as needed,
which potentially led the conversation to a deadlock.
* ps/update-ref-batch-flush:
t1400: avoid SIGPIPE race condition on fifo
update-ref: fix streaming of status updates
The "git apply -3" code path learned not to bother the lower level
merge machinery when the three-way merge can be trivially resolved
without the content level merge.
* jc/trivial-threeway-binary-merge:
apply: resolve trivial merge without hitting ll-merge with "--3way"
Doc update plus improved error reporting.
* jk/log-warn-on-bogus-encoding:
docs: use "character encoding" to refer to commit-object encoding
logmsg_reencode(): warn when iconv() fails
The output from "git fast-export", when its anonymization feature
is in use, showed an annotated tag incorrectly.
* tk/fast-export-anonymized-tag-fix:
fast-export: fix anonymized tag using original length
Even when running "git send-email" without its own threaded
discussion support, a threading related header in one message is
carried over to the subsequent message to result in an unwanted
threading, which has been corrected.
* mh/send-email-reset-in-reply-to:
send-email: avoid incorrect header propagation
Buggy tests could damage repositories outside the throw-away test
area we created. We now by default export GIT_CEILING_DIRECTORIES
to limit the damage from such a stray test.
* sg/set-ceiling-during-tests:
test-lib: set GIT_CEILING_DIRECTORIES to protect the surrounding repository
"git upload-pack" which runs on the other side of "git fetch"
forgot to take the ref namespaces into account when handling
want-ref requests.
* ka/want-ref-in-namespace:
docs: clarify the interaction of transfer.hideRefs and namespaces
upload-pack.c: treat want-ref relative to namespace
t5730: introduce fetch command helper
"git branch -D <branch>" used to refuse to remove a broken branch
ref that points at a missing commit, which has been corrected.
* rs/branch-allow-deleting-dangling:
branch: allow deleting dangling branches with --force
The delayed checkout code path in "git checkout" etc. were chatty
even when --quiet and/or --no-progress options were given.
* mt/quiet-with-delayed-checkout:
checkout: make delayed checkout respect --quiet and --no-progress
"git diff --relative" segfaulted and/or produced incorrect result
when there are unmerged paths.
* dd/diff-files-unmerged-fix:
diff-lib: ignore paths that are outside $cwd if --relative asked
Various bugs in "git rebase -r" have been fixed.
* pw/rebase-r-fixes:
rebase -r: fix merge -c with a merge strategy
rebase -r: don't write .git/MERGE_MSG when fast-forwarding
rebase -i: add another reword test
rebase -r: make 'merge -c' behave like reword
Checking out all the paths from HEAD during the last conflicted
step in "git rebase" and continuing would cause the step to be
skipped (which is expected), but leaves MERGE_MSG file behind in
$GIT_DIR and confuses the next "git commit", which has been
corrected.
* pw/rebase-skip-final-fix:
rebase --continue: remove .git/MERGE_MSG
rebase --apply: restore some tests
t3403: fix commit authorship
"git commit --fixup" now works with "--edit" again, after it was
broken in v2.32.
* jk/commit-edit-fixup-fix:
commit: restore --edit when combined with --fixup
"git apply" miscounted the bytes and failed to read to the end of
binary hunks.
* jk/apply-binary-hunk-parsing-fix:
apply: keep buffer/size pair in sync when parsing binary hunks
"git pull" had various corner cases that were not well thought out
around its --rebase backend, e.g. "git pull --ff-only" did not stop
but went ahead and rebased when the history on other side is not a
descendant of our history. The series tries to fix them up.
* en/pull-conflicting-options:
pull: fix handling of multiple heads
pull: update docs & code for option compatibility with rebasing
pull: abort by default when fast-forwarding is not possible
pull: make --rebase and --no-rebase override pull.ff=only
pull: since --ff-only overrides, handle it first
pull: abort if --ff-only is given and fast-forwarding is impossible
t7601: add tests of interactions with multiple merge heads and config
t7601: test interaction of merge/rebase/fast-forward flags and options
Bugfix for common ancestor negotiation recently introduced in "git
push" codepath.
* jt/push-negotiation-fixes:
fetch: die on invalid --negotiation-tip hash
send-pack: fix push nego. when remote has refs
send-pack: fix push.negotiate with remote helper
Input validation of "git pack-objects --stdin-packs" has been
corrected.
* ab/pack-stdin-packs-fix:
pack-objects: fix segfault in --stdin-packs option
pack-objects tests: cover blindspots in stdin handling
"git maintenance" scheduler fix for macOS.
* js/maintenance-launchctl-fix:
maintenance: skip bootout/bootstrap when plist is registered
maintenance: create `launchctl` configuration using a lock file
Command line completion updates.
* fc/completion-updates:
completion: bash: add correct suffix in variables
completion: bash: fix for multiple dash commands
completion: bash: fix for suboptions with value
completion: bash: fix prefix detection in branch.*
When the option --dry-run/-n is given, "git add" doesn't change the
index, but still writes out new object files. Only hash the latter
without writing instead to make the run as dry as possible.
Use this opportunity to also make the hash_flags variable unsigned,
to match the index_path() parameter it is used as.
Reported-by: git.mexon@spamgourmet.com
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a regression in the error output emitted when .git/objects can't
be written to. Before 9c4d6c0297 (cache-tree: Write updated
cache-tree after commit, 2014-07-13) we'd emit only one "insufficient
permission" error, now we'll do so again.
The cause is rather straightforward, we've got WRITE_TREE_SILENT for
the use-case of wanting to prepare an index silently, quieting any
permission etc. error output. Then when we attempt to update to
that (possibly broken) index we'll run into the same errors again.
But with 9c4d6c0297 the gap between the cache-tree API and the object
store wasn't closed in terms of asking write_object_file() to be
silent. I.e. post-9c4d6c0297b the first call is to prepare_index(),
and after that we'll call prepare_to_commit(). We only want verbose
error output from the latter.
So let's add and use that facility with a corresponding HASH_SILENT
flag, its only user is cache-tree.c's update_one(), which will set it
if its "WRITE_TREE_SILENT" flag is set.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for fixing a regression where we started emitting some
of these error messages twice, let's assert what the output from "git
commit" and friends is now in the case of permission errors.
As noted in [1] using test_expect_failure to mark up a TODO test has
some unexpected edge cases, e.g. we don't want to break --run=3 by
skipping the "test_lazy_prereq" here. This pattern allows us to test
just the test_cmp (and the "cat", which shouldn't fail) with the added
"test_expect_failure", we'll flip that to a "test_expect_success" in
the next commit.
1. https://lore.kernel.org/git/87tuhmk19c.fsf@evledraar.gmail.com/T/#u
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When merging a signed tag fmt-merge-msg was unable to verify its
validity missing the necessary ssh allowedSignersFile config.
Adds gpg config parsing to fmt-merge-msg.
Adds tests for ssh signed tags to fmt-merge-msg tests.
Signed-off-by: Fabian Stelzer <fs@gigacodes.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* fs/ssh-signing:
ssh signing: test that gpg fails for unknown keys
ssh signing: tests for logs, tags & push certs
ssh signing: duplicate t7510 tests for commits
ssh signing: verify signatures using ssh-keygen
ssh signing: provide a textual signing_key_id
ssh signing: retrieve a default key from ssh-agent
ssh signing: add ssh key format and signing code
ssh signing: add test prereqs
ssh signing: preliminary refactoring and clean-up
Turn off automatic background maintenance for perf tests by default to
avoid interference with performance measurements. Do that by using the
new file t/perf/config and using it as the system config file for perf
tests. Future tests intended to measure gc performance can override
the setting locally or call "git gc" explicitly.
This fixes a breakage in p2000 caused by gc automatically emptying the
reflog due its fake dates from 2005 being older than 90 days.
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up in "git difftool".
* da/difftool:
difftool: add a missing space to the run_dir_diff() comments
difftool: remove an unnecessary call to strbuf_release()
difftool: refactor dir-diff to write files using helper functions
difftool: create a tmpdir path without repeated slashes
CI learns to run the leak sanitizer builds.
* ab/sanitize-leak-ci:
tests: add a test mode for SANITIZE=leak, run it in CI
Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.
* jk/ref-paranoia:
refs: drop "broken" flag from for_each_fullref_in()
ref-filter: drop broken-ref code entirely
ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
repack, prune: drop GIT_REF_PARANOIA settings
refs: turn on GIT_REF_PARANOIA by default
refs: omit dangling symrefs when using GIT_REF_PARANOIA
refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
refs-internal.h: move DO_FOR_EACH_* flags next to each other
t5312: be more assertive about command failure
t5312: test non-destructive repack
t5312: create bogus ref as necessary
t5312: drop "verbose" helper
t5600: provide detached HEAD for corruption failures
t5516: don't use HEAD ref for invalid ref-deletion tests
t7900: clean up some more broken refs
Test updates.
* sg/test-split-index-fix:
read-cache: fix GIT_TEST_SPLIT_INDEX
tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
read-cache: look for shared index files next to the index, too
t1600-index: disable GIT_TEST_SPLIT_INDEX
t1600-index: don't run git commands upstream of a pipe
t1600-index: remove unnecessary redirection
"git multi-pack-index write --bitmap" learns to propagate the
hashcache from original bitmap to resulting bitmap.
* tb/midx-write-propagate-namehash:
t5326: test propagating hashcache values
p5326: generate pack bitmaps before writing the MIDX bitmap
p5326: don't set core.multiPackIndex unnecessarily
p5326: create missing 'perf-tag' tag
midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
pack-bitmap.c: propagate namehash values from existing bitmaps
t/helper/test-bitmap.c: add 'dump-hashes' mode
Since C++20, the language has a generalized comparison operator <=>.
Teach the cpp driver not to separate it into <= and > tokens.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since C++17, the single-quote can be used as digit separator:
3.141'592'654
1'000'000
0xdead'beaf
Make it known to the word regex of the cpp driver, so that numbers are
not split into separate tokens at the single-quotes.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are going to add support for C++'s digit-separating single-quote and
the spaceship operator. By adding the test cases in this separate
commit, the effect on the word highlighting will become more obvious
as the features are implemented and the file cpp/expect is updated.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're enumerating all objects in the object database, it doesn't
make sense to respect refs/replace. The point of this option is to
enumerate all of the objects in the database at a low level. By
definition we'd already show the replacement object's contents (under
its real oid), and showing those contents under another oid is almost
certainly working against what the user is trying to do.
Note that you could make the same argument for something like:
git show-index <foo.idx |
awk '{print $2}' |
git cat-file --batch
but there we can't know in cat-file exactly what the user intended,
because we don't know the source of the input. They could be trying to
do low-level debugging, or they could be doing something more high-level
(e.g., imagine a porcelain built around cat-file for its object
accesses). So in those cases, we'll have to rely on the user specifying
"git --no-replace-objects" to tell us what to do.
One _could_ make an argument that "cat-file --batch" is sufficiently
low-level plumbing that it should not respect replace-objects at all
(and the caller should do any replacement if they want it). But we have
been doing so for some time. The history is a little tangled:
- looking back as far as v1.6.6, we would not respect replace refs for
--batch-check, but would for --batch (because the former used
sha1_object_info(), and the replace mechanism only affected actual
object reads)
- this discrepancy was made even weirder by 98e2092b50 (cat-file:
teach --batch to stream blob objects, 2013-07-10), where we always
output the header using the --batch-check code, and then printed the
object separately. This could lead to "cat-file --batch" dying (when
it notices the size or type changed for a non-blob object) or even
producing bogus output (in streaming mode, we didn't notice that we
wrote the wrong number of bytes).
- that persisted until 1f7117ef7a (sha1_file: perform object
replacement in sha1_object_info_extended(), 2013-12-11), which then
respected replace refs for both forms.
So it has worked reliably this way for over 7 years, and we should make
sure it continues to do so. That could also be an argument that
--batch-all-objects should not change behavior (which this patch is
doing), but I really consider the current behavior to be an unintended
bug. It's a side effect of how the code is implemented (feeding the oids
back into oid_object_info() rather than looking at what we found while
reading the loose and packed object storage).
The implementation is straight-forward: we just disable the global
read_replace_refs flag when we're in --batch-all-objects mode. It would
perhaps be a little cleaner to change the flag we pass to
oid_object_info_extended(), but that's not enough. We also read objects
via read_object_file() and stream_blob_to_fd(). The former could switch
to its _extended() form, but the streaming code has no mechanism for
disabling replace refs. Setting the global flag works, and as a bonus,
it's impossible to have any "oops, we're sometimes replacing the object
and sometimes not" bugs in the output (like the ones caused by
98e2092b50 above).
The tests here cover the regular-input and --batch-all-objects cases,
for both --batch-check and --batch. There is a test in t6050 that covers
the regular-input case with --batch already, but this new one goes much
further in actually verifying the output (plus covering --batch-check
explicitly). This is perhaps a little overkill and the tests would be
simpler just covering --batch-check, but I wanted to make sure we're
checking that --batch output is consistent between the header and the
content. The global-flag technique used here makes that easy to get
right, but this is future-proofing us against regressions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few of the tests create intentionally broken objects with broken
types. Let's clean them up after we're done with them, so that later
tests don't get confused (we hadn't noticed because this only affects
tests which use --batch-all-objects, but I'm about to add more).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Submodule ODBs are never added as alternates during the execution of the
test suite, but there may be a rare interaction that the test suite does
not have coverage of. Add a trace message when this happens, so that
users who trace their commands can notice such occurrences.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass the repo explicitly when calling check_has_commit() to avoid
relying on add_submodule_odb(). With this commit and the parent commit,
the last remaining tests no longer rely on add_submodule_odb(), so mark
these tests accordingly.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After the parent commit and some of its ancestors, the only place
commits are being accessed through alternates is in the user-facing
message formatting code. Fix those, and remove the add_submodule_odb()
calls.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git log" command limits its output to the commits that contain strings
matched by a pattern when the "--grep=<pattern>" option is used, but unlike
output from "git grep -e <pattern>", the matches are not highlighted,
making them harder to spot.
Teach the pretty-printer code to highlight matches from the
"--grep=<pattern>", "--author=<pattern>" and "--committer=<pattern>"
options (to view the last one, you may have to ask for --pretty=fuller).
Also, it must be noted that we are effectively greping the content twice
(because it would be a hassle to rework the existing matching code to do
a /g match and then pass it all down to the coloring code), however it only
slows down "git log --author=^H" on this repository by around 1-2%
(compared to v2.33.0), so it should be a small enough slow down to justify
the addition of the feature.
Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were no tests for checking the specific output that we'll
generate in optname(), let's add some. That output was added back in
4a59fd1312 (Add a simple option parser., 2007-10-15).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Generally, word regex can be written such that they match tokens
liberally and need not model the actual syntax because it can be assumed
that the regex will only be applied to syntactically correct text.
The regex for cpp (C/C++) is too liberal, though. It regards these
sequences as single tokens:
1+2
1.5-e+2+f
and the following amalgams as one token:
.l as in str.length
.f as in str.find
.e as in str.erase
Tighten the regex in the following way:
- Accept + and - only in one position in the exponent. + and - are no
longer regarded as the sign of a number and are treated by the
catcher-all that is not visible in the driver's regex.
- Accept a leading decimal point only when it is followed by a digit.
For readability, factor hex- and binary numbers into an own term.
As a drive-by, this fixes that floating point numbers such as 12E5
(with upper-case E) were split into two tokens.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word regex is too loose and matches long streaks of characters
that should actually be separate tokens. Add these problematic test
cases. Separate the lines with text that will remain identical in the
pre- and post-image so that the diff algorithm will not lump removals
and additions of consecutive lines together. This makes the expected
output easier to read.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
8d96e7288f (t4034: bulk verify builtin word regex sanity, 2010-12-18)
added many tests with the intent to verify that operators consisting of
more than one symbol are kept together. These are tested by probing a
transition from, e.g., a!=b to x!=y, which results in the word-diff
[-a-]{+x+}!=[-b-]{+y+}
But that proves only that the letters and operators are separate tokens.
To prove that != is an unseparable token, we have to probe a transition
from, e.g., a=b to a!=b having a word-diff
a[-=-]{+!=+}b
that proves that the ! is not separate from the =.
In the post-image, add to or remove from operators a character that
turns it into another valid operator.
Change the identifiers used around operators such that the diff
algorithm does not have an incentive to match, e.g., a<b in one spot
in the pre-image with a<b elsewhere in the post-image.
Adjust the expected output to match the new differences. Notice that
there are some undesirable tokenizations around e, ., and -. This will
be addressed in a later change.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This command dumps individual tables or a stack of of tables.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packed/loose format has restrictions on refnames: a and a/b cannot
coexist. This limitation does not apply to reftable per se, but must be
maintained for interoperability. This code adds validation routines to
abort transactions that are trying to add invalid names.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds an abstract, read-only interface to the ref database.
This primitive is used to construct the read view of the ref database
(the read view is constructed by merging several *.ref files). It also
provides the mechanism to provide a unified view of the refs in the main
repository and the per-worktree refs.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed to create a merged view multiple reftables
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With support for reading and writing files in place, we can construct files (in
memory) and attempt to read them back.
Because some sections of the format are optional (eg. indices, log entries), we
have to exercise this code using multiple sizes of input data
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable format includes support for an (OID => ref) map. This map can speed
up visibility and reachability checks. In particular, various operations along
the fetch/push path within Gerrit have ben sped up by using this structure.
The map is constructed with help of a binary tree. Object IDs are hashes, so
they are uniformly distributed. Hence, the tree does not attempt forced
rebalancing.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable format is structured as a sequence of block. Within a block,
records are prefix compressed, with an index of offsets for fully expand keys to
enable binary search within blocks.
This commit provides the logic to read and write these blocks.
Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reftable format is structured as a sequence of blocks, and each block
contains a sequence of prefix-compressed key-value records. There are 4 types of
records, and they have similarities in how they must be handled. This is
achieved by introducing a polymorphic 'record' type that encapsulates ref, log,
index and object records.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit provides basic utility classes for the reftable library.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use MINSTD to generate pseudo-random numbers consistently instead of
using rand(3), whose output can vary from system to system, and reset
its seed before filling in the test values. This gives repeatable
results across versions and systems, which simplifies sharing and
comparing of results between developers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
6e773527b6 (sparse-index: convert from full to sparse, 2021-03-30) made
verify_path() accept trailing directory separators for directories,
which is necessary for sparse directory entries. This clemency causes
"git stash" to stumble over sub-repositories, though, and there may be
more unintended side-effects.
Avoid them by restoring the old verify_path() behavior and accepting
trailing directory separators only in places that are supposed to handle
sparse directory entries.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash" used to ignore sub-repositories until 6e773527b6
(sparse-index: convert from full to sparse, 2021-03-30). Add a test
that demonstrates this regression.
Reported-by: Robert Leftwich <robert@gitpod.io>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We strbuf_reset() this "struct strbuf" in a loop earlier, but never
freed it. Plugs a memory leak that's been here ever since this code
got introduced in 1c7b76be7d (Build in merge, 2008-07-07).
This takes us from 68 failed tests in "t7600-merge.sh" to 59 under
SANITIZE=leak, and makes "t7604-merge-custom-message.sh" pass!
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak that's been here ever since 72aeb18772 (clean.c,
ls-files.c: respect encapsulation of exclude_list_groups, 2013-01-16),
we dup'd the argument in option_parse_exclude(), but never freed the
string_list.
This makes almost all of t3001-ls-files-others-exclude.sh pass (it had
a lot of failures before). Let's mark it as passing with
TEST_PASSES_SANITIZE_LEAK=true, and then exclude the tests that still
failed with a !SANITIZE_LEAK prerequisite check until we fix those
leaks. We can still see the failed tests under
GIT_TEST_FAIL_PREREQS=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix an edge case that was missed when the dir_clear() call was added
in eceba53214 (dir: fix problematic API to avoid memory leaks,
2020-08-18), we need to also clean up when we're about to exit with
non-zero.
That commit says, on the topic of the dir_clear() API and UNLEAK():
[...]two of them clearly thought about leaks since they had an
UNLEAK(dir) directive, which to me suggests that the method to
free the data was too unclear.
I think that 0e5bba53af (add UNLEAK annotation for reducing leak
false positives, 2017-09-08) which added the UNLEAK() makes it clear
that that wasn't the case, rather it was the desire to avoid the
complexity of freeing the memory at the end of the program.
This does add a bit of complexity, but I think it's worth it to just
fix these leaks when it's easy in built-ins. It allows them to serve
as canaries for underlying APIs that shouldn't be leaking, it
encourages us to make those freeing APIs nicer for all their users,
and it prevents other leaking regressions by being able to mark the
entire test as TEST_PASSES_SANITIZE_LEAK=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a trivial memory leak present ever since 38d905bf58 (sha1-array:
add test-sha1-array and basic tests, 2014-10-01), now that that's
fixed we can test this under GIT_TEST_PASSING_SANITIZE_LEAK=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in t/helper/test-oidtree.c, we were not freeing the
"struct strbuf" we used for the stdin input we parsed. This leak has
been here ever since 92d8ed8ac1 (oidtree: a crit-bit tree for
odb_loose_cache, 2021-07-07).
Now that it's fixed we can declare that t0069-oidtree.sh will pass
under GIT_TEST_PASSING_SANITIZE_LEAK=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in t/helper/test-parse-options.c, we were not
freeing the allocated "struct string_list" or its items. Let's move
the declaration of the "list" variable into the cmd__parse_options()
and release it at the end.
In c8ba163916 (parse-options: add OPT_STRING_LIST helper, 2011-06-09)
the "list" variable was added, and later on in
c8ba163916 (parse-options: add OPT_STRING_LIST helper, 2011-06-09)
the "expect" was added.
The "list" variable was last touched in 2721ce21e4 (use string_list
initializer consistently, 2016-06-13), but it was still left at the
static scope, it's better to move it to the function for consistency.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in t/helper/test-prio-queue.c, the lack of freeing
the memory with clear_prio_queue() has been there ever since this code
was originally added in b4b594a315 (prio-queue: priority queue of
pointers to structs, 2013-06-06).
By fixing this leak we can cleanly run t0009-prio-queue.sh under
SANITIZE=leak, so annotate it as such with
TEST_PASSES_SANITIZE_LEAK=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix two different but related memory leaks in
verify_clean_subdirectory(). We leaked both the "pathbuf" if
read_directory() returned non-zero, and we never cleaned up our own
"struct dir_struct" either.
* "pathbuf": When the read_directory() call followed by the
free(pathbuf) was added in c81935348b (Fix switching to a branch
with D/F when current branch has file D., 2007-03-15) we didn't
bother to free() before we called die().
But when this code was later libified in 203a2fe117 (Allow callers
of unpack_trees() to handle failure, 2008-02-07) we started to leak
as we returned data to the caller. This fixes that memory leak,
which can be observed under SANITIZE=leak with e.g. the
"t1001-read-tree-m-2way.sh" test.
* "struct dir_struct": We've leaked the dir_struct ever since this
code was added back in c81935348b.
When that commit was written there wasn't an equivalent of
dir_clear(). Since it was added in 270be81604 (dir.c: provide
clear_directory() for reclaiming dir_struct memory, 2013-01-06)
we've omitted freeing the memory allocated here.
This memory leak could also be observed under SANITIZE=leak and the
"t1001-read-tree-m-2way.sh" test.
This makes all the test in "t1001-read-tree-m-2way.sh" pass under
"GIT_TEST_PASSING_SANITIZE_LEAK=true", we'd previously die in tests
25, 26 & 28.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a sparse index it is possible for the tree that is being verified
to be freed while it is being verified. This happens when the index is
sparse but the cache tree is not and index_name_pos() looks up a path
from the cache tree that is a descendant of a sparse index entry. That
triggers a call to ensure_full_index() which frees the cache tree that
is being verified. Carrying on trying to verify the tree after this
results in a use-after-free bug. Instead restart the verification if a
sparse index is converted to a full index. This bug is triggered by a
call to reset_head() in "git rebase --apply". Thanks to René Scharfe
and Derrick Stolee for their help analyzing the problem.
==74345==ERROR: AddressSanitizer: heap-use-after-free on address 0x606000001b20 at pc 0x557cbe82d3a2 bp 0x7ffdfee08090 sp 0x7ffdfee08080
READ of size 4 at 0x606000001b20 thread T0
#0 0x557cbe82d3a1 in verify_one /home/phil/src/git/cache-tree.c:863
#1 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#2 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#3 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#4 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#5 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#6 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#7 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#8 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#9 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#10 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#11 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#12 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#13 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
#14 0x557cbe5bcb8d in _start (/home/phil/src/git/git+0x1b9b8d)
0x606000001b20 is located 0 bytes inside of 56-byte region [0x606000001b20,0x606000001b58)
freed by thread T0 here:
#0 0x7fdd4bacff19 in __interceptor_free /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0x557cbe82af60 in cache_tree_free /home/phil/src/git/cache-tree.c:35
#2 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#3 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#4 0x557cbe82aee5 in cache_tree_free /home/phil/src/git/cache-tree.c:31
#5 0x557cbeb2557a in ensure_full_index /home/phil/src/git/sparse-index.c:310
#6 0x557cbea45c4a in index_name_stage_pos /home/phil/src/git/read-cache.c:588
#7 0x557cbe82ce37 in verify_one /home/phil/src/git/cache-tree.c:850
#8 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#9 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#10 0x557cbe82ca9d in verify_one /home/phil/src/git/cache-tree.c:840
#11 0x557cbe830a2b in cache_tree_verify /home/phil/src/git/cache-tree.c:910
#12 0x557cbea53741 in write_locked_index /home/phil/src/git/read-cache.c:3250
#13 0x557cbeab7fdd in reset_head /home/phil/src/git/reset.c:87
#14 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#15 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#16 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#17 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#18 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#19 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#20 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
previously allocated by thread T0 here:
#0 0x7fdd4bad0459 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x557cbebc1807 in xcalloc /home/phil/src/git/wrapper.c:140
#2 0x557cbe82b7d8 in cache_tree /home/phil/src/git/cache-tree.c:17
#3 0x557cbe82b7d8 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:763
#4 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#5 0x557cbe82b837 in prime_cache_tree_rec /home/phil/src/git/cache-tree.c:764
#6 0x557cbe8304e1 in prime_cache_tree /home/phil/src/git/cache-tree.c:779
#7 0x557cbeab7fa7 in reset_head /home/phil/src/git/reset.c:85
#8 0x557cbe72147f in cmd_rebase builtin/rebase.c:2074
#9 0x557cbe5bd151 in run_builtin /home/phil/src/git/git.c:461
#10 0x557cbe5bd151 in handle_builtin /home/phil/src/git/git.c:714
#11 0x557cbe5c0503 in run_argv /home/phil/src/git/git.c:781
#12 0x557cbe5c0503 in cmd_main /home/phil/src/git/git.c:912
#13 0x557cbe5bad28 in main /home/phil/src/git/common-main.c:52
#14 0x7fdd4b82eb24 in __libc_start_main (/usr/lib/libc.so.6+0x27b24)
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In read_midx_preferred_pack(), we open the bitmap index but never free
it. This isn't a big deal since this is just a test helper, and we exit
immediately after, but since we're trying to keep our leak-checking tidy
now, it's worth fixing.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* ab/repo-settings-cleanup:
repository.h: don't use a mix of int and bitfields
repo-settings.c: simplify the setup
read-cache & fetch-negotiator: check "enum" values in switch()
environment.c: remove test-specific "ignore_untracked..." variable
wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c
Regression in "git commit-graph" command line parsing has been
corrected.
* tb/commit-graph-usage-fix:
builtin/multi-pack-index.c: disable top-level --[no-]progress
builtin/commit-graph.c: don't accept common --[no-]progress
"git rebase <upstream> <tag>" failed when aborted in the middle, as
it mistakenly tried to write the tag object instead of peeling it
to HEAD.
* pw/rebase-of-a-tag-fix:
rebase: dereference tags
rebase: use lookup_commit_reference_by_name()
rebase: use our standard error return value
t3407: rework rebase --quit tests
t3407: strengthen rebase --abort tests
t3407: use test_path_is_missing
t3407: rename a variable
t3407: use test_cmp_rev
t3407: use test_commit
t3407: run tests in $TEST_DIRECTORY
More code paths that use the hack to add submodule's object
database to the set of alternate object store have been cleaned up.
* jt/add-submodule-odb-clean-up:
revision: remove "submodule" from opt struct
repository: support unabsorbed in repo_submodule_init
submodule: remove unnecessary unabsorbed fallback
Teach test_perf_() to remove the temporary test_times.* files
at the end of each test.
test_perf_() runs a particular GIT_PERF_REPEAT_COUNT times and creates
./test_times.[123...]. It then uses a perl script to find the minimum
over "./test_times.*" (note the wildcard) and writes that time to
"test-results/<testname>.<testnumber>.result".
If the repeat count is changed during the pXXXX test script, stale
test_times.* files (from previous steps) may be included in the min()
computation. For example:
...
GIT_PERF_REPEAT_COUNT=3 \
test_perf "status" "
git status
"
GIT_PERF_REPEAT_COUNT=1 \
test_perf "checkout other" "
git checkout other
"
...
The time reported in the summary for "XXXX.2 checkout other" would
be "min( checkout[1], status[2], status[3] )".
We prevent that error by removing the test_times.* files at the end of
each test.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using `test_size` with `wc -c`, users on certain platforms can run
into issues when `wc` emits leading space characters in its output,
which confuses get_times.
Callers could switch to use test_file_size instead of `wc -c` (the
former never prints leading space characters, so will always work with
test_size regardless of platform), but this is an easy enough spot to
miss that we should teach get_times to be more tolerant of the input it
accepts.
Teach get_times to do just that by stripping any leading space
characters.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29)
added a perf test that calls tac(1) from GNU core utilities. Support
systems without it by reversing the generated list using sort -nr
instead. sort(1) with options -n and -r is already used in other tests.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Protocol v0 clients can get stuck parsing a malformed feature line.
* ah/connect-parse-feature-v0-fix:
connect: also update offset for features without values
Sensitive data in the HTTP trace were supposed to be redacted, but
we failed to do so in HTTP/2 requests.
* jk/http-redact-fix:
http: match headers case-insensitively when redacting
"git cvsserver" had a long-standing bug in its authentication code,
which has finally been corrected (it is unclear and is a separate
question if anybody is seriously using it, though).
* cb/cvsserver:
Documentation: cleanup git-cvsserver
git-cvsserver: protect against NULL in crypt(3)
git-cvsserver: use crypt correctly to compare password hashes
"git clone" from a repository whose HEAD is unborn into a bare
repository didn't follow the branch name the other side used, which
is corrected.
* jk/clone-unborn-head-in-bare:
clone: handle unborn branch in bare repos
"git stash", where the tentative change involves changing a
directory to a file (or vice versa), was confused, which has been
corrected.
* en/stash-df-fix:
stash: restore untracked files AFTER restoring tracked files
stash: avoid feeding directories to update-index
t3903: document a pair of directory/file bugs
To prevent the race described in an earlier patch, generate and pass a
reference snapshot to the multi-pack bitmap code, if we are writing one
from `git repack`.
This patch is mostly limited to creating a temporary file, and then
calling for_each_ref(). Except we try to minimize duplicates, since
doing so can drastically reduce the size in network-of-forks style
repositories. In the kernel's fork network (the repository containing
all objects from the kernel and all its forks), deduplicating the
references drops the snapshot size from 934 MB to just 12 MB.
But since we're handling duplicates in this way, we have to make sure
that we preferred references (those listed in pack.preferBitmapTips)
before non-preferred ones (to avoid recording an object which is pointed
at by a preferred tip as non-preferred).
We accomplish this by doing separate passes over the references: first
visiting each prefix in pack.preferBitmapTips, and then over the rest of
the references.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.
Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.
Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2
$ mv objects/e6/ objects/e7
Would emit ("[...]" used to abbreviate the OIDs):
git fsck
error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
Now we'll instead emit:
error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]
Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ mv objects/83 objects/84
As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:
$ git fsck
fatal: invalid object type
Now we'll instead emit sensible error messages:
$ git fsck
error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]
In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.
We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:
$ git hash-object --stdin -w -t blob </dev/null
e69de29bb2
$ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git fsck
error: garbage at end of loose object 'e69d[...]'
error: unable to unpack contents of ./objects/e6/9d[...]
error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]
There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:
$ git hash-object --stdin -w -t garbage --literally </dev/null
8315a83d2acc4c174aed59430f9a9c4ed926440f
$ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
$ /usr/bin/git fsck
fatal: invalid object type
$ ~/g/git/git fsck
error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
[...]
I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.
There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the error fsck emits on invalid object types, such as:
$ git hash-object --stdin -w -t garbage --literally </dev/null
<OID>
From the very ungraceful error of:
$ git fsck
fatal: invalid object type
$
To:
$ git fsck
error: <OID>: object is of unknown type 'garbage': <OID_PATH>
[ other fsck output ]
We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).
To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f9210 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().
Since we'll need a "struct strbuf" to hold the "type_name" let's pass
it to the for_each_loose_file_in_objdir() callback to avoid allocating
a new one for each loose object in the iteration. It also makes the
memory management simpler than sticking it in fsck_loose() itself, as
we'll only need to strbuf_reset() it, with no need to do a
strbuf_release() before each "return".
Before this commit we'd never check the "type" if read_loose_object()
failed, but now we do. We therefore need to initialize it to OBJ_NONE
to be able to tell the difference between e.g. its
unpack_loose_header() having failed, and us getting past that and into
parse_loose_header().
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.
As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.
1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().
The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.
The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.
Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.
This extends tests added in 3e370f9faf (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.
We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.
See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.
1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.
This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check if sorting takes advantage of already sorted or reversed content,
or if that corner case actually decreases performance, like it would for
a simplistic quicksort implementation.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a mode that turns a sorted list into adversarial input for a
bottom-up mergesort implementation that doubles the length of sorted
sublists at each level -- like our llist_mergesort().
While unriffle mode splits the list in half at each recursion step,
unriffle_skewed splits it into 2^l items and the rest, with 2^l being
the highest power of two smaller than the number of items and thus
2^l >= rest. The rest is unriffled with the tail of the first half to
require a merge to compare the maximum number of elements.
It complements the unriffle mode, which targets balanced merges. If
the number of elements is a power of two then both actually produce the
same result, as 2^l == rest == n/2 at each recursion step in that case.
Here are the results:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 1184 700 589 OK
sawtooth unriffle_skewed 1023 1024 16373 10230 9207 OK
sawtooth unriffle 1024 1024 16384 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 18454 11275 10241 OK
The sawtooth distribution with m>=n produces a sorted list and
unriffle_skewed mode turns it into adversarial input for unbalanced
merges, which it wins in all cases except for n=1024 -- the resulting
list is the same, but unriffle is tested before unriffle_skewed, so its
result is selected by the AWK script.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a mode that turns sorted items into adversarial input for mergesort.
Do that by running mergesort in reverse and rearranging the items in
such a way that each merge needs the maximum number of operations to
undo it.
To riffle is a card shuffling technique and involves splitting a deck
into two and then to interleave them. A perfect riffle takes one card
from each half in turn. That's similar to the most expensive merge,
which has to take one item from each sublist in turn, which requires the
maximum number of comparisons (n-1).
So unriffle does that in reverse, i.e. it generates the first sublist
out of the items at even indexes and the second sublist out of the items
at odd indexes, without changing their order in any other way. Done
recursively until we reach the trivial sublist length of one, this
twists the list into an order that requires the maximum effort for
mergesort to untangle.
As a baseline, here are the rand distributions with the highest number
of comparisons from "test-tool mergesort test":
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 1184 700 569 OK
rand reverse_1st_half 1023 256 16373 10230 8976 OK
rand reverse_1st_half 1024 512 16384 10240 8993 OK
rand dither 1025 64 18454 11275 9970 OK
And here are the most expensive ones overall:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
stagger reverse 100 64 1184 700 580 OK
sawtooth unriffle 1023 1024 16373 10230 9179 OK
sawtooth unriffle 1024 1024 16384 10240 9217 OK
stagger unriffle 1025 2048 18454 11275 10241 OK
The sawtooth distribution with m>=n generates a sorted list. The
unriffle mode is designed to turn that into adversarial input for
mergesort, and that checks out for n=1023 and n=1024, where it produces
the list that requires the most comparisons.
Item counts that are not powers of two have other winners, and that's
because unriffle recursively splits lists into equal-sized halves, while
llist_mergesort() splits them into the biggest power of two smaller than
n and the rest, e.g. for n=1025 it sorts the first 1024 separately and
finally merges them to the last item.
So unriffle mode works as designed for the intended use case, but to
consistently generate adversarial input for unbalanced merges we need
something else.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a subcommand for printing test data. It can be used to generate
special test cases and feed them into the sort subcommand or sort(1) for
performance measurements. It may also be useful to illustrate the
effect of distributions, modes and their parameters.
It generates n integers with the specified distribution and its
distribution-specific parameter m. E.g. m is the maximum value for
the plateau distribution and the length and height of individual teeth
of the sawtooth distribution.
The generated values are printed as zero-padded eight-digit hexadecimal
numbers to make sure alphabetic and numeric order are the same.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt the qsort certification program from "Engineering a Sort Function"
by Bentley and McIlroy for testing our linked list sort function. It
generates several lists with various distribution patterns and counts
the number of operations llist_mergesort() needs to order them. It
compares the result to the output of a trusted sort function (qsort(1))
and also checks if the sort is stable.
Also add a test script that makes use of the new subcommand.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Give the code for sorting a text file its own sub-command. This allows
extending the helper, which we'll do in the following patches.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Strip line ending characters to make sure empty lines are sorted like
sort(1) does.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The paths generated by difftool are passed to user-facing diff tools.
Using paths with repeated slashes in them is a cosmetic blemish that
is exposed to users and can be avoided.
Use a strbuf to create the buffer used for the dir-diff tmpdir.
Strip trailing slashes from the value read from TMPDIR to avoid
repeated slashes in the generated paths.
Adjust the error handling to avoid leaking strbufs and to avoid
returning -1 to cmd_main().
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some circumstances, "git grep --textconv --recurse-submodules"
ignores the textconv attributes from the submodules and erroneously
applies the attributes defined in the superproject on the submodules'
files. The textconv cache is also saved on the superproject, even for
submodule objects.
A fix for these problems will probably require at least three changes:
- Some textconv and attributes functions (as well as their callees) will
have to be adjusted to work with arbitrary repositories. Note that
"fill_textconv()", for example, already receives a "struct repository"
but it writes the textconv cache using "write_loose_object()", which
implicitly works on "the_repository".
- grep.c functions will have to call textconv/userdiff routines passing
the "repo" field from "struct grep_source" instead of the one from
"struct grep_opt". The latter always points to "the_repository" on
"git grep" executions (see its initialization in builtin/grep.c), but
the former points to the correct repository that each source (an
object, file, or buffer) comes from.
- "userdiff_find_by_path()" might need to use a different attributes
stack for each repository it works on or reset its internal static
stack when the repository is changed throughout the calls.
For now, let's add some tests to demonstrate these problems, and also
update a NEEDSWORK comment in grep.h that mentions this bug to reference
the added tests.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking into a geometric series and writing a multi-pack bitmap,
it is beneficial to have the largest resulting pack be the preferred
object source in the bitmap's MIDX, since selecting the large packs can
lead to fewer broken delta chains and better compression.
Teach 'git repack' to identify this pack and pass it to the MIDX write
machinery in order to mark it as preferred.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach `git repack` a new `--write-midx` option for callers that wish to
persist a multi-pack index in their repository while repacking.
There are two existing alternatives to this new flag, but they don't
cover our particular use-case. These alternatives are:
- Call 'git multi-pack-index write' after running 'git repack', or
- Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running
'git repack'.
The former works, but introduces a gap in bitmap coverage between
repacking and writing a new MIDX (since the repack may have deleted a
pack included in the existing MIDX, invalidating it altogether).
Setting the 'GIT_TEST_' environment variable is obviously unsupported.
In fact, even if it were supported officially, it still wouldn't work,
because it generates the MIDX *after* redundant packs have been dropped,
leading to the same issue as above.
Introduce a new option which eliminates this race by teaching `git
repack` to generate the MIDX at the critical point: after the new packs
have been written and moved into place, but before the redundant packs
have been removed.
This option is compatible with `git repack`'s '--bitmap' option (it
changes the interpretation to be: "write a bitmap corresponding to the
MIDX after one has been generated").
There is a little bit of additional noise in the patch below to avoid
repeating ourselves when selecting which packs to delete. Instead of a
single loop as before (where we iterate over 'existing_packs', decide if
a pack is worth deleting, and if so, delete it), we have two loops (the
first where we decide which ones are worth deleting, and the second
where we actually do the deleting). This makes it so we have a single
check we can make consistently when (1) telling the MIDX which packs we
want to exclude, and (2) actually unlinking the redundant packs.
There is also a tiny change to short-circuit the body of
write_midx_included_packs() when no packs remain in the case of an empty
repository. The MIDX code does not handle this, so avoid trying to
generate a MIDX covering zero packs in the first place.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To figure out which commits we can write a bitmap for, the multi-pack
index/bitmap code does a reachability traversal, marking any commit
which can be found in the MIDX as eligible to receive a bitmap.
This approach will cause a problem when multi-pack bitmaps are able to
be generated from `git repack`, since the reference tips can change
during the repack. Even though we ignore commits that don't exist in
the MIDX (when doing a scan of the ref tips), it's possible that a
commit in the MIDX reaches something that isn't.
This can happen when a multi-pack index contains some pack which refers
to loose objects (e.g., if a pack was pushed after starting the repack
but before generating the MIDX which depends on an object which is
stored as loose in the repository, and by definition isn't included in
the multi-pack index).
By taking a snapshot of the references before we start repacking, we can
close that race window. In the above scenario (where we have a packed
object pointing at a loose one), we'll either (a) take a snapshot of the
references before seeing the packed one, or (b) take it after, at which
point we can guarantee that the loose object will be packed and included
in the MIDX.
This patch does just that. It writes a temporary "reference snapshot",
which is a list of OIDs that are at the ref tips before writing a
multi-pack bitmap. References that are "preferred" (i.e,. are a suffix
of at least one value of the 'pack.preferBitmapTips' configuration) are
marked with a special '+'.
The format is simple: one line per commit at each tip, with an optional
'+' at the beginning (for preferred references, as described above).
When provided, the reference snapshot is used to drive bitmap selection
instead of the MIDX code doing its own traversal. When it isn't
provided, the usual traversal takes place instead.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>