The mergesort implementation used to sort linked list has been
optimized.
* rs/mergesort:
test-mergesort: use repeatable random numbers
mergesort: use ranks stack
p0071: test performance of llist_mergesort()
p0071: measure sorting of already sorted and reversed files
test-mergesort: add unriffle_skewed mode
test-mergesort: add unriffle mode
test-mergesort: add generate subcommand
test-mergesort: add test subcommand
test-mergesort: add sort subcommand
test-mergesort: use strbuf_getline()
When "git cmd -h" shows more than one line of usage text (e.g.
the cmd subcommand may take sub-sub-command), parse-options API
learned to align these lines, even across i18n/l10n.
* ab/align-parse-options-help:
parse-options: properly align continued usage output
git rev-parse --parseopt tests: add more usagestr tests
send-pack: properly use parse_options() API for usage string
parse-options API users: align usage output in C-strings
Teach "git help -c" into helping the command line completion of
configuration variables.
* ab/help-config-vars:
help: move column config discovery to help.c library
help / completion: make "git help" do the hard work
help tests: test --config-for-completion option & output
help: simplify by moving to OPT_CMDMODE()
help: correct logic error in combining --all and --guides
help: correct logic error in combining --all and --config
help tests: add test for --config output
help: correct usage & behavior of "git help --guides"
help: correct the usage string in -h and documentation
Built-in fsmonitor (part 1).
* jh/builtin-fsmonitor-part1:
t/helper/simple-ipc: convert test-simple-ipc to use start_bg_command
run-command: create start_bg_command
simple-ipc/ipc-win32: add Windows ACL to named pipe
simple-ipc/ipc-win32: add trace2 debugging
simple-ipc: move definition of ipc_active_state outside of ifdef
simple-ipc: preparations for supporting binary messages.
trace2: add trace2_child_ready() to report on background children
Updates to the tests in t0000 to test the test framework.
* ab/lib-subtest:
test-lib tests: get rid of copy/pasted mock test code
test-lib tests: assert 1 exit code, not non-zero
test-lib tests: refactor common part of check_sub_test_lib_test*()
test-lib tests: avoid subshell for "test_cmp" for readability
test-lib tests: don't provide a description for the sub-tests
test-lib tests: split up "write and run" into two functions
test-lib tests: move "run_sub_test" to a new lib-subtest.sh
Various fixes in code paths that move untracked files away to make room.
* en/removing-untracked-fixes:
Documentation: call out commands that nuke untracked files/directories
Comment important codepaths regarding nuking untracked files/dirs
unpack-trees: avoid nuking untracked dir in way of locally deleted file
unpack-trees: avoid nuking untracked dir in way of unmerged file
Change unpack_trees' 'reset' flag into an enum
Remove ignored files by default when they are in the way
unpack-trees: make dir an internal-only struct
unpack-trees: introduce preserve_ignored to unpack_trees_options
read-tree, merge-recursive: overwrite ignored files by default
checkout, read-tree: fix leak of unpack_trees_options.dir
t2500: add various tests for nuking untracked files
"git grep --recurse-submodules" takes trees and blobs from the
submodule repository, but the textconv settings when processing a
blob from the submodule is not taken from the submodule repository.
A test is added to demonstrate the issue, without fixing it.
* mt/grep-submodule-textconv:
grep: demonstrate bug with textconv attributes and submodules
"git add", "git mv", and "git rm" have been adjusted to avoid
updating paths outside of the sparse-checkout definition unless
the user specifies a "--sparse" option.
* ds/add-rm-with-sparse-index:
advice: update message to suggest '--sparse'
mv: refuse to move sparse paths
rm: skip sparse paths with missing SKIP_WORKTREE
rm: add --sparse option
add: update --renormalize to skip sparse paths
add: update --chmod to skip sparse paths
add: implement the --sparse option
add: skip tracked paths outside sparse-checkout cone
add: fail when adding an untracked sparse file
dir: fix pattern matching on dirs
dir: select directories correctly
t1092: behavior for adding sparse files
t3705: test that 'sparse_entry' is unstaged
Code clean-up in "git difftool".
* da/difftool:
difftool: add a missing space to the run_dir_diff() comments
difftool: remove an unnecessary call to strbuf_release()
difftool: refactor dir-diff to write files using helper functions
difftool: create a tmpdir path without repeated slashes
CI learns to run the leak sanitizer builds.
* ab/sanitize-leak-ci:
tests: add a test mode for SANITIZE=leak, run it in CI
Makefile: add SANITIZE=leak flag to GIT-BUILD-OPTIONS
The ref iteration code used to optionally allow dangling refs to be
shown, which has been tightened up.
* jk/ref-paranoia:
refs: drop "broken" flag from for_each_fullref_in()
ref-filter: drop broken-ref code entirely
ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
repack, prune: drop GIT_REF_PARANOIA settings
refs: turn on GIT_REF_PARANOIA by default
refs: omit dangling symrefs when using GIT_REF_PARANOIA
refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
refs-internal.h: move DO_FOR_EACH_* flags next to each other
t5312: be more assertive about command failure
t5312: test non-destructive repack
t5312: create bogus ref as necessary
t5312: drop "verbose" helper
t5600: provide detached HEAD for corruption failures
t5516: don't use HEAD ref for invalid ref-deletion tests
t7900: clean up some more broken refs
Test updates.
* sg/test-split-index-fix:
read-cache: fix GIT_TEST_SPLIT_INDEX
tests: disable GIT_TEST_SPLIT_INDEX for sparse index tests
read-cache: look for shared index files next to the index, too
t1600-index: disable GIT_TEST_SPLIT_INDEX
t1600-index: don't run git commands upstream of a pipe
t1600-index: remove unnecessary redirection
"git multi-pack-index write --bitmap" learns to propagate the
hashcache from original bitmap to resulting bitmap.
* tb/midx-write-propagate-namehash:
t5326: test propagating hashcache values
p5326: generate pack bitmaps before writing the MIDX bitmap
p5326: don't set core.multiPackIndex unnecessarily
p5326: create missing 'perf-tag' tag
midx.c: respect 'pack.writeBitmapHashcache' when writing bitmaps
pack-bitmap.c: propagate namehash values from existing bitmaps
t/helper/test-bitmap.c: add 'dump-hashes' mode
Use MINSTD to generate pseudo-random numbers consistently instead of
using rand(3), whose output can vary from system to system, and reset
its seed before filling in the test values. This gives repeatable
results across versions and systems, which simplifies sharing and
comparing of results between developers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* ab/repo-settings-cleanup:
repository.h: don't use a mix of int and bitfields
repo-settings.c: simplify the setup
read-cache & fetch-negotiator: check "enum" values in switch()
environment.c: remove test-specific "ignore_untracked..." variable
wrapper.c: add x{un,}setenv(), and use xsetenv() in environment.c
Regression in "git commit-graph" command line parsing has been
corrected.
* tb/commit-graph-usage-fix:
builtin/multi-pack-index.c: disable top-level --[no-]progress
builtin/commit-graph.c: don't accept common --[no-]progress
"git rebase <upstream> <tag>" failed when aborted in the middle, as
it mistakenly tried to write the tag object instead of peeling it
to HEAD.
* pw/rebase-of-a-tag-fix:
rebase: dereference tags
rebase: use lookup_commit_reference_by_name()
rebase: use our standard error return value
t3407: rework rebase --quit tests
t3407: strengthen rebase --abort tests
t3407: use test_path_is_missing
t3407: rename a variable
t3407: use test_cmp_rev
t3407: use test_commit
t3407: run tests in $TEST_DIRECTORY
More code paths that use the hack to add submodule's object
database to the set of alternate object store have been cleaned up.
* jt/add-submodule-odb-clean-up:
revision: remove "submodule" from opt struct
repository: support unabsorbed in repo_submodule_init
submodule: remove unnecessary unabsorbed fallback
When using `test_size` with `wc -c`, users on certain platforms can run
into issues when `wc` emits leading space characters in its output,
which confuses get_times.
Callers could switch to use test_file_size instead of `wc -c` (the
former never prints leading space characters, so will always work with
test_size regardless of platform), but this is an easy enough spot to
miss that we should teach get_times to be more tolerant of the input it
accepts.
Teach get_times to do just that by stripping any leading space
characters.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29)
added a perf test that calls tac(1) from GNU core utilities. Support
systems without it by reversing the generated list using sort -nr
instead. sort(1) with options -n and -r is already used in other tests.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Protocol v0 clients can get stuck parsing a malformed feature line.
* ah/connect-parse-feature-v0-fix:
connect: also update offset for features without values
Sensitive data in the HTTP trace were supposed to be redacted, but
we failed to do so in HTTP/2 requests.
* jk/http-redact-fix:
http: match headers case-insensitively when redacting
"git cvsserver" had a long-standing bug in its authentication code,
which has finally been corrected (it is unclear and is a separate
question if anybody is seriously using it, though).
* cb/cvsserver:
Documentation: cleanup git-cvsserver
git-cvsserver: protect against NULL in crypt(3)
git-cvsserver: use crypt correctly to compare password hashes
"git clone" from a repository whose HEAD is unborn into a bare
repository didn't follow the branch name the other side used, which
is corrected.
* jk/clone-unborn-head-in-bare:
clone: handle unborn branch in bare repos
"git stash", where the tentative change involves changing a
directory to a file (or vice versa), was confused, which has been
corrected.
* en/stash-df-fix:
stash: restore untracked files AFTER restoring tracked files
stash: avoid feeding directories to update-index
t3903: document a pair of directory/file bugs
Check if sorting takes advantage of already sorted or reversed content,
or if that corner case actually decreases performance, like it would for
a simplistic quicksort implementation.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a mode that turns a sorted list into adversarial input for a
bottom-up mergesort implementation that doubles the length of sorted
sublists at each level -- like our llist_mergesort().
While unriffle mode splits the list in half at each recursion step,
unriffle_skewed splits it into 2^l items and the rest, with 2^l being
the highest power of two smaller than the number of items and thus
2^l >= rest. The rest is unriffled with the tail of the first half to
require a merge to compare the maximum number of elements.
It complements the unriffle mode, which targets balanced merges. If
the number of elements is a power of two then both actually produce the
same result, as 2^l == rest == n/2 at each recursion step in that case.
Here are the results:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
sawtooth unriffle_skewed 100 128 1184 700 589 OK
sawtooth unriffle_skewed 1023 1024 16373 10230 9207 OK
sawtooth unriffle 1024 1024 16384 10240 9217 OK
sawtooth unriffle_skewed 1025 2048 18454 11275 10241 OK
The sawtooth distribution with m>=n produces a sorted list and
unriffle_skewed mode turns it into adversarial input for unbalanced
merges, which it wins in all cases except for n=1024 -- the resulting
list is the same, but unriffle is tested before unriffle_skewed, so its
result is selected by the AWK script.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a mode that turns sorted items into adversarial input for mergesort.
Do that by running mergesort in reverse and rearranging the items in
such a way that each merge needs the maximum number of operations to
undo it.
To riffle is a card shuffling technique and involves splitting a deck
into two and then to interleave them. A perfect riffle takes one card
from each half in turn. That's similar to the most expensive merge,
which has to take one item from each sublist in turn, which requires the
maximum number of comparisons (n-1).
So unriffle does that in reverse, i.e. it generates the first sublist
out of the items at even indexes and the second sublist out of the items
at odd indexes, without changing their order in any other way. Done
recursively until we reach the trivial sublist length of one, this
twists the list into an order that requires the maximum effort for
mergesort to untangle.
As a baseline, here are the rand distributions with the highest number
of comparisons from "test-tool mergesort test":
$ t/helper/test-tool mergesort test | awk '
NR > 1 && $1 != "rand" {next}
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
rand copy 100 32 1184 700 569 OK
rand reverse_1st_half 1023 256 16373 10230 8976 OK
rand reverse_1st_half 1024 512 16384 10240 8993 OK
rand dither 1025 64 18454 11275 9970 OK
And here are the most expensive ones overall:
$ t/helper/test-tool mergesort test | awk '
$7 > max[$3] {max[$3] = $7; line[$3] = $0}
END {for (n in line) print line[n]}
'
distribut mode n m get_next set_next compare verdict
stagger reverse 100 64 1184 700 580 OK
sawtooth unriffle 1023 1024 16373 10230 9179 OK
sawtooth unriffle 1024 1024 16384 10240 9217 OK
stagger unriffle 1025 2048 18454 11275 10241 OK
The sawtooth distribution with m>=n generates a sorted list. The
unriffle mode is designed to turn that into adversarial input for
mergesort, and that checks out for n=1023 and n=1024, where it produces
the list that requires the most comparisons.
Item counts that are not powers of two have other winners, and that's
because unriffle recursively splits lists into equal-sized halves, while
llist_mergesort() splits them into the biggest power of two smaller than
n and the rest, e.g. for n=1025 it sorts the first 1024 separately and
finally merges them to the last item.
So unriffle mode works as designed for the intended use case, but to
consistently generate adversarial input for unbalanced merges we need
something else.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a subcommand for printing test data. It can be used to generate
special test cases and feed them into the sort subcommand or sort(1) for
performance measurements. It may also be useful to illustrate the
effect of distributions, modes and their parameters.
It generates n integers with the specified distribution and its
distribution-specific parameter m. E.g. m is the maximum value for
the plateau distribution and the length and height of individual teeth
of the sawtooth distribution.
The generated values are printed as zero-padded eight-digit hexadecimal
numbers to make sure alphabetic and numeric order are the same.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adapt the qsort certification program from "Engineering a Sort Function"
by Bentley and McIlroy for testing our linked list sort function. It
generates several lists with various distribution patterns and counts
the number of operations llist_mergesort() needs to order them. It
compares the result to the output of a trusted sort function (qsort(1))
and also checks if the sort is stable.
Also add a test script that makes use of the new subcommand.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Give the code for sorting a text file its own sub-command. This allows
extending the helper, which we'll do in the following patches.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Strip line ending characters to make sure empty lines are sorted like
sort(1) does.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The paths generated by difftool are passed to user-facing diff tools.
Using paths with repeated slashes in them is a cosmetic blemish that
is exposed to users and can be avoided.
Use a strbuf to create the buffer used for the dir-diff tmpdir.
Strip trailing slashes from the value read from TMPDIR to avoid
repeated slashes in the generated paths.
Adjust the error handling to avoid leaking strbufs and to avoid
returning -1 to cmd_main().
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some circumstances, "git grep --textconv --recurse-submodules"
ignores the textconv attributes from the submodules and erroneously
applies the attributes defined in the superproject on the submodules'
files. The textconv cache is also saved on the superproject, even for
submodule objects.
A fix for these problems will probably require at least three changes:
- Some textconv and attributes functions (as well as their callees) will
have to be adjusted to work with arbitrary repositories. Note that
"fill_textconv()", for example, already receives a "struct repository"
but it writes the textconv cache using "write_loose_object()", which
implicitly works on "the_repository".
- grep.c functions will have to call textconv/userdiff routines passing
the "repo" field from "struct grep_source" instead of the one from
"struct grep_opt". The latter always points to "the_repository" on
"git grep" executions (see its initialization in builtin/grep.c), but
the former points to the correct repository that each source (an
object, file, or buffer) comes from.
- "userdiff_find_by_path()" might need to use a different attributes
stack for each repository it works on or reset its internal static
stack when the repository is changed throughout the calls.
For now, let's add some tests to demonstrate these problems, and also
update a NEEDSWORK comment in grep.h that mentions this bug to reference
the added tests.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup to limit memory consumption and tighten protocol
message parsing.
* jk/reduce-malloc-in-v2-servers:
ls-refs: reject unknown arguments
serve: reject commands used as capabilities
serve: reject bogus v2 "command=ls-refs=foo"
docs/protocol-v2: clarify some ls-refs ref-prefix details
ls-refs: ignore very long ref-prefix counts
serve: drop "keys" strvec
serve: provide "receive" function for session-id capability
serve: provide "receive" function for object-format capability
serve: add "receive" method for v2 capabilities table
serve: return capability "value" from get_capability()
serve: rename is_command() to parse_command()
The previous changes modified the behavior of 'git add', 'git rm', and
'git mv' to not adjust paths outside the sparse-checkout cone, even if
they exist in the working tree and their cache entries lack the
SKIP_WORKTREE bit. The intention is to warn users that they are doing
something potentially dangerous. The '--sparse' option was added to each
command to allow careful users the same ability they had before.
To improve the discoverability of this new functionality, add a message
to advice.updateSparsePath that mentions the existence of the option.
The previous set of changes also modified the purpose of this message to
include possibly a list of paths instead of only a list of pathspecs.
Make the warning message more clear about this new behavior.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since cmd_mv() does not operate on cache entries and instead directly
checks the filesystem, we can only use path_in_sparse_checkout() as a
mechanism for seeing if a path is sparse or not. Be sure to skip
returning a failure if '-k' is specified.
To ensure that the advice around sparse paths is the only reason a move
failed, be sure to check this as the very last thing before inserting
into the src_for_dst list.
The tests cover a variety of cases such as whether the target is tracked
or untracked, and whether the source or destination are in or outside of
the sparse-checkout definition.
Helped-by: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a path does not match the sparse-checkout cone but is somehow missing
the SKIP_WORKTREE bit, then 'git rm' currently succeeds in removing the
file. One reason a user might be in this situation is a merge conflict
outside of the sparse-checkout cone. Removing such a file might be
problematic for users who are not sure what they are doing.
Add a check to path_in_sparse_checkout() when 'git rm' is checking if a
path should be considered for deletion. Of course, this check is ignored
if the '--sparse' option is specified, allowing users who accept the
risks to continue with the removal.
This also removes a confusing behavior where a user asks for a directory
to be removed, but only the entries that are within the sparse-checkout
definition are removed. Now, 'git rm <dir>' will fail without '--sparse'
and will succeed in removing all contained paths with '--sparse'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we did previously in 'git add', add a '--sparse' option to 'git rm'
that allows modifying paths outside of the sparse-checkout definition.
The existing checks in 'git rm' are restricted to tracked files that
have the SKIP_WORKTREE bit in the current index. Future changes will
cause 'git rm' to reject removing paths outside of the sparse-checkout
definition, even if they are untracked or do not have the SKIP_WORKTREE
bit.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We added checks for path_in_sparse_checkout() to portions of 'git add'
that add warnings and prevent stagins a modification, but we skipped the
--renormalize mode. Update renormalize_tracked_files() to ignore cache
entries whose path is outside of the sparse-checkout cone (unless
--sparse is provided). Add a test in t3705.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We added checks for path_in_sparse_checkout() to portions of 'git add'
that add warnings and prevent staging a modification, but we skipped the
--chmod mode. Update chmod_pathspec() to ignore cache entries whose path
is outside of the sparse-checkout cone (unless --sparse is provided).
Add a test in t3705.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We previously modified 'git add' to refuse updating index entries
outside of the sparse-checkout cone. This is justified to prevent users
from accidentally getting into a confusing state when Git removes those
files from the working tree at some later point.
Unfortunately, this caused some workflows that were previously possible
to become impossible, especially around merge conflicts outside of the
sparse-checkout cone. These were documented in tests within t1092.
We now re-enable these workflows using a new '--sparse' option to 'git
add'. This allows users to signal "Yes, I do know what I'm doing with
these files," and accept the consequences of the files leaving the
worktree later.
We delay updating the advice message until implementing a similar option
in 'git rm' and 'git mv'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'git add' adds a tracked file that is outside of the
sparse-checkout cone, it checks the SKIP_WORKTREE bit to see if the file
exists outside of the sparse-checkout cone. This is usually correct,
except in the case of a merge conflict outside of the cone.
Modify add_pathspec_matched_against_index() to be more careful about
paths by checking the sparse-checkout patterns in addition to the
SKIP_WORKTREE bit. This causes 'git add' to no longer allow files
outside of the cone that removed the SKIP_WORKTREE bit due to a merge
conflict.
With only this change, users will only be able to add the file after
adding the file to the sparse-checkout cone. A later change will allow
users to force adding even though the file is outside of the
sparse-checkout cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>