git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	d89db6f4c3	Merge branch 'sm/branch-sort-config' "git branch --list" learned to take the default sort order from the 'branch.sort' configuration variable, just like "git tag --list" pays attention to 'tag.sort'. * sm/branch-sort-config: branch: support configuring --sort via .gitconfig	2018-08-27 14:33:42 -07:00
Junio C Hamano	9b73732577	Merge branch 'nd/config-core-checkstat-doc' The meaning of the possible values the "core.checkStat" configuration variable can take were not adequately documented, which has been fixed. * nd/config-core-checkstat-doc: config.txt: clarify core.checkStat	2018-08-27 14:33:42 -07:00
Ævar Arnfjörð Bjarmason	4a3ed63802	tests: fix and add lint for non-portable grep --file The --file option to grep isn't in POSIX[1], but -f is[1]. Let's check for that in the future, and fix the portability regression in `f237c8b6fe` ("commit-graph: implement git-commit-graph write", 2018-04-02) that broke e.g. AIX. 1. http://pubs.opengroup.org/onlinepubs/009695399/utilities/grep.html Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 14:07:32 -07:00
Ævar Arnfjörð Bjarmason	8c97e38731	tests: fix version-specific portability issue in Perl JSON The test guarded by PERLJSON added in `75459410ed` ("json_writer: new routines to create JSON data", 2018-07-13) assumed that a JSON boolean value like "true" or "false" would be represented as "1" or "0" in Perl. This behavior can't be relied upon, e.g. with JSON.pm 2.50 and JSON::PP. A JSON::PP::Boolean object will be represented as "true" or "false". To work around this let's check if we have any refs left after we check for hashes and arrays, assume those are JSON objects, and coerce them to a known boolean value. The behavior of this test still looks odd to me. Why implement our own ad-hoc encoder just for some one-off test, as opposed to say Perl's own Data::Dumper with Sortkeys et al? But with this change it works, so let's leave it be. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 14:07:32 -07:00
Ævar Arnfjörð Bjarmason	a3c4c8841c	tests: use shorter labels in chainlint.sed for AIX sed Improve the portability of chainlint by using shorter labels. On AIX sed will complain about: sed: 0602-417 The label :hereslurp is greater than eight characters This, in combination with the previous fix to this file makes GIT_TEST_CHAIN_LINT=1 (which is the default) working again on AIX without issues, and the "gmake check-chainlint" test also passes. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 14:07:18 -07:00
Kyle Meyer	72f47be2db	range-diff: update stale summary of --no-dual-color `275267937b` (range-diff: make dual-color the default mode, 2018-08-13) replaced --dual-color with --no-dual-color but left the option's summary untouched. Rewrite the summary to describe --no-dual-color rather than dual-color. Helped-by: Jonathan Nieder <jrnieder@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 13:13:59 -07:00
Nguyễn Thái Ngọc Duy	5f4436a721	Document update for nd/unpack-trees-with-cache-tree Fix an incorrect comment in the new code added in `b4da37380b` (unpack-trees: optimize walking same trees with cache-tree - 2018-08-18) and document about the new test variable that is enabled by default in test-lib.sh in `4592e6080f` (cache-tree: verify valid cache-tree in the test suite - 2018-08-18) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 12:26:17 -07:00
Ævar Arnfjörð Bjarmason	2d9ded8acc	tests: fix comment syntax in chainlint.sed for AIX sed Change a comment in chainlint.sed to appease AIX sed, which would previously print this error: sed: # stash for later printing is not a recognized function 1. https://public-inbox.org/git/CAPig+cTTbU5HFMKgNyrxTp3+kcK46-Fn=4ZH6zDt1oQChAc3KA@mail.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Acked-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 11:31:43 -07:00
Ævar Arnfjörð Bjarmason	b2fa7a2372	tests: fix and add lint for non-portable seq The seq command is not in POSIX, and doesn't exist on e.g. OpenBSD. We've had the test_seq wrapper since `d17cf5f3a3` ("tests: Introduce test_seq", 2012-08-04), but use of it keeps coming back, e.g. in the recently added "fetch negotiator" tests being added here. So let's also add a check to "make test-lint". The regex is aiming to capture the likes of $(seq ..) and "seq" as a stand-alone command, without capturing some existing cases where we e.g. have files called "seq", as \bseq\b would do. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 11:31:18 -07:00
Ævar Arnfjörð Bjarmason	2a59a6ef20	tests: fix and add lint for non-portable head -c N The "head -c BYTES" option is non-portable (not in POSIX[1]). Change such invocations to use the test_copy_bytes wrapper added in `48860819e8` ("t9300: factor out portable "head -c" replacement", 2016-06-30). This fixes a test added in `9d2e330b17` ("ewah_read_mmap: bounds-check mmap reads", 2018-06-14), which has been breaking t5310-pack-bitmaps.sh on OpenBSD since 2.18.0. The OpenBSD ports already have a similar workaround after their upgrade to 2.18.0[2]. I have not tested this on IRIX, but according to `4de0bbd898` ("t9300: use perl "head -c" clone in place of "dd bs=1 count=16000" kluge", 2010-12-13) this invocation would have broken things there too. Also, change a valgrind-specific codepath in test-lib.sh to use this wrapper. Given where valgrind runs I don't think this would ever become a portability issue in practice, but it's easier to just use the wrapper than introduce some exception for the "make test-lint" check being added here. 1. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/head.html 2. `08d5d82eae (diff-f7d3c4fabeed1691620d608f1534f5e5)` Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-27 11:31:18 -07:00
Jean-Noël Avila	27c929edd6	i18n: fix mistakes in translated strings Fix typos and convert a question which does not expect to be replied to a simple advice. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 14:29:12 -07:00
Derrick Stolee	d915114a3b	config: fix commit-graph related config docs The core.commitGraph config setting was accidentally removed from the config documentation. In that same patch, the config setting that writes a commit-graph during garbage collection was incorrectly written to the doc as "gc.commitGraph" instead of "gc.writeCommitGraph". Reported-by: Szeder Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:20:54 -07:00
Jeff King	66e83d9b41	append_signoff: use size_t for string offsets The append_signoff() function takes an "int" to specify the number of bytes to ignore. Most callers just pass 0, and the remainder use ignore_non_trailer() to skip over cruft. That function also returns an int, and uses them internally. On systems where size_t is larger than an int (i.e., most 64-bit systems), dealing with a ridiculously large commit message could end up overflowing an int, producing surprising results (e.g., returning a negative offset, which would cause us to look outside the original string). Let's consistently use size_t for these offsets through this whole stack. As a bonus, this makes the meaning of "ignore_footer" as an offset (and not a boolean) more clear. But while we're here, let's also document the interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
Jeff King	ffce7f590f	sequencer: ignore "---" divider when parsing trailers When the sequencer code appends a signoff or cherry-pick origin, it uses the default trailer-parsing options, which treat "---" as the end of the commit message. As a result, it may be fooled by a commit message that contains that string and fail to find the existing trailer block. Even more confusing, the actual append code does not know about "---", and always appends to the end of the string. This can lead to bizarre results. E.g., appending a signoff to a commit message like this: subject body --- these dashes confuse the parser! Signed-off-by: A results in output with a final block like: Signed-off-by: A Signed-off-by: A The parser thinks the final line of the message is "body", and ignores everything else, claiming there are no trailers. So we output an extra newline separator (wrong) and add a duplicate signoff (also wrong). Since we know we are feeding a pure commit message, we can simply tell the parser to ignore the "---" divider. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
Jeff King	e5fba5d558	pretty, ref-filter: format %(trailers) with no_divider option In both of these cases we know that we are feeding the trailer-parsing code a pure commit message. We should tell it so, which avoids false positives for a commit message that contains a "---" line. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
Jeff King	1688c9a489	interpret-trailers: allow suppressing "---" divider Even with the newly-tightened "---" parser, it's still possible for a commit message to trigger a false positive if it contains something like "--- foo". If the caller knows that it has only a single commit message, it can now tell us with the "--no-divider" option, eliminating any false positives. If we were designing this from scratch, I'd probably make this the default. But we've advertised the "---" behavior in the documentation since interpret-trailers has existed. Since it's meant to be scripted, breaking that would be a bad idea. Note that the logic is in the underlying trailer.c code, which is used elsewhere. The default there will keep the current behavior, but many callers will benefit from setting this new option. That's left for future patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
Jeff King	c188668e38	interpret-trailers: tighten check for "---" patch boundary The interpret-trailers command accepts not only raw commit messages, but it also can manipulate trailers in format-patch output. That means it must find the "---" boundary separating the commit message from the patch. However, it does so by looking for any line starting with "---", regardless of whether there is further content. This is overly lax compared to the parsing done in mailinfo.c's patchbreak(), and may cause false positives (e.g., t/perf output tables uses dashes; if you cut and paste them into your commit message, it fools the parser). We could try to reuse patchbreak() here, but it actually has several heuristics that are not of interest to us (e.g., matching "diff -" without a three-dash separator or even a CVS "Index:" line). We're not interested in taking in whatever random cruft people may send, but rather handling git-formatted patches. Note that the existing documentation was written in a loose way, so technically we are changing the behavior from what it said. But this should implement the original intent in a more accurate way. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
Jeff King	00a21f5cbd	trailer: pass process_trailer_opts to trailer_info_get() Most of the trailer code has an "opts" struct which is filled in by the caller. We don't pass it down to trailer_info_get(), which does the initial parsing, because there hasn't yet been a need to do so. Let's start passing it down in preparation for adding new options. Note that there's a single caller which doesn't otherwise have such an options struct. Since it's just one caller (that we'd have to modify anyway), let's not bother with any special treatment like accepting a NULL options struct, and just have it allocate one with the defaults. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
Jeff King	a3b636e215	trailer: use size_t for iterating trailer list We store the length of the trailers list in a size_t. So on a 64-bit system with a 32-bit int, in the unlikely case that we manage to actually allocate a list with 2^31 entries, we'd loop forever trying to iterate over it (our "int" would wrap to negative before exceeding info->trailer_nr). This probably doesn't matter in practice. Each entry is at least a pointer plus a non-empty string, so even without malloc overhead or the memory to hold the original string we're parsing from, you'd need to allocate tens of gigabytes. But it's easy enough to do it right. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
Jeff King	0d2db00e24	trailer: use size_t for string offsets Many of the string-parsing functions inside trailer.c return integer offsets into the string (e.g., to point to the end of the trailer block). Several of these use an "int" to return or store the offsets. On a system where "size_t" is much larger than "int" (e.g., most 64-bit ones), it's easy to feed a gigantic commit message that results in a negative offset. This can result in us reading memory before the string (if the int is used as an index) or far after (if it's implicitly cast to a size_t by passing to a strbuf function). Let's fix this by using size_t for all string offsets. Note that several of the functions need ssize_t, since they use "-1" as a sentinel value. The interactions here can be pretty subtle. E.g., end_of_title in find_trailer_start() does not itself need to be signed, but it is compared to the result of last_line(), which is. That promotes the latter to unsigned, and the ">=" does not behave as you might expect. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 10:08:51 -07:00
SZEDER Gábor	7afb0d6777	t/lib-rebase.sh: support explicit 'pick' commands in 'fake_editor.sh' The verbose output of the test 'reword without issues functions as intended' in 't3423-rebase-reword.sh', added in `a9279c6785` (sequencer: do not squash 'reword' commits when we hit conflicts, 2018-06-19), contains the following error output: sed: -e expression #1, char 2: extra characters after command This error comes from within the 'fake-editor.sh' script created by 'lib-rebase.sh's set_fake_editor() function, and the root cause is the FAKE_LINES="pick 1 reword 2" variable in the test in question, in particular the "pick" word. 'fake-editor.sh' assumes 'pick' to be the default rebase command and doesn't support an explicit 'pick' command in FAKE_LINES. As a result, 'pick' will be used instead of a line number when assembling the following 'sed' script: sed -n picks/^pick/pick/p which triggers the aforementioned error. Luckily, this didn't affect the test's correctness: the erroring 'sed' command doesn't write anything to the todo script, and processing the rest of FAKE_LINES generates the desired todo script, as if that 'pick' command were not there at all. The minimal fix would be to remove the 'pick' word from FAKE_LINES, but that would leave us susceptible to similar issues in the future. Instead, teach the fake-editor script to recognize an explicit 'pick' command, which is still a fairly trivial change. In the future we might want to consider reinforcing this fake editor script with an &&-chain and stricter parsing of the FAKE_LINES variable (e.g. to error out when encountering unknown rebase commands or commands and line numbers in the wrong order). Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 09:23:34 -07:00
Jeff King	183a638b7d	hashcmp: assert constant hash size Prior to `509f6f62a4` (cache: update object ID functions for the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a constant size of 20 bytes. Some compilers were able to turn that into a few quad-word comparisons, which is faster than actually calling memcmp(). In `509f6f62a4`, we started using the_hash_algo->rawsz instead. Even though this will always be 20, the compiler doesn't know that while inlining hashcmp() and ends up just generating a call to memcmp(). Eventually we'll have to deal with multiple hash sizes, but for the upcoming v2.19, we can restore some of the original performance by asserting on the size. That gives the compiler enough information to know that the memcmp will always be called with a length of 20, and it performs the same optimization. Here are numbers for p0001.2 run against linux.git on a few versions. This is using -O2 with gcc 8.2.0. Test v2.18.0 v2.19.0-rc0 HEAD ------------------------------------------------------------------------------ 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) -1.0% You can see that v2.19 is a little slower than v2.18. This commit ended up slightly faster than v2.18, but there's a fair bit of run-to-run noise (the generated code in the two cases is basically the same). This patch does seem to be consistently 1-2% faster than v2.19. I tried changing hashcpy(), which was also touched by `509f6f62a4`, in the same way, but couldn't measure any speedup. Which makes sense, at least for this workload. A traversal of the whole commit graph requires looking up every entry of every tree via lookup_object(). That's many multiples of the numbers of objects in the repository (most of the lookups just return "yes, we already saw that object"). [jn: verified using "make object.s" that the memcmp call goes away.] Reported-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Jeff King <peff@peff.net Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-23 06:20:58 -07:00
Jeff King	a12cbe23ef	rev-list: make empty --stdin not an error When we originally did the series that contains `7ba826290a` (revision: add rev_input_given flag, 2017-08-02) the intent was that "git rev-list --stdin </dev/null" would similarly become a successful noop. However, an attempt at the time to do that did not work[1]. The problem is that rev_input_given serves two roles: - it tells rev-list.c that it should not error out - it tells revision.c that it should not have the "default" ref kick (e.g., "HEAD" in "git log") We want to trigger the former, but not the latter. This is technically possible with a single flag, if we set the flag only after revision.c's revs->def check. But this introduces a rather subtle ordering dependency. Instead, let's keep two flags: one to denote when we got actual input (which triggers both roles) and one for when we read stdin (which triggers only the first). This does mean a caller interested in the first role has to check both flags, but there's only one such caller. And any future callers might want to make the distinction anyway (e.g., if they care less about erroring out, and more about whether revision.c soaked up our stdin). In fact, we already keep such a flag internally in revision.c for this purpose, so this is really just exposing that to the caller (and the old function-local flag can go away in favor of our new one). [1] https://public-inbox.org/git/20170802223416.gwiezhbuxbdmbjzx@sigill.intra.peff.net/ Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 14:44:50 -07:00
SZEDER Gábor	2745817028	t3420-rebase-autostash: don't try to grep non-existing files Several tests in 't3420-rebase-autostash.sh' start various rebase processes that are expected to fail because of merge conflicts. These tests then run '! grep' to ensure that the autostash feature did its job, and the dirty contents of a file is gone. However, due to the test repo's history and the choice of upstream branch that file shouldn't exist in the conflicted state at all. Consequently, this 'grep' doesn't fail as expected, because it can't find the dirty content, but it fails because it can't open the file. Tighten this check by using 'test_path_is_missing' instead, thereby avoiding unexpected errors from 'grep' as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 11:52:51 -07:00
SZEDER Gábor	79b04f9b60	t3903-stash: don't try to grep non-existing file The test 'store updates stash ref and reflog' in 't3903-stash.sh' creates a stash from a new file, runs 'git reset --hard' to throw away any modifications to the work tree, and then runs '! grep' to ensure that the staged contents are gone. Since the file didn't exist before, it shouldn't exist after 'git reset' either. Consequently, this 'grep' doesn't fail as expected, because it can't find the staged content, but it fails because it can't open the file. Tighten this check by using 'test_path_is_missing' instead, thereby avoiding an unexpected error from 'grep' as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 11:52:35 -07:00
Junio C Hamano	29d9e3e2c4	Merge branch 'nd/pack-deltify-regression-fix' In a recent update in 2.18 era, "git pack-objects" started producing a larger than necessary packfiles by missing opportunities to use large deltas. * nd/pack-deltify-regression-fix: pack-objects: fix performance issues on packing large deltas	2018-08-22 11:17:05 -07:00
SZEDER Gábor	b89b4a660c	t6018-rev-list-glob: fix 'empty stdin' test Prior to `d3c6751b18` (tests: make use of the test_must_be_empty function, 2018-07-27), in the test 'rev-list should succeed with empty output on empty stdin' in 't6018-rev-list-glob' the empty 'expect' file served dual purpose: besides specifying the expected output, as usual, it also served as empty input for 'git rev-list --stdin'. Then `d3c6751b18` came along, and, as part of the conversion to 'test_must_be_empty', removed this empty 'expect' file, not realizing its secondary purpose. Redirecting stdin from the now non-existing file failed the test, but since this test expects failure in the first place, this issue went unnoticed. Redirect 'git rev-list's stdin explicitly from /dev/null to provide empty input. (Strictly speaking we don't need this redirection, because the test script's stdin is already redirected from /dev/null anyway, but I think it's better to be explicit about it.) Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 11:02:07 -07:00
SZEDER Gábor	c8b35b95e1	t4051-diff-function-context: read the right file The test ' context does not include preceding empty lines' in the block of tests 'change with long common tail and no context' in 't4051-diff-function-context.sh' tries to read the file 'long_common_tail.diff.diff', but that file doesn't exist as its name contains one more '.diff' suffixes than necessary. Despite this error the test still succeeded without checking what it's supposed to, because this erroneous read is done on the line: test "$(first_context_line <long_common_tail.diff.diff)" != " " which means that: - the command substitution hides the error, so it won't fail the test, and - the result of the command substitution is the empty string, which is, of course, not equal to a single space character, so the condition is fulfilled, and the test succeeds. As a minimal fix, fix the name of the file to be read. In the future we might want to reorganize this test script (1) to use 'test_cmp' instead of 'test's and command substitutions to catch failing commands and to provide helpful error messages, and (2) to specify what the expected result actually _is_ instead of what it isn't. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 09:14:24 -07:00
SZEDER Gábor	30612cb670	t0020-crlf: check the right file In the test 'checkout with autocrlf=input' in 't0020-crlf.sh', one of the 'has_cr' checks looks at the non-existing file 'two' instead of 'dir/two'. The test still succeeds, without actually checking what it was supposed to, because this check is expected to fail anyway. As a minimal fix, fix the name of the file to be checked. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 09:08:08 -07:00
SZEDER Gábor	15da753709	t7501-commit: drop silly command substitution The test '--dry-run with conflicts fixed from a merge' in 't7501-commit.sh', added in `8dc874b2ee` (wt-status.c: set commitable bit if there is a meaningful merge., 2016-02-15), runs the following unnecessary and downright bogus command substitution: ! $(git merge --no-commit commit-1) && I.e. after 'git merge ...' is executed and expectedly fails, the test attempts to execute its output: Merging: 80f2ea2 commit 2 virtual commit-1 found 1 common ancestor: e60d113 Initial commit Auto-merging test-file CONFLICT (content): Merge conflict in test-file Automatic merge failed; fix conflicts and then commit the result. as a command, which most likely fails, because there is no such command as "Merging:". Then '!' negates the failed exit status, the test continues, and eventually succeeds. Remove this command substitution and use 'test_must_fail' to ensure that 'git merge' fails. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-22 08:39:31 -07:00
Derrick Stolee	1820703045	commit: use timestamp_t for author_date_slab The author_date_slab is used to store the author date of a commit when walking with the --author-date flag in rev-list or log. This was added as an 'unsigned long' in `81c6b38b` "log: --author-date-order" Since 'unsigned long' is ambiguous in its bit-ness across platforms (64-bit in Linux, 32-bit in Windows, for example), most references to the author dates in commit.c were converted to timestamp_t in `dddbad72` "timestamp_t: a new data type for timestamps" However, the slab definition was missed, leading to a mismatch in the data types in Windows. This would not reveal itself as a bug unless someone authors a commit after February 2106, but commits can store anything as their author date. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 14:08:18 -07:00
Jeff King	6a1e32d532	pack-objects: reuse on-disk deltas for thin "have" objects When we serve a fetch, we pass the "wants" and "haves" from the fetch negotiation to pack-objects. That tells us not only which objects we need to send, but we also use the boundary commits as "preferred bases": their trees and blobs are candidates for delta bases, both for reusing on-disk deltas and for finding new ones. However, this misses some opportunities. Modulo some special cases like shallow or partial clones, we know that every object reachable from the "haves" could be a preferred base. We don't use all of them for two reasons: 1. It's expensive to traverse the whole history and enumerate all of the objects the other side has. 2. The delta search is expensive, so we want to keep the number of candidate bases sane. The boundary commits are the most likely to work. When we have reachability bitmaps, though, reason 1 no longer applies. We can efficiently compute the set of reachable objects on the other side (and in fact already did so as part of the bitmap set-difference to get the list of interesting objects). And using this set conveniently covers the shallow and partial cases, since we have to disable the use of bitmaps for those anyway. The second reason argues against using these bases in the search for new deltas. But there's one case where we can use this information for free: when we have an existing on-disk delta that we're considering reusing, we can do so if we know the other side has the base object. This in fact saves time during the delta search, because it's one less delta we have to compute. And that's exactly what this patch does: when we're considering whether to reuse an on-disk delta, if bitmaps tell us the other side has the object (and we're making a thin-pack), then we reuse it. Here are the results on p5311 using linux.git, which simulates a client fetching after `N` days since their last fetch: Test origin HEAD -------------------------------------------------------------------------- 5311.3: server (1 days) 0.27(0.27+0.04) 0.12(0.09+0.03) -55.6% 5311.4: size (1 days) 0.9M 237.0K -73.7% 5311.5: client (1 days) 0.04(0.05+0.00) 0.10(0.10+0.00) +150.0% 5311.7: server (2 days) 0.34(0.42+0.04) 0.13(0.10+0.03) -61.8% 5311.8: size (2 days) 1.5M 347.7K -76.5% 5311.9: client (2 days) 0.07(0.08+0.00) 0.16(0.15+0.01) +128.6% 5311.11: server (4 days) 0.56(0.77+0.08) 0.13(0.10+0.02) -76.8% 5311.12: size (4 days) 2.8M 566.6K -79.8% 5311.13: client (4 days) 0.13(0.15+0.00) 0.34(0.31+0.02) +161.5% 5311.15: server (8 days) 0.97(1.39+0.11) 0.30(0.25+0.05) -69.1% 5311.16: size (8 days) 4.3M 1.0M -76.0% 5311.17: client (8 days) 0.20(0.22+0.01) 0.53(0.52+0.01) +165.0% 5311.19: server (16 days) 1.52(2.51+0.12) 0.30(0.26+0.03) -80.3% 5311.20: size (16 days) 8.0M 2.0M -74.5% 5311.21: client (16 days) 0.40(0.47+0.03) 1.01(0.98+0.04) +152.5% 5311.23: server (32 days) 2.40(4.44+0.20) 0.31(0.26+0.04) -87.1% 5311.24: size (32 days) 14.1M 4.1M -70.9% 5311.25: client (32 days) 0.70(0.90+0.03) 1.81(1.75+0.06) +158.6% 5311.27: server (64 days) 11.76(26.57+0.29) 0.55(0.50+0.08) -95.3% 5311.28: size (64 days) 89.4M 47.4M -47.0% 5311.29: client (64 days) 5.71(9.31+0.27) 15.20(15.20+0.32) +166.2% 5311.31: server (128 days) 16.15(36.87+0.40) 0.91(0.82+0.14) -94.4% 5311.32: size (128 days) 134.8M 100.4M -25.5% 5311.33: client (128 days) 9.42(16.86+0.49) 25.34(25.80+0.46) +169.0% In all cases we save CPU time on the server (sometimes significant) and the resulting pack is smaller. We do spend more CPU time on the client side, because it has to reconstruct more deltas. But that's the right tradeoff to make, since clients tend to outnumber servers. It just means the thin pack mechanism is doing its job. From the user's perspective, the end-to-end time of the operation will generally be faster. E.g., in the 128-day case, we saved 15s on the server at a cost of 16s on the client. Since the resulting pack is 34MB smaller, this is a net win if the network speed is less than 270Mbit/s. And that's actually the worst case. The 64-day case saves just over 11s at a cost of just under 11s. So it's a slight win at any network speed, and the 40MB saved is pure bonus. That trend continues for the smaller fetches. The implementation itself is mostly straightforward, with the new logic going into check_object(). But there are two tricky bits. The first is that check_object() needs access to the relevant information (the thin flag and bitmap result). We can do this by pushing these into program-lifetime globals. The second is that the rest of the code assumes that any reused delta will point to another "struct object_entry" as its base. But of course the case we are interested in here is the one where don't have such an entry! I looked at a number of options that didn't quite work: - we could use a flag to signal a reused delta, but it's not a single bit. We have to actually store the oid of the base, which is normally done by pointing to the existing object_entry. And we'd have to modify all the code which looks at deltas. - we could add the reused bases to the end of the existing object_entry array. While this does create some extra work as later stages consider the extra entries, it's actually not too bad (we're not sending them, so they don't cost much in the delta search, and at most we'd have 2*N of them). But there's a more subtle problem. Adding to the existing array means we might need to grow it with realloc, which could move the earlier entries around. While many of the references to other entries are done by integer index, some (including ones on the stack) use pointers, which would become invalidated. This isn't insurmountable, but it would require quite a bit of refactoring (and it's hard to know that you've got it all, since it may work _most_ of the time and then fail subtly based on memory allocation patterns). - we could allocate a new one-off entry for the base. In fact, this is what an earlier version of this patch did. However, since the refactoring brought in by `ad635e82d6` (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23), the delta_idx code requires that both entries be in the main packing list. So taking all of those options into account, what I ended up with is a separate list of "external bases" that are not part of the main packing list. Each delta entry that points to an external base has a single-bit flag to do so; we have a little breathing room in the bitfield section of object_entry. This lets us limit the change primarily to the oe_delta() and oe_set_delta_ext() functions. And as a bonus, most of the rest of the code does not consider these dummy entries at all, saving both runtime CPU and code complexity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 12:45:49 -07:00
Jeff King	30cdc33fba	pack-bitmap: save "have" bitmap from walk When we do a bitmap walk, we save the result, which represents (WANTs & ~HAVEs); i.e., every object we care about visiting in our walk. However, we throw away the haves bitmap, which can sometimes be useful, too. Save it and provide an access function so code which has performed a walk can query it. A few notes on the accessor interface: - the bitmap code calls these "haves" because it grew out of the want/have negotiation for fetches. But really, these are simply the objects that would be flagged UNINTERESTING in a regular traversal. Let's use that more universal nomenclature for the external module interface. We may want to change the internal naming inside the bitmap code, but that's outside the scope of this patch. - it still uses a bare "sha1" rather than "oid". That's true of all of the bitmap code. And in this particular instance, our caller in pack-objects is dealing with the bare sha1 that comes from a packed REF_DELTA (we're pointing directly to the mmap'd pack on disk). That's something we'll have to deal with as we transition to a new hash, but we can wait and see how the caller ends up being fixed and adjust this interface accordingly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 12:33:39 -07:00
Jeff King	69d846f053	test-tool.h: include git-compat-util.h The test-tool programs include "test-tool.h" as their first include, which breaks our CodingGuideline of "the first include must be git-compat-util.h or an equivalent". Rather than change them all, let's instead make test-tool.h one of those equivalents, just like we do for builtin.h (which many of the actual git builtins include first). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 12:11:40 -07:00
SZEDER Gábor	1c5e94f459	tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>' Using 'test_must_be_empty' is shorter and more idiomatic than >empty && test_cmp empty out as it saves the creation of an empty file. Furthermore, sometimes the expected empty file doesn't have such a descriptive name like 'empty', and its creation is far away from the place where it's finally used for comparison (e.g. in 't7600-merge.sh', where two expected empty files are created in the 'setup' test, but are used only about 500 lines later). These cases were found by instrumenting 'test_cmp' to error out the test script when it's used to compare empty files, and then converted manually. Note that even after this patch there still remain a lot of cases where we use 'test_cmp' to check empty files: - Sometimes the expected output is not hard-coded in the test, but 'test_cmp' is used to ensure that two similar git commands produce the same output, and that output happens to be empty, e.g. the test 'submodule update --merge - ignores --merge for new submodules' in 't7406-submodule-update.sh'. - Repetitive common tasks, including preparing the expected results and running 'test_cmp', are often extracted into a helper function, and some of this helper's callsites expect no output. - For the same reason as above, the whole 'test_expect_success' block is within a helper function, e.g. in 't3070-wildmatch.sh'. - Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update (-p)' in 't9400-git-cvsserver-server.sh'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:48:36 -07:00
SZEDER Gábor	ec21ac8c18	tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null <out>' Using 'test_must_be_empty' is more idiomatic than 'test_cmp /dev/null out', and its message on error is perhaps a bit more to the point. This patch was basically created by running: sed -i -e 's%test_cmp /dev/null%test_must_be_empty%' t[0-9]*.sh with the exception of the change in 'should not fail in an empty repo' in 't7401-submodule-summary.sh', where it was 'test_cmp output /dev/null'. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:48:34 -07:00
SZEDER Gábor	f0dc593a95	tests: use 'test_must_be_empty' instead of 'test ! -s' Using 'test_must_be_empty' is preferable to 'test ! -s', because it gives a helpful error message if the given file is unexpectedly no empty, while the latter remains completely silent. Furthermore, it also catches cases when the given file unexpectedly does not exist at all. This patch was created by: sed -i -e 's/test ! -s/test_must_be_empty/' t[0-9]*.sh Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:48:31 -07:00
SZEDER Gábor	ec10b018e7	tests: use 'test_must_be_empty' instead of '! test -s' Using 'test_must_be_empty' is preferable to '! test -s', because it gives a helpful error message if the given file is unexpectedly not empty, while the latter remains completely silent. Furthermore, it also catches cases when the given file unexpectedly does not exist at all. This patch was basically created by: sed -i -e 's/! test -s/test_must_be_empty/' t[0-9]*.sh with the following notable exceptions: - The '! test -s' check in '.gitmodules ignore=dirty suppresses submodules with untracked content' in 't7508-status.sh' is left as-is, because it's bogus and, therefore, it's subject of a dedicated patch. - The '! test -s' checks in 't9131-git-svn-empty-symlink.sh' and 't9135-git-svn-moved-branch-empty-file.sh' are immediately preceeded by a 'test -f' to ensure that the files exist in the first place. 'test_must_be_empty' ensures that as well, so those 'test -f' commands are removed as well. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:48:29 -07:00
René Scharfe	bbc072f5d8	parseopt: group literal string alternatives in argument help This formally clarifies that the "--option=" part is the same for all alternatives. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:35:54 -07:00
René Scharfe	446e63ccf5	remote: improve argument help for add --mirror Group the possible values using a pair of parentheses and don't mark them for translation, as they are literal strings that have to be used as-is in any locale. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:33:21 -07:00
René Scharfe	168f32eb10	checkout-index: improve argument help for --stage Spell out all alternatives and avoid using a numerical range operator, as it is not mentioned in CodingGuidelines and the resulting string is still concise. Wrap them in parentheses to document clearly that the "--stage=" part is common among them. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:32:44 -07:00
Nguyễn Thái Ngọc Duy	eb90ea79c5	generate-cmdlist.sh: collect config from all config.txt files This script uses Documentation/config.txt as input for "git help --config" and "git config" completion but it misses the fact that config.txt includes other txt files. Include all config.txt as input when scanning for config keys. This could produce false positives, but as long as we stick to the blah-config.txt naming convention, we should be ok. While at there, move diff. from config.txt to diff-config.txt where all other diff config keys are. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 11:28:11 -07:00
Jiang Xin	dba9f13c6a	l10n: git.pot: v2.19.0 round 1 (382 new, 30 removed) Generate po/git.pot from v2.19.0-rc0 for git v2.19.0 l10n round 1. Signed-off-by: Jiang Xin <worldhello.net@gmail.com>	2018-08-21 08:28:47 +08:00
Jiang Xin	6b7719ee3f	Merge branch 'maint' of git://github.com/git-l10n/git-po * 'maint' of git://github.com/git-l10n/git-po: l10n: de.po: translate 108 new messages l10n: zh_CN: review for git 2.18.0 l10n: sv.po: Update Swedish translation(3608t0f0u)	2018-08-21 08:22:04 +08:00
Derrick Stolee	6a22d52126	pack-objects: consider packs in multi-pack-index When running 'git pack-objects --local', we want to avoid packing objects that are in an alternate. Currently, we check for these objects using the packed_git_mru list, which excludes the pack-files covered by a multi-pack-index. Add a new iteration over the multi-pack-indexes to find these copies and mark them as unwanted. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-20 15:31:40 -07:00
Derrick Stolee	e9ab2ed7de	midx: test a few commands that use get_all_packs The new get_all_packs() method exposed the packfiles coverede by a multi-pack-index. Before, the 'git cat-file --batch' and 'git count-objects' commands would skip objects in an environment with a multi-pack-index. Further, a reachability bitmap would be ignored if its pack-file was covered by a multi-pack-index. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-20 15:31:40 -07:00
Derrick Stolee	454ea2e4d7	treewide: use get_all_packs There are many places in the codebase that want to iterate over all packfiles known to Git. The purposes are wide-ranging, and those that can take advantage of the multi-pack-index already do. So, use get_all_packs() instead of get_packed_git() to be sure we are iterating over all packfiles. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-20 15:31:40 -07:00
Derrick Stolee	0bff5269d3	packfile: add all_packs list If a repo contains a multi-pack-index, then the packed_git list does not contain the packfiles that are covered by the multi-pack-index. This is important for doing object lookups, abbreviations, and approximating object count. However, there are many operations that really want to iterate over all packfiles. Create a new 'all_packs' linked list that contains this list, starting with the packfiles in the multi-pack-index and then continuing along the packed_git linked list. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-20 15:31:40 -07:00
Derrick Stolee	29e2016b8f	midx: fix bug that skips midx with alternates The logic for constructing the linked list of multi-pack-indexes in the object store is incorrect. If the local object store has a multi-pack-index, but an alternate does not, then the list is dropped. Add tests that would have revealed this bug. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-20 15:31:39 -07:00
Derrick Stolee	fe86c3beb5	midx: stop reporting garbage When prepare_packed_git is called with the report_garbage method initialized, we report unexpected files in the objects directory as garbage. Stop reporting the multi-pack-index and the pack-files it covers as garbage. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-20 15:31:39 -07:00

1 2 3 4 5 ...

52977 Commits