git-commit-vandalism

Author	SHA1	Message	Date
Ævar Arnfjörð Bjarmason	15c9649730	grep/log: remove hidden --debug and --grep-debug options Remove the hidden "grep --debug" and "log --grep-debug" options added in `17bf35a3c7` (grep: teach --debug option to dump the parse tree, 2012-09-13). At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them[1]. Reasons to want this gone: 1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in `e01b4dab01` (grep: change non-ASCII -i test to stop using --debug, 2017-05-20). 2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine. 3. An exception to that is `c581e4a749` (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect. The combination of that and my earlier `b65abcafc7` (grep: use PCRE v2 for optimized fixed-string search, 2019-07-01) means Git prints this out in its most common in-the-wild configuration: $ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug $ -e foo --and -e bar $ --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz ) I.e. for each pattern we're considering for the and/or/--all-match etc. debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not. I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e. it's not. The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath. That the original author of this debugging facility seemingly hasn't noticed the bad output since then[2] is probably some indicator. 1. https://lore.kernel.org/git/cover.1347615361.git.git@drmicha.warpmail.net/ 2. https://lore.kernel.org/git/xmqqk1b8x0ac.fsf@gitster-ct.c.googlers.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-26 11:36:20 -08:00
Junio C Hamano	e6362826a0	The fourth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 14:19:20 -08:00
Junio C Hamano	b7bb322cba	Merge branch 'ab/mailmap-fixup' Follow-up fixes and improvements to ab/mailmap topic. * ab/mailmap-fixup: t4203: make blame output massaging more robust mailmap doc: use correct environment variable 'GIT_WORK_TREE' t4203: stop losing return codes of git commands test-lib-functions.sh: fix usage for test_commit()	2021-01-25 14:19:20 -08:00
Junio C Hamano	bcaaf972e6	Merge branch 'tb/pack-revindex-api' Abstract accesses to in-core revindex that allows enumerating objects stored in a packfile in the order they appear in the pack, in preparation for introducing an on-disk precomputed revindex. * tb/pack-revindex-api: (21 commits) for_each_object_in_pack(): clarify pack vs index ordering pack-revindex.c: avoid direct revindex access in 'offset_to_pack_pos()' pack-revindex: hide the definition of 'revindex_entry' pack-revindex: remove unused 'find_revindex_position()' pack-revindex: remove unused 'find_pack_revindex()' builtin/gc.c: guess the size of the revindex for_each_object_in_pack(): convert to new revindex API unpack_entry(): convert to new revindex API packed_object_info(): convert to new revindex API retry_bad_packed_offset(): convert to new revindex API get_delta_base_oid(): convert to new revindex API rebuild_existing_bitmaps(): convert to new revindex API try_partial_reuse(): convert to new revindex API get_size_by_pos(): convert to new revindex API show_objects_for_type(): convert to new revindex API bitmap_position_packfile(): convert to new revindex API check_object(): convert to new revindex API write_reused_pack_verbatim(): convert to new revindex API write_reused_pack_one(): convert to new revindex API write_reuse_object(): convert to new revindex API ...	2021-01-25 14:19:20 -08:00
Junio C Hamano	381dac2349	Merge branch 'ab/coc-update-to-2.0' Update the Code-of-conduct to version 2.0 from the upstream (we've been using version 1.4). * ab/coc-update-to-2.0: CoC: update to version 2.0 + local changes CoC: explicitly take any whitespace breakage CoC: Update word-wrapping to match upstream	2021-01-25 14:19:19 -08:00
Junio C Hamano	294e949fa2	Merge branch 'ps/config-env-pairs' Introduce two new ways to feed configuration variable-value pairs via environment variables, and tweak the way GIT_CONFIG_PARAMETERS encodes variable/value pairs to make it more robust. * ps/config-env-pairs: config: allow specifying config entries via envvar pairs environment: make `getenv_safe()` a public function config: store "git -c" variables using more robust format config: parse more robust format in GIT_CONFIG_PARAMETERS config: extract function to parse config pairs quote: make sq_dequote_step() a public function config: add new way to pass config via `--config-env` git: add `--super-prefix` to usage string	2021-01-25 14:19:19 -08:00
Junio C Hamano	7eefa1349b	Merge branch 'cc/write-promisor-file' A bit of code refactoring. * cc/write-promisor-file: pack-write: die on error in write_promisor_file() fetch-pack: refactor writing promisor file fetch-pack: rename helper to create_promisor_file()	2021-01-25 14:19:19 -08:00
Junio C Hamano	8b48981987	Merge branch 'jx/bundle' "git bundle" learns "--stdin" option to read its refs from the standard input. Also, it now does not lose refs whey they point at the same object. * jx/bundle: bundle: arguments can be read from stdin bundle: lost objects when removing duplicate pendings test: add helper functions for git-bundle	2021-01-25 14:19:19 -08:00
Junio C Hamano	42342b3ee6	Merge branch 'ab/mailmap' Clean-up docs, codepaths and tests around mailmap. * ab/mailmap: (22 commits) shortlog: remove unused(?) "repo-abbrev" feature mailmap doc + tests: document and test for case-insensitivity mailmap tests: add tests for empty "<>" syntax mailmap tests: add tests for whitespace syntax mailmap tests: add a test for comment syntax mailmap doc + tests: add better examples & test them tests: refactor a few tests to use "test_commit --append" test-lib functions: add an --append option to test_commit test-lib functions: add --author support to test_commit test-lib functions: document arguments to test_commit test-lib functions: expand "test_commit" comment template mailmap: test for silent exiting on missing file/blob mailmap tests: get rid of overly complex blame fuzzing mailmap tests: add a test for "not a blob" error mailmap tests: remove redundant entry in test mailmap tests: improve --stdin tests mailmap tests: modernize syntax & test idioms mailmap tests: use our preferred whitespace syntax mailmap doc: start by mentioning the comment syntax check-mailmap doc: note config options ...	2021-01-25 14:19:19 -08:00
Junio C Hamano	60ecad090d	Merge branch 'ps/fetch-atomic' "git fetch" learns to treat ref updates atomically in all-or-none fashion, just like "git push" does, with the new "--atomic" option. * ps/fetch-atomic: fetch: implement support for atomic reference updates fetch: allow passing a transaction to `s_update_ref()` fetch: refactor `s_update_ref` to use common exit path fetch: use strbuf to format FETCH_HEAD updates fetch: extract writing to FETCH_HEAD	2021-01-25 14:19:19 -08:00
Junio C Hamano	b69bed22c5	Merge branch 'jk/log-cherry-pick-duplicate-patches' When more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B" did not exclude them all when a commit with the same patch ID appears on the other side. Now it does. * jk/log-cherry-pick-duplicate-patches: patch-ids: handle duplicate hashmap entries	2021-01-25 14:19:19 -08:00
Junio C Hamano	27d7c8599b	Merge branch 'js/default-branch-name-tests-final-stretch' Prepare tests not to be affected by the name of the default branch "git init" creates. * js/default-branch-name-tests-final-stretch: (28 commits) tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed t99: adjust the references to the default branch name "main" tests(git-p4): transition to the default branch name `main` t9[5-7]: adjust the references to the default branch name "main" t9[0-4]: adjust the references to the default branch name "main" t8: adjust the references to the default branch name "main" t7[5-9]: adjust the references to the default branch name "main" t7[0-4]: adjust the references to the default branch name "main" t6[4-9]: adjust the references to the default branch name "main" t64: preemptively adjust alignment to prepare for `master` -> `main` t6[0-3]: adjust the references to the default branch name "main" t5[6-9]: adjust the references to the default branch name "main" t55[4-9]: adjust the references to the default branch name "main" t55[23]: adjust the references to the default branch name "main" t551: adjust the references to the default branch name "main" t550: adjust the references to the default branch name "main" t5503: prepare aligned comment for replacing `master` with `main` t5[0-4]: adjust the references to the default branch name "main" t5323: prepare centered comment for `master` -> `main` t4: adjust the references to the default branch name "main" ...	2021-01-25 14:19:18 -08:00
Junio C Hamano	440acfbe0c	Merge branch 'dl/reflog-with-single-entry' After expiring a reflog and making a single commit, the reflog for the branch would record a single entry that knows both @{0} and @{1}, but we failed to answer "what commit were we on?", i.e. @{1} * dl/reflog-with-single-entry: refs: allow @{n} to work with n-sized reflog refs: factor out set_read_ref_cutoffs()	2021-01-25 14:19:18 -08:00
Junio C Hamano	0806279428	Merge branch 'sj/untracked-files-in-submodule-directory-is-not-dirty' "git diff" showed a submodule working tree with untracked cruft as "Submodule commit <objectname>-dirty", but a natural expectation is that the "-dirty" indicator would align with "git describe --dirty", which does not consider having untracked files in the working tree as source of dirtiness. The inconsistency has been fixed. * sj/untracked-files-in-submodule-directory-is-not-dirty: diff: do not show submodule with untracked files as "-dirty"	2021-01-25 14:19:18 -08:00
Junio C Hamano	dfcd905069	Merge branch 'jc/deprecate-pack-redundant' Warn loudly when the "pack-redundant" command, which has been left stale with almost unusable performance issues, gets used, as we no longer want to recommend its use (instead just "repack -d" instead). * jc/deprecate-pack-redundant: pack-redundant: gauge the usage before proposing its removal	2021-01-25 14:19:18 -08:00
Junio C Hamano	c7b1aaf6d6	Merge branch 'jk/forbid-lf-in-git-url' Newline characters in the host and path part of git:// URL are now forbidden. * jk/forbid-lf-in-git-url: fsck: reject .gitmodules git:// urls with newlines git_connect_git(): forbid newlines in host and path	2021-01-25 14:19:17 -08:00
Junio C Hamano	9e409d7e07	Merge branch 'ab/branch-sort' The implementation of "git branch --sort" wrt the detached HEAD display has always been hacky, which has been cleaned up. * ab/branch-sort: branch: show "HEAD detached" first under reverse sort branch: sort detached HEAD based on a flag ref-filter: move ref_sorting flags to a bitfield ref-filter: move "cmp_fn" assignment into "else if" arm ref-filter: add braces to if/else if/else chain branch tests: add to --sort tests branch: change "--local" to "--list" in comment	2021-01-25 14:19:17 -08:00
Junio C Hamano	a5ac31b5b1	Merge branch 'en/diffcore-rename' File-level rename detection updates. * en/diffcore-rename: diffcore-rename: remove unnecessary duplicate entry checks diffcore-rename: accelerate rename_dst setup diffcore-rename: simplify and accelerate register_rename_src() t4058: explore duplicate tree entry handling in a bit more detail t4058: add more tests and documentation for duplicate tree entry handling diffcore-rename: reduce jumpiness in progress counters diffcore-rename: simplify limit check diffcore-rename: avoid usage of global in too_many_rename_candidates() diffcore-rename: rename num_create to num_destinations	2021-01-25 14:19:17 -08:00
Junio C Hamano	58e2ce9112	Merge branch 'ma/more-opaque-lock-file' Code clean-up. * ma/more-opaque-lock-file: read-cache: try not to peek into `struct {lock_,temp}file` refs/files-backend: don't peek into `struct lock_file` midx: don't peek into `struct lock_file` commit-graph: don't peek into `struct lock_file` builtin/gc: don't peek into `struct lock_file`	2021-01-25 14:19:17 -08:00
Junio C Hamano	2856089e36	Merge branch 'en/merge-ort-3' Rename detection is added to the "ORT" merge strategy. * en/merge-ort-3: merge-ort: add implementation of type-changed rename handling merge-ort: add implementation of normal rename handling merge-ort: add implementation of rename collisions merge-ort: add implementation of rename/delete conflicts merge-ort: add implementation of both sides renaming differently merge-ort: add implementation of both sides renaming identically merge-ort: add basic outline for process_renames() merge-ort: implement compare_pairs() and collect_renames() merge-ort: implement detect_regular_renames() merge-ort: add initial outline for basic rename detection merge-ort: add basic data structures for handling renames	2021-01-25 14:19:17 -08:00
Junio C Hamano	c7d6d419b0	Merge branch 'ab/mktag' "git mktag" validates its input using its own rules before writing a tag object---it has been updated to share the logic with "git fsck". * ab/mktag: (23 commits) mktag: add a --[no-]strict option mktag: mark strings for translation mktag: convert to parse-options mktag: allow omitting the header/body \n separator mktag: allow turning off fsck.extraHeaderEntry fsck: make fsck_config() re-usable mktag: use fsck instead of custom verify_tag() mktag: use puts(str) instead of printf("%s\n", str) mktag: remove redundant braces in one-line body "if" mktag: use default strbuf_read() hint mktag tests: test verify_object() with replaced objects mktag tests: improve verify_object() test coverage mktag tests: test "hash-object" compatibility mktag tests: stress test whitespace handling mktag tests: run "fsck" after creating "mytag" mktag tests: don't create "mytag" twice mktag tests: don't redirect stderr to a file needlessly mktag tests: remove needless SHA-1 hardcoding mktag tests: use "test_commit" helper mktag tests: don't needlessly use a subshell ...	2021-01-25 14:19:17 -08:00
Ævar Arnfjörð Bjarmason	95ca1f987e	grep/pcre2: better support invalid UTF-8 haystacks Improve the support for invalid UTF-8 haystacks given a non-ASCII needle when using the PCREv2 backend. This is a more complete fix for a bug I started to fix in `870eea8166` (grep: do not enter PCRE2_UTF mode on fixed matching, 2019-07-26), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we can make use of it. This fixes the sort of case described in `8a5999838e` (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26), i.e.: - The subject string is non-ASCII (e.g. "ævar") - We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C" - We are using --ignore-case, or we're a non-fixed pattern If those conditions were satisfied and we matched found non-valid UTF-8 data PCREv2 might bark on it, in practice this only happened under the JIT backend (turned on by default on most platforms). Ultimately this fixes a "regression" in `b65abcafc7` ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01), I'm putting that in scare-quotes because before then we wouldn't properly support these complex case-folding, locale etc. cases either, it just broke in different ways. There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed in PCREv2 10.36. It can be worked around by setting the PCRE2_NO_START_OPTIMIZE flag. Let's do that in those cases, and add tests for the bug. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-24 16:09:17 -08:00
Ævar Arnfjörð Bjarmason	a4fea08b6e	grep/pcre2 tests: don't rely on invalid UTF-8 data test As noted in [1] when I originally added this test in [2] the test was completely broken as it lacked a redirect[3]. I now think this whole thing is overly fragile. Let's only test if we have a segfault here. Before this the first test's "test_cmp" was pretty meaningless. We were only testing if PCREv2 was so broken that it would spew out something completely unrelated on stdout, which isn't very plausible. In the second test we're relying on PCREv2 forever holding to the current behavior of the PCRE_UTF8 flag, as opposed to learning some optimistic graceful fallback to PCRE2_MATCH_INVALID_UTF in the future. If that happens having this test broken under bisecting would suck. A follow-up commit will actually test this case in a meaningful way under the PCRE2_MATCH_INVALID_UTF flag. Let's run this one unconditionally, and just make sure we don't segfault. 1. `e714b898c6` (t7812: expect failure for grep -i with invalid UTF-8 data, 2019-11-29) 2. `8a5999838e` (grep: stess test PCRE v2 on invalid UTF-8 data, 2019-07-26) 3. `c74b3cbb83` (t7812: add missing redirects, 2019-11-26) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-24 16:09:15 -08:00
Elijah Newren	557ac0350d	merge-ort: begin performance work; instrument with trace2_region_* calls Add some timing instrumentation for both merge-ort and diffcore-rename; I used these to measure and optimize performance in both, and several future patch series will build on these to reduce the timings of some select testcases. === Setup === The primary testcase I used involved rebasing a random topic in the linux kernel (consisting of 35 patches) against an older version. I added two variants, one where I rename a toplevel directory, and another where I only rebase one patch instead of the whole topic. The setup is as follows: $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git $ git branch hwmon-updates fd8bdb23b91876ac1e624337bb88dc1dcc21d67e $ git branch hwmon-just-one fd8bdb23b91876ac1e624337bb88dc1dcc21d67e~34 $ git branch base 4703d9119972bf586d2cca76ec6438f819ffa30e $ git switch -c 5.4-renames v5.4 $ git mv drivers pilots # Introduce over 26,000 renames $ git commit -m "Rename drivers/ to pilots/" $ git config merge.renameLimit 30000 $ git config merge.directoryRenames true === Testcases === Now with REBASE standing for either "git rebase [--merge]" (using merge-recursive) or "test-tool fast-rebase" (using merge-ort), the testcases are: Testcase #1: no-renames $ git checkout v5.4^0 $ REBASE --onto HEAD base hwmon-updates Note: technically the name is misleading; there are some renames, but very few. Rename detection only takes about half the overall time. Testcase #2: mega-renames $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-updates Testcase #3: just-one-mega $ git checkout 5.4-renames^0 $ REBASE --onto HEAD base hwmon-just-one === Timing results === Overall timings, using hyperfine (1 warmup run, 3 runs for mega-renames, 10 runs for the other two cases): merge-recursive merge-ort no-renames: 18.912 s ± 0.174 s 14.263 s ± 0.053 s mega-renames: 5964.031 s ± 10.459 s 5504.231 s ± 5.150 s just-one-mega: 149.583 s ± 0.751 s 158.534 s ± 0.498 s A single re-run of each with some breakdowns: --- no-renames --- merge-recursive merge-ort overall runtime: 19.302 s 14.257 s inexact rename detection: 7.603 s 7.906 s everything else: 11.699 s 6.351 s --- mega-renames --- merge-recursive merge-ort overall runtime: 5950.195 s 5499.672 s inexact rename detection: 5746.309 s 5487.120 s everything else: 203.886 s 17.552 s --- just-one-mega --- merge-recursive merge-ort overall runtime: 151.001 s 158.582 s inexact rename detection: 143.448 s 157.835 s everything else: 7.553 s 0.747 s === Timing observations === 0) Maximum speedup The "everything else" row represents the maximum speedup we could achieve if we were to somehow infinitely parallelize inexact rename detection, but leave everything else alone. The fact that this is so much smaller than the real runtime (even in the case with virtually no renames) makes it clear just how overwhelmingly large the time spent on rename detection can be. 1) no-renames 1a) merge-ort is faster than merge-recursive, which is nice. However, this still should not be considered good enough. Although the "merge" backend to rebase (merge-recursive) is sometimes faster than the "apply" backend, this is one of those cases where it is not. In fact, even merge-ort is slower. The "apply" backend can complete this testcase in 6.940 s ± 0.485 s which is about 2x faster than merge-ort and 3x faster than merge-recursive. One goal of the merge-ort performance work will be to make it faster than git-am on this (and similar) testcases. 2) mega-renames 2a) Obviously rename detection is a huge cost; it's where most the time is spent. We need to cut that down. If we could somehow infinitely parallelize it and drive its time to 0, the merge-recursive time would drop to about 204s, and the merge-ort time would drop to about 17s. I think this particular stat shows I've subtly baked a couple performance improvements into merge-ort and into fast-rebase already. 3) just-one-mega 3a) not much to say here, it just gives some flavor for how rebasing only one patch compares to rebasing 35. === Goals === This patch is obviously just the beginning. Here are some of my goals that this measurement will help us achieve: * Drive the cost of rename detection down considerably for merges * After the above has been achieved, see if there are other slowness factors (which would have previously been overshadowed by rename detection costs) which we can then focus on and also optimize. * Ensure our rebase testcase that requires little rename detection is noticeably faster with merge-ort than with apply-based rebase. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Taylor Blau <ttaylorr@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Elijah Newren	5ced7c3da0	merge-ort: ignore the directory rename split conflict for now get_provisional_directory_renames() has code to detect directories being evenly split between different locations. However, as noted previously, if there are no new files added to that directory that was split evenly, our inability to determine where the directory was renamed to doesn't matter since there are no new files to try to move into the new location. Unfortunately, that code is unaware of whether there are new files under the directory in question and we just ignore that, causing us to fail t6423 test 2b but pass test 2a; turn off the error for now, swapping which tests pass and fail. The motivating reason for switching this off as a temporary measure is that as we add optimizations, we'll start looking at only subsets of renames, and subsets of renames can start switching the result we get when this error is (wrongly) on. Once we get enough optimizations, however, we can prevent that code from even running when there are no new files added to the relevant directory, at which point we can revert this commit and then both testcases 2a and 2b will pass simultaneously. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Elijah Newren	cf8937acde	merge-ort: fix massive leak When a series of merges was performed (such as for a rebase or series of cherry-picks), only the data structures allocated by the final merge operation were being freed. The problem was that while picking out pieces of merge-ort to upstream, I previously misread a certain section of merge_start() and assumed it was associated with a later optimization. Include that section now, which ensures that if there was a previous merge operation, that we clear out result->priv and then re-use it for opt->priv, and otherwise we allocate opt->priv. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 23:30:06 -08:00
Ævar Arnfjörð Bjarmason	7599730b7e	Remove support for v1 of the PCRE library Remove support for using version 1 of the PCRE library. Its use has been discouraged by upstream for a long time, and it's in a bugfix-only state. Anyone who was relying on v1 in particular got a nudge to move to v2 in `e6c531b808` (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1, 2018-03-11), which was first released as part of v2.18.0. With this the LIBPCRE2 test prerequisites is redundant to PCRE. But I'm keeping it for self-documentation purposes, and to avoid conflict with other in-flight PCRE patches. I'm also not changing all of our own "pcre2" names to "pcre", i.e. the inverse of `6d4b5747f0` (grep: change internal pcre variable & function names to be pcre1, 2017-05-25). I don't see the point, and it makes the history/blame harder to read. Maybe if there's ever a PCRE v3... Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 21:15:43 -08:00
Ævar Arnfjörð Bjarmason	0205bb13d0	config.mak.uname: remove redundant NO_LIBPCRE1_JIT flag Remove a flag added in my `fb95e2e38d` (grep: un-break building with PCRE >= 8.32 without --enable-jit, 2017-06-01). It's set just below USE_LIBPCRE=YesPlease, so it's been redundant since `e6c531b808` (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1, 2018-03-11). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 21:15:12 -08:00
Derrick Stolee	19a0acc83e	t1092: test interesting sparse-checkout scenarios These also document some behaviors that differ from a full checkout, and possibly in a way that is not intended. The test is designed to be run with "--run=1,X" where 'X' is an interesting test case. Each test uses 'init_repos' to reset the full and sparse copies of the initial-repo that is created by the first test case. This also makes it possible to have test cases leave the working directory or index in unusual states without disturbing later cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:20 -08:00
Derrick Stolee	3b14436364	test-lib: test_region looks for trace2 regions From ff15d509b89edd4830d85d53cea3079a6b0c1c08 Mon Sep 17 00:00:00 2001 From: Derrick Stolee <dstolee@microsoft.com> Date: Mon, 11 Jan 2021 08:53:09 -0500 Subject: [PATCH 8/9] test-lib: test_region looks for trace2 regions Most test cases can verify Git's behavior using input/output expectations or changes to the .git directory. However, sometimes we want to check that Git did or did not run a certain section of code. This is particularly important for performance-only features that we want to ensure have been enabled in certain cases. Add a new 'test_region' function that checks if a trace2 region was entered and left in a given trace2 event log. There is one existing test (t0500-progress-display.sh) that performs this check already, so use the helper function instead. Note that this changes the expectations slightly. The old test (incorrectly) used two patterns for the 'grep' invocation, but this performs an OR of the patterns, not an AND. This means that as long as one region_enter event was logged, the test would succeed, even if it was not due to the progress category. More uses will be added in a later change. t6423-merge-rename-directories.sh also greps for region_enter lines, but it verifies the number of such lines, which is not the same as an existence check. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:18 -08:00
Derrick Stolee	dd23022acb	sparse-checkout: load sparse-checkout patterns A future feature will want to load the sparse-checkout patterns into a pattern_list, but the current mechanism to do so is a bit complicated. This is made difficult due to needing to find the sparse-checkout file in different ways throughout the codebase. The logic implemented in the new get_sparse_checkout_patterns() was duplicated in populate_from_existing_patterns() in unpack-trees.c. Use the new method instead, keeping the logic around handling the struct unpack_trees_options. The callers to get_sparse_checkout_filename() in builtin/sparse-checkout.c manipulate the sparse-checkout file directly, so it is not appropriate to replace logic in that file with get_sparse_checkout_patterns(). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	6a9372f4ef	name-hash: use trace2 regions for init The lazy_init_name_hash() populates a hashset with all filenames and another with all directories represented in the index. This is run only if we need to use the hashsets to check for existence or case-folding renames. Place trace2 regions where there is already a performance trace. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	1fd9ae517c	repository: add repo reference to index_state It will be helpful to add behavior to index operations that might trigger an object lookup. Since each index belongs to a specific repository, add a 'repo' pointer to struct index_state that allows access to this repository. Add a BUG() statement if the repo already has an index, and the index already has a repo, but somehow the index points to a different repo. This will prevent future changes from needing to pass an additional 'struct repository repo' parameter and instead rely only on the 'struct index_state istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	cae70acf24	fsmonitor: de-duplicate BUG()s around dirty bits The index has an fsmonitor_dirty bitmap that records which index entries are "dirty" based on the response from the FSMonitor. If this bitmap ever grows larger than the index, then there was an error in how it was constructed, and it was probably a developer's bug. There are several BUG() statements that are very similar, so replace these uses with a simpler assert_index_minimum(). Since there is one caller that uses a custom 'pos' value instead of the bit_size member, we cannot simplify it too much. However, the error string is identical in each, so this simplifies things. Be sure to add one when checking if a position if valid, since the minimum is a bound on the expected size. The end result is that the code is simpler to read while also preserving these assertions for developers in the FSMonitor space. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	c80dd3967f	cache-tree: extract subtree_pos() This method will be helpful to use outside of cache-tree.c in a later feature. The implementation is subtle due to subtree_name_cmp() sorting by length and then lexicographically. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	8d87e338e1	cache-tree: simplify verify_cache() prototype The verify_cache() method takes an array of cache entries and a count, but these are always provided directly from a struct index_state. Use a pointer to the full structure instead. There is a subtle point when istate->cache_nr is zero that subtracting one will underflow. This triggers a failure in t0000-basic.sh, among others. Use "i + 1 < istate->cache_nr" to avoid these strange comparisons. Convert i to be unsigned as well, which also removes the potential signed overflow in the unlikely case that cache_nr is over 2.1 billion entries. The 'funny' variable has a maximum value of 11, so making it unsigned does not change anything of importance. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
Derrick Stolee	fb0882648e	cache-tree: clean up cache_tree_update() Make the method safer by allocating a cache_tree member for the given index_state if it is not already present. This is preferrable to a BUG() statement or returning with an error because future callers will want to populate an empty cache-tree using this method. Callers can also remove their conditional allocations of cache_tree. Also drop local variables that can be found directly from the 'istate' parameter. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 17:14:07 -08:00
ZheNing Hu	93a7d9835f	ls-files.c: add --deduplicate option During a merge conflict, the name of a file may appear multiple times in "git ls-files" output, once for each stage. If you use both `--delete` and `--modify` at the same time, the output may mention a deleted file twice. When none of the '-t', '-u', or '-s' options is in use, these duplicate entries do not add much value to the output. Introduce a new '--deduplicate' option to suppress them. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: extended doc and rewritten commit log] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 11:48:20 -08:00
ZheNing Hu	ed644d1666	ls_files.c: consolidate two for loops into one This will make it easier to show only one entry per filename in the next step. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: corrected the log message] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 11:48:20 -08:00
ZheNing Hu	f1c462ea41	ls_files.c: bugfix for --deleted and --modified This situation may occur in the original code: lstat() failed but we use `&st` to feed ie_modified() later. Therefore, we can directly execute show_ce without the judgment of ie_modified() when lstat() has failed. Signed-off-by: ZheNing Hu <adlternative@gmail.com> [jc: fixed misindented code] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-23 11:48:11 -08:00
Taylor Blau	b3970c702c	ls-refs.c: traverse prefixes of disjoint "ref-prefix" sets ls-refs performs a single revision walk over the whole ref namespace, and sends ones that match with one of the given ref prefixes down to the user. This can be expensive if there are many refs overall, but the portion of them covered by the given prefixes is small by comparison. To attempt to reduce the difference between the number of refs traversed, and the number of refs sent, only traverse references which are in the longest common prefix of the given prefixes. This is very reminiscent of the approach taken in `b31e2680c4` (ref-filter.c: find disjoint pattern prefixes, 2019-06-26) which does an analogous thing for multi-patterned 'git for-each-ref' invocations. The callback 'send_ref' is resilient to ignore extra patterns by discarding any arguments which do not begin with at least one of the specified prefixes. Similarly, the code introduced in `b31e2680c4` is resilient to stop early at metacharacters, but we only pass strict prefixes here. At worst we would return too many results, but the double checking done by send_ref will throw away anything that doesn't start with something in the prefix list. Finally, if no prefixes were provided, then implicitly add the empty string (which will match all references) since this matches the existing behavior (see the "no restrictions" comment in "ls-refs.c:ref_match()"). Original-patch-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Jacob Vosmaer	83befd3724	ls-refs.c: initialize 'prefixes' before using it Correctly initialize the "prefixes" strvec using strvec_init() instead of simply zeroing it via the earlier memset(). There's no way to trigger a crash, since the first 'ref-prefix' command will initialize the strvec via the 'ALLOC_GROW' in 'strvec_push_nodup()' (the alloc and nr variables are already zero'd, so the call to ALLOC_GROW is valid). If no "ref-prefix" command was given, then the call to 'ls-refs.c:ref_match()' will abort early after it reads the zero in 'prefixes->nr'. Likewise, strvec_clear() will only call free() on the array, which is NULL, so we're safe there, too. But, all of this is dangerous and requires more reasoning than it would if we simply called 'strvec_init()', so do that. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Taylor Blau	16b1985be5	refs: expose 'for_each_fullref_in_prefixes' This function was used in the ref-filter.c code to find the longest common prefix of among a set of refspecs, and then to iterate all of the references that descend from that prefix. A future patch will want to use that same code from ls-refs.c, so prepare by exposing and moving it to refs.c. Since there is nothing specific to the ref-filter code here (other than that it was previously the only caller of this function), this really belongs in the more generic refs.h header. The code moved in this patch is identical before and after, with the one exception of renaming some arguments to be consistent with other functions exposed in refs.h. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 18:57:27 -08:00
Jacob Vosmaer	be18153b97	builtin/pack-objects.c: avoid iterating all refs In git-pack-objects, we iterate over all the tags if the --include-tag option is passed on the command line. For some reason this uses for_each_ref which is expensive if the repo has many refs. We should use for_each_tag_ref instead. Because the add_ref_tag callback will now only visit tags we simplified it a bit. The motivation for this change is that we observed performance issues with a repository on gitlab.com that has 500,000 refs but only 2,000 tags. The fetch traffic on that repo is dominated by CI, and when we changed CI to fetch with 'git fetch --no-tags' we saw a dramatic change in the CPU profile of git-pack-objects. This lead us to this particular ref walk. More details in: https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598 Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 17:27:42 -08:00
Jeff King	ee4e22554f	run-command: document use_shell option It's unclear how run-command's use_shell option should impact the arguments fed to a command. Plausibly it could mean that we glue all of the arguments together into a string to pass to the shell, in which case that opens the question of whether the caller needs to quote them. But in fact we don't implement it that way (and even if we did, we'd probably auto-quote the arguments as part of the glue step). And we must not receive quoted arguments, because we might actually optimize out the shell entirely (i.e., the caller does not even know if a shell will be involved in the end or not). Since this ambiguity may have been the cause of a recent bug, let's document the option a bit. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 14:21:32 -08:00
Phil Hord	8198907795	use delete_refs when deleting tags or branches 'git tag -d' accepts one or more tag refs to delete, but each deletion is done by calling `delete_ref` on each argv. This is very slow when removing from packed refs. Use delete_refs instead so all the removals can be done inside a single transaction with a single update. Do the same for 'git branch -d'. Since delete_refs performs all the packed-refs delete operations inside a single transaction, if any of the deletes fail then all them will be skipped. In practice, none of them should fail since we verify the hash of each one before calling delete_refs, but some network error or odd permissions problem could have different results after this change. Also, since the file-backed deletions are not performed in the same transaction, those could succeed even when the packed-refs transaction fails. After deleting branches, remove the branch config only if the branch ref was removed and was not subsequently added back in. A manual test deleting 24,000 tags took about 30 minutes using delete_ref. It takes about 5 seconds using delete_refs. Acked-by: Elijah Newren <newren@gmail.com> Signed-off-by: Phil Hord <phil.hord@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 16:05:05 -08:00
Jeff King	36a317929b	refs: switch peel_ref() to peel_iterated_oid() The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char refname, const struct object_id oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:51:31 -08:00
Ævar Arnfjörð Bjarmason	73c01d25fe	tests: remove uses of GIT_TEST_GETTEXT_POISON=false As noted in previous commits we are removing the use of GIT_TEST_GETTEXT_POISON=false. These tests all relied on the facility being off, it always is off after an earlier change, but we hadn't removed the redundant assignments to "false" in the tests. I'm preserving the deletion of "error" lines in `38b9197a76` (t5411: add basic test cases for proc-receive hook, 2020-08-27), it turns out that's useful even without GIT_TEST_GETTEXT_POISON=true in play. Update a comment added in that commit to note that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:50:03 -08:00
Ævar Arnfjörð Bjarmason	d162b25f95	tests: remove support for GIT_TEST_GETTEXT_POISON This removes the ability to inject "poison" gettext() messages via the GIT_TEST_GETTEXT_POISON special test setup. I initially added this as a compile-time option in `bb946bba76` (i18n: add GETTEXT_POISON to simulate unfriendly translator, 2011-02-22), and most recently modified to be toggleable at runtime in `6cdccfce1e` (i18n: make GETTEXT_POISON a runtime option, 2018-11-08).. The reason for its removal is that the trade-off of maintaining it v.s. what it's getting us has long since flipped. When gettext was integrated in `5e9637c629` (i18n: add infrastructure for translating Git with gettext, 2011-11-18) there was understandable concern on the Git ML that in marking messages for translation en-masse we'd inadvertently mark plumbing messages. The GETTEXT_POISON facility was a way to smoke those out via our test suite. Nowadays however we're done (or almost entirely done) with any marking of messages for translation. New messages are usually marked by their authors, who'll know whether it makes sense to translate them or not. If not any errors in marking the messages are much more likely to be spotted in review than in the the initial deluge of i18n patches in the 2011-2012 era. So let's just remove this. This leaves the test suite in a state where we still have a lot of test_i18n, C_LOCALE_OUTPUT etc. uses. Subsequent commits will remove those too. The change to t/lib-rebase.sh is a selective revert of the relevant part of `f2d17068fd` (i18n: rebase-interactive: mark comments of squash for translation, 2016-06-17), and the comment in t/t3406-rebase-message.sh is from `c7108bf9ed` (i18n: rebase: mark messages for translation, 2012-07-25). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:50:01 -08:00
Ævar Arnfjörð Bjarmason	6c280b4142	ci: remove GETTEXT_POISON jobs A subsequent commit will remove GETTEXT_POISON entirely, let's start by removing the CI jobs that enable the option. We cannot just remove the job because the CI is implicitly depending on the "poison" job being a sort of "default" job in the sense that it's the job that was otherwise run with the default compiler, no other GIT_TEST_* options etc. So let's keep it under the name "linux-gcc-default". This means we can remove the initial "make test" from the "linux-gcc" job (it does another one after setting a bunch of GIT_TEST_* variables). I'm not doing that because it would conflict with the in-flight `334afbc76f` (tests: mark tests relying on the current default for `init.defaultBranch`, 2020-11-18) (currently on the "seen" branch, so the SHA-1 will almost definitely change). It's going to use that "make test" again for different reasons, so let's preserve it for now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:50:00 -08:00

1 2 3 4 5 ...

62019 Commits