git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	daab8a564f	The fifth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-16 17:42:53 -07:00
Junio C Hamano	8e62a85352	Merge branch 'ds/gender-neutral-doc' Update the documentation not to assume users are of certain gender and adds to guidelines to do so. * ds/gender-neutral-doc: *: fix typos comments: avoid using the gender of our users doc: avoid using the gender of other people	2021-07-16 17:42:53 -07:00
Junio C Hamano	8721e2eaed	Merge branch 'jt/partial-clone-submodule-1' Prepare the internals for lazily fetching objects in submodules from their promisor remotes. * jt/partial-clone-submodule-1: promisor-remote: teach lazy-fetch in any repo run-command: refactor subprocess env preparation submodule: refrain from filtering GIT_CONFIG_COUNT promisor-remote: support per-repository config repository: move global r_f_p_c to repo struct	2021-07-16 17:42:53 -07:00
Junio C Hamano	bd4232fac3	Merge branch 'ab/struct-init' Code cleanup around struct_type_init() functions. * ab/struct-init: string-list.h users: change to use _{nodup,dup}() string-list.[ch]: add a string_list_init_{nodup,dup}() dir.[ch]: replace dir_init() with DIR_INIT .c _init(): define in terms of corresponding _INIT macro .h: move some _INIT to designated initializers	2021-07-16 17:42:53 -07:00
Junio C Hamano	832a239b72	Merge branch 'dd/test-stdout-count-lines' Tiny test clean-up. * dd/test-stdout-count-lines: t6402: preserve git exit status code t6400: preserve git ls-files exit status code test-lib-functions: introduce test_stdout_line_count	2021-07-16 17:42:52 -07:00
Junio C Hamano	c4670b8a8d	Merge branch 'hn/refs-test-cleanup' Test clean-up. * hn/refs-test-cleanup: t7509: avoid direct file access for writing CHERRY_PICK_HEAD t1415: avoid direct filesystem access for writing refs	2021-07-16 17:42:52 -07:00
Junio C Hamano	a91e0bb833	Merge branch 'rs/khash-alloc-cleanup' Code clean-up. * rs/khash-alloc-cleanup: khash: clarify that allocations never fail	2021-07-16 17:42:52 -07:00
Junio C Hamano	8eb90d385c	Merge branch 'ar/help-micro-cleanup' Tiny code clean-up. * ar/help-micro-cleanup: help: convert git_cmd to page in one place	2021-07-16 17:42:51 -07:00
Junio C Hamano	f90efd9981	Merge branch 'ar/submodule-helper-include-cleanup' Code clean-up. * ar/submodule-helper-include-cleanup: submodule--helper: remove redundant include	2021-07-16 17:42:51 -07:00
Junio C Hamano	cdeabf513a	Merge branch 'ab/bundle-updates' Code clean-up and leak plugging in "git bundle". * ab/bundle-updates: bundle: remove "ref_list" in favor of string-list.c API bundle.c: use a temporary variable for OIDs and names bundle cmd: stop leaking memory from parse_options_cmd_bundle()	2021-07-16 17:42:49 -07:00
Junio C Hamano	f0ade787ac	Merge branch 'hn/refs-iterator-peel-returns-boolean' Tiny API tweak. * hn/refs-iterator-peel-returns-boolean: refs: make explicit that ref_iterator_peel returns boolean	2021-07-16 17:42:49 -07:00
Junio C Hamano	3cc43bff9c	Merge branch 'ab/mktag-tests' Fill test gaps. * ab/mktag-tests: mktag tests: test fast-export mktag tests: test for-each-ref mktag tests: test update-ref and reachable fsck mktag tests: test hash-object --literally and unreachable fsck mktag tests: invert --no-strict test mktag tests: parse out options in helper	2021-07-16 17:42:48 -07:00
Junio C Hamano	1fb3445658	Merge branch 'ab/show-branch-tests' Fill test gaps. * ab/show-branch-tests: show-branch tests: add missing tests show-branch: don't <COLOR></RESET> for space characters show-branch tests: modernize test code show-branch tests: rename the one "show-branch" test file	2021-07-16 17:42:48 -07:00
Junio C Hamano	b2fc822629	Merge branch 'ab/fetch-negotiate-segv-fix' Code recently added to support common ancestry negotiation during "git push" did not sanity check its arguments carefully enough. * ab/fetch-negotiate-segv-fix: fetch: fix segfault in --negotiate-only without --negotiation-tip=* fetch: document the --negotiate-only option send-pack.c: move "no refs in common" abort earlier	2021-07-16 17:42:48 -07:00
Junio C Hamano	368cab75c1	Merge branch 'ab/make-delete-on-error' Use ".DELETE_ON_ERROR" pseudo target to simplify our Makefile. * ab/make-delete-on-error: Makefile: add and use the ".DELETE_ON_ERROR" flag	2021-07-16 17:42:47 -07:00
Junio C Hamano	a93c6fd677	Merge branch 'ew/mmap-failures' Error message update. * ew/mmap-failures: xmmap: inform Linux users of tuning knobs on ENOMEM	2021-07-16 17:42:47 -07:00
Junio C Hamano	fba551379e	Merge branch 'js/config-mak-windows-pcre-fix' Whitespace fix. * js/config-mak-windows-pcre-fix: config.mak.uname: PCRE1 cleanup	2021-07-16 17:42:47 -07:00
Junio C Hamano	bc34e5227b	Merge branch 'js/gfw-system-config-loc-fix' Update the location of system-side configuration file on Windows. * js/gfw-system-config-loc-fix: config: normalize the path of the system gitconfig cmake(windows): set correct path to the system Git config mingw: move Git for Windows' system config where users expect it	2021-07-16 17:42:46 -07:00
Junio C Hamano	508416d95c	Merge branch 'ks/submodule-cleanup' Code cleanup. * ks/submodule-cleanup: submodule: remove unnecessary `prefix` based option logic	2021-07-16 17:42:46 -07:00
Junio C Hamano	3b57e72c0c	Merge branch 'tb/midx-use-checksum' When rebuilding the multi-pack index file reusing an existing one, we used to blindly trust the existing file and ended up carrying corrupted data into the updated file, which has been corrected. * tb/midx-use-checksum: midx: report checksum mismatches during 'verify' midx: don't reuse corrupt MIDXs when writing commit-graph: rewrite to use checksum_valid() csum-file: introduce checksum_valid()	2021-07-16 17:42:46 -07:00
Junio C Hamano	d3b88be1b4	Merge branch 'en/merge-dir-rename-corner-case-fix' The merge code had funny interactions between content based rename detection and directory rename detection. * en/merge-dir-rename-corner-case-fix: merge-recursive: handle rename-to-self case merge-ort: ensure we consult df_conflict and path_conflicts t6423: test directory renames causing rename-to-self	2021-07-16 17:42:45 -07:00
Junio C Hamano	fdbcdfcf61	Merge branch 'en/ort-perf-batch-13' Performance tweaks of "git merge -sort" around lazy fetching of objects. * en/ort-perf-batch-13: merge-ort: add prefetching for content merges diffcore-rename: use a different prefetch for basename comparisons diffcore-rename: allow different missing_object_cb functions t6421: add tests checking for excessive object downloads during merge promisor-remote: output trace2 statistics for number of objects fetched	2021-07-16 17:42:45 -07:00
Junio C Hamano	89efac81c7	Merge branch 'en/ort-perf-batch-12' More fix-ups and optimization to "merge -sort". * en/ort-perf-batch-12: merge-ort: miscellaneous touch-ups Fix various issues found in comments diffcore-rename: avoid unnecessary strdup'ing in break_idx merge-ort: replace string_list_df_name_compare with faster alternative	2021-07-16 17:42:45 -07:00
Junio C Hamano	5b1cd37e44	CodingGuidelines: recommend gender-neutral description Technical writing seeks to convey information with minimal friction. One way that a reader can experience friction is if they encounter a description of "a user" that is later simplified using a gendered pronoun. If the reader does not consider that pronoun to apply to them, then they can experience cognitive dissonance that removes focus from the information. Give some basic tips to guide us avoid unnecessary uses of gendered description. Using a gendered pronoun is appropriate when referring to a specific person. There are acceptable existing uses of gendered pronouns within the Git codebase, such as: * References to real people (e.g. Linus Torvalds, "the Git maintainer"). Do not misgender real people. If there is any doubt to the gender of a person, then avoid using pronouns. * References to fictional people with clear genders (e.g. Alice and Bob). * Sample text used in test cases (e.g t3702, t6432). * The official text of the GPL license contains uses of "he or she", but using singular "they" (or modifying the text in some other way) is not within the scope of the Git project. * Literal email messages in Documentation/howto/ should not be edited for grammatical concerns such as this, unless we update the entire document to fit the standard documentation format. If such an effort is taken on, then the authorship would change and no longer refer to the exact mail message. * External projects consumed in contrib/ should not deviate solely for style reasons. Recommended edits should be contributed to those projects directly. Other cases within the Git project were cleaned up by the previous changes. Co-authored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-16 11:35:46 -07:00
Philippe Blain	ca2d62b787	parse-options: don't complete option aliases by default Since 'OPT_ALIAS' was created in `5c387428f1` (parse-options: don't emit "ambiguous option" for aliases, 2019-04-29), 'git clone --git-completion-helper', which is used by the Bash completion script to list options accepted by clone (via '__gitcomp_builtin'), lists both '--recurse-submodules' and its alias '--recursive', which was not the case before since '--recursive' had the PARSE_OPT_HIDDEN flag set, and options with this flag are skipped by 'parse-options.c::show_gitcomp', which implements 'git <cmd> --git-completion-helper'. This means that typing 'git clone --recurs<TAB>' will yield both '--recurse-submodules' and '--recursive', which is not ideal since both do the same thing, and so the completion should directly complete the canonical option. At the point where 'show_gitcomp' is called in 'parse_options_step', 'preprocess_options' was already called in 'parse_options', so any aliases are now copies of the original options with a modified help text indicating they are aliases. Helpfully, since `64cc539fd2` (parse-options: don't leak alias help messages, 2021-03-21) these copies have the PARSE_OPT_FROM_ALIAS flag set, so check that flag early in 'show_gitcomp' and do not print them, unless the user explicitely requested that all completion be shown (by setting 'GIT_COMPLETION_SHOW_ALL'). After all, if we want to encourage the use of '--recurse-submodules' over '--recursive', we'd better just suggest the former. The only other options alias is 'log' and friends' '--mailmap', which is an alias for '--use-mailmap', but the Bash completion helpers for these commands do not use '__gitcomp_builtin', and thus are unnaffected by this change. Test the new behaviour in t9902-completion.sh. As a side effect, this also tests the correct behaviour of GIT_COMPLETION_SHOW_ALL, which was not tested before. Note that since '__gitcomp_builtin' caches the options it shows, we need to re-source the completion script to clear that cache for the second test. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-16 11:31:44 -07:00
Elijah Newren	94b82d5686	rename: bump limit defaults yet again These were last bumped in commit `92c57e5c1d` (bump rename limit defaults (again), 2011-02-19), and were bumped both because processors had gotten faster, and because people were getting ugly merges that caused problems and reporting it to the mailing list (suggesting that folks were willing to spend more time waiting). Since that time: * Linus has continued recommending kernel folks to set diff.renameLimit=0 (maps to 32767, currently) * Folks with repositories with lots of renames were happy to set merge.renameLimit above 32767, once the code supported that, to get correct cherry-picks * Processors have gotten faster * It has been discovered that the timing methodology used last time probably used too large example files. The last point is probably worth explaining a bit more: * The "average" file size used appears to have been average blob size in the linux kernel history at the time (probably v2.6.25 or something close to it). * Since bigger files are modified more frequently, such a computation weights towards larger files. * Larger files may be more likely to be modified over time, but are not more likely to be renamed -- the mean and median blob size within a tree are a bit higher than the mean and median of blob sizes in the history leading up to that version for the linux kernel. * The mean blob size in v2.6.25 was half the average blob size in history leading to that point * The median blob size in v2.6.25 was about 40% of the mean blob size in v2.6.25. * Since the mean blob size is more than double the median blob size, any file as big as the mean will not be compared to any files of median size or less (because they'd be more than 50% dissimilar). * Since it is the number of files compared that provides the O(n^2) behavior, median-sized files should matter more than mean-sized ones. The combined effect of the above is that the file size used in past calculations was likely about 5x too large. Combine that with a CPU performance improvement of ~30%, and we can increase the limits by a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated time limits. Keeping the same approximate time limit probably makes sense for diff.renameLimit (there is no progress feedback in e.g. git log -p), but the experience above suggests merge.renameLimit could be extended significantly. In fact, it probably would make sense to have an unlimited default setting for merge.renameLimit, but that would likely need to be coupled with changes to how progress is displayed. (See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/ for details in that area.) For now, let's just bump the approximate time limit from 10s to 1m. (Note: We do not want to use actual time limits, because getting results that depend on how loaded your system is that day feels bad, and because we don't discover that we won't get all the renames until after we've put in a lot of work rather than just upfront telling the user there are too many files involved.) Using the original time limit of 2s for diff.renameLimit, and bumping merge.renameLimit from 10s to 60s, I found the following timings using the simple script at the end of this commit message (on an AWS c5.xlarge which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"): N Timing 1300 1.995s 7100 59.973s So let's round down to nice even numbers and bump the limits from 400->1000, and from 1000->7000. Here is the measure_rename_perf script (adapted from https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/ in particular to avoid triggering the linear handling from basename-guided rename detection): #!/bin/bash n=$1; shift rm -rf repo mkdir repo && cd repo git init -q -b main mkdata() { mkdir $1 for i in `seq 1 $2`; do (sed "s/^/$i /" <../sample echo tag: $1 ) >$1/$i done } mkdata initial $n git add . git commit -q -m initial mkdata new $n git add . cd new for i in *; do git mv $i $i.renamed; done cd .. git rm -q -rf initial git commit -q -m new time git diff-tree -M -l0 --summary HEAD^ HEAD Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-15 16:54:34 -07:00
Elijah Newren	9dd29dbef0	diffcore-rename: treat a rename_limit of 0 as unlimited In commit `89973554b5` (diffcore-rename: make diff-tree -l0 mean -l<large>, 2017-11-29), -l0 was given a special magical "large" value, but one which was not large enough for some uses (as can be seen from commit `9f7e4bfa3b` (diff: remove silent clamp of renameLimit, 2017-11-13). Make 0 (or a negative value) be treated as unlimited instead and update the documentation to mention this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-15 16:54:24 -07:00
Elijah Newren	6623a528e0	doc: clarify documentation for rename/copy limits A few places in the docs implied that rename/copy detection is always quadratic or that all (unpaired) files were involved in the quadratic portion of rename/copy detection. The following two commits each introduced an exception to this: `9027f53cb5` (Do linear-time/space rename logic for exact renames, 2007-10-25) `bd24aa2f97` (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14) (As a side note, for copy detection, the basename guided inexact rename detection is turned off and the exact renames will only result in sources (without the dests) being removed from the set of files used in quadratic detection. So, for copy detection, the documentation was closer to correct.) Avoid implying that all files involved in rename/copy detection are subject to the full quadratic algorithm. While at it, also note the default values for all these settings. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-15 16:54:24 -07:00
Elijah Newren	05d2c61c67	diff: correct warning message when renameLimit exceeded The warning when quadratic rename detection was skipped referred to "inexact rename detection". For years, the only linear portion of rename detection was looking for exact renames, so "inexact rename detection" was an accurate way to refer to the quadratic portion of rename detection. However, that changed with commit `bd24aa2f97` (diffcore-rename: guide inexact rename detection based on basenames, 2021-02-14). Let's instead use the term "exhaustive rename detection" to refer to the quadratic portion. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-15 16:54:24 -07:00
Stephen Manz	0db4961c49	worktree: teach `add` to accept --reason <string> with --lock The default reason stored in the lock file, "added with --lock", is unlikely to be what the user would have given in a separate `git worktree lock` command. Allowing `--reason` to be specified along with `--lock` when adding a working tree gives the user control over the reason for locking without needing a second command. Signed-off-by: Stephen Manz <smanz@alum.mit.edu> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-15 13:30:59 -07:00
Johannes Schindelin	a066a90db6	ci(check-whitespace): restrict to the intended commits During a run of the `check-whitespace` we want to verify that the commits introduced in the Pull Request have no whitespace issues. We only want to look at those commits, not the upstream commits (because the contributor cannot do anything about the latter). However, by using the `-<count>` form in `git log --check`, we run the risk of looking at the wrong commits. The reason is that the `actions/checkout` step does _not_ check out the tip commit of the Pull Request's branch: Instead, it checks out a merge commit that merges that branch into the target branch. For that reason, we already adjust the commit count by incrementing it, but that is not enough: if the upstream branch has newer commits, they are traversed _first_. And obviously we will then miss some of the commits that we _actually_ wanted to look at. Therefore, let's be careful to stop assuming a linear, up to date commit topology in the contributed commits, and instead specify the correct commit range. Unfortunately, this means that we no longer can rely on a shallow clone: There is no way of knowing just how many commits the upstream branch advanced after the commit from which the PR branch branched off. So let's just go with a full clone instead, and be safe rather than sorry (if we have "too shallow" a situation, a commit range `@{u}..` may very well include a shallow commit itself, and the output of `git show --check <shallow>` is _not_ pretty). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:38:01 -07:00
Johannes Schindelin	cc00362125	ci(check-whitespace): stop requiring a read/write token As part of some recent security tightening, GitHub introduced the ability to configure GitHub workflows to be run with a read-only token. This is much more secure, in particular when working in a public repository: While the regular read/write token might be restricted to writing to the current branch, it is not necessarily restricted to access only the current Pull Request. However, the `check-whitespace` workflow threw a wrench into this plan: it _requires_ write access (because it wants to add a PR comment in case of a whitespace issue). Let's just skip that PR comment. The user can always click through to the actual error, even if it is slightly less convenient. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:37:59 -07:00
Derrick Stolee	1ba5f45132	checkout: stop expanding sparse indexes Previous changes did the necessary improvements to unpack-trees.c and diff-lib.c in order to modify a sparse index based on its comparision with a tree. The only remaining work is to remove some ensure_full_index() calls and add tests that verify that the index is not expanded in our interesting cases. Include 'switch' and 'restore' in these tests, as they share a base implementation with 'checkout'. Here are the relevant performance results from p2000-sparse-operations.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------- 2000.18: git checkout -f - (full-v3) 0.49(0.43+0.03) 0.47(0.39+0.05) -4.1% 2000.19: git checkout -f - (full-v4) 0.45(0.37+0.06) 0.42(0.37+0.05) -6.7% 2000.20: git checkout -f - (sparse-v3) 0.76(0.71+0.07) 0.04(0.03+0.04) -94.7% 2000.21: git checkout -f - (sparse-v4) 0.75(0.72+0.04) 0.05(0.06+0.04) -93.3% It is important to compare the full index case to the sparse index case, as the previous results for the sparse index were inflated by the index expansion. For index v4, this is an 88% improvement. On an internal repository with over two million paths at HEAD and a sparse-checkout definition containing ~60,000 of those paths, 'git checkout' went from 3.5s to 297ms with this change. The theoretical optimum where only those ~60,000 paths exist was 275ms, so the extra sparse directory entries contribute a 22ms overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:05:53 -07:00
Derrick Stolee	f934f1b47f	sparse-index: recompute cache-tree When some commands run with command_requires_full_index=1, then the index can get in a state where the in-memory cache tree is actually equal to the sparse index's cache tree instead of the full one. This results in incorrect entry_count values. By clearing the cache tree before converting to sparse, we avoid this issue. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:05:53 -07:00
Derrick Stolee	daa1acefc5	commit: integrate with sparse-index Update 'git commit' to allow using the sparse-index in memory without expanding to a full one. The only place that had an ensure_full_index() call was in cache_tree_update(). The recursive algorithm for update_one() was already updated in `2de37c536` (cache-tree: integrate with sparse directory entries, 2021-03-03) to handle sparse directory entries in the index. Most of this change involves testing different command-line options that allow specifying which on-disk changes should be included in the commit. This includes no options (only take currently-staged changes), -a (take all tracked changes), and --include (take a list of specific changes). To simplify testing that these options do not expand the index, update the test that previously verified that 'git status' does not expand the index with a helper method, ensure_not_expanded(). This allows 'git commit' to operate much faster when the sparse-checkout cone is much smaller than the full list of files at HEAD. Here are the relevant lines from p2000-sparse-operations.sh: Test HEAD~1 HEAD ---------------------------------------------------------------------------------- 2000.14: git commit -a -m A (full-v3) 0.35(0.26+0.06) 0.36(0.28+0.07) +2.9% 2000.15: git commit -a -m A (full-v4) 0.32(0.26+0.05) 0.34(0.28+0.06) +6.3% 2000.16: git commit -a -m A (sparse-v3) 0.63(0.59+0.06) 0.04(0.05+0.05) -93.7% 2000.17: git commit -a -m A (sparse-v4) 0.64(0.59+0.08) 0.04(0.04+0.04) -93.8% It is important to compare the full-index case to the sparse-index case, so the improvement for index version v4 is actually an 88% improvement in this synthetic example. In a real repository with over two million files at HEAD and 60,000 files in the sparse-checkout definition, the time for 'git commit -a' went from 2.61 seconds to 134ms. I compared this to the result if the index only contained the paths in the sparse-checkout definition and found the theoretical optimum to be 120ms, so the out-of-cone paths only add a 12% overhead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:05:53 -07:00
Derrick Stolee	11042ab914	p2000: compress repo names By using shorter names for the test repos, we will get a slightly more compressed performance summary without comprimising clarity. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:05:53 -07:00
Derrick Stolee	0d53d19946	p2000: add 'git checkout -' test and decrease depth As we increase our list of commands to test in p2000-sparse-operations.sh, we will want to have a slightly smaller test repository. Reduce the size by a factor of four by reducing the depth of the step that creates a big index around a moderately-sized repository. Also add a step to run 'git checkout -' on repeat. This requires having a previous location in the reflog, so add that to the initialization steps. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 15:05:53 -07:00
Derrick Stolee	e5ca291076	t1092: document bad sparse-checkout behavior There are several situations where a repository with sparse-checkout enabled will act differently than a normal repository, and in ways that are not intentional. The test t1092-sparse-checkout-compatibility.sh documents some of these deviations, but a casual reader might think these are intentional behavior changes. Add comments on these tests that make it clear that these behaviors should be updated. Using 'NEEDSWORK' helps contributors find that these are potential areas for improvement. Helped-by: Elijah Newren <newren@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	f8fe49e539	fsmonitor: integrate with sparse index If we need to expand a sparse-index into a full one, then the FS Monitor bitmap is going to be incorrect. Ensure that we start fresh at such an event. While this is currently a performance drawback, the eventual hope of the sparse-index feature is that these expansions will be rare and hence we will be able to keep the FS Monitor data accurate across multiple Git commands. These tests are added to demonstrate that the behavior is the same across a full index and a sparse index, but also that file modifications to a tracked directory outside of the sparse cone will trigger ensure_full_index(). Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	fe0d576153	wt-status: expand added sparse directory entries It is difficult, but possible, to get into a state where we intend to add a directory that is outside of the sparse-checkout definition. Add a test to t1092-sparse-checkout-compatibility.sh that demonstrates this using a combination of 'git reset --mixed' and 'git checkout --orphan'. This test failed before because the output of 'git status --porcelain=v2' would not match on the lines for folder1/: * The sparse-checkout repo (with a full index) would output each path name that is intended to be added. * The sparse-index repo would only output that "folder1/" is staged for addition. The status should report the full list of files to be added, and so this sparse-directory entry should be expanded to a full list when reaching it inside the wt_status_collect_changes_initial() method. Use read_tree_at() to assist. Somehow, this loop over the cache entries was not guarded by ensure_full_index() as intended. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	d76723ee53	status: use sparse-index throughout By testing 'git -c core.fsmonitor= status -uno', we can check for the simplest index operations that can be made sparse-aware. The necessary implementation details are already integrated with sparse-checkout, so modify command_requires_full_index to be zero for cmd_status(). In refresh_index(), we loop through the index entries to refresh their stat() information. However, sparse directories have no stat() information to populate. Ignore these entries. This allows 'git status' to no longer expand a sparse index to a full one. This is further tested by dropping the "-uno" option and adding an untracked file into the worktree. The performance test p2000-sparse-checkout-operations.sh demonstrates these improvements: Test HEAD~1 HEAD ----------------------------------------------------------------------------- 2000.2: git status (full-index-v3) 0.31(0.30+0.05) 0.31(0.29+0.06) +0.0% 2000.3: git status (full-index-v4) 0.31(0.29+0.07) 0.34(0.30+0.08) +9.7% 2000.4: git status (sparse-index-v3) 2.35(2.28+0.10) 0.04(0.04+0.05) -98.3% 2000.5: git status (sparse-index-v4) 2.35(2.24+0.15) 0.05(0.04+0.06) -97.9% Note that since HEAD~1 was expanding the sparse index by parsing trees, it was artificially slower than the full index case. Thus, the 98% improvement is misleading, and instead we should celebrate the 0.34s to 0.05s improvement of 85%. This is more indicative of the peformance gains we are expecting by using a sparse index. Note: we are dropping the assignment of core.fsmonitor here. This is not necessary for the test script as we are not altering the config any other way. Correct integration with FS Monitor will be validated in later changes. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	bf48e5acdb	status: skip sparse-checkout percentage with sparse-index 'git status' began reporting a percentage of populated paths when sparse-checkout is enabled in `051df3cf` (wt-status: show sparse checkout status as well, 2020-07-18). This percentage is incorrect when the index has sparse directories. It would also be expensive to calculate as we would need to parse trees to count the total number of possible paths. Avoid the expensive computation by simplifying the output to only report that a sparse checkout exists, without the percentage. This change is the reason we use 'git status --porcelain=v2' in t1092-sparse-checkout-compatibility.sh. We don't want to ensure that this message is equal across both modes, but instead just the important information about staged, modified, and untracked files are compared. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	9eb00af562	diff-lib: handle index diffs with sparse dirs While comparing an index to a tree, we may see a sparse directory entry. In this case, we should compare that portion of the tree to the tree represented by that entry. This could include a new tree which needs to be expanded to a full list of added files. It could also include an existing tree, in which case all of the changes inside are important to describe, including the modifications, additions, and deletions. Note that the case where the tree has a path and the index does not remains identical to before: the lack of a cache entry is the same with a sparse index. Use diff_tree_oid() appropriately to compute the diff. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	69bdbdb0ee	dir.c: accept a directory as part of cone-mode patterns When we have sparse directory entries in the index, we want to compare that directory against sparse-checkout patterns. Those pattern matching algorithms are built expecting a file path, not a directory path. This is especially important in the "cone mode" patterns which will match files that exist within the "parent directories" as well as the recursive directory matches. If path_matches_pattern_list() is given a directory, we can add a fake filename ("-") to the directory and get the same results as before, assuming we are in cone mode. Since sparse index requires cone mode patterns, this is an acceptable assumption. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	523506df51	unpack-trees: unpack sparse directory entries During unpack_callback(), index entries are compared against tree entries. These are matched according to names and types. One goal is to decide if we should recurse into subtrees or simply operate on one index entry. In the case of a sparse-directory entry, we do not want to recurse into that subtree and instead simply compare the trees. In some cases, we might want to perform a merge operation on the entry, such as during 'git checkout <commit>' which wants to replace a sparse tree entry with the tree for that path at the target commit. We extend the logic within unpack_single_entry() to create a sparse-directory entry in this case, and then that is sent to call_unpack_fn(). There are some subtleties in this process. For instance, we need to update find_cache_entry() to allow finding a sparse-directory entry that exactly matches a given path. Use the new helper method sparse_dir_matches_path() for this. We also need to ignore conflict markers in the case that the entries correspond to directories and we already have a sparse directory entry. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	bd6a3fd7f1	unpack-trees: rename unpack_nondirectories() In the next change, we will use this method to unpack a sparse directory entry, so change the name to unpack_single_entry() so these entries apply. The new name reflects that we will not recurse into trees in order to resolve the conflicts. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:49 -07:00
Derrick Stolee	cd807a5cda	unpack-trees: compare sparse directories correctly As we further integrate the sparse-index into unpack-trees, we need to ensure that we compare sparse directory entries correctly with other entries. This affects searching for an exact path as well as sorting index entries. Sparse directory entries contain the trailing directory separator. This is important for the sorting, in particular. Thus, within do_compare_entry() we stop using S_IFREG in all cases, since sparse directories should use S_IFDIR to indicate that the comparison should treat the entry name as a dirctory. Within compare_entry(), it first calls do_compare_entry() to check the leading portion of the name. When the input path is a directory name, we could match exactly already. Thus, we should return 0 if we have an exact string match on a sparse directory entry. The final check is a length comparison between the strings. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:48 -07:00
Derrick Stolee	17a1bb570b	unpack-trees: preserve cache_bottom The cache_bottom member of 'struct unpack_trees_options' is used to track the range of index entries corresponding to a node of the cache tree. While recursing with traverse_by_cache_tree(), this value is preserved on the call stack using a local and then restored as that method returns. The mark_ce_used() method normally modifies the cache_bottom member when it refers to the marked cache entry. However, sparse directory entries are stored as nodes in the cache-tree data structure as of `2de37c53` (cache-tree: integrate with sparse directory entries, 2021-03-30). Thus, the cache_bottom will be modified as the cache-tree walk advances. Do not update it as well within mark_ce_used(). Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:48 -07:00
Derrick Stolee	bf26c06f12	t1092: add tests for status/add and sparse files Before moving to update 'git status' and 'git add' to work with sparse indexes, add an explicit test that ensures the sparse-index works the same as a normal sparse-checkout when the worktree contains directories and files outside of the sparse cone. Specifically, 'folder1/a' is a file in our test repo, but 'folder1' is not in the sparse cone. When 'folder1/a' is modified, the file is not shown as modified and adding it will fail. This is new behavior as of `a20f704` (add: warn when asked to update SKIP_WORKTREE entries, 2021-04-08). Before that change, these adds would be silently ignored. Untracked files are fine: adding new files both with 'git add .' and 'git add folder1/' works just as in a full checkout. This may not be entirely desirable, but we are not intending to change behavior at the moment, only document it. A future change could alter the behavior to be more sensible, and this test could be modified to satisfy the new expected behavior. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:48 -07:00
Derrick Stolee	e669ffb2b8	t1092: expand repository data shape As more features integrate with the sparse-index feature, more and more special cases arise that require different data shapes within the tree structure of the repository in order to demonstrate those cases. Add several interesting special cases all at once instead of sprinkling them across several commits. The interesting cases being added here are: * Add sparse-directory entries on both sides of directories within the sparse-checkout definition. * Add directories outside the sparse-checkout definition who have only one entry and are the first entry of a directory with multiple entries. * Add filenames adjacent to a sparse directory entry that sort before and after the trailing slash. Later tests will take advantage of these shapes, but they also deepen the tests that already exist. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-14 13:42:48 -07:00

1 2 3 4 5 ...

63710 Commits