git-commit-vandalism

Author	SHA1	Message	Date
Taylor Blau	528290f8c6	Merge branch 'tb/config-copy-or-rename-in-file-injection' Avoids issues with renaming or deleting sections with long lines, where configuration values may be interpreted as sections, leading to configuration injection. Addresses CVE-2023-29007. * tb/config-copy-or-rename-in-file-injection: config.c: disallow overly-long lines in `copy_or_rename_section_in_file()` config.c: avoid integer truncation in `copy_or_rename_section_in_file()` config: avoid fixed-sized buffer when renaming/deleting a section t1300: demonstrate failure when renaming sections with long lines Signed-off-by: Taylor Blau <me@ttaylorr.com>	2023-04-17 21:15:42 +02:00
Taylor Blau	3bb3d6bac5	config.c: disallow overly-long lines in `copy_or_rename_section_in_file()` As a defense-in-depth measure to guard against any potentially-unknown buffer overflows in `copy_or_rename_section_in_file()`, refuse to work with overly-long lines in a gitconfig. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>	2023-04-17 21:15:40 +02:00
Taylor Blau	a5bb10fd5e	config: avoid fixed-sized buffer when renaming/deleting a section When renaming (or deleting) a section of configuration, Git uses the function `git_config_copy_or_rename_section_in_file()` to rewrite the configuration file after applying the rename or deletion to the given section. To do this, Git repeatedly calls `fgets()` to read the existing configuration data into a fixed size buffer. When the configuration value under `old_name` exceeds the size of the buffer, we will call `fgets()` an additional time even if there is no newline in the configuration file, since our read length is capped at `sizeof(buf)`. If the first character of the buffer (after zero or more characters satisfying `isspace()`) is a '[', Git will incorrectly treat it as beginning a new section when the original section is being removed. In other words, a configuration value satisfying this criteria can incorrectly be considered as a new secftion instead of a variable in the original section. Avoid this issue by using a variable-width buffer in the form of a strbuf rather than a fixed-with region on the stack. A couple of small points worth noting: - Using a strbuf will cause us to allocate arbitrary sizes to match the length of each line. In practice, we don't expect any reasonable configuration files to have lines that long, and a bandaid will be introduced in a later patch to ensure that this is the case. - We are using strbuf_getwholeline() here instead of strbuf_getline() in order to match `fgets()`'s behavior of leaving the trailing LF character on the buffer (as well as a trailing NUL). This could be changed later, but using strbuf_getwholeline() changes the least about this function's implementation, so it is picked as the safest path. - It is temping to want to replace the loop to skip over characters matching isspace() at the beginning of the buffer with a convenience function like `strbuf_ltrim()`. But this is the wrong approach for a couple of reasons: First, it involves a potentially large and expensive `memmove()` which we would like to avoid. Second, and more importantly, we also do want to preserve those spaces to avoid changing the output of other sections. In all, this patch is a minimal replacement of the fixed-width buffer in `git_config_copy_or_rename_section_in_file()` to instead use a `struct strbuf`. Reported-by: André Baptista <andre@ethiack.com> Reported-by: Vítor Pinho <vitor@ethiack.com> Helped-by: Patrick Steinhardt <ps@pks.im> Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2023-04-17 21:15:40 +02:00
Taylor Blau	29198213c9	t1300: demonstrate failure when renaming sections with long lines When renaming a configuration section which has an entry whose length exceeds the size of our buffer in config.c's implementation of `git_config_copy_or_rename_section_in_file()`, Git will incorrectly form a new configuration section with part of the data in the section being removed. In this instance, our first configuration file looks something like: [b] c = d <spaces> [a] e = f [a] g = h Here, we have two configuration values, "b.c", and "a.g". The value "[a] e = f" belongs to the configuration value "b.c", and does not form its own section. However, when renaming the section 'a' to 'xyz', Git will write back "[xyz]\ne = f", but "[xyz]" is still attached to the value of "b.c", which is why "e = f" on its own line becomes a new entry called "b.e". A slightly different example embeds the section being renamed within another section. Demonstrate this failure in a test in t1300, which we will fix in the following commit. Co-authored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2023-04-17 21:15:39 +02:00
Johannes Schindelin	9db05711c9	apply --reject: overwrite existing `.rej` symlink if it exists The `git apply --reject` is expected to write out `.rej` files in case one or more hunks fail to apply cleanly. Historically, the command overwrites any existing `.rej` files. The idea being that apply/reject/edit cycles are relatively common, and the generated `.rej` files are not considered precious. But the command does not overwrite existing `.rej` symbolic links, and instead follows them. This is unsafe because the same patch could potentially create such a symbolic link and point at arbitrary paths outside the current worktree, and `git apply` would write the contents of the `.rej` file into that location. Therefore, let's make sure that any existing `.rej` file or symbolic link is removed before writing it. Reported-by: RyotaK <ryotak.mail@gmail.com> Helped-by: Taylor Blau <me@ttaylorr.com> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>	2023-04-17 21:15:38 +02:00
Jeff King	c4716236f2	t5512: test "ls-remote --heads --symref" filtering with v0 and v2 We have two overlapping tests for checking the behavior of "ls-remote --symref" when filtering output. The first test checks that using "--heads" will omit the symref for HEAD (since we don't print anything about HEAD at all), but still prints other symrefs. This has been marked as expecting failure since it was added in `99c08d4eb2` (ls-remote: add support for showing symrefs, 2016-01-19). That's because back then, we only had the v0 protocol, and it only reported on the HEAD symref, not others. But these days we have v2, which does exactly what the test wants. It would even have started unexpectedly passing when we switched to v2 by default, except that `b2f73b70b2` (t5512: compensate for v0 only sending HEAD symrefs, 2019-02-25) over-zealously marked it to run only in v0 mode. So let's run it with both protocol versions, and adjust the expected output for each. It passes in v2 without modification. In v0 mode, we'll drop the extra symref, but this is still testing something useful: it ensures that we do omit HEAD. The test after this checks "--heads" again, this time using the expected v0 output. That's now redundant. It also checks that limiting with a pattern like "refs/heads/*" works similarly, but that's redundant with a test earlier in the script which limits by HEAD (again, back then the "HEAD" test was less interesting because there were no other symrefs to omit, but in a modern v2 world, there are). So we can just delete that second test entirely. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 15:08:13 -07:00
Jeff King	d6747adfa8	t5512: allow any protocol version for filtered symref test We have a test that checks that ls-remote, when asked only about HEAD, will report the HEAD symref, and not others. This was marked to always run with the v0 protocol by `b2f73b70b2` (t5512: compensate for v0 only sending HEAD symrefs, 2019-02-25). But in v0 this test is doing nothing! For v0, upload-pack only reports the HEAD symref anyway, so we'd never have any other symref to report. For v2, it is useful; we learn about all symrefs (and the test repo has multiple), so this demonstrates that we correctly avoid showing them. We could perhaps mark this to test explicitly with v2, but since that is the default these days, it's sufficient to just run ls-remote without any protocol specification. It still passes if somebody does an explicit GIT_TEST_PROTOCOL_VERSION=0; it's just testing less. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 15:08:12 -07:00
Jeff King	20272ee8cf	t5512: add v2 support for "ls-remote --symref" test Commit `b2f73b70b2` (t5512: compensate for v0 only sending HEAD symrefs, 2019-02-25) configured this test to always run with protocol v0, since the output is different for v2. But that means we are not getting any test coverage of the feature with v2 at all. We could obviously switch to using and expecting v2, but then that leaves v0 behind (and while we don't use it by default, it's still important for testing interoperability with older servers). Likewise, we could switch the expected output based on $GIT_TEST_PROTOCOL_VERSION, but hardly anybody runs the tests for v0 these days. Instead, let's explicitly run it for both protocol versions to make sure they're well behaved. This matches other similar tests added later in `6a139cdd74` (ls-remote: pass heads/tags prefixes to transport, 2018-10-31), etc. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 15:08:12 -07:00
Jeff King	13e67aa39b	v0 protocol: fix sha1/sha256 confusion for capabilities^{} Commit `eb398797cd` (connect: advertized capability is not a ref, 2016-09-09) added support for an upload-pack server responding with: 0000000000000000000000000000000000000000 capabilities^{} followed by a NUL and the actual capabilities. We correctly parse the oid using the packet_reader's hash_algo field, but then we compare it to null_oid(), which will instead use our current repo's default algorithm. If we're defaulting to sha256 locally but the other side is sha1, they won't match and we'll fail to parse the line (and thus die()). This can cause a test failure when the suite is run with GIT_TEST_DEFAULT_HASH=sha256, and we even do so regularly via the linux-sha256 CI job. But since the test requires JGit to run, it's usually just skipped, and nobody noticed the problem. The reason the original patch used JGit is that Git itself does not ever produce such a line via upload-pack; the feature was added to fix a real-world problem when interacting with JGit. That was good for verifying that the incompatibility was fixed, but it's not a good regression test: - hardly anybody runs it, because you have to have jgit installed; hence this bug going unnoticed - we're depending on jgit's behavior for the test to do anything useful. In particular, this behavior is only relevant to the v0 protocol, but these days we ask for the v2 protocol by default. So for modern jgit, this is probably testing nothing. - it's complicated and slow. We had to do some fifo trickery to handle races, and this one test makes up 40% of the runtime of the total script. Instead, let's just hard-code the response that's of interest to us. That will test exactly what we want for every run, and reveals the bug when run in sha256 mode. And of course we'll fix the actual bug by using the correct hash_algo struct. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 15:08:12 -07:00
Jeff King	e6c4309748	t5512: stop referring to "v1" protocol There really isn't a "v1" Git protocol. It's just v0 with an extra probe which we used to test compatibility in preparation for v2. Any tests that are looking for before/after behavior for v2 really care about "v0". Mentioning "v1" in these tests is just making things more confusing, because we don't care about that probe; we're really testing v0. So let's say so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 15:08:12 -07:00
Jeff King	aa962fef27	v0 protocol: fix infinite loop when parsing multi-valued capabilities If Git's client-side parsing of an upload-pack response (so git-fetch or ls-remote) sees multiple instances of a single capability, it can enter an infinite loop due to a bug in advancing the "offset" parameter in the parser. This bug can't happen between a client and server of the same Git version. The client bug is in parse_feature_value() when the caller passes in an offset parameter. And that only happens when the v0 protocol is parsing "symref" and "object-format" capabilities, via next_server_feature_value(). But Git has never produced multiple object-format capabilities, and it stopped producing multiple symref values in `d007dbf7d6` (Revert "upload-pack: send non-HEAD symbolic refs", 2013-11-18). However, upload-pack did produce multiple symref entries for a while, and they are valid. Plus other implementations, such as Dulwich will still do so. So we should handle them. And even if we do not expect it, it is obviously a bug for the parser to enter an infinite loop. The bug itself is pretty simple. Commit `2c6a403d96` (connect: add function to parse multiple v1 capability values, 2020-05-25) added the "offset" parameter, which is used as both an in- and out-parameter. When parsing the first "symref" capability, offset will be 0 on input, and after parsing the capability, we set offset to an index just past the value by taking a pointer difference "(value + end) - feature_list". But on the second call, now offset is set to that larger index, which lets us skip past the first "symref" capability. However, we do so by incrementing feature_list. That means our pointer difference is now too small; it is counting from where we resumed parsing, not from the start of the original feature_list pointer. And because we incremented feature_list only inside our function, and not the caller, that increment is lost next time the function is called. One solution would be to account for those skipped bytes by incrementing offset, rather than assigning to it. But wait, there's more! We also increment feature_list if we have a near-miss. Say we are looking for "symref" and find "almost-symref". In that case we'll point feature_list to the "y" in "almost-symref" and restart our search. But that again means our offset won't be correct, as it won't account for the bytes between the start of the string and that "y". So instead, let's just record the beginning of the feature_list string in a separate pointer that we never touch. That offset we take in and return is meant to be using that point as a base, and now we'll do so consistently. Since the bug can't be reproduced using the current version of git-upload-pack, we'll instead hard-code an input which triggers the problem. Before this patch it loops forever re-parsing the second symref entry. Now we check both that it finishes, and that it parses both entries correctly (a case we could not test at all before). We don't need to worry about testing v2 here; it communicates the capabilities in a completely different way, and doesn't use this code at all. There are tests earlier in t5512 that are meant to cover this (they don't, but we'll address that in a future patch). Reported-by: Jonas Haag <jonas@lophus.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 15:08:12 -07:00
Robin Jarry	3c8d3adeae	send-email: export patch counters in validate environment When sending patch series (with a cover-letter or not) sendemail-validate is called with every email/patch file independently from the others. When one of the patches depends on a previous one, it may not be possible to use this hook in a meaningful way. A hook that wants to check some property of the whole series needs to know which patch is the final one. Expose the current and total number of patches to the hook via the GIT_SENDEMAIL_PATCH_COUNTER and GIT_SENDEMAIL_PATCH_TOTAL environment variables so that both incremental and global validation is possible. Sharing any other state between successive invocations of the validate hook must be done via external means. For example, by storing it in a git config sendemail.validateWorktree entry. Add a sample script with placeholder validations and update tests to check that the counters are properly exported. Suggested-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Robin Jarry <robin@jarry.cc> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:41:15 -07:00
Patrick Steinhardt	d85cd18777	repack: disable writing bitmaps when doing a local repack In order to write a bitmap, we need to have full coverage of all objects that are about to be packed. In the traditional non-multi-pack-index world this meant we need to do a full repack of all objects into a single packfile. But in the new multi-pack-index world we can get away with writing bitmaps when we have multiple packfiles as long as the multi-pack-index covers all objects. This is not always the case though. When asked to perform a repack of local objects, only, then we cannot guarantee to have full coverage of all objects regardless of whether we do a full repack or a repack with a multi-pack-index. The end result is that writing the bitmap will fail in both worlds: $ git multi-pack-index write --stdin-packs --bitmap <packfiles warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing) error: could not write multi-pack bitmap Now there are two different ways to fix this. The first one would be to amend git-multi-pack-index(1) to disable writing bitmaps when we notice that we don't have full object coverage. - We don't have enough information in git-multi-pack-index(1) in order to tell whether the local repository _should_ have full coverage. Because even when connected to an alternate object directory, it may be the case that we still have all objects around in the main object database. - git-multi-pack-index(1) is quite a low-level tool. Automatically disabling functionality that it was asked to provide does not feel like the right thing to do. We can easily fix it at a higher level in git-repack(1) though. When asked to only include local objects via `-l` and when connected to an alternate object directory then we will override the user's ask and disable writing bitmaps with a warning. This is similar to what we do in git-pack-objects(1), where we also disable writing bitmaps in case we omit an object from the pack. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:52 -07:00
Patrick Steinhardt	932c16c04b	repack: honor `-l` when calculating pack geometry When the user passes `-l` to git-repack(1), then they essentially ask us to only repack objects part of the local object database while ignoring any packfiles part of an alternate object database. And we in fact honor this bit when doing a geometric repack as the resulting packfile will only ever contain local objects. What we're missing though is that we don't take locality of packfiles into account when computing whether the geometric sequence is intact or not. So even though we would only ever roll up local packfiles anyway, we could end up trying to repack because of non-local packfiles. This does not make much sense, and in the worst case it can cause us to try and do the geometric repack over and over again because we're never able to restore the geometric sequence. Fix this bug by honoring whether the user has passed `-l`. If so, we skip adding any non-local packfiles to the pack geometry. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:52 -07:00
Patrick Steinhardt	19a3a7bde9	t/helper: allow chmtime to print verbosely without modifying mtime The `test-tool chmtime` helper allows us to both read and modify the modification time of files. But while it is possible to only read the mtimes of a file via `--get`, it is not possible to read the mtimes and report them together with their respective file paths via the `--verbose` flag without also modifying the mtime at the same time. Fix this so that it is possible to call `test-tool chmtime --verbose <files>...` without modifying any mtimes. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:52 -07:00
Patrick Steinhardt	f3028418c3	pack-objects: extend test coverage of `--stdin-packs` with alternates We don't have any tests that verify that git-pack-objects(1) works with `--stdin-packs` when combined with alternate object directories. Add some to make sure that the basic functionality works as expected. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:52 -07:00
Patrick Steinhardt	752b465c3c	pack-objects: fix error when same packfile is included and excluded When passing the same packfile both as included and excluded via the `--stdin-packs` option, then we will return an error because the excluded packfile cannot be found. This is because we will only set the `util` pointer for the included packfile list if it was found, so that we later die when we notice that it's in fact not set for the excluded packfile list. Fix this bug by always setting the `util` pointer for both the included and excluded list entries. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:51 -07:00
Patrick Steinhardt	732194b5f2	pack-objects: fix error when packing same pack twice When passed the same packfile twice via `--stdin-packs` we return an error that the packfile supposedly was not found. This is because when reading packs into the list of included or excluded packfiles, we will happily re-add packfiles even if they are part of the lists already. And while the list can now contain duplicates, we will only set the `util` pointer of the first list entry to the `packed_git` structure. We notice that at a later point when checking that all list entries have their `util` pointer set and die with an error. While this is kind of a nonsensical request, this scenario can be hit when doing geometric repacks. When a repository is connected to an alternate object directory and both have the exact same packfile then both would get added to the geometric sequence. And when we then decide to perform the repack, we will invoke git-pack-objects(1) with the same packfile twice. Fix this bug by removing any duplicates from both the included and excluded packs. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:51 -07:00
Patrick Steinhardt	b7b8f048f5	pack-objects: split out `--stdin-packs` tests into separate file The test suite for git-pack-objects(1) is quite huge, and we're about to add more tests that relate to the `--stdin-packs` option. Split out all tests related to this option into a standalone file so that it becomes easier to test the feature in isolation. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:51 -07:00
Patrick Steinhardt	51861340f8	repack: fix generating multi-pack-index with only non-local packs When writing the multi-pack-index with geometric repacking we will add all packfiles to the index that are part of the geometric sequence. This can potentially also include packfiles borrowed from an alternate object directory. But given that a multi-pack-index can only ever include packs that are part of the main object database this does not make much sense whatsoever. In the edge case where all packfiles are contained in the alternate object database and the local repository has none itself this bug can cause us to invoke git-multi-pack-index(1) with only non-local packfiles that it ultimately cannot find. This causes it to return an error and thus causes the geometric repack to fail. Fix the code to skip non-local packfiles. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:51 -07:00
Patrick Steinhardt	3d74a2337c	repack: fix trying to use preferred pack in alternates When doing a geometric repack with multi-pack-indices, then we ask git-multi-pack-index(1) to use the largest packfile as the preferred pack. It can happen though that the largest packfile is not part of the main object database, but instead part of an alternate object database. The result is that git-multi-pack-index(1) will not be able to find the preferred pack and print a warning. It then falls back to use the first packfile that the multi-pack-index shall reference. Fix this bug by only considering packfiles as preferred pack that are local. This is the right thing to do given that a multi-pack-index should never reference packfiles borrowed from an alternate. While at it, rename the function `get_largest_active_packfile()` to `get_preferred_pack()` to better document its intent. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:51 -07:00
Patrick Steinhardt	ceb96a160b	midx: fix segfault with no packs and invalid preferred pack When asked to write a multi-pack-index the user can specify a preferred pack that is used as a tie breaker when multiple packs contain the same objects. When this packfile cannot be found, we just pick the first pack that is getting tracked by the newly written multi-pack-index as a fallback. Picking the fallback can fail in the case where we're asked to write a multi-pack-index with no packfiles at all: picking the fallback value will cause a segfault as we blindly index into the array of packfiles, which would be empty. Fix this bug by resetting the preferred packfile index to `-1` before searching for the preferred pack. This fixes the segfault as we already check for whether the index is `> - 1`. If it is not, we simply don't pick a preferred packfile at all. Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-14 10:27:51 -07:00
Øystein Walle	aabfdc9514	branch, for-each-ref, tag: add option to omit empty lines If the given format string expands to the empty string, a newline is still printed. This makes using the output linewise more tedious. For example, git update-ref --stdin does not accept empty lines. Add options to "git branch", "git for-each-ref", and "git tag" to not print these empty lines. The default behavior remains the same. Signed-off-by: Øystein Walle <oystwa@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 08:07:45 -07:00
Taylor Blau	9f7f10a282	t: invert `GIT_TEST_WRITE_REV_INDEX` Back in `e8c58f894b` (t: support GIT_TEST_WRITE_REV_INDEX, 2021-01-25), we added a test knob to conditionally enable writing a ".rev" file when indexing a pack. At the time, this was used to ensure that the test suite worked even when ".rev" files were written, which served as a stress-test for the on-disk reverse index implementation. Now that reading from on-disk ".rev" files is enabled by default, the test knob `GIT_TEST_WRITE_REV_INDEX` no longer has any meaning. We could get rid of the option entirely, but there would be no convenient way to test Git when ".rev" files aren't in place. Instead of getting rid of the option, invert its meaning to instead disable writing ".rev" files, thereby running the test suite in a mode where the reverse index is generated from scratch. This ensures that, when GIT_TEST_NO_WRITE_REV_INDEX is set to some spelling of "true", we are still running and exercising Git's behavior when forced to generate reverse indexes from scratch. Do so by setting it in the linux-TEST-vars CI run to ensure that we are maintaining good coverage of this now-legacy code. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Taylor Blau	a8dd7e05b1	config: enable `pack.writeReverseIndex` by default Back in `e37d0b8730` (builtin/index-pack.c: write reverse indexes, 2021-01-25), Git learned how to read and write a pack's reverse index from a file instead of in-memory. A pack's reverse index is a mapping from pack position (that is, the order that objects appear together in a ".pack") to their position in lexical order (that is, the order that objects are listed in an ".idx" file). Reverse indexes are consulted often during pack-objects, as well as during auxiliary operations that require mapping between pack offsets, pack order, and index index. They are useful in GitHub's infrastructure, where we have seen a dramatic increase in performance when writing ".rev" files[1]. In particular: - an ~80% reduction in the time it takes to serve fetches on a popular repository, Homebrew/homebrew-core. - a ~60% reduction in the peak memory usage to serve fetches on that same repository. - a collective savings of ~35% in CPU time across all pack-objects invocations serving fetches across all repositories in a single datacenter. Reverse indexes are also beneficial to end-users as well as forges. For example, the time it takes to generate a pack containing the objects for the 10 most recent commits in linux.git (representing a typical push) is significantly faster when on-disk reverse indexes are available: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~10 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 543.0 ms ± 20.3 ms [User: 616.2 ms, System: 58.8 ms] Range (min … max): 521.0 ms … 577.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 245.0 ms ± 11.4 ms [User: 335.6 ms, System: 31.3 ms] Range (min … max): 226.0 ms … 259.6 ms 13 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 2.22 ± 0.13 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' The same is true of writing a pack containing the objects for the 30 most-recent commits: $ { git rev-parse HEAD && printf '^' && git rev-parse HEAD~30 } >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} pack-objects --delta-base-offset --revs --stdout <in >/dev/null' Benchmark 1: git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 866.5 ms ± 16.2 ms [User: 1414.5 ms, System: 97.0 ms] Range (min … max): 839.3 ms … 886.9 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null Time (mean ± σ): 581.6 ms ± 10.2 ms [User: 1181.7 ms, System: 62.6 ms] Range (min … max): 567.5 ms … 599.3 ms 10 runs Summary 'git.compile -c pack.readReverseIndex=true pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ran 1.49 ± 0.04 times faster than 'git.compile -c pack.readReverseIndex=false pack-objects --delta-base-offset --revs --stdout <in >/dev/null' ...and savings on trivial operations like computing the on-disk size of a single (packed) object are even more dramatic: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 305.8 ms ± 11.4 ms [User: 264.2 ms, System: 41.4 ms] Range (min … max): 290.3 ms … 331.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 4.0 ms ± 0.3 ms [User: 1.7 ms, System: 2.3 ms] Range (min … max): 1.6 ms … 4.6 ms 1155 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 76.96 ± 6.25 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' In the more than two years since `e37d0b8730` was merged, Git's implementation of on-disk reverse indexes has been thoroughly tested, both from users enabling `pack.writeReverseIndexes`, and from GitHub's deployment of the feature. The latter has been running without incident for more than two years. This patch changes Git's behavior to write on-disk reverse indexes by default when indexing a pack, which should make the above operations faster for everybody's Git installation after a repack. (The previous commit explains some potential drawbacks of using on-disk reverse indexes in certain limited circumstances, that essentially boil down to a trade-off between time to generate, and time to access. For those limited cases, the `pack.readReverseIndex` escape hatch can be used). [1]: https://github.blog/2021-04-29-scaling-monorepo-maintenance/#reverse-indexes Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Taylor Blau	dbcf611617	pack-revindex: introduce `pack.readReverseIndex` Since `1615c567b8` (Documentation/config/pack.txt: advertise 'pack.writeReverseIndex', 2021-01-25), we have had the `pack.writeReverseIndex` configuration option, which tells Git whether or not it is allowed to write a ".rev" file when indexing a pack. Introduce a complementary configuration knob, `pack.readReverseIndex` to control whether or not Git will read any ".rev" file(s) that may be available on disk. This option is useful for debugging, as well as disabling the effect of ".rev" files in certain instances. This is useful because of the trade-off[^1] between the time it takes to generate a reverse index (slow from scratch, fast when reading an existing ".rev" file), and the time it takes to access a record (the opposite). For example, even though it is faster to use the on-disk reverse index when computing the on-disk size of a packed object, it is slower to enumerate the same value for all objects. Here are a couple of examples from linux.git. When computing the above for a single object, using the on-disk reverse index is significantly faster: $ git rev-parse HEAD >in $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 302.5 ms ± 12.5 ms [User: 258.7 ms, System: 43.6 ms] Range (min … max): 291.1 ms … 328.1 ms 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in Time (mean ± σ): 3.9 ms ± 0.3 ms [User: 1.6 ms, System: 2.4 ms] Range (min … max): 2.0 ms … 4.4 ms 801 runs Summary 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran 77.29 ± 7.14 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in' , but when instead trying to compute the on-disk object size for all objects in the repository, using the ".rev" file is a disadvantage over creating the reverse index from scratch: $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 8.258 s ± 0.035 s [User: 7.949 s, System: 0.308 s] Range (min … max): 8.199 s … 8.293 s 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 16.976 s ± 0.107 s [User: 16.706 s, System: 0.268 s] Range (min … max): 16.839 s … 17.105 s 10 runs Summary 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran 2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' Luckily, the results when running `git cat-file` with `--unordered` are closer together: $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 5.066 s ± 0.105 s [User: 4.792 s, System: 0.274 s] Range (min … max): 4.943 s … 5.220 s 10 runs Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects Time (mean ± σ): 6.193 s ± 0.069 s [User: 5.937 s, System: 0.255 s] Range (min … max): 6.145 s … 6.356 s 10 runs Summary 'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran 1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' Because the equilibrium point between these two is highly machine- and repository-dependent, allow users to configure whether or not they will read any ".rev" file(s) with this configuration knob. [^1]: Generating a reverse index in memory takes O(N) time (where N is the number of objects in the repository), since we use a radix sort. Reading an entry from an on-disk ".rev" file is slower since each operation is bound by disk I/O instead of memory I/O. In order to compute the on-disk size of a packed object, we need to find the offset of our object, and the adjacent object (the on-disk size difference of these two). Finding the first offset requires a binary search. Finding the latter involves a single .rev lookup. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:46 -07:00
Taylor Blau	b77919ed6e	t5325: mark as leak-free This test is leak-free as of the previous commit, so let's mark it as such to ensure we don't regress and introduce a leak in the future. Signed-off-by: Taylor Blau <me@ttaylorr.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-13 07:55:45 -07:00
Junio C Hamano	063cd850f2	Merge branch 'jk/use-perl-path-consistently' Tests had a few places where we ignored PERL_PATH and blindly used /usr/bin/perl, which have been corrected. * jk/use-perl-path-consistently: t/lib-httpd: pass PERL_PATH to CGI scripts	2023-04-11 13:49:13 -07:00
Junio C Hamano	96f4113ac0	Merge branch 'jc/clone-object-format-from-void' "git clone" from an empty repository learned to propagate the choice of the hash algorithm from the source repository to the newly created repository. * jc/clone-object-format-from-void: clone: propagate object-format when cloning from void	2023-04-11 13:49:13 -07:00
Junio C Hamano	30e04bcfa8	Merge branch 'ar/adjust-tests-for-the-index-fallout' Comment updates. * ar/adjust-tests-for-the-index-fallout: t2107: fix mention of the_index.cache_changed t3060: fix mention of function prune_index	2023-04-11 13:49:12 -07:00
Junio C Hamano	647a2bb3ff	Merge branch 'jc/spell-id-in-both-caps-in-message-id' Consistently spell "Message-ID" as such, not "Message-Id". * jc/spell-id-in-both-caps-in-message-id: e-mail workflow: Message-ID is spelled with ID in both capital letters	2023-04-11 13:49:12 -07:00
Junio C Hamano	d02343b599	Merge branch 'ws/sparse-check-rules' "git sparse-checkout" command learns a debugging aid for the sparse rule definitions. * ws/sparse-check-rules: builtin/sparse-checkout: add check-rules command builtin/sparse-checkout: remove NEED_WORK_TREE flag	2023-04-11 13:49:12 -07:00
Elijah Newren	e93fc5d721	treewide: remove cache.h inclusion due to object-name.h changes Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	dabab1d6e6	object-name.h: move declarations for object-name.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	5579f44d2f	treewide: remove unnecessary cache.h inclusion Several files were including cache.h solely to get other headers, such as trace.h and trace2.h. Since the last few commits have modified files to make these dependencies more explicit, the inclusion of cache.h is no longer needed in several cases. Remove it. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	5bc07225e5	treewide: be explicit about dependence on mem-pool.h Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:09 -07:00
Elijah Newren	74ea5c9574	treewide: be explicit about dependence on trace.h & trace2.h Dozens of files made use of trace and trace2 functions, without explicitly including trace.h or trace2.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include trace.h or trace2.h if they are using them. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Calvin Wan <calvinwan@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:52:08 -07:00
Glen Choo	4e33535ea9	clone: error specifically with --local and symlinked objects `6f054f9fb3` (builtin/clone.c: disallow --local clones with symlinks, 2022-07-28) gives a good error message when "git clone --local" fails when the repo to clone has symlinks in "$GIT_DIR/objects". In `bffc762f87` (dir-iterator: prevent top-level symlinks without FOLLOW_SYMLINKS, 2023-01-24), we later extended this restriction to the case where "$GIT_DIR/objects" is itself a symlink, but we didn't update the error message then - bffc762f87's tests show that we print a generic "failed to start iterator over" message. This is exacerbated by the fact that Documentation/git-clone.txt mentions neither restriction, so users are left wondering if this is intentional behavior or not. Fix this by adding a check to builtin/clone.c: when doing a local clone, perform an extra check to see if "$GIT_DIR/objects" is a symlink, and if so, assume that that was the reason for the failure and report the relevant information. Ideally, dir_iterator_begin() would tell us that the real failure reason is the presence of the symlink, but (as far as I can tell) there isn't an appropriate errno value for that. Also, update Documentation/git-clone.txt to reflect that this restriction exists. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-11 08:46:09 -07:00
Andrei Rybak	fd72637423	t2024: fix loose/strict local base branch DWIM test Test 'loosely defined local base branch is reported correctly' in t2024-checkout-dwim.sh, which was introduced in [1] compares output of two invocations of "git checkout", invoked with two different branches named "strict" and "loose". As per description in [1], the test is validating that output of tracking information for these two branches. This tracking information is printed to standard output: Your branch is behind 'main' by 1 commit, and can be fast-forwarded. (use "git pull" to update your local branch) The test assumes that the names of the two branches (strict and loose) are in that output, and pipes the output through sed to replace names of the branches with "BRANCHNAME". Command "git checkout", however, outputs the branch name to standard error, not standard output -- see message "Switched to branch '%s'\n" in function "update_refs_for_switch" in "builtin/checkout.c". This means that the two invocations of sed do nothing. Redirect both the standard output and the standard error of "git checkout" for these assertions. Ensure that compared files have the string "BRANCHNAME". In a series of piped commands, only the return code of the last command is used. Thus, all other commands will have their return codes masked. Avoid piping of output of git directly into sed to preserve the exit status code of "git checkout", while we're here. [1] `05e73682cd` (checkout: report upstream correctly even with loosely defined branch.*.merge, 2014-10-14) Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-10 10:11:23 -07:00
Phillip Wood	05106aa198	rebase: remove a couple of redundant strategy tests Remove a test in t3402 that has been redundant ever since `80ff47957b` (rebase: remember strategy and strategy options, 2011-02-06). That commit added a new test, the first part of which (as noted in the old commit message) duplicated an existing test. Also remove a test t3418 that has been redundant since the merge backend was removed in `68aa495b59` (rebase: implement --merge via the interactive machinery, 2018-12-11), since it now tests the same code paths as the preceding test. Helped-by: Elijah Newren <newren@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-10 09:53:19 -07:00
Phillip Wood	4960e5c7bd	rebase -m: fix serialization of strategy options To store the strategy options rebase prepends " --" to each one and writes them to a file. To load them it reads the file and passes the contents to split_cmdline(). This roughly mimics the behavior of the scripted rebase but has a couple of limitations, (1) options containing whitespace are not properly preserved (this is true of the scripted rebase as well) and (2) options containing '"' or '\' are incorrectly parsed and may cause the parser to return an error. Fix these limitations by quoting each option when they are stored so that they can be parsed correctly. Now that "--preserve-merges" no longer exist this change also stops prepending "--" to the options when they are stored as that was an artifact of the scripted rebase. These changes are backwards compatible so the files written by an older version of git can still be read. They are also forwards compatible, the file can still be parsed by recent versions of git as they treat the "--" prefix as optional. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-10 09:53:19 -07:00
Phillip Wood	4a8bc9860a	rebase -m: cleanup --strategy-option handling When handling "--strategy-option" rebase collects the commands into a struct string_list, then concatenates them into a string, prepending "--" to each one before splitting the string and removing the "--" prefix. This is an artifact of the scripted rebase and the need to support "rebase --preserve-merges". Now that "--preserve-merges" no-longer exists we can cleanup the way the argument is handled. The tests for a bad strategy option are adjusted now that parse_strategy_opts() is no-longer called when starting a rebase. The fact that it only errors out when running "git rebase --continue" is a mixed blessing but the next commit will fix the root cause of the parsing problem so lets not worry about that here. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-10 09:53:19 -07:00
René Scharfe	8a7f0b666f	date: remove approxidate_relative() When `29f4332e66` (Quit passing 'now' to date code, 2019-09-11) removed its timeval parameter, approxidate_relative() became equivalent to approxidate(). Convert its last two call sites and remove the redundant function. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-10 08:46:40 -07:00
René Scharfe	be39144954	userdiff: support regexec(3) with multi-byte support Since `1819ad327b` (grep: fix multibyte regex handling under macOS, 2022-08-26) we use the system library for all regular expression matching on macOS, not just for git grep. It supports multi-byte strings and rejects invalid multi-byte characters. This broke all built-in userdiff word regexes in UTF-8 locales because they all include such invalid bytes in expressions that are intended to match multi-byte characters without explicit support for that from the regex engine. "\|[^[:space:]]\|[\xc0-\xff][\x80-\xbf]+" is added to all built-in word regexes to match a single non-space or multi-byte character. The \xNN characters are invalid if interpreted as UTF-8 because they have their high bit set, which indicates they are part of a multi-byte character, but they are surrounded by single-byte characters. Replace that expression with "\|[^[:space:]]" if the regex engine supports multi-byte matching, as there is no need to have an explicit range for multi-byte characters then. Check for that capability at runtime, because it depends on the locale and thus on environment variables. Construct the full replacement expression at build time and just switch it in if necessary to avoid string manipulation and allocations at runtime. Additionally the word regex for tex contains the expression "[a-zA-Z0-9\x80-\xff]+" with a similarly invalid range. The best replacement with only valid characters that I can come up with is "([a-zA-Z0-9]\|[^\x01-\x7f])+". Unlike the original it matches NUL characters, though. Assuming that tex files usually don't contain NUL this should be acceptable. Reported-by: D. Ben Knoble <ben.knoble@gmail.com> Reported-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-07 07:38:09 -07:00
Junio C Hamano	89833fc249	Merge branch 'ds/fetch-bundle-uri-with-all' "git fetch --all" does not have to download and handle the same bundleURI over and over, which has been corrected. * ds/fetch-bundle-uri-with-all: fetch: download bundles once, even with --all	2023-04-06 13:38:32 -07:00
Junio C Hamano	0b94009649	Merge branch 'jk/chainlint-fixes' Test framework fix. * jk/chainlint-fixes: tests: skip test_eval_ in internal chain-lint tests: drop here-doc check from internal chain-linter tests: diagnose unclosed here-doc in chainlint.pl tests: replace chainlint subshell with a function tests: run internal chain-linter under "make test"	2023-04-06 13:38:31 -07:00
Junio C Hamano	6047b28eb7	Merge branch 'en/header-split-cleanup' Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers	2023-04-06 13:38:31 -07:00
Junio C Hamano	72871b198f	Merge branch 'ab/remove-implicit-use-of-the-repository' Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-06 13:38:30 -07:00
Junio C Hamano	06e9e726d4	Merge branch 'gc/config-parsing-cleanup' Config API clean-up to reduce its dependence on static variables * gc/config-parsing-cleanup: config.c: rename "struct config_source cf" config: report cached filenames in die_bad_number() config.c: remove current_parsing_scope config.c: remove current_config_kvi config.c: plumb the_reader through callbacks config.c: create config_reader and the_reader config.c: don't assign to "cf_global" directly config.c: plumb config_source through static fns	2023-04-06 13:38:29 -07:00
Junio C Hamano	87daf40750	Merge branch 'ab/config-multi-and-nonbool' Assorted config API updates. * ab/config-multi-and-nonbool: for-each-repo: with bad config, don't conflate <path> and <cmd> config API: add "string" version of _value_multi(), fix segfaults config API users: test for _get_value_multi() segfaults for-each-repo: error on bad --config config API: have _multi() return an "int" and take a "dest" versioncmp.c: refactor config reading next commit config API: add and use a "git_config_get()" family of functions config tests: add "NULL" tests for _get_value_multi() config tests: cover blind spots in git_die_config() tests	2023-04-06 13:38:29 -07:00
Junio C Hamano	955abf5f72	Merge branch 'jk/unused-post-2.40-part2' Code clean-up for "-Wunused-parameter" build. * jk/unused-post-2.40-part2: parse-options: drop parse_opt_unknown_cb() t/helper: mark unused argv/argc arguments mark "argv" as unused when we check argc builtins: mark unused prefix parameters builtins: annotate always-empty prefix parameters builtins: always pass prefix to parse_options() fast-import: fix file access when run from subdir	2023-04-06 13:38:28 -07:00
Junio C Hamano	119e82a515	Merge branch 'ps/ahead-behind-truncation-fix' Fix unnecessary truncation of generation numbers used in-core. * ps/ahead-behind-truncation-fix: commit-graph: fix truncated generation numbers	2023-04-06 13:38:27 -07:00
Junio C Hamano	7727da99df	Merge branch 'ds/ahead-behind' "git for-each-ref" learns '%(ahead-behind:<base>)' that computes the distances from a single reference point in the history with bunch of commits in bulk. * ds/ahead-behind: commit-reach: add tips_reachable_from_bases() for-each-ref: add ahead-behind format atom commit-reach: implement ahead_behind() logic commit-graph: introduce `ensure_generations_valid()` commit-graph: return generation from memory commit-graph: simplify compute_generation_numbers() commit-graph: refactor compute_topological_levels() for-each-ref: explicitly test no matches for-each-ref: add --stdin option	2023-04-06 13:38:21 -07:00
Jeff King	c1917156a0	t/lib-httpd: pass PERL_PATH to CGI scripts As discussed in t/README, tests should aim to use PERL_PATH rather than straight "perl". We usually do this automatically with a "perl" function in test-lib.sh, but a few cases need to be handled specially. One such case is the apply-one-time-perl.sh CGI, which invokes plain "perl". It should be using $PERL_PATH, but to make that work, we must also instruct Apache to pass through the variable. Prior to this patch, doing: mv /usr/bin/perl /usr/bin/my-perl make PERL_PATH=/usr/bin/my-perl test would fail t5702, t5703, and t5616. After this it passes. This is a pretty extreme case, as even if you install perl elsewhere, you'd likely still have it in your $PATH. A more realistic case is that you don't want to use the perl in your $PATH (because it's older, broken, etc) and expect PERL_PATH to consistently override that (since that's what it's documented to do). Removing it completely is just a convenient way of completely breaking it for testing purposes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-06 09:29:43 -07:00
Tao Klerks	42943b950e	mergetool: new config guiDefault supports auto-toggling gui by DISPLAY When no merge.tool or diff.tool is configured or manually selected, the selection of a default tool is sensitive to the DISPLAY variable; in a GUI session a gui-specific tool will be proposed if found, and otherwise a terminal-based one. This "GUI-optimizing" behavior is important because a GUI can make a huge difference to a user's ability to understand and correctly complete a non-trivial conflicting merge. Some time ago the merge.guitool and diff.guitool config options were introduced to enable users to configure both a GUI tool, and a non-GUI tool (with fallback if no GUI tool configured), in the same environment. Unfortunately, the --gui argument introduced to support the selection of the guitool is still explicit. When using configured tools, there is no equivalent of the no-tool-configured "propose a GUI tool if we are in a GUI environment" behavior. As proposed in <xmqqmtb8jsej.fsf@gitster.g>, introduce new configuration options, difftool.guiDefault and mergetool.guiDefault, supporting a special value "auto" which causes the corresponding tool or guitool to be selected depending on the presence of a non-empty DISPLAY value. Also support "true" to say "default to the guitool (unless --no-gui is passed on the commandline)", and "false" as the previous default behavior when these new configuration options are not specified. Signed-off-by: Tao Klerks <tao@klerks.biz> Acked-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-05 21:03:29 -07:00
Junio C Hamano	8b214c2e9d	clone: propagate object-format when cloning from void A user could prepare an empty repository and set it to use SHA256 as the object format. The new repository created by "git clone" from such a repository however would not record that it is expecting objects in the same SHA256 format. This works as expected if the source repository is not empty. Just like we started copying the name of the primary branch from the remote repository even if it is unborn in `3d8314f8` (clone: propagate empty remote HEAD even with other branches, 2022-07-07), lift the code that records the object format out of the block executed only when cloning from an instantiated repository, so that it works also when cloning from an empty repository. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-05 14:17:00 -07:00
Junio C Hamano	45602dd029	Merge branch 'ar/test-cleanup-unused-file-creation' Test clean-up. * ar/test-cleanup-unused-file-creation: t1507: assert output of rev-parse t1404: don't create unused file t1400: assert output of update-ref t1302: don't create unused file t1010: don't create unused files t1006: assert error output of cat-file t1005: assert output of ls-files	2023-04-04 14:28:29 -07:00
Junio C Hamano	62df03c277	Merge branch 'jk/blame-contents-with-arbitrary-commit' "git blame --contents=<file> <rev> -- <path>" used to be forbidden, but now it finds the origins of lines starting at <file> contents through the history that leads to <rev>. * jk/blame-contents-with-arbitrary-commit: blame: allow --contents to work with non-HEAD commit	2023-04-04 14:28:28 -07:00
Junio C Hamano	6dd9d96129	Merge branch 'rs/archive-mtime' Test update. * rs/archive-mtime: t5000: use check_mtime()	2023-04-04 14:28:28 -07:00
Junio C Hamano	9142fce9b0	Merge branch 'ah/rebase-merges-config' Streamline --rebase-merges command line option handling and introduce rebase.merges configuration variable. * ah/rebase-merges-config: rebase: add a config option for --rebase-merges rebase: deprecate --rebase-merges="" rebase: add documentation and test for --no-rebase-merges	2023-04-04 14:28:28 -07:00
Junio C Hamano	7e13d654c2	Merge branch 'jk/fast-export-cleanup' Code clean-up. * jk/fast-export-cleanup: fast-export: drop unused parameter from anonymize_commit_message() fast-export: drop data parameter from anonymous generators fast-export: de-obfuscate --anonymize-map handling fast-export: factor out anonymized_entry creation fast-export: simplify initialization of anonymized hashmaps fast-export: drop const when storing anonymized values	2023-04-04 14:28:27 -07:00
Junio C Hamano	f315a8b609	Merge branch 'js/split-index-fixes' The index files can become corrupt under certain conditions when the split-index feature is in use, especially together with fsmonitor, which have been corrected. * js/split-index-fixes: unpack-trees: take care to propagate the split-index flag fsmonitor: avoid overriding `cache_changed` bits split-index; stop abusing the `base_oid` to strip the "link" extension split-index & fsmonitor: demonstrate a bug	2023-04-04 14:28:27 -07:00
Junio C Hamano	f834089925	Merge branch 'pw/wildmatch-fixes' The wildmatch library code unlearns exponential behaviour it acquired some time ago since it was borrowed from rsync. * pw/wildmatch-fixes: t3070: make chain lint tester happy wildmatch: hide internal return values wildmatch: avoid undefined behavior wildmatch: fix exponential behavior	2023-04-04 14:28:27 -07:00
Shuqi Liang	1a65b41b38	write-tree: integrate with sparse index Update 'git write-tree' to allow using the sparse-index in memory without expanding to a full one. The recursive algorithm for update_one() was already updated in `2de37c5` (cache-tree: integrate with sparse directory entries, 2021-03-03) to handle sparse directory entries in the index. Hence we can just set the requires-full-index to false for "write-tree". The `p2000` tests demonstrate a ~96% execution time reduction for 'git write-tree' using a sparse index: Test before after ----------------------------------------------------------------- 2000.78: git write-tree (full-v3) 0.34 0.33 -2.9% 2000.79: git write-tree (full-v4) 0.32 0.30 -6.3% 2000.80: git write-tree (sparse-v3) 0.47 0.02 -95.8% 2000.81: git write-tree (sparse-v4) 0.45 0.02 -95.6% Signed-off-by: Shuqi Liang <cheskaqiqi@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-04 12:50:54 -07:00
Junio C Hamano	e7dca80692	Merge branch 'ab/remove-implicit-use-of-the-repository' into en/header-split-cache-h * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-04 08:25:52 -07:00
Raghul Nanth A	748b8d669a	describe: enable sparse index for describe git describe compares the index with the working tree when (and only when) it is run with the "--dirty" flag. This is done by the run_diff_index() function. The function has been made aware of the sparse-index in the series that led to `8d2c3732` (Merge branch 'ld/sparse-diff-blame', 2021-12-21). Hence we can just set the requires-full-index to false for "describe". Performance metrics Test HEAD~1 HEAD ------------------------------------------------------------------------------------------------- 2000.2: git describe --dirty (full-v3) 0.08(0.09+0.01) 0.08(0.06+0.03) +0.0% 2000.3: git describe --dirty (full-v4) 0.09(0.07+0.03) 0.08(0.05+0.04) -11.1% 2000.4: git describe --dirty (sparse-v3) 0.88(0.82+0.06) 0.02(0.01+0.05) -97.7% 2000.5: git describe --dirty (sparse-v4) 0.68(0.60+0.08) 0.02(0.02+0.04) -97.1% 2000.6: echo >>new && git describe --dirty (full-v3) 0.08(0.04+0.05) 0.08(0.05+0.04) +0.0% 2000.7: echo >>new && git describe --dirty (full-v4) 0.08(0.07+0.03) 0.08(0.05+0.04) +0.0% 2000.8: echo >>new && git describe --dirty (sparse-v3) 0.75(0.69+0.07) 0.02(0.03+0.03) -97.3% 2000.9: echo >>new && git describe --dirty (sparse-v4) 0.81(0.73+0.09) 0.02(0.01+0.05) -97.5% Signed-off-by: Raghul Nanth A <nanth.raghul@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-03 11:30:23 -07:00
Alex Henrie	f024913164	format-patch: correct documentation of --thread without an argument In Git, almost all command line flags unconditionally override the corresponding config option.[1] Add a test to confirm that this is the case for `git format-patch --thread`. [1] https://lore.kernel.org/git/CAMMLpeS3+NUQa2oqpHKVo3yWQNVMgkEXrs4U5_ggvk31yQbezQ@mail.gmail.com/ Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-04-03 09:59:20 -07:00
Junio C Hamano	ba4324c4e1	e-mail workflow: Message-ID is spelled with ID in both capital letters We used to write "Message-Id:" and "Message-ID:" pretty much interchangeably, and the header name is defined to be case insensitive by the RFCs, but the canonical form "Message-ID:" is used throughout the RFC documents, so let's imitate it ourselves. Signed-off-by: Junio C Hamano <gitster@pobox.com> Reviewed-by: Elijah Newren <newren@gmail.com>	2023-04-03 08:55:43 -07:00
Junio C Hamano	290a973bb9	Merge branch 'ds/p2000-fix-grep-sparse' Fix perf test. * ds/p2000-fix-grep-sparse: p2000: remove stray '--sparse' flag from test	2023-03-31 17:50:23 -07:00
Junio C Hamano	0d865049f7	Merge branch 'ab/retire-scripted-add-p' Test fix. * ab/retire-scripted-add-p: t3701: we don't need no Perl for `add -i` anymore	2023-03-31 17:50:23 -07:00
Junio C Hamano	dd88a1af1a	Merge branch 'js/t5563-portability-fix' Test portability fix. * js/t5563-portability-fix: t5563: prevent "ambiguous redirect"	2023-03-31 17:50:23 -07:00
Andrei Rybak	1ec40a83a5	t2107: fix mention of the_index.cache_changed Commit [1] added a test to t2107-update-index-basic.sh with a comment that mentions macro "active_cache_changed". Later in [2], the macro was removed and its usage in function cmd_update_index in file builtin/update-index.c was replaced with "the_index.cache_changed". Fix the outdated comment in file t2107-update-index-basic.sh. [1] `fa137f67a4` (lockfile.c: store absolute path, 2014-11-02) [2] `dc594180d9` (cocci & cache.h: apply variable section of "pending" index-compatibility, 2022-11-19) Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-31 16:57:04 -07:00
Andrei Rybak	993d7085be	t3060: fix mention of function prune_index Commit [1] added tests which trigger function prune_cache. The comments in these tests, however, incorrectly call it "prune_path". Since then, function "prune_cache" has been renamed to "prune_index" in commit [2]. Later still in commit [3], the_index singleton, which is also mentioned in a comment, stopped being used directly with function "prune_index". Fix mentions of function "prune_index" and the struct it changes in comments in file "t3060-ls-files-with-tree.sh". [1] `54e1abce90` (Add test case for ls-files --with-tree, 2007-10-03) [2] `6510ae173a` (ls-files: convert prune_cache to take an index, 2017-06-12) [3] `188dce131f` (ls-files: use repository object, 2017-06-22) Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-31 16:57:03 -07:00
Derrick Stolee	25bccb4b79	fetch: download bundles once, even with --all When fetch.bundleURI is set, 'git fetch' downloads bundles from the given bundle URI before fetching from the specified remote. However, when using non-file remotes, 'git fetch --all' will launch 'git fetch' subprocesses which then read fetch.bundleURI and fetch the bundle list again. We do not expect the bundle list to have new information during these multiple runs, so avoid these extra calls by un-setting fetch.bundleURI in the subprocess arguments. Be careful to skip fetching bundles for the empty bundle string. Fetching bundles from the empty list presents some interesting test failures. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-31 10:07:33 -07:00
Johannes Schindelin	92c7b3d473	t5563: prevent "ambiguous redirect" When I ran this test using `TEST_SHELL_PATH=/bin/bash` in my Ubuntu setup (where Bash is at version 5.0.17(1)-release), I was greeted with this error message: ./test-lib.sh: line 1072: $CHALLENGE: ambiguous redirect This commit fixes that error by quoting the `CHALLENGE` variable (which has as value a path containing spaces), and by avoiding to cuddle the empty string parameter in the `printf` call with the redirect character (in fact, the `printf ''>$CHALLENGE` is removed because the next line overwrites the file anyway because it _also_ uses a single `>` to redirect the output). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-31 08:50:30 -07:00
Jeff King	cc48ddd937	tests: skip test_eval_ in internal chain-lint To check for broken &&-chains, we run "fail_117 && $1" as a test snippet, and check the exit code. We use test_eval_ to do so, because that's the way we run the actual test. But we don't need any of its niceties, like "set -x" tracing. In fact, they hinder us, because we have to explicitly disable them. So let's skip that and use "eval" more directly, which is simpler. I had hoped it would also be faster, but it doesn't seem to produce a measurable improvement (probably because it's just running internal shell commands, with no subshells or forks). Note that there is one gotcha: even though we don't intend to run any of the commands if the &&-chain is intact, an error like this: test_expect_success 'broken' ' # this next line breaks the &&-chain true # and then this one is executed even by the linter return 1 ' means we'll "return 1" from the eval, and thus from test_run_(). We actually do notice this in test_expect_success, but only by saying "hey, this test didn't say it was OK, so it must have failed", which is not right (it should say "broken &&-chain"). We can handle this by calling test_eval_inner_() instead, which is our trick for wrapping "return" in a test snippet. But to do that, we have to push the trace code out of that inner function and into test_eval_(). This is arguably where it belonged in the first place, but it never mattered because the "inner_" function had only one caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-30 13:07:29 -07:00
Jeff King	750b260411	tests: drop here-doc check from internal chain-linter Commit `99a64e4b73` (tests: lint for run-away here-doc, 2017-03-22) tweaked the chain-lint test to catch unclosed here-docs. It works by adding an extra "echo" command after the test snippet, and checking that it is run (if it gets swallowed by a here-doc, naturally it is not run). The downside here is that we introduced an extra $() substitution, which happens in a subshell. This has a measurable performance impact when run for many tests. The tradeoff in safety was undoubtedly worth it when `99a64e4b73` was written. But since the external chainlint.pl learned to find these recently, we can just rely on it. By switching back to a simpler chain-lint, hyperfine reports a measurable speedup on t3070 (which has 1800 tests): 'HEAD' ran 1.12 ± 0.01 times faster than 'HEAD~1' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-30 13:07:29 -07:00
Eric Sunshine	2b61c8dc88	tests: diagnose unclosed here-doc in chainlint.pl An unclosed here-doc in a test is a problem, because it silently gobbles up any remaining commands. Since `99a64e4b73` (tests: lint for run-away here-doc, 2017-03-22) we detect this by piggy-backing on the internal chainlint checker in test-lib.sh. However, it would be nice to detect it in chainlint.pl, for a few reasons: - the output from chainlint.pl is much nicer; it can show the exact spot of the error, rather than a vague "somewhere in this test you broke the &&-chain or had a bad here-doc" message. - the implementation in test-lib.sh runs for each test snippet. And since it requires a subshell, the extra cost is small but not zero. If chainlint.pl can reliably find the problem, we can optimize the test-lib.sh code. The chainlint.pl code never intended to find here-doc problems. But since it has to parse them anyway (to avoid reporting problems inside here-docs), most of what we need is already there. We can detect the problem when we fail to find the missing end-tag in swallow_heredocs(). The extra change in scan_heredoc_tag() stores the location of the start of the here-doc, which lets us mark it as the source of the error in the output (see the new tests for examples). [jk: added commit message and tests] Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-30 13:07:29 -07:00
Jeff King	1686de55fa	tests: replace chainlint subshell with a function To test that we don't break the &&-chain, test-lib.sh does something like: (exit 117) && $test_commands and checks that the result is exit code 117. We don't care what that initial command is, as long as it exits with a unique code. Using "exit" works and is simple, but is a bit expensive since it requires a subshell (to avoid exiting the whole script!). This isn't usually very noticeable, but it can add up for scripts which have a large number of tests. Using "return" naively won't work here, because we'd return from the function eval-ing the snippet (and it wouldn't find &&-chain breakages). But if we further push that into its own function, it does exactly what we want, without extra subshell overhead. According to hyperfine, this produces a measurable improvement when running t3070 (which has 1800 tests, all of them quite short): 'HEAD' ran 1.09 ± 0.01 times faster than 'HEAD~1' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-30 13:07:29 -07:00
Jeff King	7b6555ab8d	tests: run internal chain-linter under "make test" Since `69b9924b87` (t/Makefile: teach `make test` and `make prove` to run chainlint.pl, 2022-09-01), we run a single chainlint.pl process for all scripts, and then instruct each individual script to run with the equivalent of --no-chain-lint, which tells them not to redundantly run the chainlint script themselves. However, this also disables the internal linter run within the shell by eval-ing "(exit 117) && $1" and confirming we get code 117. In theory the external linter produces a superset of complaints, and we don't need the internal one anymore. However, we know there is at least one case where they differ. A test like: test_expect_success 'should fail linter' ' false && sleep 2 & pid=$! && kill $pid ' is buggy (it ignores the failure from "false", because it is backgrounded along with the sleep). The internal linter catches this, but the external one doesn't (and teaching it to do so is complicated[1]). So not only does "make test" miss this problem, but it's doubly confusing because running the script standalone does complain. Let's teach the suppression in the Makefile to only turn off the external linter (which we know is redundant, as it was already run) and leave the internal one intact. I've used a new environment variable to do this here, and intentionally did not add a "--no-ext-chain-lint" option. This is an internal optimization used by the Makefile, and not something that ordinary users would need to tweak. [1] For discussion of chainlint.pl and this case, see: https://lore.kernel.org/git/CAPig+cQtLFX4PgXyyK_AAkCvg4Aw2RAC5MmLbib-aHHgTBcDuw@mail.gmail.com/ Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-30 13:07:29 -07:00
Jeff King	126e3b3d2a	t/helper: mark unused argv/argc arguments Many test helper programs do not bother to look at argc or argv, because they don't take any options. In a user-facing program, it's a good idea to check for unexpected arguments and complain. But for a test helper, it's not worth the trouble to enforce this. But we do want to tell the compiler we're OK with ignoring them, to silence -Wunused-parameter (and obviously we can't get rid of them, since we have to conform to the usual cmd__foo() interface). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 14:11:24 -07:00
Jeff King	9dc607f1c2	fast-import: fix file access when run from subdir In cmd_fast_import(), we ignore the "prefix" argument entirely, even though it tells us how we may have changed directory to the root of the repository earlier in the process. Which means that if you run it from a subdir and point to paths in the filesystem, like: cd subdir git fast-import --import-marks=foo <dump then it will look for "foo" in the root of the repository, not the current directory ("subdir/") which the user would have expected. We can fix this by recording the prefix and using it as appropriate whenever we open a file for reading or writing. I found each of these by looking for cases where we call fopen() within fast-import.c, so this should cover all cases. The new test triggers each one, as well as making sure we don't accidentally apply the prefix when --relative-marks is in use (since that option interprets some paths as relative to a specific directory). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 14:11:24 -07:00
Derrick Stolee	d52fcf493b	p2000: remove stray '--sparse' flag from test This argument was added in `7cae7627c4` (builtin/grep.c: integrate with sparse index, 2022-09-22), but it was a carry-over from an earlier version where the --sparse flag was added to the 'git grep' builtin. This argument does not exist, so currently the p2000-sparse-operations.sh performance test script fails when reaching this step. With this fix, the script works with these numbers for my copy of the Git source code repository: Test HEAD ------------------------------------------------------------ 2000.30: git grep --cached ... (full-v3) 0.34(1.20+0.14) 2000.31: git grep --cached ... (full-v4) 0.31(1.15+0.13) 2000.32: git grep --cached ... (sparse-v3) 0.26(1.13+0.12) 2000.33: git grep --cached ... (sparse-v4) 0.27(1.13+0.12) Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 13:25:52 -07:00
Glen Choo	e2016508e7	config: report cached filenames in die_bad_number() If, when parsing numbers from config, die_bad_number() is called, it reports the filename and config source type if we were parsing a config file, but not if we were iterating a config_set (it defaults to a less specific error message). Most call sites don't parse config files because config is typically read once and cached, so we only report filename and config source type in "git config --type" (since "git config" always parses config files). This could have been fixed when we taught the current_config_* functions to respect config_set values (`0d44a2dacc` (config: return configset value for current_config_ functions, 2016-05-26), but it was hard to spot then and we might have just missed it (I didn't find mention of die_bad_number() in the original ML discussion [1].) Fix this by refactoring the current_config_* functions into variants that don't BUG() when we aren't reading config, and using the resulting functions in die_bad_number(). "git config --get[-regexp] --type=int" cannot use the non-refactored version because it parses the int value _after_ parsing the config file, which would run into the BUG(). Since the refactored functions aren't public, they use "struct config_reader". 1. https://lore.kernel.org/git/20160518223712.GA18317@sigill.intra.peff.net/ Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 13:03:27 -07:00
Junio C Hamano	f879501ad0	Merge branch 'jk/fix-proto-downgrade-to-v0' Transports that do not support protocol v2 did not correctly fall back to protocol v0 under certain conditions, which has been corrected. * jk/fix-proto-downgrade-to-v0: git_connect(): fix corner cases in downgrading v2 to v0	2023-03-28 10:51:52 -07:00
Junio C Hamano	8069aa01cd	Merge branch 'fc/oid-quietly-parse-upstream' "git rev-parse --quiet foo@{u}", or anything that asks @{u} to be parsed with GET_OID_QUIETLY option, did not quietly fail, which has been corrected. * fc/oid-quietly-parse-upstream: object-name: fix quiet @{u} parsing	2023-03-28 10:51:52 -07:00
Junio C Hamano	6041a13ec2	Merge branch 'fc/completion-colors-do-not-need-prompt-command' Lift the limitation that colored prompts can only be used with PROMPT_COMMAND mode. * fc/completion-colors-do-not-need-prompt-command: completion: prompt: use generic colors	2023-03-28 10:51:52 -07:00
Ævar Arnfjörð Bjarmason	3611f7467f	for-each-repo: with bad config, don't conflate <path> and <cmd> Fix a logic error in `4950b2a2b5` (for-each-repo: run subcommands on configured repos, 2020-09-11). Due to assuming that elements returned from the repo_config_get_value_multi() call wouldn't be "NULL" we'd conflate the <path> and <command> part of the argument list when running commands. As noted in the preceding commit the fix is to move to a safer "_string_multi()" version of the _multi() API. This change is separated from the rest because those all segfaulted. In this change we ended up with different behavior. When using the "--config=<config>" form we take each element of the list as a path to a repository. E.g. with a configuration like: [repo] list = /some/repo We would, with this command: git for-each-repo --config=repo.list status builtin Run a "git status" in /some/repo, as: git -C /some/repo status builtin I.e. ask "status" to report on the "builtin" directory. But since a configuration such as this would result in a "struct string_list *" with one element, whose "string" member is "NULL": [repo] list We would, when constructing our command-line in "builtin/for-each-repo.c"... strvec_pushl(&child.args, "-C", path, NULL); for (i = 0; i < argc; i++) strvec_push(&child.args, argv[i]); ...have that "path" be "NULL", and as strvec_pushl() stops when it sees NULL we'd end with the first "argv" element as the argument to the "-C" option, e.g.: git -C status builtin I.e. we'd run the command "builtin" in the "status" directory. In another context this might be an interesting security vulnerability, but I think that this amounts to a nothingburger on that front. A hypothetical attacker would need to be able to write config for the victim to run, if they're able to do that there's more interesting attack vectors. See the "safe.directory" facility added in `8d1a744820` (setup.c: create `safe.bareRepository`, 2022-07-14). An even more unlikely possibility would be an attacker able to generate the config used for "for-each-repo --config=<key>", but nothing else (e.g. an automated system producing that list). Even in that case the attack vector is limited to the user running commands whose name matches a directory that's interesting to the attacker (e.g. a "log" directory in a repository). The second argument (if any) of the command is likely to make git die without doing anything interesting (e.g. "-p" to "log", there being no "-p" built-in command to run). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	9e2d884d0f	config API: add "string" version of _value_multi(), fix segfaults Fix numerous and mostly long-standing segfaults in consumers of the _config_value_multi() API. As discussed in the preceding commit an empty key in the config syntax yields a "NULL" string, which these users would give to strcmp() (or similar), resulting in segfaults. As this change shows, most users users of the _config_value_multi() API didn't really want such an an unsafe and low-level API, let's give them something with the safety of git_config_get_string() instead. This fix is similar to what the _string() functions and others acquired in[1] and [2]. Namely introducing and using a safer "_get_string_multi()" variant of the low-level "_value_multi()" function. This fixes segfaults in code introduced in: - `d811c8e17c` (versionsort: support reorder prerelease suffixes, 2015-02-26) - `c026557a37` (versioncmp: generalize version sort suffix reordering, 2016-12-08) - `a086f921a7` (submodule: decouple url and submodule interest, 2017-03-17) - `a6be5e6764` (log: add log.excludeDecoration config option, 2020-04-16) - `92156291ca` (log: add default decoration filter, 2022-08-05) - `50a044f1e4` (gc: replace config subprocesses with API calls, 2022-09-27) There are now two users ofthe low-level API: - One in "builtin/for-each-repo.c", which we'll convert in a subsequent commit. - The "t/helper/test-config.c" code added in [3]. As seen in the preceding commit we need to give the "t/helper/test-config.c" caller these "NULL" entries. We could also alter the underlying git_configset_get_value_multi() function to be "string safe", but doing so would leave no room for other variants of "*_get_value_multi()" that coerce to other types. Such coercion can't be built on the string version, since as we've established "NULL" is a true value in the boolean context, but if we coerced it to "" for use in a list of strings it'll be subsequently coerced to "false" as a boolean. The callback pattern being used here will make it easy to introduce e.g. a "multi" variant which coerces its values to "bool", "int", "path" etc. 1. `40ea4ed903` (Add config_error_nonbool() helper function, 2008-02-11) 2. `6c47d0e8f3` (config.c: guard config parser from value=NULL, 2008-02-11). 3. `4c715ebb96` (test-config: add tests for the config_set API, 2014-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	1c7e239bd0	config API users: test for _get_value_multi() segfaults As we'll discuss in the subsequent commit these tests all show _get_value_multi() API users unable to handle there being a value-less key in the config, which is represented with a "NULL" for that entry in the "string" member of the returned "struct string_list", causing a segfault. These added tests exhaustively test for that issue, as we'll see in a subsequent commit we'll need to change all of the API users of *_get_value_multi(). These cases were discovered by triggering each one individually, and then adding these tests. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	f7b2ff9516	for-each-repo: error on bad --config As noted in `6c62f01552` (for-each-repo: do nothing on empty config, 2021-01-08) this command wants to ignore a non-existing config key, but let's not conflate that with bad config. Before this, all these added tests would pass with an exit code of 0. We could preserve the comment added in `6c62f01552`, but now that we're directly using the documented repo_config_get_value_multi() value it's just narrating something that should be obvious from the API use, so let's drop it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	a428619309	config API: have _multi() return an "int" and take a "dest" Have the "git_configset_get_value_multi()" function and its siblings return an "int" and populate a "dest" parameter like every other git_configset_get_()" in the API. As we'll take advantage of in subsequent commits, this fixes a blind spot in the API where it wasn't possible to tell whether a list was empty from whether a config key existed. For now we don't make use of those new return values, but faithfully convert existing API users. Most of this is straightforward, commentary on cases that stand out: - To ensure that we'll properly use the return values of this function in the future we're using the "RESULT_MUST_BE_USED" macro introduced in [1]. As git_die_config() now has to handle this return value let's have it BUG() if it can't find the config entry. As tested for in a preceding commit we can rely on getting the config list in git_die_config(). - The loops after getting the "list" value in "builtin/gc.c" could also make use of "unsorted_string_list_has_string()" instead of using that loop, but let's leave that for now. - In "versioncmp.c" we now use the return value of the functions, instead of checking if the lists are still non-NULL. 1. `1e8697b5c4` (submodule--helper: check repo{_submodule,}_init() return values, 2022-09-01), Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:53 -07:00
Ævar Arnfjörð Bjarmason	b83efcecaf	config API: add and use a "git_config_get()" family of functions We already have the basic "git_config_get_value()" function and its "repo_" and "configset" siblings to get a given "key" and assign the last key found to a provided "value". But some callers don't care about that value, but just want to use the return value of the "get_value()" function to check whether the key exist (or another non-zero return value). The immediate motivation for this is that a subsequent commit will need to change all callers of the "_get_value_multi()" family of functions. In two cases here we (ab)used it to check whether we had any values for the given key, but didn't care about the return value. The rest of the callers here used various other config API functions to do the same, all of which resolved to the same underlying functions to provide the answer. Some of these were using either git_config_get_string() or git_config_get_string_tmp(), see `fe4c750fb1` (submodule--helper: fix a configure_added_submodule() leak, 2022-09-01) for a recent example. We can now use a helper function that doesn't require a throwaway variable. We could have changed git_configset_get_value_multi() (and then git_config_get_value() etc.) to accept a "NULL" as a "dest" for all callers, but let's avoid changing the behavior of existing API users. Having an "unused" value that we throw away internal to config.c is cheap. A "NULL as optional dest" pattern is also more fragile, as the intent of the caller might be misinterpreted if he were to accidentally pass "NULL", e.g. when "dest" is passed in from another function. Another name for this function could have been "_config_key_exists()", as suggested in [1]. That would work for all of these callers, and would currently be equivalent to this function, as the git_configset_get_value() API normalizes all non-zero return values to a "1". But adding that API would set us up to lose information, as e.g. if git_config_parse_key() in the underlying configset_find_element() fails we'd like to return -1, not 1. Let's change the underlying configset_find_element() function to support this use-case, we'll make further use of it in a subsequent commit where the git_configset_get_value_multi() function itself will expose this new return value. This still leaves various inconsistencies and clobbering or ignoring of the return value in place. E.g here we're modifying configset_add_value(), but ever since it was added in [2] we've been ignoring its "int" return value, but as we're changing the configset_find_element() it uses, let's have it faithfully ferry that "ret" along. Let's also use the "RESULT_MUST_BE_USED" macro introduced in [3] to assert that we're checking the return value of configset_find_element(). We're leaving the same change to configset_add_value() for some future series. Once we start paying attention to its return value we'd need to ferry it up as deep as do_config_from(), and would need to make least read_{,very_}early_config() and git_protected_config() return an "int" instead of "void". Let's leave that for now, and focus on the _get_*() functions. 1. `3c8687a73e` (add `config_set` API for caching config-like files, 2014-07-28) 2. https://lore.kernel.org/git/xmqqczadkq9f.fsf@gitster.g/ 3. `1e8697b5c4` (submodule--helper: check repo{_submodule,}_init() return values, 2022-09-01), Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:52 -07:00
Ævar Arnfjörð Bjarmason	e7587a8f53	config tests: add "NULL" tests for *_get_value_multi() A less well known edge case in the config format is that keys can be value-less, a shorthand syntax for "true" boolean keys. I.e. these two are equivalent as far as "--type=bool" is concerned: [a]key [a]key = true But as far as our parser is concerned the values for these two are NULL, and "true". I.e. for a sequence like: [a]key=x [a]key [a]key=y We get a "struct string_list" with "string" members with ".string" values of: { "x", NULL, "y" } This behavior goes back to the initial implementation of git_config_bool() in `17712991a5` (Add ".git/config" file parser, 2005-10-10). When parts of the config_set API were tested for in [1] they didn't add coverage for 3/4 of the "(NULL)" cases handled in "t/helper/test-config.c". We'd test that case for "get_value", but not "get_value_multi", "configset_get_value" and "configset_get_value_multi". We now cover all of those cases, which in turn expose the details of how this part of the config API works. 1. `4c715ebb96` (test-config: add tests for the config_set API, 2014-07-28) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:52 -07:00
Ævar Arnfjörð Bjarmason	258902ce07	config tests: cover blind spots in git_die_config() tests There were no tests checking for the output of the git_die_config() function in the config API, added in `5a80e97c82` (config: add `git_die_config()` to the config-set API, 2014-08-07). We only tested "test_must_fail", but didn't assert the output. We need tests for this because a subsequent commit will alter the return value of git_config_get_value_multi(), which is used to get the config values in the git_die_config() function. This test coverage helps to build confidence in that subsequent change. These tests cover different interactions with git_die_config(): - The "notes.mergeStrategy" test in "t/t3309-notes-merge-auto-resolve.sh" is a case where a function outside of config.c (git_config_get_notes_strategy()) calls git_die_config(). - The "gc.pruneExpire" test in "t5304-prune.sh" is a case where git_config_get_expiry() calls git_die_config(), covering a different "type" than the "string" test for "notes.mergeStrategy". - The "fetch.negotiationAlgorithm" test in "t/t5552-skipping-fetch-negotiator.sh" is a case where git_config_get_string*() calls git_die_config(). We also cover both the "from command-line config" and "in file..at line" cases here. The clobbering of existing ".git/config" files here is so that we're not implicitly testing the line count of the default config. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:37:52 -07:00
Ævar Arnfjörð Bjarmason	bab821646a	cocci: apply the "pretty.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "pretty.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason	ecb5091fd4	cocci: apply the "commit.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "commit.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason	cb338c23d6	cocci: apply the "commit-reach.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "commit-reach.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:36 -07:00
Ævar Arnfjörð Bjarmason	d850b7a545	cocci: apply the "cache.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "cache.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:36 -07:00
Michael J Gruber	3dc0b7f0dc	t3070: make chain lint tester happy `1f2e05f0b7` ("wildmatch: fix exponential behavior", 2023-03-20) introduced a new test with a background process. Backgrounding necessarily gives a result of 0, so that a seemingly broken && chain is not really broken. Adjust t3070 slightly so that our chain lint test recognizes the construct for what it is and does not raise a false positive. Signed-off-by: Michael J Gruber <git@grubix.eu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-27 17:02:38 -07:00

1 2 3 4 5 ...

20884 Commits