git-commit-vandalism

Author	SHA1	Message	Date
Taylor Blau	092d3a2bf9	t1092: prepare for changing protocol.file.allow Explicitly cloning over the "file://" protocol in t1092 in preparation for merging a security release which will change the default value of this configuration to be "user". Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:30:43 -04:00
Taylor Blau	067aa8fb41	t2080: prepare for changing protocol.file.allow Explicitly cloning over the "file://" protocol in t1092 in preparation for merging a security release which will change the default value of this configuration to be "user". Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:27:18 -04:00
Taylor Blau	4a7dab5ce4	t1092: prepare for changing protocol.file.allow Explicitly cloning over the "file://" protocol in t1092 in preparation for merging a security release which will change the default value of this configuration to be "user". Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:27:14 -04:00
Kevin Backhouse	0ca6ead81e	alias.c: reject too-long cmdline strings in split_cmdline() This function improperly uses an int to represent the number of entries in the resulting argument array. This allows a malicious actor to intentionally overflow the return value, leading to arbitrary heap writes. Because the resulting argv array is typically passed to execv(), it may be possible to leverage this attack to gain remote code execution on a victim machine. This was almost certainly the case for certain configurations of git-shell until the previous commit limited the size of input it would accept. Other calls to split_cmdline() are typically limited by the size of argv the OS is willing to hand us, so are similarly protected. So this is not strictly fixing a known vulnerability, but is a hardening of the function that is worth doing to protect against possible unknown vulnerabilities. One approach to fixing this would be modifying the signature of `split_cmdline()` to look something like: int split_cmdline(char cmdline, const char *argv, size_t argc); Where the return value of `split_cmdline()` is negative for errors, and zero otherwise. If non-NULL, the `argc` pointer is modified to contain the size of the `*argv` array. But this implies an absurdly large `argv` array, which more than likely larger than the system's argument limit. So even if split_cmdline() allowed this, it would fail immediately afterwards when we called execv(). So instead of converting all of `split_cmdline()`'s callers to work with `size_t` types in this patch, instead pursue the minimal fix here to prevent ever returning an array with more than INT_MAX entries in it. Signed-off-by: Kevin Backhouse <kevinbackhouse@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Jeff King	71ad7fe1bc	shell: limit size of interactive commands When git-shell is run in interactive mode (which must be enabled by creating $HOME/git-shell-commands), it reads commands from stdin, one per line, and executes them. We read the commands with git_read_line_interactively(), which uses a strbuf under the hood. That means we'll accept an input of arbitrary size (limited only by how much heap we can allocate). That creates two problems: - the rest of the code is not prepared to handle large inputs. The most serious issue here is that split_cmdline() uses "int" for most of its types, which can lead to integer overflow and out-of-bounds array reads and writes. But even with that fixed, we assume that we can feed the command name to snprintf() (via xstrfmt()), which is stuck for historical reasons using "int", and causes it to fail (and even trigger a BUG() call). - since the point of git-shell is to take input from untrusted or semi-trusted clients, it's a mild denial-of-service. We'll allocate as many bytes as the client sends us (actually twice as many, since we immediately duplicate the buffer). We can fix both by just limiting the amount of per-command input we're willing to receive. We should also fix split_cmdline(), of course, which is an accident waiting to happen, but that can come on top. Most calls to split_cmdline(), including the other one in git-shell, are OK because they are reading from an OS-provided argv, which is limited in practice. This patch should eliminate the immediate vulnerabilities. I picked 4MB as an arbitrary limit. It's big enough that nobody should ever run into it in practice (since the point is to run the commands via exec, we're subject to OS limits which are typically much lower). But it's small enough that allocating it isn't that big a deal. The code is mostly just swapping out fgets() for the strbuf call, but we have to add a few niceties like flushing and trimming line endings. We could simplify things further by putting the buffer on the stack, but 4MB is probably a bit much there. Note that we'll _always_ allocate 4MB, which for normal, non-malicious requests is more than we would before this patch. But on the other hand, other git programs are happy to use 96MB for a delta cache. And since we'd never touch most of those pages, on a lazy-allocating OS like Linux they won't even get allocated to actual RAM. The ideal would be a version of strbuf_getline() that accepted a maximum value. But for a minimal vulnerability fix, let's keep things localized and simple. We can always refactor further on top. The included test fails in an obvious way with ASan or UBSan (which notice the integer overflow and out-of-bounds reads). Without them, it fails in a less obvious way: we may segfault, or we may try to xstrfmt() a long string, leading to a BUG(). Either way, it fails reliably before this patch, and passes with it. Note that we don't need an EXPENSIVE prereq on it. It does take 10-15s to fail before this patch, but with the new limit, we fail almost immediately (and the perl process generating 2GB of data exits via SIGPIPE). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Jeff King	32696a4cbe	shell: add basic tests We have no tests of even basic functionality of git-shell. Let's add a couple of obvious ones. This will serve as a framework for adding tests for new things we fix, as well as making sure we don't screw anything up too badly while doing so. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	a1d4f67c12	transport: make `protocol.file.allow` be "user" by default An earlier patch discussed and fixed a scenario where Git could be used as a vector to exfiltrate sensitive data through a Docker container when a potential victim clones a suspicious repository with local submodules that contain symlinks. That security hole has since been plugged, but a similar one still exists. Instead of convincing a would-be victim to clone an embedded submodule via the "file" protocol, an attacker could convince an individual to clone a repository that has a submodule pointing to a valid path on the victim's filesystem. For example, if an individual (with username "foo") has their home directory ("/home/foo") stored as a Git repository, then an attacker could exfiltrate data by convincing a victim to clone a malicious repository containing a submodule pointing at "/home/foo/.git" with `--recurse-submodules`. Doing so would expose any sensitive contents in stored in "/home/foo" tracked in Git. For systems (such as Docker) that consider everything outside of the immediate top-level working directory containing a Dockerfile as inaccessible to the container (with the exception of volume mounts, and so on), this is a violation of trust by exposing unexpected contents in the working copy. To mitigate the likelihood of this kind of attack, adjust the "file://" protocol's default policy to be "user" to prevent commands that execute without user input (including recursive submodule initialization) from taking place by default. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	f4a32a550f	t/t9NNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that interact with submodules a handful of times use `test_config_global`. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	0d3beb71da	t/t7NNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Tests that interact with submodules a handful of times use `test_config_global` instead. Test scripts that rely on submodules throughout use a `git config --global` during a setup test towards the beginning of the script. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	0f21b8f468	t/t6NNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	225d2d50cc	t/t5NNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Tests that interact with submodules a handful of times use `test_config_global` instead. Test scripts that rely on submodules throughout use a `git config --global` during a setup test towards the beginning of the script. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	ac7e57fa28	t/t4NNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Tests that interact with submodules a handful of times use `test_config_global` instead. Test scripts that rely on submodules throughout use a `git config --global` during a setup test towards the beginning of the script. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	f8d510ed0b	t/t3NNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Tests that interact with submodules a handful of times use `test_config_global` instead. Test scripts that rely on submodules throughout use a `git config --global` during a setup test towards the beginning of the script. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	99f4abb8da	t/2NNNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Tests that interact with submodules a handful of times use `test_config_global` instead. Test scripts that rely on submodules throughout use a `git config --global` during a setup test towards the beginning of the script. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	8a96dbcb33	t/t1NNN: allow local submodules To prepare for the default value of `protocol.file.allow` to change to "user", ensure tests that rely on local submodules can initialize them over the file protocol. Tests that only need to interact with submodules in a limited capacity have individual Git commands annotated with the appropriate configuration via `-c`. Tests that interact with submodules a handful of times use `test_config_global` instead. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	7de0c306f7	t/lib-submodule-update.sh: allow local submodules To prepare for changing the default value of `protocol.file.allow` to "user", update the `prolog()` function in lib-submodule-update to allow submodules to be cloned over the file protocol. This is used by a handful of submodule-related test scripts, which themselves will have to tweak the value of `protocol.file.allow` in certain locations. Those will be done in subsequent commits. Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Taylor Blau	6f054f9fb3	builtin/clone.c: disallow `--local` clones with symlinks When cloning a repository with `--local`, Git relies on either making a hardlink or copy to every file in the "objects" directory of the source repository. This is done through the callpath `cmd_clone()` -> `clone_local()` -> `copy_or_link_directory()`. The way this optimization works is by enumerating every file and directory recursively in the source repository's `$GIT_DIR/objects` directory, and then either making a copy or hardlink of each file. The only exception to this rule is when copying the "alternates" file, in which case paths are rewritten to be absolute before writing a new "alternates" file in the destination repo. One quirk of this implementation is that it dereferences symlinks when cloning. This behavior was most recently modified in `36596fd2df` (clone: better handle symlinked files at .git/objects/, 2019-07-10), which attempted to support `--local` clones of repositories with symlinks in their objects directory in a platform-independent way. Unfortunately, this behavior of dereferencing symlinks (that is, creating a hardlink or copy of the source's link target in the destination repository) can be used as a component in attacking a victim by inadvertently exposing the contents of file stored outside of the repository. Take, for example, a repository that stores a Dockerfile and is used to build Docker images. When building an image, Docker copies the directory contents into the VM, and then instructs the VM to execute the Dockerfile at the root of the copied directory. This protects against directory traversal attacks by copying symbolic links as-is without dereferencing them. That is, if a user has a symlink pointing at their private key material (where the symlink is present in the same directory as the Dockerfile, but the key itself is present outside of that directory), the key is unreadable to a Docker image, since the link will appear broken from the container's point of view. This behavior enables an attack whereby a victim is convinced to clone a repository containing an embedded submodule (with a URL like "file:///proc/self/cwd/path/to/submodule") which has a symlink pointing at a path containing sensitive information on the victim's machine. If a user is tricked into doing this, the contents at the destination of those symbolic links are exposed to the Docker image at runtime. One approach to preventing this behavior is to recreate symlinks in the destination repository. But this is problematic, since symlinking the objects directory are not well-supported. (One potential problem is that when sharing, e.g. a "pack" directory via symlinks, different writers performing garbage collection may consider different sets of objects to be reachable, enabling a situation whereby garbage collecting one repository may remove reachable objects in another repository). Instead, prohibit the local clone optimization when any symlinks are present in the `$GIT_DIR/objects` directory of the source repository. Users may clone the repository again by prepending the "file://" scheme to their clone URL, or by adding the `--no-local` option to their `git clone` invocation. The directory iterator used by `copy_or_link_directory()` must no longer dereference symlinks (i.e., it must call `lstat()` instead of `stat()` in order to discover whether or not there are symlinks present). This has no bearing on the overall behavior, since we will immediately `die()` on encounter a symlink. Note that t5604.33 suggests that we do support local clones with symbolic links in the source repository's objects directory, but this was likely unintentional, or at least did not take into consideration the problem with sharing parts of the objects directory with symbolic links at the time. Update this test to reflect which options are and aren't supported. Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com>	2022-10-01 00:23:38 -04:00
Junio C Hamano	a0feb8611d	Merge a handful of topics from the 'master' front As the 'master' front will soon tag a preview and then release candidates for 2.38, it is unknown if we are going to issue another maintenance release on the 2.37.x track, but as we have accumulated enough material there, let's prepare a draft for it. Even if we end up not tagging 2.37.4, it would help motivated distro packagers to maintain their slightly older and "more stable" versions. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-13 12:22:59 -07:00
Junio C Hamano	2c75b3255b	Merge branch 'en/merge-unstash-only-on-clean-merge' into maint The auto-stashed local changes created by "git merge --autostash" was mixed into a conflicted state left in the working tree, which has been corrected. * en/merge-unstash-only-on-clean-merge: merge: only apply autostash when appropriate	2022-09-13 12:21:11 -07:00
Junio C Hamano	4f06dfde7a	Merge branch 'ds/github-actions-use-newer-ubuntu' into maint Update the version of Ubuntu used for GitHub Actions CI from 18.04 to 22.04. * ds/github-actions-use-newer-ubuntu: ci: update 'static-analysis' to Ubuntu 22.04	2022-09-13 12:21:10 -07:00
Junio C Hamano	37317ab40b	Merge branch 'ad/preload-plug-memleak' into maint The preload-index codepath made copies of pathspec to give to multiple threads, which were left leaked. * ad/preload-plug-memleak: preload-index: fix memleak	2022-09-13 12:21:10 -07:00
Junio C Hamano	c61614e30f	Merge branch 'sg/xcalloc-cocci-fix' into maint xcalloc(), imitating calloc(), takes "number of elements of the array", and "size of a single element", in this order. A call that does not follow this ordering has been corrected. * sg/xcalloc-cocci-fix: promisor-remote: fix xcalloc() argument order	2022-09-13 12:21:09 -07:00
Junio C Hamano	aa31cb8974	Merge branch 'jk/pipe-command-nonblock' into maint Fix deadlocks between main Git process and subprocess spawned via the pipe_command() API, that can kill "git add -p" that was reimplemented in C recently. * jk/pipe-command-nonblock: pipe_command(): mark stdin descriptor as non-blocking pipe_command(): handle ENOSPC when writing to a pipe pipe_command(): avoid xwrite() for writing to pipe git-compat-util: make MAX_IO_SIZE define globally available nonblock: support Windows compat: add function to enable nonblocking pipes	2022-09-13 12:21:08 -07:00
Junio C Hamano	72869e750b	Merge branch 'jk/is-promisor-object-keep-tree-in-use' into maint An earlier optimization discarded a tree-object buffer that is still in use, which has been corrected. * jk/is-promisor-object-keep-tree-in-use: is_promisor_object(): fix use-after-free of tree buffer	2022-09-13 12:21:07 -07:00
Junio C Hamano	ac8035a2af	Git 2.37.3 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-30 10:22:10 -07:00
Junio C Hamano	0f5bd024f2	A handful more topics from the 'master' front for 2.37.3 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-26 11:13:13 -07:00
Junio C Hamano	842c912fc7	Merge branch 'po/doc-add-renormalize' into maint Documentation for "git add --renormalize" has been improved. source: <20220810144450.470-2-philipoakley@iee.email> * po/doc-add-renormalize: doc add: renormalize is not idempotent for CRCRLF	2022-08-26 11:13:13 -07:00
Junio C Hamano	7be9f3f335	Merge branch 'vd/sparse-reset-checkout-fixes' into maint Fixes to sparse index compatibility work for "reset" and "checkout" commands. source: <pull.1312.v3.git.1659985672.gitgitgadget@gmail.com> * vd/sparse-reset-checkout-fixes: unpack-trees: unpack new trees as sparse directories cache.h: create 'index_name_pos_sparse()' oneway_diff: handle removed sparse directories checkout: fix nested sparse directory diff in sparse index	2022-08-26 11:13:13 -07:00
Junio C Hamano	e5cb51d3aa	Merge branch 'jk/fsck-tree-mode-bits-fix' into maint "git fsck" reads mode from tree objects but canonicalizes the mode before passing it to the logic to check object sanity, which has hid broken tree objects from the checking logic. This has been corrected, but to help exiting projects with broken tree objects that they cannot fix retroactively, the severity of anomalies this code detects has been demoted to "info" for now. source: <YvQcNpizy9uOZiAz@coredump.intra.peff.net> * jk/fsck-tree-mode-bits-fix: fsck: downgrade tree badFilemode to "info" fsck: actually detect bad file modes in trees tree-walk: add a mechanism for getting non-canonicalized modes	2022-08-26 11:13:12 -07:00
Junio C Hamano	222f953777	Merge branch 'fc/vimdiff-layout-vimdiff3-fix' into maint "vimdiff3" regression fix. source: <20220810154618.307275-1-felipe.contreras@gmail.com> * fc/vimdiff-layout-vimdiff3-fix: mergetools: vimdiff: simplify tabfirst mergetools: vimdiff: fix single window layouts mergetools: vimdiff: rework tab logic mergetools: vimdiff: fix for diffopt mergetools: vimdiff: silence annoying messages mergetools: vimdiff: make vimdiff3 actually work mergetools: vimdiff: fix comment	2022-08-26 11:13:12 -07:00
Junio C Hamano	10f9eab347	Merge branch 'js/safe-directory-plus' into maint Platform-specific code that determines if a directory is OK to use as a repository has been taught to report more details, especially on Windows. source: <pull.1286.v2.git.1659965270.gitgitgadget@gmail.com> * js/safe-directory-plus: mingw: handle a file owned by the Administrators group correctly mingw: be more informative when ownership check fails on FAT32 mingw: provide details about unsafe directories' ownership setup: prepare for more detailed "dubious ownership" messages setup: fix some formatting	2022-08-26 11:13:12 -07:00
Junio C Hamano	6283c1e6ad	Merge branch 'pw/use-glibc-tunable-for-malloc-optim' into maint Avoid repeatedly running getconf to ask libc version in the test suite, and instead just as it once per script. source: <pull.1311.git.1659620305757.gitgitgadget@gmail.com> * pw/use-glibc-tunable-for-malloc-optim: tests: cache glibc version check	2022-08-26 11:13:12 -07:00
Junio C Hamano	5825304328	Merge branch 'ab/hooks-regression-fix' into maint A follow-up fix to a fix for a regression in 2.36. source: <patch-1.1-2450e3e65cf-20220805T141402Z-avarab@gmail.com> * ab/hooks-regression-fix: hook API: don't segfault on strbuf_addf() to NULL "out"	2022-08-26 11:13:12 -07:00
Junio C Hamano	ed051d4024	Merge branch 'gc/git-reflog-doc-markup' into maint Doc mark-up fix. source: <pull.1304.git.git.1659387885711.gitgitgadget@gmail.com> * gc/git-reflog-doc-markup: Documentation/git-reflog: remove unneeded \ from \{	2022-08-26 11:13:11 -07:00
Junio C Hamano	c2d62d0c7d	Merge branch 'js/ort-clean-up-after-failed-merge' into maint Plug memory leaks in the failure code path in the "merge-ort" merge strategy backend. source: <pull.1307.v2.git.1659114727.gitgitgadget@gmail.com> * js/ort-clean-up-after-failed-merge: merge-ort: do leave trace2 region even if checkout fails merge-ort: clean up after failed merge	2022-08-26 11:13:11 -07:00
Junio C Hamano	4b2d41b0ad	Merge branch 'jk/struct-zero-init-with-older-gcc' into maint Older gcc with -Wall complains about the universal zero initializer "struct s = { 0 };" idiom, which makes developers' lives inconvenient (as -Werror is enabled by DEVELOPER=YesPlease). The build procedure has been tweaked to help these compilers. source: <YuQ60ZUPBHAVETD7@coredump.intra.peff.net> * jk/struct-zero-init-with-older-gcc: config.mak.dev: squelch -Wno-missing-braces for older gcc	2022-08-26 11:13:11 -07:00
Junio C Hamano	69c99b85e7	Merge branch 'js/lstat-mingw-enotdir-fix' into maint Fix to lstat() emulation on Windows. source: <pull.1291.v3.git.1659089152877.gitgitgadget@gmail.com> * js/lstat-mingw-enotdir-fix: lstat(mingw): correctly detect ENOTDIR scenarios	2022-08-26 11:13:10 -07:00
Junio C Hamano	9166bca8ba	Merge branch 'js/mingw-with-python' into maint Conditionally allow building Python interpreter on Windows source: <pull.1306.v2.git.1659109272.gitgitgadget@gmail.com> * js/mingw-with-python: mingw: remove unneeded `NO_CURL` directive mingw: remove unneeded `NO_GETTEXT` directive windows: include the Python bits when building Git for Windows	2022-08-26 11:13:10 -07:00
Junio C Hamano	2794e813c6	Merge branch 'ca/unignore-local-installation-on-windows' into maint Fix build procedure for Windows that uses CMake so that it can pick up the shell interpreter from local installation location. source: <pull.1304.git.1658912756815.gitgitgadget@gmail.com> * ca/unignore-local-installation-on-windows: cmake: support local installations of git	2022-08-26 11:13:10 -07:00
Derrick Stolee	ef46584831	ci: update 'static-analysis' to Ubuntu 22.04 GitHub Actions scheduled a brownout of Ubuntu 18.04, which canceled all runs of the 'static-analysis' job in our CI runs. Update to 22.04 to avoid this as the brownout later turns into a complete deprecation. The use of 18.04 was set in `d051ed77ee` (.github/workflows/main.yml: run static-analysis on bionic, 2021-02-08) due to the lack of Coccinelle being available on 20.04 (which continues today). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-24 13:02:12 -07:00
Elijah Newren	d3a9295ada	merge: only apply autostash when appropriate If a merge failed and we are leaving conflicts in the working directory for the user to resolve, we should not attempt to apply any autostash. Further, if we fail to apply the autostash (because either the merge failed, or the user requested --no-commit), then we should instruct the user how to apply it later. Add a testcase verifying we have corrected this behavior. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-24 09:08:32 -07:00
SZEDER Gábor	c4bbd9bb8f	promisor-remote: fix xcalloc() argument order Pass the number of elements first and their size second, as expected by xcalloc(). Patch generated with: make SPATCH_FLAGS=--recursive-includes contrib/coccinelle/xcalloc.cocci.patch Our default SPATCH_FLAGS ('--all-includes') doesn't catch this transformation by default, unless used in combination with a large-ish SPATCH_BATCH_SIZE which happens to put 'promisor-remote.c' with a file that includes 'repository.h' directly in the same batch. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-24 08:50:39 -07:00
Anthony Delannoy	23578904da	preload-index: fix memleak Fix a memory leak occuring in case of pathspec copy in preload_index. Direct leak of 8 byte(s) in 8 object(s) allocated from: #0 0x7f0a353ead47 in __interceptor_malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/libasan.so.6+0xb5d47) #1 0x55750995e840 in do_xmalloc /home/anthony/src/c/git/wrapper.c:51 #2 0x55750995e840 in xmalloc /home/anthony/src/c/git/wrapper.c:72 #3 0x55750970f824 in copy_pathspec /home/anthony/src/c/git/pathspec.c:684 #4 0x557509717278 in preload_index /home/anthony/src/c/git/preload-index.c:135 #5 0x55750975f21e in refresh_index /home/anthony/src/c/git/read-cache.c:1633 #6 0x55750915b926 in cmd_status builtin/commit.c:1547 #7 0x5575090e1680 in run_builtin /home/anthony/src/c/git/git.c:466 #8 0x5575090e1680 in handle_builtin /home/anthony/src/c/git/git.c:720 #9 0x5575090e284a in run_argv /home/anthony/src/c/git/git.c:787 #10 0x5575090e284a in cmd_main /home/anthony/src/c/git/git.c:920 #11 0x5575090dbf82 in main /home/anthony/src/c/git/common-main.c:56 #12 0x7f0a348230ab (/lib64/libc.so.6+0x290ab) Signed-off-by: Anthony Delannoy <anthony.2lannoy@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-22 15:08:30 -07:00
Jeff King	716c1f649e	pipe_command(): mark stdin descriptor as non-blocking Our pipe_command() helper lets you both write to and read from a child process on its stdin/stdout. It's supposed to work without deadlocks because we use poll() to check when descriptors are ready for reading or writing. But there's a bug: if both the data to be written and the data to be read back exceed the pipe buffer, we'll deadlock. The issue is that the code assumes that if you have, say, a 2MB buffer to write and poll() tells you that the pipe descriptor is ready for writing, that calling: write(cmd->in, buf, 210241024); will do a partial write, filling the pipe buffer and then returning what it did write. And that is what it would do on a socket, but not for a pipe. When writing to a pipe, at least on Linux, it will block waiting for the child process to read() more. And now we have a potential deadlock, because the child may be writing back to us, waiting for us to read() ourselves. An easy way to trigger this is: git -c add.interactive.useBuiltin=true \ -c interactive.diffFilter=cat \ checkout -p HEAD~200 The diff against HEAD~200 will be big, and the filter wants to write all of it back to us (obviously this is a dummy filter, but in the real world something like diff-highlight would similarly stream back a big output). If you set add.interactive.useBuiltin to false, the problem goes away, because now we're not using pipe_command() anymore (instead, that part happens in perl). But this isn't a bug in the interactive code at all. It's the underlying pipe_command() code which is broken, and has been all along. We presumably didn't notice because most calls only do input _or_ output, not both. And the few that do both, like gpg calls, may have large inputs or outputs, but never both at the same time (e.g., consider signing, which has a large payload but a small signature comes back). The obvious fix is to put the descriptor into non-blocking mode, and indeed, that makes the problem go away. Callers shouldn't need to care, because they never see the descriptor (they hand us a buffer to feed into it). The included test fails reliably on Linux without this patch. Curiously, it doesn't fail in our Windows CI environment, but has been reported to do so for individual developers. It should pass in any environment after this patch (courtesy of the compat/ layers added in the last few commits). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:41 -07:00
Jeff King	c6d3cce6f3	pipe_command(): handle ENOSPC when writing to a pipe When write() to a non-blocking pipe fails because the buffer is full, POSIX says we should see EAGAIN. But our mingw_write() compat layer on Windows actually returns ENOSPC for this case. This is probably something we want to correct, but given that we don't plan to use non-blocking descriptors in a lot of places, we can work around it by just catching ENOSPC alongside EAGAIN. If we ever do fix mingw_write(), then this patch can be reverted. We don't actually use a non-blocking pipe yet, so this is still just preparation. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:41 -07:00
Jeff King	14eab817e4	pipe_command(): avoid xwrite() for writing to pipe If xwrite() sees an EAGAIN response, it will loop forever until the write succeeds (or encounters a real error). This is due to `ef1cf0167a` (xwrite: poll on non-blocking FDs, 2016-06-26), with the idea that we won't be surprised by a descriptor unexpectedly set as non-blocking. But that will make things awkward when we do want a non-blocking descriptor, and a future patch will switch pipe_command() to using one. In that case, looping on EAGAIN is bad, because the process on the other end of the pipe may be waiting on us before doing another read() on the pipe, which would mean we deadlock. In practice we're not supposed to ever see EAGAIN here, since poll() will have just told us the descriptor is ready for writing. But our Windows emulation of poll() will always return "ready" for writing to a pipe descriptor! This is due to `94f4d01932` (mingw: workaround for hangs when sending STDIN, 2020-02-17). Our best bet in that case is to keep handling other descriptors, as any read() we do may allow the child command to make forward progress (i.e., its write() finishes, and then it read()s from its stdin, freeing up space in the pipe buffer). This means we might busy-loop between poll() and write() on Windows if the child command is slow to read our input, but it's much better than the alternative of deadlocking. In practice, this busy-looping should be rare: - for small inputs, we'll just write the whole thing in a single write() anyway, non-blocking or not - for larger inputs where the child reads input and then processes it before writing (e.g., gpg verifying a signature), we may make a few extra write() calls that get EAGAIN during the initial write, but once it has taken in the whole input, we'll correctly block waiting to read back the data. - for larger inputs where the child process is streaming output back (like a diff filter), we'll likewise see some extra EAGAINs, but most of them will be followed immediately by a read(), which will let the child command make forward progress. Of course it won't happen at all for now, since we don't yet use a non-blocking pipe. This is just preparation for when we do. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:40 -07:00
Jeff King	ec4f39b233	git-compat-util: make MAX_IO_SIZE define globally available We define MAX_IO_SIZE within wrapper.c, but it's useful for any code that wants to do a raw write() for whatever reason (say, because they want different EAGAIN handling). Let's make it available everywhere. The alternative would be adding xwrite_foo() variants to give callers more options. But there's really no reason MAX_IO_SIZE needs to be abstracted away, so this give callers the most flexibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:40 -07:00
René Scharfe	24b56ae4ae	nonblock: support Windows Implement enable_pipe_nonblock() using the Windows API. This works only for pipes, but that is sufficient for this limited interface. Despite the API calls used, it handles both "named" and anonymous pipes from our pipe() emulation. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:40 -07:00
Jeff King	10f743389c	compat: add function to enable nonblocking pipes We'd like to be able to make some of our pipes nonblocking so that poll() can be used effectively, but O_NONBLOCK isn't portable. Let's introduce a compat wrapper so this can be abstracted for each platform. The interface is as narrow as possible to let platforms do what's natural there (rather than having to implement fcntl() and a fake O_NONBLOCK for example, or having to handle other types of descriptors). The next commit will add Windows support, at which point we should be covering all platforms in practice. But if we do find some other platform without O_NONBLOCK, we'll return ENOSYS. Arguably we could just trigger a build-time #error in this case, which would catch the problem earlier. But since we're not planning to use this compat wrapper in many code paths, a seldom-seen runtime error may be friendlier for such a platform than blocking compilation completely. Our test suite would still notice it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-17 09:21:40 -07:00
Jeff King	1490d7d82d	is_promisor_object(): fix use-after-free of tree buffer Since commit `fcc07e980b` (is_promisor_object(): free tree buffer after parsing, 2021-04-13), we'll always free the buffers attached to a "struct tree" after searching them for promisor links. But there's an important case where we don't want to do so: if somebody else is already using the tree! This can happen during a "rev-list --missing=allow-promisor" traversal in a partial clone that is missing one or more trees or blobs. The backtrace for the free looks like this: #1 free_tree_buffer tree.c:147 #2 add_promisor_object packfile.c:2250 #3 for_each_object_in_pack packfile.c:2190 #4 for_each_packed_object packfile.c:2215 #5 is_promisor_object packfile.c:2272 #6 finish_object__ma builtin/rev-list.c:245 #7 finish_object builtin/rev-list.c:261 #8 show_object builtin/rev-list.c:274 #9 process_blob list-objects.c:63 #10 process_tree_contents list-objects.c:145 #11 process_tree list-objects.c:201 #12 traverse_trees_and_blobs list-objects.c:344 [...] We're in the middle of walking through the entries of a tree object via process_tree_contents(). We see a blob (or it could even be another tree entry) that we don't have, so we call is_promisor_object() to check it. That function loops over all of the objects in the promisor packfile, including the tree we're currently walking. When we're done with it there, we free the tree buffer. But as we return to the walk in process_tree_contents(), it's still holding on to a pointer to that buffer, via its tree_desc iterator, and it accesses the freed memory. Even a trivial use of "--missing=allow-promisor" triggers this problem, as the included test demonstrates (it's just a vanilla --blob:none clone). We can detect this case by only freeing the tree buffer if it was allocated on our behalf. This is a little tricky since that happens inside parse_object(), and it doesn't tell us whether the object was already parsed, or whether it allocated the buffer itself. But by checking for an already-parsed tree beforehand, we can distinguish the two cases. That feels a little hacky, and does incur an extra lookup in the object-hash table. But that cost is fairly minimal compared to actually loading objects (and since we're iterating the whole pack here, we're likely to be loading most objects, rather than reusing cached results). It may also be a good direction for this function in general, as there are other possible optimizations that rely on doing some analysis before parsing: - we could detect blobs and avoid reading their contents; they can't link to other objects, but parse_object() doesn't know that we don't care about checking their hashes. - we could avoid allocating object structs entirely for most objects (since we really only need them in the oidset), which would save some memory. - promisor commits could use the commit-graph rather than loading the object from disk This commit doesn't do any of those optimizations, but I think it argues that this direction is reasonable, rather than relying on parse_object() and trying to teach it to give us more information about whether it parsed. The included test fails reliably under SANITIZE=address just when running "rev-list --missing=allow-promisor". Checking the output isn't strictly necessary to detect the bug, but it seems like a reasonable addition given the general lack of coverage for "allow-promisor" in the test suite. Reported-by: Andrew Olsen <andrew.olsen@koordinates.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-14 18:03:36 -07:00

1 2 3 4 5 ...

67465 Commits