git-commit-vandalism/t/t5544-pack-objects-hook.sh

#!/bin/sh

test_description='test custom script in place of pack-objects'

TEST_PASSES_SANITIZE_LEAK=true
. ./test-lib.sh

test_expect_success 'create some history to fetch' '
	test_commit one &&
	test_commit two
'

test_expect_success 'create debugging hook script' '
	write_script .git/hook <<-\EOF
		echo >&2 "hook running"
		echo "$*" >hook.args
		cat >hook.stdin
		"$@" <hook.stdin >hook.stdout
		cat hook.stdout
	EOF
'

clear_hook_results () {
	rm -rf .git/hook.* dst.git
}

test_expect_success 'hook runs via global config' '
	clear_hook_results &&
	test_config_global uploadpack.packObjectsHook ./hook &&
	git clone --no-local . dst.git 2>stderr &&
	grep "hook running" stderr
'

test_expect_success 'hook outputs are sane' '
	# check that we recorded a usable pack
	git index-pack --stdin <.git/hook.stdout &&

	# check that we recorded args and stdin. We do not check
	# the full argument list or the exact pack contents, as it would make
	# the test brittle. So just sanity check that we could replay
	# the packing procedure.
	grep "^git" .git/hook.args &&
	$(cat .git/hook.args) <.git/hook.stdin >replay
'

test_expect_success 'hook runs from -c config' '
	clear_hook_results &&
	git clone --no-local \
	  -u "git -c uploadpack.packObjectsHook=./hook upload-pack" \
	  . dst.git 2>stderr &&
	grep "hook running" stderr
'

test_expect_success 'hook does not run from repo config' '
	clear_hook_results &&
	test_config uploadpack.packObjectsHook "./hook" &&
	git clone --no-local . dst.git 2>stderr &&
	! grep "hook running" stderr &&
	test_path_is_missing .git/hook.args &&
	test_path_is_missing .git/hook.stdin &&
	test_path_is_missing .git/hook.stdout &&

	# check that global config is used instead
	test_config_global uploadpack.packObjectsHook ./hook &&
	git clone --no-local . dst2.git 2>stderr &&
	grep "hook running" stderr
'

test_expect_success 'hook works with partial clone' '
	clear_hook_results &&
	test_config_global uploadpack.packObjectsHook ./hook &&
	test_config_global uploadpack.allowFilter true &&
	git clone --bare --no-local --filter=blob:none . dst.git &&
	git -C dst.git rev-list --objects --missing=allow-any --no-object-names --all >objects &&
	git -C dst.git cat-file --batch-check="%(objecttype)" <objects >types &&
	! grep blob types
'

test_done
upload-pack: provide a hook for running pack-objects When upload-pack serves a client request, it turns to pack-objects to do the heavy lifting of creating a packfile. There's no easy way to intercept the call to pack-objects, but there are a few good reasons to want to do so: 1. If you're debugging a client or server issue with fetching, you may want to store a copy of the generated packfile. 2. If you're gathering data from real-world fetches for performance analysis or debugging, storing a copy of the arguments and stdin lets you replay the pack generation at your leisure. 3. You may want to insert a caching layer around pack-objects; it is the most CPU- and memory-intensive part of serving a fetch, and its output is a pure function[1] of its input, making it an ideal place to consolidate identical requests. This patch adds a simple "hook" interface to intercept calls to pack-objects. The new test demonstrates how it can be used for debugging (using it for caching is a straightforward extension; the tricky part is writing the actual caching layer). This hook is unlike the normal hook scripts found in the "hooks/" directory of a repository. Because we promise that upload-pack is safe to run in an untrusted repository, we cannot execute arbitrary code or commands found in the repository (neither in hooks/, nor in the config). So instead, this hook is triggered from a config variable that is explicitly ignored in the per-repo config. The config variable holds the actual shell command to run as the hook. Another approach would be to simply treat it as a boolean: "should I respect the upload-pack hooks in this repo?", and then run the script from "hooks/" as we usually do. However, that isn't as flexible; there's no way to run a hook approved by the site administrator (e.g., in "/etc/gitconfig") on a repository whose contents are not trusted. The approach taken by this patch is more fine-grained, if a little less conventional for git hooks (it does behave similar to other configured commands like diff.external, etc). [1] Pack-objects isn't _actually_ a pure function. Its output depends on the exact packing of the object database, and if multi-threading is used for delta compression, can even differ racily. But for the purposes of caching, that's OK; of the many possible outputs for a given input, it is sufficient only that we output one of them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-05-19 00:45:37 +02:00			`#!/bin/sh`

			`test_description='test custom script in place of pack-objects'`
connected.c: free the "struct packed_git" The "new_pack" we allocate in check_connected() wasn't being free'd. Let's do that before we return from the function. This has leaked ever since "new_pack" was added to this function in c6807a40dcd (clone: open a shortcut for connectivity check, 2013-05-26). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> 2022-11-08 19:17:47 +01:00
			`TEST_PASSES_SANITIZE_LEAK=true`
upload-pack: provide a hook for running pack-objects When upload-pack serves a client request, it turns to pack-objects to do the heavy lifting of creating a packfile. There's no easy way to intercept the call to pack-objects, but there are a few good reasons to want to do so: 1. If you're debugging a client or server issue with fetching, you may want to store a copy of the generated packfile. 2. If you're gathering data from real-world fetches for performance analysis or debugging, storing a copy of the arguments and stdin lets you replay the pack generation at your leisure. 3. You may want to insert a caching layer around pack-objects; it is the most CPU- and memory-intensive part of serving a fetch, and its output is a pure function[1] of its input, making it an ideal place to consolidate identical requests. This patch adds a simple "hook" interface to intercept calls to pack-objects. The new test demonstrates how it can be used for debugging (using it for caching is a straightforward extension; the tricky part is writing the actual caching layer). This hook is unlike the normal hook scripts found in the "hooks/" directory of a repository. Because we promise that upload-pack is safe to run in an untrusted repository, we cannot execute arbitrary code or commands found in the repository (neither in hooks/, nor in the config). So instead, this hook is triggered from a config variable that is explicitly ignored in the per-repo config. The config variable holds the actual shell command to run as the hook. Another approach would be to simply treat it as a boolean: "should I respect the upload-pack hooks in this repo?", and then run the script from "hooks/" as we usually do. However, that isn't as flexible; there's no way to run a hook approved by the site administrator (e.g., in "/etc/gitconfig") on a repository whose contents are not trusted. The approach taken by this patch is more fine-grained, if a little less conventional for git hooks (it does behave similar to other configured commands like diff.external, etc). [1] Pack-objects isn't _actually_ a pure function. Its output depends on the exact packing of the object database, and if multi-threading is used for delta compression, can even differ racily. But for the purposes of caching, that's OK; of the many possible outputs for a given input, it is sufficient only that we output one of them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-05-19 00:45:37 +02:00			`. ./test-lib.sh`

			`test_expect_success 'create some history to fetch' '`
			`test_commit one &&`
			`test_commit two`
			`'`

			`test_expect_success 'create debugging hook script' '`
			`write_script .git/hook <<-\EOF`
			`echo >&2 "hook running"`
			`echo "$*" >hook.args`
			`cat >hook.stdin`
			`"$@" <hook.stdin >hook.stdout`
			`cat hook.stdout`
			`EOF`
			`'`

			`clear_hook_results () {`
			`rm -rf .git/hook.* dst.git`
			`}`

			`test_expect_success 'hook runs via global config' '`
			`clear_hook_results &&`
			`test_config_global uploadpack.packObjectsHook ./hook &&`
			`git clone --no-local . dst.git 2>stderr &&`
			`grep "hook running" stderr`
			`'`

			`test_expect_success 'hook outputs are sane' '`
			`# check that we recorded a usable pack`
			`git index-pack --stdin <.git/hook.stdout &&`

			`# check that we recorded args and stdin. We do not check`
			`# the full argument list or the exact pack contents, as it would make`
			`# the test brittle. So just sanity check that we could replay`
			`# the packing procedure.`
			`grep "^git" .git/hook.args &&`
			`$(cat .git/hook.args) <.git/hook.stdin >replay`
			`'`

			`test_expect_success 'hook runs from -c config' '`
			`clear_hook_results &&`
			`git clone --no-local \`
			`-u "git -c uploadpack.packObjectsHook=./hook upload-pack" \`
			`. dst.git 2>stderr &&`
			`grep "hook running" stderr`
			`'`

			`test_expect_success 'hook does not run from repo config' '`
			`clear_hook_results &&`
			`test_config uploadpack.packObjectsHook "./hook" &&`
			`git clone --no-local . dst.git 2>stderr &&`
			`! grep "hook running" stderr &&`
			`test_path_is_missing .git/hook.args &&`
			`test_path_is_missing .git/hook.stdin &&`
config: learn `git_protected_config()` `uploadpack.packObjectsHook` is the only 'protected configuration only' variable today, but we've noted that `safe.directory` and the upcoming `safe.bareRepository` should also be 'protected configuration only'. So, for consistency, we'd like to have a single implementation for protected configuration. The primary constraints are: 1. Reading from protected configuration should be fast. Nearly all "git" commands inside a bare repository will read both `safe.directory` and `safe.bareRepository`, so we cannot afford to be slow. 2. Protected configuration must be readable when the gitdir is not known. `safe.directory` and `safe.bareRepository` both affect repository discovery and the gitdir is not known at that point [1]. The chosen implementation in this commit is to read protected configuration and cache the values in a global configset. This is similar to the caching behavior we get with the_repository->config. Introduce git_protected_config(), which reads protected configuration and caches them in the global configset protected_config. Then, refactor `uploadpack.packObjectsHook` to use git_protected_config(). The protected configuration functions are named similarly to their non-protected counterparts, e.g. git_protected_config_check_init() vs git_config_check_init(). In light of constraint 1, this implementation can still be improved. git_protected_config() iterates through every variable in protected_config, which is wasteful, but it makes the conversion simple because it matches existing patterns. We will likely implement constant time lookup functions for protected configuration in a future series (such functions already exist for non-protected configuration, i.e. repo_config_get_*()). An alternative that avoids introducing another configset is to continue to read all config using git_config(), but only accept values that have the correct config scope [2]. This technically fulfills constraint 2, because git_config() simply ignores the local and worktree config when the gitdir is not known. However, this would read incomplete config into the_repository->config, which would need to be reset when the gitdir is known and git_config() needs to read the local and worktree config. Resetting the_repository->config might be reasonable while we only have these 'protected configuration only' variables, but it's not clear whether this extends well to future variables. [1] In this case, we do have a candidate gitdir though, so with a little refactoring, it might be possible to provide a gitdir. [2] This is how `uploadpack.packObjectsHook` was implemented prior to this commit. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2022-07-14 23:27:59 +02:00			`test_path_is_missing .git/hook.stdout &&`

			`# check that global config is used instead`
			`test_config_global uploadpack.packObjectsHook ./hook &&`
			`git clone --no-local . dst2.git 2>stderr &&`
			`grep "hook running" stderr`
upload-pack: provide a hook for running pack-objects When upload-pack serves a client request, it turns to pack-objects to do the heavy lifting of creating a packfile. There's no easy way to intercept the call to pack-objects, but there are a few good reasons to want to do so: 1. If you're debugging a client or server issue with fetching, you may want to store a copy of the generated packfile. 2. If you're gathering data from real-world fetches for performance analysis or debugging, storing a copy of the arguments and stdin lets you replay the pack generation at your leisure. 3. You may want to insert a caching layer around pack-objects; it is the most CPU- and memory-intensive part of serving a fetch, and its output is a pure function[1] of its input, making it an ideal place to consolidate identical requests. This patch adds a simple "hook" interface to intercept calls to pack-objects. The new test demonstrates how it can be used for debugging (using it for caching is a straightforward extension; the tricky part is writing the actual caching layer). This hook is unlike the normal hook scripts found in the "hooks/" directory of a repository. Because we promise that upload-pack is safe to run in an untrusted repository, we cannot execute arbitrary code or commands found in the repository (neither in hooks/, nor in the config). So instead, this hook is triggered from a config variable that is explicitly ignored in the per-repo config. The config variable holds the actual shell command to run as the hook. Another approach would be to simply treat it as a boolean: "should I respect the upload-pack hooks in this repo?", and then run the script from "hooks/" as we usually do. However, that isn't as flexible; there's no way to run a hook approved by the site administrator (e.g., in "/etc/gitconfig") on a repository whose contents are not trusted. The approach taken by this patch is more fine-grained, if a little less conventional for git hooks (it does behave similar to other configured commands like diff.external, etc). [1] Pack-objects isn't _actually_ a pure function. Its output depends on the exact packing of the object database, and if multi-threading is used for delta compression, can even differ racily. But for the purposes of caching, that's OK; of the many possible outputs for a given input, it is sufficient only that we output one of them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-05-19 00:45:37 +02:00			`'`

upload-pack.c: fix filter spec quoting bug Fix a bug in upload-pack.c that occurs when you combine partial clone and uploadpack.packObjectsHook. You can reproduce it as follows: git clone -u 'git -c uploadpack.allowfilter '\ '-c uploadpack.packobjectshook=env '\ 'upload-pack' --filter=blob:none --no-local \ src.git dst.git Be careful with the line endings because this has a long quoted string as the -u argument. The error I get when I run this is: Cloning into '/tmp/broken'... remote: fatal: invalid filter-spec ''blob:none'' error: git upload-pack: git-pack-objects died with error. fatal: git upload-pack: aborting due to possible repository corruption on the remote side. remote: aborting due to possible repository corruption on the remote side. fatal: early EOF fatal: index-pack failed The problem is caused by unneeded quoting. This bug was already present in 10ac85c785 (upload-pack: add object filtering for partial clone, 2017-12-08) when the server side filter support was introduced. In fact, in 10ac85c785 this was broken regardless of uploadpack.packObjectsHook. Then in 0b6069fe0a (fetch-pack: test support excluding large blobs, 2017-12-08) the quoting was removed but only behind a conditional that depends on whether uploadpack.packObjectsHook is set. Because uploadpack.packObjectsHook is apparently rarely used, nobody noticed the problematic quoting could still happen. Remove the conditional quoting and add a test for partial clone in t5544-pack-objects-hook. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-01-28 17:04:53 +01:00			`test_expect_success 'hook works with partial clone' '`
			`clear_hook_results &&`
			`test_config_global uploadpack.packObjectsHook ./hook &&`
			`test_config_global uploadpack.allowFilter true &&`
			`git clone --bare --no-local --filter=blob:none . dst.git &&`
t5544: clarify 'hook works with partial clone' test Apply a few leftover improvements from the review of ad5df6b782 (upload-pack.c: fix filter spec quoting bug). 1. Instead of enumerating objects reachable from HEAD, enumerate all reachable objects, because HEAD has not special significance in this test. 2. Instead of relying on the knowledge that "? in rev-list output means partial clone", explicitly verify that there are no blobs with cat-file. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-02-02 20:24:17 +01:00			`git -C dst.git rev-list --objects --missing=allow-any --no-object-names --all >objects &&`
			`git -C dst.git cat-file --batch-check="%(objecttype)" <objects >types &&`
			`! grep blob types`
upload-pack.c: fix filter spec quoting bug Fix a bug in upload-pack.c that occurs when you combine partial clone and uploadpack.packObjectsHook. You can reproduce it as follows: git clone -u 'git -c uploadpack.allowfilter '\ '-c uploadpack.packobjectshook=env '\ 'upload-pack' --filter=blob:none --no-local \ src.git dst.git Be careful with the line endings because this has a long quoted string as the -u argument. The error I get when I run this is: Cloning into '/tmp/broken'... remote: fatal: invalid filter-spec ''blob:none'' error: git upload-pack: git-pack-objects died with error. fatal: git upload-pack: aborting due to possible repository corruption on the remote side. remote: aborting due to possible repository corruption on the remote side. fatal: early EOF fatal: index-pack failed The problem is caused by unneeded quoting. This bug was already present in 10ac85c785 (upload-pack: add object filtering for partial clone, 2017-12-08) when the server side filter support was introduced. In fact, in 10ac85c785 this was broken regardless of uploadpack.packObjectsHook. Then in 0b6069fe0a (fetch-pack: test support excluding large blobs, 2017-12-08) the quoting was removed but only behind a conditional that depends on whether uploadpack.packObjectsHook is set. Because uploadpack.packObjectsHook is apparently rarely used, nobody noticed the problematic quoting could still happen. Remove the conditional quoting and add a test for partial clone in t5544-pack-objects-hook. Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2021-01-28 17:04:53 +01:00			`'`

upload-pack: provide a hook for running pack-objects When upload-pack serves a client request, it turns to pack-objects to do the heavy lifting of creating a packfile. There's no easy way to intercept the call to pack-objects, but there are a few good reasons to want to do so: 1. If you're debugging a client or server issue with fetching, you may want to store a copy of the generated packfile. 2. If you're gathering data from real-world fetches for performance analysis or debugging, storing a copy of the arguments and stdin lets you replay the pack generation at your leisure. 3. You may want to insert a caching layer around pack-objects; it is the most CPU- and memory-intensive part of serving a fetch, and its output is a pure function[1] of its input, making it an ideal place to consolidate identical requests. This patch adds a simple "hook" interface to intercept calls to pack-objects. The new test demonstrates how it can be used for debugging (using it for caching is a straightforward extension; the tricky part is writing the actual caching layer). This hook is unlike the normal hook scripts found in the "hooks/" directory of a repository. Because we promise that upload-pack is safe to run in an untrusted repository, we cannot execute arbitrary code or commands found in the repository (neither in hooks/, nor in the config). So instead, this hook is triggered from a config variable that is explicitly ignored in the per-repo config. The config variable holds the actual shell command to run as the hook. Another approach would be to simply treat it as a boolean: "should I respect the upload-pack hooks in this repo?", and then run the script from "hooks/" as we usually do. However, that isn't as flexible; there's no way to run a hook approved by the site administrator (e.g., in "/etc/gitconfig") on a repository whose contents are not trusted. The approach taken by this patch is more fine-grained, if a little less conventional for git hooks (it does behave similar to other configured commands like diff.external, etc). [1] Pack-objects isn't _actually_ a pure function. Its output depends on the exact packing of the object database, and if multi-threading is used for delta compression, can even differ racily. But for the purposes of caching, that's OK; of the many possible outputs for a given input, it is sufficient only that we output one of them. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-05-19 00:45:37 +02:00			`test_done`