git-commit-vandalism/builtin
Taylor Blau 6f054f9fb3 builtin/clone.c: disallow --local clones with symlinks
When cloning a repository with `--local`, Git relies on either making a
hardlink or copy to every file in the "objects" directory of the source
repository. This is done through the callpath `cmd_clone()` ->
`clone_local()` -> `copy_or_link_directory()`.

The way this optimization works is by enumerating every file and
directory recursively in the source repository's `$GIT_DIR/objects`
directory, and then either making a copy or hardlink of each file. The
only exception to this rule is when copying the "alternates" file, in
which case paths are rewritten to be absolute before writing a new
"alternates" file in the destination repo.

One quirk of this implementation is that it dereferences symlinks when
cloning. This behavior was most recently modified in 36596fd2df (clone:
better handle symlinked files at .git/objects/, 2019-07-10), which
attempted to support `--local` clones of repositories with symlinks in
their objects directory in a platform-independent way.

Unfortunately, this behavior of dereferencing symlinks (that is,
creating a hardlink or copy of the source's link target in the
destination repository) can be used as a component in attacking a
victim by inadvertently exposing the contents of file stored outside of
the repository.

Take, for example, a repository that stores a Dockerfile and is used to
build Docker images. When building an image, Docker copies the directory
contents into the VM, and then instructs the VM to execute the
Dockerfile at the root of the copied directory. This protects against
directory traversal attacks by copying symbolic links as-is without
dereferencing them.

That is, if a user has a symlink pointing at their private key material
(where the symlink is present in the same directory as the Dockerfile,
but the key itself is present outside of that directory), the key is
unreadable to a Docker image, since the link will appear broken from the
container's point of view.

This behavior enables an attack whereby a victim is convinced to clone a
repository containing an embedded submodule (with a URL like
"file:///proc/self/cwd/path/to/submodule") which has a symlink pointing
at a path containing sensitive information on the victim's machine. If a
user is tricked into doing this, the contents at the destination of
those symbolic links are exposed to the Docker image at runtime.

One approach to preventing this behavior is to recreate symlinks in the
destination repository. But this is problematic, since symlinking the
objects directory are not well-supported. (One potential problem is that
when sharing, e.g. a "pack" directory via symlinks, different writers
performing garbage collection may consider different sets of objects to
be reachable, enabling a situation whereby garbage collecting one
repository may remove reachable objects in another repository).

Instead, prohibit the local clone optimization when any symlinks are
present in the `$GIT_DIR/objects` directory of the source repository.
Users may clone the repository again by prepending the "file://" scheme
to their clone URL, or by adding the `--no-local` option to their `git
clone` invocation.

The directory iterator used by `copy_or_link_directory()` must no longer
dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()`
in order to discover whether or not there are symlinks present). This has
no bearing on the overall behavior, since we will immediately `die()` on
encounter a symlink.

Note that t5604.33 suggests that we do support local clones with
symbolic links in the source repository's objects directory, but this
was likely unintentional, or at least did not take into consideration
the problem with sharing parts of the objects directory with symbolic
links at the time. Update this test to reflect which options are and
aren't supported.

Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-10-01 00:23:38 -04:00
..
add.c drop unused argc parameters 2020-09-30 12:53:47 -07:00
am.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
annotate.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
apply.c
archive.c
bisect--helper.c Merge branch 'mr/bisect-in-c-3' 2020-11-09 14:06:25 -08:00
blame.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
branch.c branch: sort detached HEAD based on a flag 2021-01-07 15:13:21 -08:00
bugreport.c builtin/bugreport.c: use thread-safe localtime_r() 2020-12-01 13:05:37 -08:00
bundle.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
cat-file.c Merge branch 'cc/cat-file-usage-update' into master 2020-07-09 14:00:41 -07:00
check-attr.c
check-ignore.c dir: fix problematic API to avoid memory leaks 2020-08-18 17:17:31 -07:00
check-mailmap.c
check-ref-format.c
checkout-index.c checkout-index: propagate errors to exit code 2020-10-27 12:41:56 -07:00
checkout.c Merge branch 'dl/checkout-p-merge-base' 2020-12-23 13:59:46 -08:00
clean.c quote_path: give flags parameter to quote_path() 2020-09-10 10:49:19 -07:00
clone.c builtin/clone.c: disallow --local clones with symlinks 2022-10-01 00:23:38 -04:00
column.c
commit-graph.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
commit-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
commit.c Documentation: stylistically normalize references to Signed-off-by: 2020-10-20 11:57:40 -07:00
config.c config: implement --fixed-value with --get* 2020-11-25 14:43:48 -08:00
count-objects.c
credential-cache--daemon.c make credential helpers builtins 2020-08-13 11:02:08 -07:00
credential-cache.c Merge branch 'jc/undash-in-tree-git-callers' 2020-09-03 12:37:03 -07:00
credential-store.c crendential-store: use timeout when locking file 2020-11-25 12:30:18 -08:00
credential.c credential: load default config 2020-10-16 12:30:45 -07:00
describe.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
diff-files.c diff-files: treat "i-t-a" files as "not-in-index" 2020-06-22 10:46:45 -07:00
diff-index.c builtin/diff-index: learn --merge-base 2020-09-20 21:30:26 -07:00
diff-tree.c builtin/diff-tree: learn --merge-base 2020-09-21 13:37:03 -07:00
diff.c Merge branch 'dl/diff-merge-base' 2020-11-02 13:17:39 -08:00
difftool.c Use new HASHMAP_INIT macro to simplify hashmap initialization 2020-11-11 12:55:27 -08:00
env--helper.c assert PARSE_OPT_NONEG in parse-options callbacks 2020-09-30 12:53:47 -07:00
fast-export.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
fast-import.c Merge branch 'jk/fast-import-marks-cleanup' 2020-11-02 13:17:40 -08:00
fetch-pack.c fetch-pack: remove no_dependents code 2020-08-18 16:46:53 -07:00
fetch.c hashmap: provide deallocation function names 2020-11-02 12:15:50 -08:00
fmt-merge-msg.c Lib-ify fmt-merge-msg 2020-03-24 15:04:43 -07:00
for-each-ref.c ref-filter: move ref_sorting flags to a bitfield 2021-01-07 15:13:21 -08:00
for-each-repo.c for-each-repo: do nothing on empty config 2021-01-07 19:12:02 -08:00
fsck.c fsck: do not lazy fetch known non-promisor object 2020-08-06 13:01:03 -07:00
gc.c builtin/gc: don't peek into struct lock_file 2021-01-06 13:53:32 -08:00
get-tar-commit-id.c
grep.c grep: use designated initializers for grep_defaults 2020-11-21 14:50:33 -08:00
hash-object.c
help.c help: drop usage of 'common' and 'useful' for guides 2020-08-04 18:34:01 -07:00
index-pack.c compute pack .idx byte offsets using size_t 2020-11-16 13:41:35 -08:00
init-db.c get_default_branch_name(): prepare for showing some advice 2020-12-13 15:53:50 -08:00
interpret-trailers.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
log.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
ls-files.c dir: fix problematic API to avoid memory leaks 2020-08-18 17:17:31 -07:00
ls-remote.c Merge branch 'jk/unleak-fixes' 2020-08-24 14:54:30 -07:00
ls-tree.c
mailinfo.c
mailsplit.c
merge-base.c rebase: --fork-point regression fix 2020-02-11 09:59:39 -08:00
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c
merge-tree.c merge-base, xdiff: zero out xpparam_t structures 2020-10-20 12:53:26 -07:00
merge.c Merge branch 'en/merge-ort-api-null-impl' 2020-11-18 13:32:53 -08:00
mktag.c sha1-file: allow check_object_signature() to handle any repo 2020-01-31 10:45:39 -08:00
mktree.c
multi-pack-index.c
mv.c git-mv: improve error message for conflicted file 2020-07-20 14:35:43 -07:00
name-rev.c messages: avoid SHA-1 in end-user facing messages 2020-08-14 09:33:37 -07:00
notes.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
pack-objects.c Merge branch 'jc/object-names-are-not-sha-1' 2020-08-19 16:14:52 -07:00
pack-redundant.c Merge branch 'jx/pack-redundant-on-single-pack' 2020-12-23 13:59:46 -08:00
pack-refs.c
patch-id.c patch-id: use oid_to_hex() to print multiple object IDs 2019-12-09 12:26:40 -08:00
prune-packed.c Lib-ify prune-packed 2020-03-24 15:04:44 -07:00
prune.c Merge branch 'tb/shallow-cleanup' 2020-05-13 12:19:18 -07:00
pull.c Merge branch 'pb/pull-rebase-recurse-submodules' 2020-12-03 00:18:06 -08:00
push.c Merge branch 'sk/force-if-includes' 2020-10-27 15:09:49 -07:00
range-diff.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
read-tree.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
rebase.c Merge branch 'rs/rebase-commit-validation' into maint 2021-02-05 16:31:22 -08:00
receive-pack.c Merge branch 'js/trace2-session-id' 2020-12-08 15:11:20 -08:00
reflog.c Merge branch 'es/get-worktrees-unsort' 2020-07-06 22:09:15 -07:00
remote-ext.c strvec: convert builtin/ callers away from argv_array name 2020-07-28 15:02:18 -07:00
remote-fd.c
remote.c config: convert multi_replace to flags 2020-11-25 14:43:47 -08:00
repack.c builtin/repack.c: don't move existing packs out of the way 2020-11-17 13:31:55 -08:00
replace.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
rerere.c
reset.c wt-status: tolerate dangling marks 2020-09-02 14:39:25 -07:00
rev-list.c bisect: combine args passed to find_bisection() 2020-08-07 15:13:03 -07:00
rev-parse.c rev-parse: handle --end-of-options 2020-11-10 13:46:27 -08:00
revert.c Merge branch 'en/merge-ort-api-null-impl' 2020-11-18 13:32:53 -08:00
rm.c rm: support the --pathspec-from-file option 2020-02-19 10:56:49 -08:00
send-pack.c push: parse and set flag for "--force-if-includes" 2020-10-03 09:59:19 -07:00
shortlog.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
show-branch.c Merge branch 'jt/interpret-branch-name-fallback' 2020-09-09 13:53:09 -07:00
show-index.c builtin/show-index: provide options to determine hash algo 2020-05-27 10:07:07 -07:00
show-ref.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
sparse-checkout.c sparse-checkout: fill in some options boilerplate 2020-09-30 12:53:48 -07:00
stash.c Merge branch 'en/stash-apply-sparse-checkout' into maint 2021-02-05 16:31:22 -08:00
stripspace.c
submodule--helper.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
symbolic-ref.c
tag.c ref-filter: move ref_sorting flags to a bitfield 2021-01-07 15:13:21 -08:00
unpack-file.c
unpack-objects.c sha1-file: pass git_hash_algo to hash_object_file() 2020-01-31 10:45:39 -08:00
update-index.c Use OPT_CALLBACK and OPT_CALLBACK_F 2020-04-28 10:47:10 -07:00
update-ref.c update-ref: disallow "start" for ongoing transactions 2020-11-16 13:44:01 -08:00
update-server-info.c
upload-archive.c strvec: rename struct fields 2020-07-30 19:18:06 -07:00
upload-pack.c
var.c
verify-commit.c
verify-pack.c Merge branch 'bc/sha-256-part-3' 2020-08-11 18:04:11 -07:00
verify-tag.c
worktree.c Merge branch 'mt/worktree-error-message-fix' 2020-11-30 14:49:43 -08:00
write-tree.c