When cloning a repo with a --filter and with --recurse-submodules
enabled, the partial clone filter only applies to the top-level repo.
This can lead to unexpected bandwidth and disk usage for projects which
include large submodules. For example, a user might wish to make a
partial clone of Gerrit and would run:
`git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`.
However, only the superproject would be a partial clone; all the
submodules would have all blobs downloaded regardless of their size.
With this change, the same filter can also be applied to submodules,
meaning the expected bandwidth and disk savings apply consistently.
To avoid changing default behavior, add a new clone flag,
`--also-filter-submodules`. When this is set along with `--filter` and
`--recurse-submodules`, the filter spec is passed along to git-submodule
and git-submodule--helper, such that submodule clones also have the
filter applied.
This applies the same filter to the superproject and all submodules.
Users who need to customize the filter per-submodule would need to clone
with `--no-recurse-submodules` and then manually initialize each
submodule with the proper filter.
Applying filters to submodules should be safe thanks to Jonathan Tan's
recent work [1, 2, 3] eliminating the use of alternates as a method of
accessing submodule objects, so any submodule object access now triggers
a lazy fetch from the submodule's promisor remote if the accessed object
is missing. This patch is a reworked version of [4], which was created
prior to Jonathan Tan's work.
[1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16)
[2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate',
2021-09-20)
[3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules',
2021-10-25)
[4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The is_zero_oid() function in git-submodule.sh has not been used since
e83e3333b5 (submodule: port submodule subcommand 'summary' from shell
to C, 2020-08-13), so we can remove it.
This was the last user of the sane_egrep() function in
git-sh-setup.sh. I'm not removing it in case some out-of-tree user
relied on it. Per the discussion that can be found upthread of [1].
1. https://lore.kernel.org/git/87tuiwjfvi.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new submodule--helper subcommand `run-update-procedure` that runs
the update procedure if the SHA1 of the submodule does not match what
the superproject expects.
This is an intermediate change that works towards total conversion of
`submodule update` from shell to C.
Specific error codes are returned so that the shell script calling the
subcommand can take a decision on the control flow, and preserve the
error messages across subsequent recursive calls of `cmd_update`.
This change is more focused on doing a faithful conversion, so for now we
are not too concerned with trying to reduce subprocess spawns.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce the 'add' subcommand to `submodule--helper.c` that does all
the work 'submodule add' past the parsing of flags.
We also remove the constness of the sm_path field of the `add_data`
struct. This is needed so that it can be modified by
normalize_path_copy().
As with the previous conversions, this is meant to be a faithful
conversion with no modification to the behaviour of `submodule add`.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Helped-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new "add-config" subcommand to `git submodule--helper` with the
goal of converting part of the shell code in git-submodule.sh related to
`git submodule add` into C code. This new subcommand sets the
configuration variables of a newly added submodule, by registering the
url in local git config, as well as the submodule name and path in the
.gitmodules file. It also sets 'submodule.<name>.active' to "true" if
the submodule path has not already been covered by any pathspec
specified in 'submodule.active'.
This is meant to be a faithful conversion from shell to C, although we
add comments to areas that could be improved in future patches, after
the conversion has settled.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rewrite of "git submodule" in C continues.
* ar/submodule-add:
submodule: drop unused sm_name parameter from show_fetch_remotes()
submodule--helper: introduce add-clone subcommand
submodule--helper: refactor module_clone()
submodule: prefix die messages with 'fatal'
t7400: test failure to add submodule in tracked path
Let's add a new "add-clone" subcommand to `git submodule--helper` with
the goal of converting part of the shell code in git-submodule.sh
related to `git submodule add` into C code. This new subcommand clones
the repository that is to be added, and checks out to the appropriate
branch.
This is meant to be a faithful conversion that leaves the behaviour of
'cmd_add()' script unchanged.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Shourya Shukla <periperidip@gmail.com>
Based-on-patch-by: Prathamesh Chavan <pc44800@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The standard `die()` function that is used in C code prefixes all the
messages passed to it with 'fatal: '. This does not happen with the
`die` used in 'git-submodule.sh'.
Let's prefix each of the shell die messages with 'fatal: ' so that when
they are converted to C code, the error messages stay the same as before
the conversion.
Note that the shell version of `die` exits with error code 1, while the
C version exits with error code 128. In practice, this does not change
any behaviour, as no functionality in 'submodule add' and 'submodule
update' relies on the value of the exit code.
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Shourya Shukla <periperidip@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Over time when parts of submodule have been ported from shell to
builtin, many instances of the submodule helper have been added.
Also added with them are some unnecessary option passing
logic that are based on the `prefix` shell variable which never
gets set in their code flows.
On analysis, the only shell functions which have a valid usage
for the `prefix` shell variable are:
- cmd_update: which is the only function which sets the variable
and thus uses it properly
- cmd_init: which uses the variable via a call from cmd_update
So, remove the unnecessary option parsing logic based on the `prefix`
shell variable.
Signed-off-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands such as
$ git submodule update --quiet --init --depth=1
involving shallow clones, call the shell function fetch_in_submodule, which
in turn invokes git fetch. Pass the --quiet option onward there.
Signed-off-by: Nicholas Clark <nick@ccl4.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1c1518071c (submodule: use "fetch" logic instead of custom remote
discovery, 2020-11-14) rewrote the logic in fetch_in_submodule to do:
elif test "$2" -ne ""
But this is nonsense in shell: -ne is for numeric comparisons. This
should be "=" or more idiomatically:
elif test -n "$2"
But once we fix that, many tests start failing. Because that commit
introduced another problem. The caller that passes 3 arguments looks
like this:
fetch_in_submodule "$sm_path" $depth "$sha1"
Note the unquoted $depth parameter. When it isn't set, the function will
see only 2 arguments, and the function has no idea if what it sees in $2
is an option to go on the command line, or a refspec to pass on stdin.
In the old code before that commit:
fetch_in_submodule () (
sanitize_submodule_env &&
cd "$1" &&
- case "$2" in
- '')
- git fetch ;;
- *)
- shift
- git fetch $(get_default_remote) "$@" ;;
- esac
we treated those the same, so it didn't matter. But in the new logic
(with my fix above):
+ if test $# -eq 3
+ then
+ echo "$3" | git fetch --stdin "$2"
+ elif test -n "$n"
+ then
+ git fetch "$2"
+ else
+ git fetch
+ fi
we use the number of parameters to distinguish the two. Let's insist
that the caller pass an empty string for positional parameter two if
they want to have a third parameter after it.
But that still leaves one problem. In the --stdin block, we
unconditionally pass "$2" to git-fetch, even if it's the empty string.
Rather than add another conditional, we can use :+ parameter expansion
to include it only if it's non-empty. In fact, we can do the same for
the elif, too, simplifying it further. Technically this is overkill,
since we know the --depth parameter will not have whitespace (and
indeed, most callers do not bother quoting it), but it doesn't hurt for
the function to be careful.
It's somewhat amazing that no tests were failing. I think what happened
is that:
- the 3-arg form rarely triggered; any call with a non-empty $depth
and a $sha1 would work, but one with an empty $depth would only have
2 arguments
- because of the wrong arguments to "test", the shell would complain
and exit non-zero. So we never ran the middle conditional at all
- that left every call running "git fetch" with no arguments. A
well-written test could have detected the distinction here, but in
practice omitting --depth just means fetching more commits, and
fetching everything (rather than a single sha1) works as long as the
commit in question is reachable
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous two commits removed the last use of a function in this
library, but most of it had been dead code for a while[1][2]. Only the
"get_default_remote" function was still being used.
Even though we had a manual page for this library it was never
intended (or I expect, actually) used outside of git.git. Let's just
remove it, if anyone still cares about a function here they can pull
them into their own project[3].
1. Last use of error_on_missing_default_upstream():
d03ebd411c ("rebase: remove the rebase.useBuiltin setting",
2019-03-18)
2. Last use of get_remote_merge_branch(): 49eb8d39c7 ("Remove
contrib/examples/*", 2018-03-25)
3. https://lore.kernel.org/git/87a6vmhdka.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the now-redundant "get_default_remote" function by converting
its last user to the "print-default-remote" helper.
As can be seen in 13424764db ("submodule: port submodule subcommand
'sync' from shell to C", 2018-01-15) this helper is already used
internally by the C code for submodule remote name discovery.
The "get_default_remote" function in "git-parse-remote.sh" will be
removed in a follow-up change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace a use of the get_default_remote() function with an invocation
of "git fetch"
The "fetch" command already has logic to discover the remote for the
current branch. However, before it learned to accept a custom
refspec *and* use its idea of the default remote, it wasn't possible
to get rid of some equivalent of the "get_default_remote" invocation
here.
As it turns out the recently added "--stdin" option to fetch[1] gives
us a way to do that. Let's use it instead.
While I'm at it simplify the "fetch_in_submodule" function. It wasn't
necessary to pass "$@" to "fetch" since we'd only ever provide one
SHA-1 as an argument in the previous "*" codepath (in addition to
"--depth=N"). Rewrite the function to more narrowly reflect its
use-case.
1. https://lore.kernel.org/git/87eekwf87n.fsf@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commands such as
$ git pull --rebase --recurse-submodules --quiet
produce non-quiet output from the merge or rebase. Pass the --quiet
option down when invoking "rebase" and "merge".
Also fix the parsing of git submodule update -v.
When e84c3cf3 (git-submodule.sh: accept verbose flag in cmd_update
to be non-quiet, 2018-08-14) taught "git submodule update" to take
"--quiet", it apparently did not know how ${GIT_QUIET:+--quiet}
works, and reviewers seem to have missed that setting the variable
to "0", rather than unsetting it, still results in "--quiet" being
passed to underlying commands.
Signed-off-by: Theodore Dubois <tbodt@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert submodule subcommand 'summary' to a builtin and call it via
'git-submodule.sh'.
The shell version had to call $diff_cmd twice, once to find the modified
modules cared by the user and then again, with that list of modules
to do various operations for computing the summary of those modules.
On the other hand, the C version does not need a second call to
$diff_cmd since it reuses the module list from the first call to do the
aforementioned tasks.
In the C version, we use the combination of setting a child process'
working directory to the submodule path and then calling
'prepare_submodule_repo_env()' which also sets the 'GIT_DIR' to '.git',
so that we can be certain that those spawned processes will not access
the superproject's ODB by mistake.
A behavioural difference between the C and the shell version is that the
shell version outputs two line feeds after the 'git log' output when run
outside of the tests while the C version outputs one line feed in any
case. The reason for this is that the shell version calls log with
'--pretty=format:<fmt>' whose output is followed by two echo
calls; 'format' does not have "terminator" semantics like its 'tformat'
counterpart. So, the log output is terminated by a newline only when
invoked by the user and not when invoked from the scripts. This results
in the one & two line feed differences in the shell version.
On the other hand, the C version calls log with '--pretty=<fmt>'
which is equivalent to '--pretty:tformat:<fmt>' which is then
followed by a 'printf("\n")'. Due to its "terminator" semantics the
log output is always terminated by newline and hence one line feed in
any case.
Also, when we try to pass an option-like argument after a non-option
argument, for instance:
git submodule summary HEAD --foo-bar
(or)
git submodule summary HEAD --cached
That argument would be treated like a path to the submodule for which
the user is requesting a summary. So, the option ends up having no
effect. Though, passing '--quiet' is an exception to this:
git submodule summary HEAD --quiet
While 'summary' doesn't support '--quiet', we don't get an output for
the above command as '--quiet' is treated as a path which means we get
an output only if a submodule whose path is '--quiet' exists.
The error message in case of computing a summary for non-existent
submodules in the C version is different from that of the shell version.
Since the new error message is not marked for translation, change the
'test_i18ngrep' in t7421.4 to 'grep'.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Stefan Beller <stefanbeller@gmail.com>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert submodule subcommand 'set-branch' to a builtin and call it via
'git-submodule.sh'.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Denton Liu <liu.denton@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert submodule subcommand 'set-url' to a builtin. Port 'set-url' to
'submodule--helper.c' and call the latter via 'git-submodule.sh'.
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git submodule" command did not initialize a few variables it
internally uses and was affected by variable settings leaked from
the environment.
* lx/submodule-clear-variables:
git-submodule.sh: setup uninitialized variables
We have an environment variable `jobs=16` defined in our CI system, and
this environment makes our build job failed with the following message:
error: pathspec '16' did not match any file(s) known to git
The pathspec '16' for Git command is from the environment variable
"jobs".
This is because "git-submodule" command is implemented in shell script,
and environment variables may change its behavior. Set values for
uninitialized variables, such as "jobs" and "recommend_shallow" will
fix this issue.
Helped-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Li Xuejiang <xuejiang@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone --recurse-submodules --single-branch" now uses the same
single-branch option when cloning the submodules.
* es/recursive-single-branch-clone:
clone: pass --single-branch during --recurse-submodules
submodule--helper: use C99 named initializer
Previously, performing "git clone --recurse-submodules --single-branch"
resulted in submodules cloning all branches even though the superproject
cloned only one branch. Pipe --single-branch through the submodule
helper framework to make it to 'clone' later on.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unless --force is specified, 'submodule add' checks if the destination
path is ignored by calling 'git add --dry-run --ignore-missing', and,
if that call fails, aborts with a custom "path is ignored" message (a
slight variant of what 'git add' shows). Aborting early rather than
letting the downstream 'git add' call fail is done so that the command
exits before cloning into the destination path. However, in rare
cases where the dry-run call fails for a reason other than the path
being ignored---for example, due to a preexisting index.lock
file---displaying the "ignored path" error message hides the real
source of the failure.
Instead of displaying the tailored "ignored path" message, let's
report the standard error from the dry run to give the caller more
accurate information about failures that are not due to an ignored
path.
For the ignored path case, this leads to the following change in the
error message:
The following [-path is-]{+paths are+} ignored by one of your .gitignore files:
<destination path>
Use -f if you really want to add [-it.-]{+them.+}
The new phrasing is a bit awkward, because 'submodule add' is only
dealing with one destination path. Alternatively, we could continue
to use the tailored message when the exit code is 1 (the expected
status for a failure due to an ignored path) and relay the standard
error for all other non-zero exits. That, however, risks hiding the
message of unrelated failures that share an exit code of 1, so it
doesn't seem worth doing just to avoid a clunkier, but still clear,
error message.
Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
In addition to preventing `.git` from being tracked by Git, on Windows
we also have to prevent `git~1` from being tracked, as the default NTFS
short name (also known as the "8.3 filename") for the file name `.git`
is `git~1`, otherwise it would be possible for malicious repositories to
write directly into the `.git/` directory, e.g. a `post-checkout` hook
that would then be executed _during_ a recursive clone.
When we implemented appropriate protections in 2b4c6efc82 (read-cache:
optionally disallow NTFS .git variants, 2014-12-16), we had analyzed
carefully that the `.git` directory or file would be guaranteed to be
the first directory entry to be written. Otherwise it would be possible
e.g. for a file named `..git` to be assigned the short name `git~1` and
subsequently, the short name generated for `.git` would be `git~2`. Or
`git~3`. Or even `~9999999` (for a detailed explanation of the lengths
we have to go to protect `.gitmodules`, see the commit message of
e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11)).
However, by exploiting two issues (that will be addressed in a related
patch series close by), it is currently possible to clone a submodule
into a non-empty directory:
- On Windows, file names cannot end in a space or a period (for
historical reasons: the period separating the base name from the file
extension was not actually written to disk, and the base name/file
extension was space-padded to the full 8/3 characters, respectively).
Helpfully, when creating a directory under the name, say, `sub.`, that
trailing period is trimmed automatically and the actual name on disk
is `sub`.
This means that while Git thinks that the submodule names `sub` and
`sub.` are different, they both access `.git/modules/sub/`.
- While the backslash character is a valid file name character on Linux,
it is not so on Windows. As Git tries to be cross-platform, it
therefore allows backslash characters in the file names stored in tree
objects.
Which means that it is totally possible that a submodule `c` sits next
to a file `c\..git`, and on Windows, during recursive clone a file
called `..git` will be written into `c/`, of course _before_ the
submodule is cloned.
Note that the actual exploit is not quite as simple as having a
submodule `c` next to a file `c\..git`, as we have to make sure that the
directory `.git/modules/b` already exists when the submodule is checked
out, otherwise a different code path is taken in `module_clone()` that
does _not_ allow a non-empty submodule directory to exist already.
Even if we will address both issues nearby (the next commit will
disallow backslash characters in tree entries' file names on Windows,
and another patch will disallow creating directories/files with trailing
spaces or periods), it is a wise idea to defend in depth against this
sort of attack vector: when submodules are cloned recursively, we now
_require_ the directory to be empty, addressing CVE-2019-1349.
Note: the code path we patch is shared with the code path of `git
submodule update --init`, which must not expect, in general, that the
directory is empty. Hence we have to introduce the new option
`--force-init` and hand it all the way down from `git submodule` to the
actual `git submodule--helper` process that performs the initial clone.
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Currently, in the event that a submodule's upstream URL changes, users
have to manually alter the URL in the .gitmodules file then run
`git submodule sync`. Let's make that process easier.
Teach submodule the set-url subcommand which will automatically change
the `submodule.$name.url` property in the .gitmodules file and then run
`git submodule sync` to complete the process.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running "git add" on a repository created inside the current
repository is an explicit indication that the user wants to add it
as a submodule, but when the HEAD of the inner repository is on an
unborn branch, it cannot be added as a submodule. Worse, the files
in its working tree can be added as if they are a part of the outer
repository, which is not what the user wants. These problems are
being addressed.
* km/empty-repo-is-still-a-repo:
add: error appropriately on repository with no commits
dir: do not traverse repositories with no commits
submodule: refuse to add repository with no commits
"git submodule foreach <command> --quiet" did not pass the option
down correctly, which has been corrected.
* nd/submodule-foreach-quiet:
submodule foreach: fix "<command> --quiet" not being respected
Robin reported that
git submodule foreach --quiet git pull --quiet origin
is not really quiet anymore [1]. "git pull" behaves as if --quiet is not
given.
This happens because parseopt in submodule--helper will try to parse
both --quiet options as if they are foreach's options, not git-pull's.
The parsed options are removed from the command line. So when we do
pull later, we execute just this
git pull origin
When calling submodule helper, adding "--" in front of "git pull" will
stop parseopt for parsing options that do not really belong to
submodule--helper foreach.
PARSE_OPT_KEEP_UNKNOWN is removed as a safety measure. parseopt should
never see unknown options or something has gone wrong. There are also
a couple usage string update while I'm looking at them.
While at it, I also add "--" to other subcommands that pass "$@" to
submodule--helper. "$@" in these cases are paths and less likely to be
--something-like-this. But the point still stands, git-submodule has
parsed and classified what are options, what are paths. submodule--helper
should never consider paths passed by git-submodule to be options even
if they look like one.
The test case is also contributed by Robin.
[1] it should be quiet before fc1b9243cd (submodule: port submodule
subcommand 'foreach' from shell to C, 2018-05-10) because parseopt
can't accidentally eat options then.
Reported-by: Robin H. Johnson <robbat2@gentoo.org>
Tested-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the path given to 'git submodule add' is an existing repository
that is not in the index, the repository is passed to 'git add'. If
this repository doesn't have a commit checked out, we don't get a
useful result: there is no subproject OID to track, and any untracked
files in the sub-repository are added as blobs in the top-level
repository.
To avoid getting into this state, abort if the path is a repository
that doesn't have a commit checked out. Note that this check must
come before the 'git add --dry-run' check because the next commit will
make 'git add' fail when given a repository that doesn't have a commit
checked out.
Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This teaches git-submodule the set-branch subcommand which allows the
branch of a submodule to be set through a porcelain command without
having to manually manipulate the .gitmodules file.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning with --recurse-submodules a superproject with at least one
submodule with HEAD pointing to an unborn branch, the clone goes
something like this:
Cloning into 'test'...
<messages about cloning of superproject>
Submodule '<name>' (<uri>) registered for path '<submodule path>'
Cloning into '<submodule path>'...
fatal: Couldn't find remote ref HEAD
Unable to fetch in submodule path '<submodule path>'
<messages about fetching with SHA-1>
From <uri>
* branch <hash> -> FETCH_HEAD
Submodule path '<submodule path>': checked out '<hash>'
In other words, first, a fetch is done with no hash arguments (that is,
a fetch of HEAD) resulting in a "Couldn't find remote ref HEAD" error;
then, a fetch is done given a hash, which succeeds.
The fetch given a hash was added in fb43e31f2b ("submodule: try harder
to fetch needed sha1 by direct fetching sha1", 2016-02-24), and the
"Unable to fetch..." message was downgraded from a fatal error to a
notice in e30d833671 ("git-submodule.sh: try harder to fetch a
submodule", 2018-05-16).
This commit improves the notice to be clearer that we are retrying the
fetch, and that the previous messages (in particular, the fatal errors
from fetch) do not necessarily indicate that the whole command fails. In
other words:
- If the HEAD-fetch succeeds and we then have the commit we want,
git-submodule prints no explanation.
- If the HEAD-fetch succeeds and we do not have the commit we want, but
the hash-fetch succeeds, git-submodule prints no explanation.
- If the HEAD-fetch succeeds and we do not have the commit we want, but
the hash-fetch fails, git-submodule prints a fatal error.
- If the HEAD-fetch fails, fetch prints a fatal error, and
git-submodule informs the user that it will retry by fetching
specific commits by hash.
- If the hash-fetch then succeeds, git-submodule prints no
explanation (besides the ones already printed).
- If the HEAD-fetch then fails, git-submodule prints a fatal error.
It could be said that we should just eliminate the HEAD-fetch
altogether, but that changes some behavior (in particular, some refs
that were opportunistically updated would no longer be), so I have left
that alone for now.
There is an analogous situation with the fetching code in fetch_finish()
and surrounding functions. For now, I have added a NEEDSWORK.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
submodule's default behavior wasn't documented in both git-submodule.txt
and in the usage text of git-submodule. Document the default behavior
similar to how git-remote does it.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git submodule summary" subcommand showed shortened commit
object names by mechanically truncating them at 7-hexdigit, which
has been improved to let "rev-parse --short" scale the length of
the abbreviation with the size of the repository.
* sh/submodule-summary-abbrev-fix:
git-submodule.sh: shorten submodule SHA-1s using rev-parse
Until now, `git submodule summary` was always emitting 7-character
SHA-1s that have a higher chance of being ambiguous for larger
repositories. Use `git rev-parse --short` instead, which will
determine suitable short SHA-1 lengths.
When a submodule hasn't been initialized with "submodule init" or
not cloned, `git rev-parse` would not work in it yet; as a fallback,
use the original method of cutting at 7 hexdigits.
Signed-off-by: Sven van Haastregt <svenvh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by
ensure-core-worktree, 2018-08-13) forgot to exit the submodule operation
if the helper could not ensure that core.worktree is set correctly.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The submodule support has been updated to read from the blob at
HEAD:.gitmodules when the .gitmodules file is missing from the
working tree.
* ao/submodule-wo-gitmodules-checked-out:
t/helper: add test-submodule-nested-repo-config
submodule: support reading .gitmodules when it's not in the working tree
submodule: add a helper to check if it is safe to write to .gitmodules
t7506: clean up .gitmodules properly before setting up new scenario
submodule: use the 'submodule--helper config' command
submodule--helper: add a new 'config' subcommand
t7411: be nicer to future tests and really clean things up
t7411: merge tests 5 and 6
submodule: factor out a config_set_in_gitmodules_file_gently function
submodule: add a print_config_from_gitmodules() helper