apply_parse_options() passes the array of argument strings to
parse_options(), which removes recognized options. The removed strings
are not freed, though.
Make a copy of the strvec to pass to the function to retain the pointers
of its strings, so we release them all at the end.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git branch --edit-description" on an unborh branch misleadingly
said that no such branch exists, which has been corrected.
* rj/branch-edit-desc-unborn:
branch: description for non-existent branch errors
Code clean-up.
* jk/cleanup-callback-parameters:
attr: drop DEBUG_ATTR code
commit: avoid writing to global in option callback
multi-pack-index: avoid writing to global in option callback
test-submodule: inline resolve_relative_url() function
"GIT_EDITOR=: git branch --edit-description" resulted in failure,
which has been corrected.
* jc/branch-description-unset:
branch: do not fail a no-op --edit-desc
"git fsck" failed to release contents of tree objects already used
from the memory, which has been fixed.
* jk/fsck-on-diet:
parse_object_buffer(): respect save_commit_buffer
fsck: turn off save_commit_buffer
fsck: free tree buffers after walking unreachable objects
"git clone" did not like to see the "--bare" and the "--origin"
options used together without a good reason.
* jk/clone-allow-bare-and-o-together:
clone: allow "--bare" with "-o"
"git remote rename" failed to rename a remote without fetch
refspec, which has been corrected.
* jk/remote-rename-without-fetch-refspec:
remote: handle rename of remote without fetch refspec
When the repository does not yet have commits, some errors describe that
there is no branch:
$ git init -b first
$ git branch --edit-description first
error: No branch named 'first'.
$ git branch --set-upstream-to=upstream
fatal: branch 'first' does not exist
$ git branch -c second
error: refname refs/heads/first not found
fatal: Branch copy failed
That "first" branch is unborn but to say it doesn't exists is confusing.
Options "-c" (copy) and "-m" (rename) show the same error when the
origin branch doesn't exists:
$ git branch -c non-existent-branch second
error: refname refs/heads/non-existent-branch not found
fatal: Branch copy failed
$ git branch -m non-existent-branch second
error: refname refs/heads/non-existent-branch not found
fatal: Branch rename failed
Note that "--edit-description" without an explicit argument is already
considering the _empty repository_ circumstance in its error. Also note
that "-m" on the initial branch it is an allowed operation.
Make the error descriptions for those branch operations with unborn or
non-existent branches, more informative.
This is the result of the change:
$ git init -b first
$ git branch --edit-description first
error: No commit on branch 'first' yet.
$ git branch --set-upstream-to=upstream
fatal: No commit on branch 'first' yet.
$ git branch -c second
fatal: No commit on branch 'first' yet.
$ git branch [-c/-m] non-existent-branch second
fatal: No branch named 'non-existent-branch'.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strvec "argv" is used to build a command for run_command_v_opt(),
but never freed. Use a constant string array instead, which doesn't
require any cleanup.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callback function for --trailer writes directly to the global
trailer_args and ignores opt->value completely. This is OK, since that's
where we expect to find the value. But it does mean the option
declaration isn't as clear. E.g., we have:
OPT_BOOL(0, "reset-author", &renew_authorship, ...),
OPT_CALLBACK_F(0, "trailer", NULL, ..., opt_pass_trailer)
In the first one we can see where the result will be stored, but in the
second, we get only NULL, and you have to go read the callback.
Let's pass &trailer_args, and use it in the callback. As a bonus, this
silences a -Wunused-parameter warning.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We declare the --object-dir option like:
OPT_CALLBACK(0, "object-dir", &opts.object_dir, ...);
but the pointer to opts.object_dir is completely unused. Instead, the
callback writes directly to a global. Which fortunately happens to be
opts.object_dir. So everything works as expected, but it's unnecessarily
confusing.
Instead, let's have the callback write to the option value pointer that
has been passed in. This also quiets a -Wunused-parameter warning (since
we don't otherwise look at "opt").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass a constant string array directly to run_command_v_opt() instead of
copying it into a strvec first. This shortens the code and avoids heap
allocations.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning a repository with `--local`, Git relies on either making a
hardlink or copy to every file in the "objects" directory of the source
repository. This is done through the callpath `cmd_clone()` ->
`clone_local()` -> `copy_or_link_directory()`.
The way this optimization works is by enumerating every file and
directory recursively in the source repository's `$GIT_DIR/objects`
directory, and then either making a copy or hardlink of each file. The
only exception to this rule is when copying the "alternates" file, in
which case paths are rewritten to be absolute before writing a new
"alternates" file in the destination repo.
One quirk of this implementation is that it dereferences symlinks when
cloning. This behavior was most recently modified in 36596fd2df (clone:
better handle symlinked files at .git/objects/, 2019-07-10), which
attempted to support `--local` clones of repositories with symlinks in
their objects directory in a platform-independent way.
Unfortunately, this behavior of dereferencing symlinks (that is,
creating a hardlink or copy of the source's link target in the
destination repository) can be used as a component in attacking a
victim by inadvertently exposing the contents of file stored outside of
the repository.
Take, for example, a repository that stores a Dockerfile and is used to
build Docker images. When building an image, Docker copies the directory
contents into the VM, and then instructs the VM to execute the
Dockerfile at the root of the copied directory. This protects against
directory traversal attacks by copying symbolic links as-is without
dereferencing them.
That is, if a user has a symlink pointing at their private key material
(where the symlink is present in the same directory as the Dockerfile,
but the key itself is present outside of that directory), the key is
unreadable to a Docker image, since the link will appear broken from the
container's point of view.
This behavior enables an attack whereby a victim is convinced to clone a
repository containing an embedded submodule (with a URL like
"file:///proc/self/cwd/path/to/submodule") which has a symlink pointing
at a path containing sensitive information on the victim's machine. If a
user is tricked into doing this, the contents at the destination of
those symbolic links are exposed to the Docker image at runtime.
One approach to preventing this behavior is to recreate symlinks in the
destination repository. But this is problematic, since symlinking the
objects directory are not well-supported. (One potential problem is that
when sharing, e.g. a "pack" directory via symlinks, different writers
performing garbage collection may consider different sets of objects to
be reachable, enabling a situation whereby garbage collecting one
repository may remove reachable objects in another repository).
Instead, prohibit the local clone optimization when any symlinks are
present in the `$GIT_DIR/objects` directory of the source repository.
Users may clone the repository again by prepending the "file://" scheme
to their clone URL, or by adding the `--no-local` option to their `git
clone` invocation.
The directory iterator used by `copy_or_link_directory()` must no longer
dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()`
in order to discover whether or not there are symlinks present). This has
no bearing on the overall behavior, since we will immediately `die()` on
encounter a symlink.
Note that t5604.33 suggests that we do support local clones with
symbolic links in the source repository's objects directory, but this
was likely unintentional, or at least did not take into consideration
the problem with sharing parts of the objects directory with symbolic
links at the time. Update this test to reflect which options are and
aren't supported.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Imagine running "git branch --edit-description" while on a branch
without the branch description, and then exit the editor after
emptying the edit buffer, which is the way to tell the command that
you changed your mind and you do not want the description after all.
The command should just happily oblige, adding no branch description
for the current branch, and exit successfully. But it fails to do
so:
$ git init -b main
$ git commit --allow-empty -m commit
$ GIT_EDITOR=: git branch --edit-description
fatal: could not unset 'branch.main.description'
The end result is OK in that the configuration variable does not
exist in the resulting repository, but we should do better. If we
know we didn't have a description, and if we are asked not to have a
description by the editor, we can just return doing nothing.
This of course introduces TOCTOU. If you add a branch description
to the same branch from another window, while you had the editor
open to edit the description, and then exit the editor without
writing anything there, we'd end up not removing the description you
added in the other window. But you are fooling yourself in your own
repository at that point, and if it hurts, you'd be better off not
doing so ;-).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We return an error when trying to rename a remote that has no fetch
refspec:
$ git config --unset-all remote.origin.fetch
$ git remote rename origin foo
fatal: could not unset 'remote.foo.fetch'
To make things even more confusing, we actually _do_ complete the config
modification, via git_config_rename_section(). After that we try to
rewrite the fetch refspec (to say refs/remotes/foo instead of origin).
But our call to git_config_set_multivar() to remove the existing entries
fails, since there aren't any, and it calls die().
We could fix this by using the "gently" form of the config call, and
checking the error code. But there is an even simpler fix: if we know
that there are no refspecs to rewrite, then we can skip that part
entirely.
Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We explicitly forbid the combination of "--bare" with "-o", but there
doesn't seem to be any good reason to do so. The original logic came as
part of e6489a1bdf (clone: do not accept more than one -o option.,
2006-01-22), but that commit does not give any reason.
Furthermore, the equivalent combination via config is allowed:
git -c clone.defaultRemoteName=foo clone ...
and works as expected. It may be that this combination was considered
useless, because a bare clone does not set remote.origin.fetch (and
hence there is no refs/remotes/origin hierarchy). But it does set
remote.origin.url, and that name is visible to the user via "git fetch
origin", etc.
Let's allow the options to be used together, and switch the "forbid"
test in t5606 to check that we use the requested name. That test came
much later in 349cff76de (clone: add tests for --template and some
disallowed option pairs, 2020-09-29), and does not offer any logic
beyond "let's test what the code currently does".
Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the global variable "save_commit_buffer" is set to 0, then
parse_commit() will throw away the commit object data after parsing it,
rather than sticking it into a commit slab. This goes all the way back
to 60ab26de99 ([PATCH] Avoid wasting memory in git-rev-list,
2005-09-15).
But there's another code path which may similarly stash the buffer:
parse_object_buffer(). This is where we end up if we parse a commit via
parse_object(), and it's used directly in a few other code paths like
git-fsck.
The original goal of 60ab26de99 was avoiding extra memory usage for
rev-list. And there it's not all that important to catch parse_object().
We use that function only for looking at the tips of the traversal, and
the majority of the commits are parsed by following parent links, where
we use parse_commit() directly. So we were wasting some memory, but only
a small portion.
It's much easier to see the effect with fsck. Since we now turn off
save_commit_buffer by default there, we _should_ be able to drop the
freeing of the commit buffer in fsck_obj(). But if we do so (taking the
first hunk of this patch without the rest), then the peak heap of "git
fsck" in a clone of git.git goes from 136MB to 194MB. Teaching
parse_object_buffer() to respect save_commit_buffer brings that down to
134.5MB (it's hard to tell from massif's output, but I suspect the
savings comes from avoiding the overhead of the mostly-empty commit
slab).
Other programs should see a small improvement. Both "rev-list --all" and
"fsck --connectivity-only" improve by a few hundred kilobytes, as they'd
avoid loading the tip objects of their traversals.
Most importantly, no code should be hurt by doing this. Any program that
turns off save_commit_buffer is already making the assumption that any
commit it sees may need to have its object data loaded on demand, as it
doesn't know which ones were parsed by parse_commit() versus
parse_object(). Not to mention that anything parsed by the commit graph
may be in the same boat, even if save_commit_buffer was not disabled.
This should be the only spot that needs to be fixed. Grepping for
set_commit_buffer() shows that this and parse_commit() are the only
relevant calls.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing a commit, the default behavior is to stuff the original
buffer into a commit_slab (which takes ownership of it). But for a tool
like fsck, this isn't useful. While we may look at the buffer further as
part of fsck_commit(), we'll always do so through a separate pointer;
attaching the buffer to the slab doesn't help.
Worse, it means we have to remember to free the commit buffer in all
call paths. We do so in fsck_obj(), which covers a regular "git fsck".
But with "--connectivity-only", we forget to do so in both
traverse_one_object(), which covers reachable objects, and
mark_unreachable_referents(), which covers unreachable ones. As a
result, that mode ends up storing an uncompressed copy of every commit
on the heap at once.
We could teach the code paths for --connectivity-only to also free
commit buffers. But there's an even easier fix: we can just turn off the
save_commit_buffer flag, and then we won't attach them to the commits in
the first place.
This reduces the peak heap of running "git fsck --connectivity-only" in
a clone of linux.git from ~2GB to ~1GB. According to massif, the
remaining memory goes where you'd expect: the object structs themselves,
the obj_hash containing them, and the delta base cache.
Note that we'll leave the call to free commit buffers in fsck_obj() for
now; it's not quite redundant because of a related bug that we'll fix in
a subsequent commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After calling fsck_walk(), a tree object struct may be left in the
parsed state, with the full tree contents available via tree->buffer.
It's the responsibility of the caller to free these when it's done with
the object to avoid having many trees allocated at once.
In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which
does call free_tree_buffer(). Likewise for "--connectivity-only", we see
most objects via traverse_one_object(), which makes a similar call.
The exception is in mark_unreachable_referents(). When using both
"--connectivity-only" and "--dangling" (the latter of which is the
default), we walk all of the unreachable objects, and there we forget to
free. Most cases would not notice this, because they don't have a lot of
unreachable objects, but you can make a pathological case like this:
git clone --bare /path/to/linux.git repo.git
cd repo.git
rm packed-refs ;# now everything is unreachable!
git fsck --connectivity-only
That ends up with peak heap usage ~18GB, which is (not coincidentally)
close to the size of all uncompressed trees in the repository. After
this patch, the peak heap is only ~2GB.
A few things to note:
- it might seem like fsck_walk(), if it is parsing the trees, should
be responsible for freeing them. But the situation is quite tricky.
In the non-connectivity mode, after we call fsck_walk() we then
proceed with fsck_object() which actually does the type-specific
sanity checks on the object contents. We do pass our own separate
buffer to fsck_object(), but there's a catch: our earlier call to
parse_object_buffer() may have attached that buffer to the object
struct! So by freeing it, we leave the rest of the code with a
dangling pointer.
Likewise, the call to fsck_walk() in index-pack is subtle. It
attaches a buffer to the tree object that must not be freed! And
so rather than calling free_tree_buffer(), it actually detaches it
by setting tree->buffer to NULL.
These cases would _probably_ be fixable by having fsck_walk() free
the tree buffer only when it was the one who allocated it via
parse_tree(). But that would still leave the callers responsible for
freeing other cases, so they wouldn't be simplified. While the
current semantics for fsck_walk() make it easy to accidentally leak
in new callers, at least they are simple to explain, and it's not a
function that's likely to get a lot of new call-sites.
And in any case, it's probably sensible to fix the leak first with
this simple patch, and try any more complicated refactoring
separately.
- a careful reader may notice that fsck_obj() also frees commit
buffers, but neither the call in traverse_one_object() nor the one
touched in this patch does so. And indeed, this is another problem
for --connectivity-only (and accounts for most of the 2GB heap after
this patch), but it's one we'll fix in a separate commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These commands have no placeholders to be translated.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These strings are not translatable in the diagnose_options array in
diagnose.c. Don't translate them in builtin/diagnose.c either.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command you type is still "git maintenance" even in other languages.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hoist the remainder of "scalar" out of contrib/ to the main part of
the codebase.
* vd/scalar-to-main:
Documentation/technical: include Scalar technical doc
t/perf: add 'GIT_PERF_USE_SCALAR' run option
t/perf: add Scalar performance tests
scalar-clone: add test coverage
scalar: add to 'git help -a' command list
scalar: implement the `help` subcommand
git help: special-case `scalar`
scalar: include in standard Git build & installation
scalar: fix command documentation section header
A couple of bugfixes with code clean-up.
* jk/list-objects-filter-cleanup:
list-objects-filter: convert filter_spec to a strbuf
list-objects-filter: add and use initializers
list-objects-filter: handle null default filter spec
list-objects-filter: don't memset after releasing filter struct
"git mv A B" in a sparsely populated working tree can be asked to
move a path from a directory that is "in cone" to another directory
that is "out of cone". Handling of such a case has been improved.
* sy/mv-out-of-cone:
builtin/mv.c: fix possible segfault in add_slash()
mv: check overwrite for in-to-out move
advice.h: add advise_on_moving_dirty_path()
mv: cleanup empty WORKING_DIRECTORY
mv: from in-cone to out-of-cone
mv: remove BOTH from enum update_mode
mv: check if <destination> is a SKIP_WORKTREE_DIR
mv: free the with_slash in check_dir_in_index()
mv: rename check_dir_in_index() to empty_dir_has_sparse_contents()
t7002: add tests for moving from in-cone to out-of-cone
"git fetch" over protocol v2 sent an incorrect ref prefix request
to the server and made "git pull" with configured fetch refspec
that does not cover the remote branch to merge with fail, which has
been corrected.
* jk/proto-v2-ref-prefix-fix:
fetch: add branch.*.merge to default ref-prefix extension
fetch: stop checking for NULL transport->remote in do_fetch()
Plugging leaks in submodule--helper.
* ab/submodule-helper-leakfix:
submodule--helper: fix a configure_added_submodule() leak
submodule--helper: free rest of "displaypath" in "struct update_data"
submodule--helper: free some "displaypath" in "struct update_data"
submodule--helper: fix a memory leak in print_status()
submodule--helper: fix a leak in module_add()
submodule--helper: fix obscure leak in module_add()
submodule--helper: fix "reference" leak
submodule--helper: fix a memory leak in get_default_remote_submodule()
submodule--helper: fix a leak with repo_clear()
submodule--helper: fix "sm_path" and other "module_cb_list" leaks
submodule--helper: fix "errmsg_str" memory leak
submodule--helper: add and use *_release() functions
submodule--helper: don't leak {run,capture}_command() cp.dir argument
submodule--helper: "struct pathspec" memory leak in module_update()
submodule--helper: fix most "struct pathspec" memory leaks
submodule--helper: fix trivial get_default_remote_submodule() leak
submodule--helper: fix a leak in "clone_submodule"
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.
* ab/unused-annotation:
git-compat-util.h: use "deprecated" for UNUSED variables
git-compat-util.h: use "UNUSED", not "UNUSED(var)"
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.
* jk/unused-annotation:
is_path_owned_by_current_uid(): mark "report" parameter as unused
run-command: mark unused async callback parameters
mark unused read_tree_recursive() callback parameters
hashmap: mark unused callback parameters
config: mark unused callback parameters
streaming: mark unused virtual method parameters
transport: mark bundle transport_options as unused
refs: mark unused virtual method parameters
refs: mark unused reflog callback parameters
refs: mark unused each_ref_fn parameters
git-compat-util: add UNUSED macro
The auto-stashed local changes created by "git merge --autostash"
was mixed into a conflicted state left in the working tree, which
has been corrected.
* en/merge-unstash-only-on-clean-merge:
merge: only apply autostash when appropriate
The parser in the script interface to parse-options in "git
rev-parse" has been updated to diagnose a bogus input correctly.
* ow/rev-parse-parseopt-fix:
rev-parse --parseopt: detect missing opt-spec
The codepath for the OPT_SUBCOMMAND facility has been cleaned up.
* sg/parse-options-subcommand:
notes, remote: show unknown subcommands between `'
notes: simplify default operation mode arguments check
test-parse-options.c: fix style of comparison with zero
test-parse-options.c: don't use for loop initial declaration
t0040-parse-options: remove leftover debugging
ce74de9(ls-files: introduce "--format" option) miss
a space between two words incorrectly, it leads to
wrong i10n messages. So fix it by adding a space at
the end of the error message.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
strings, 2022-09-08), we noted that the filter_spec string_list was
inconsistent in how it handled memory ownership of strings stored in the
list. The fix there was a bit of a band-aid to set the "strdup_strings"
variable right before adding anything.
That works OK, and it lets the users of the API continue to
zero-initialize the struct. But it makes the code a bit hard to follow
and accident-prone, as any other spots appending the filter_spec need to
think about whether to set the strdup_strings value, too (there's one
such spot in partial_clone_get_default_filter_spec(), which is probably
a possible memory leak).
So let's do that full cleanup now. We'll introduce a
LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as
appropriate (though it is for the "_options" struct, this matches the
corresponding list_objects_filter_release() function).
This is harder than it seems! Many other structs, like
git_transport_data, embed the filter struct. So they need to initialize
it themselves even if the rest of the enclosing struct is OK with
zero-initialization. I found all of the relevant spots by grepping
manually for declarations of list_objects_filter_options. And then doing
so recursively for structs which embed it, and ones which embed those,
and so on.
I'm pretty sure I got everything, but there's no change that would alert
the compiler if any topics in flight added new declarations. To catch
this case, we now double-check in the parsing function that things were
initialized as expected and BUG() if appropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A possible segfault was introduced in c08830de41 (mv: check if
<destination> is a SKIP_WORKTREE_DIR, 2022-08-09).
When running t7001 with SANITIZE=address, problem appears when running:
git mv path1/path2/ .
or
git mv directory ../
or
any <destination> that makes dest_path[0] an empty string.
The add_slash() call could segfault when path argument to it is an empty
string, because it makes an out-of-bounds read to decide if an extra
slash '/' needs to be appended to it.
As add_slash() is used to make sure that a valid pathname to a file in
the given directory can be made by appending a filename after the value
returned from it, if path is an empty string, we want to return it
as-is. The path to a file "F" in the top-level of the working tree
(i.e. path=="") is formed by appending "F" after "" (i.e. path) without
any slash in between.
So, just like the case where a non-empty path already ends with a slash,
return an empty path as-is.
Reported-by: Jeff King <peff@peff.net>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>