Deduplicate the code for reporting and starting the bisect run command
by moving it to a short helper function. Use a string array instead of
a strvec to prepare the arguments, for simplicity.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Build the string array av during initialization, without any magic
numbers or heap allocations. Not duplicating the result of oid_to_hex()
is safe because run_command_v_opt() duplicates all arguments already.
(It would even be safe if it didn't, but that's a different story.)
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Simplify the code that builds the arguments for the "read-tree"
invocation in reset_hard() and read_empty() to remove the "verbose"
parameter.
Before 172b6428d0 (do not overwrite untracked during merge from
unborn branch, 2010-11-14) there was a "reset_hard()" function that
would be called in two places, one of those passed a "verbose=1", the
other a "verbose=0".
After 172b6428d0 when read_empty() was split off from reset_hard()
both of these functions only had one caller. The "verbose" in
read_empty() would always be false, and the one in reset_hard() would
always be true.
There was never a good reason for the code to act this way, it
happened because the read_empty() function was a copy/pasted and
adjusted version of reset_hard().
Since we're no longer conditionally adding the "-v" parameter
here (and we'd only add it for "reset_hard()" we'll be able to move to
a simpler and safer run-command API in the subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
When creating a multi-pack bitmap, remove per-pack bitmap files
unconditionally as they will never be consulted.
* tb/remove-unused-pack-bitmap:
builtin/repack.c: remove redundant pack-based bitmaps
The short-help text shown by "git cmd -h" and the synopsis text
shown at the beginning of "git help cmd" have been made more
consistent.
* ab/doc-synopsis-and-cmd-usage: (34 commits)
tests: assert consistent whitespace in -h output
tests: start asserting that *.txt SYNOPSIS matches -h output
doc txt & -h consistency: make "worktree" consistent
worktree: define subcommand -h in terms of command -h
reflog doc: list real subcommands up-front
doc txt & -h consistency: make "commit" consistent
doc txt & -h consistency: make "diff-tree" consistent
doc txt & -h consistency: use "[<label>...]" for "zero or more"
doc txt & -h consistency: make "annotate" consistent
doc txt & -h consistency: make "stash" consistent
doc txt & -h consistency: add missing options
doc txt & -h consistency: use "git foo" form, not "git-foo"
doc txt & -h consistency: make "bundle" consistent
doc txt & -h consistency: make "read-tree" consistent
doc txt & -h consistency: make "rerere" consistent
doc txt & -h consistency: add missing options and labels
doc txt & -h consistency: make output order consistent
doc txt & -h consistency: add or fix optional "--" syntax
doc txt & -h consistency: fix mismatching labels
doc SYNOPSIS & -h: use "-" to separate words in labels, not "_"
...
"git branch --edit-description" on an unborh branch misleadingly
said that no such branch exists, which has been corrected.
* rj/branch-edit-desc-unborn:
branch: description for non-existent branch errors
Move a global variable added as a hack during regression fixes to
its proper place in the API.
* ab/run-hook-api-cleanup:
run-command.c: remove "max_processes", add "const" to signal() handler
run-command.c: pass "opts" further down, and use "opts->processes"
run-command.c: use "opts->processes", not "pp->max_processes"
run-command.c: don't copy "data" to "struct parallel_processes"
run-command.c: don't copy "ungroup" to "struct parallel_processes"
run-command.c: don't copy *_fn to "struct parallel_processes"
run-command.c: make "struct parallel_processes" const if possible
run-command API: move *_tr2() users to "run_processes_parallel()"
run-command API: have run_process_parallel() take an "opts" struct
run-command.c: use designated init for pp_init(), add "const"
run-command API: don't fall back on online_cpus()
run-command API: make "n" parameter a "size_t"
run-command tests: use "return", not "exit"
run-command API: have "run_processes_parallel{,_tr2}()" return void
run-command test helper: use "else if" pattern
When geometric repacking feature is in use together with the
--pack-kept-objects option, we lost packs marked with .keep files.
* tb/save-keep-pack-during-geometric-repack:
repack: don't remove .keep packs with `--pack-kept-objects`
More UNUSED annotation to help using -Wunused option with the
compiler.
* jk/unused-anno-more:
ll-merge: mark unused parameters in callbacks
diffcore-pickaxe: mark unused parameters in pickaxe functions
convert: mark unused parameter in null stream filter
apply: mark unused parameters in noop error/warning routine
apply: mark unused parameters in handlers
date: mark unused parameters in handler functions
string-list: mark unused callback parameters
object-file: mark unused parameters in hash_unknown functions
mark unused parameters in trivial compat functions
update-index: drop unused argc from do_reupdate()
submodule--helper: drop unused argc from module_list_compute()
diffstat_consume(): assert non-zero length
We are interested in exploring whether gc.cruftPacks=true should become
the default value.
To determine whether it is safe to do so, let's encourage more users to
try it out.
Users who have set feature.experimental=true have already volunteered to
try new and possibly-breaking config changes, so let's try this new
default with that set of users.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since c2d17ba3db (branch --edit-description: protect against mistyped
branch name, 2012-02-05) we return -1 on error editing the branch
description.
Let's change to 1, which follows the established convention and it is
better for portability reasons.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In c847f53712 (Detached HEAD (experimental), 2007-01-01) an error
condition was introduced in rename_branch() to prevent renaming, later
also copying, a detached HEAD.
The condition used was checking for NULL in oldname, the source branch
to rename/copy. That condition cannot be satisfied because if no source
branch is specified, HEAD is going to be used in the call.
The error issued instead is:
fatal: Invalid branch name: 'HEAD'
Let's remove the condition in copy_or_rename_branch() (the current
function name) and check for HEAD before calling it, dying with the
original intended error if we're in a detached HEAD.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff rev^!" did not show combined diff to go to the rev from
its parents.
* rs/diff-caret-bang-with-parents:
diff: support ^! for merges
revisions.txt: unspecify order of resolved parts of ^!
revision: use strtol_i() for exclude_parent
Code clean-up.
* jk/cleanup-callback-parameters:
attr: drop DEBUG_ATTR code
commit: avoid writing to global in option callback
multi-pack-index: avoid writing to global in option callback
test-submodule: inline resolve_relative_url() function
"GIT_EDITOR=: git branch --edit-description" resulted in failure,
which has been corrected.
* jc/branch-description-unset:
branch: do not fail a no-op --edit-desc
"git fsck" failed to release contents of tree objects already used
from the memory, which has been fixed.
* jk/fsck-on-diet:
parse_object_buffer(): respect save_commit_buffer
fsck: turn off save_commit_buffer
fsck: free tree buffers after walking unreachable objects
"git clone" did not like to see the "--bare" and the "--origin"
options used together without a good reason.
* jk/clone-allow-bare-and-o-together:
clone: allow "--bare" with "-o"
"git remote rename" failed to rename a remote without fetch
refspec, which has been corrected.
* jk/remote-rename-without-fetch-refspec:
remote: handle rename of remote without fetch refspec
The last git version that had "diff-tree" in the header text
of "git diff-tree" output was v1.3.0 from 2006. The header text
was changed from "diff-tree" to "commit" in 91539833
("Log message printout cleanups").
Given how long ago this change was made, it is highly unlikely that
anyone is still feeding in outputs from that git version.
Remove the handling of the "diff-tree" prefix and document the
source of the other prefixes so that the overall functionality
is more clear.
Signed-off-by: Jerry Zhang <Jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are situations where the user might not want the default
setting where patch-id strips all whitespace. They might be working
in a language where white space is syntactically important, or they
might have CI testing that enforces strict whitespace linting. In
these cases, a whitespace change would result in the patch
fundamentally changing, and thus deserving of a different id.
Add a new mode that is exclusive of --stable and --unstable called
--verbatim. It also corresponds to the config
patchid.verbatim = true. In this mode, the stable algorithm is
used and whitespace is not stripped from the patch text.
Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git patch-id" currently doesn't produce correct output if the
incoming diff has any binary files. Add logic to get_one_patchid
to handle the different possible styles of binary diff. This
attempts to keep resulting patch-ids identical to what would be
produced by the counterpart logic in diff.c, that is it produces
the id by hashing the a and b oids in succession.
In general we handle binary diffs by first caching the object ids from
the "index" line and using those if we then find an indication
that the diff is binary.
The input could contain patches generated with "git diff --binary". This
currently breaks the parse logic and results in multiple patch-ids
output for a single commit. Here we have to skip the contents of the
patch itself since those do not go into the patch id. --binary
implies --full-index so the object ids are always available.
When the diff is generated with --full-index there is no patch content
to skip over.
When a diff is generated without --full-index or --binary, it will
contain abbreviated object ids. This will still result in a sufficiently
unique patch-id when hashed, but does not match internal patch id
output. We'll call this ok for now as we already need specialized
arguments to diff in order to match internal patch id (namely -U3).
Signed-off-by: Jerry Zhang <Jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git doesn't persist patch-ids during the rebase process, so there is
no need to specifically invoke the unstable variant. Use the stable
logic for all internal patch-id calculations to minimize the number of
code paths and improve test coverage.
Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same spirit as the previous commit, reimplement
`--group=committer` as a special case of `--group=<format>`, too.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as
a special case of the new `--group=<format>` mode, where the author mode
is a shorthand for `--group='%aN <%aE>'.
Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it
has a different meaning in `read_from_stdin()`, where it is still used
for a different purpose.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract a function which finishes setting up the shortlog struct for
use. The caller in `make_cover_letter()` does not care about trailer
sorting, so it isn't strictly necessary to add a call there in this
patch.
But the next patch will add additional functionality to the new
`shortlog_finish_setup()` function, which the caller in
`make_cover_letter()` will care about.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.
This can be used, for example, to generate a distribution of commit
activity over time, like so:
$ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
117 2022-06
274 2022-07
324 2022-08
263 2022-09
7 2022-10
Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:
$ git shortlog --group='%aN <%aE>' ...
and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.
Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The subsequent commit will add another unhandled case in
`read_from_stdin()` which will want to use the same message as with
`--group=trailer`.
Extract the "--group=trailer" part from this message so the same
translation key can be used for both cases.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there are no trailers to insert, it is natural that
insert_records_from_trailers() should return without having done any
work.
But instead we guard this call unnecessarily by first checking whether
`log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set.
Prepare to match a similar pattern in the future where a function which
inserts records of a certain type does no work when no specifiers
matching that type are given.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare for a future patch which will introduce arbitrary pretty formats
via the `--group` argument.
To allow additional customizability (for example, to support something
like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
groups commits by the datestring 'YYYY-mm' according to author date), we
must store off the `--date` parsed from calling `parse_revision_opt()`.
Note that this also affects custom output `--format` strings in `git
shortlog`. Though this is a behavior change, this is arguably fixing a
long-standing bug (ie., that `--format` strings are not affected by
`--date` specifiers as they should be).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pruning objects with `--cruft`, `git repack` offers some
flexibility when selecting the set of which objects are pruned via the
`--cruft-expiration` option.
This is useful for expiring objects which are older than the grace
period, making races where to-be-pruned objects become reachable and
then ancestors of freshly pushed objects, leaving the repository in a
corrupt state after pruning substantially less likely [1].
But in practice, such races are impossible to avoid entirely, no matter
how long the grace period is. To prevent this race, it is often
advisable to temporarily put a repository into a read-only state. But in
practice, this is not always practical, and so some middle ground would
be nice.
This patch introduces a new option, `--expire-to`, which teaches `git
repack` to write an additional cruft pack containing just the objects
which were pruned from the repository. The caller can specify a
directory outside of the current repository as the destination for this
second cruft pack.
This makes it possible to prune objects from a repository, while still
holding onto a supplemental copy of them outside of the original
repository. Having this copy on-disk makes it substantially easier to
recover objects when the aforementioned race is encountered.
`--expire-to` is implemented in a somewhat convoluted manner, which is
to take advantage of the fact that the first time `write_cruft_pack()`
is called, it adds the name of the cruft pack to the `names` string
list. That means the second time we call `write_cruft_pack()`, objects
in the previously-written cruft pack will be excluded.
As long as the caller ensures that no objects are expired during the
second pass, this is sufficient to generate a cruft pack containing all
objects which don't appear in any of the new packs written by `git
repack`, including the cruft pack. In other words, all of the objects
which are about to be pruned from the repository.
It is important to note that the destination in `--expire-to` does not
necessarily need to be a Git repository (though it can be) Notably, the
expired packs do not contain all ancestors of expired objects. So if the
source repository contains something like:
<unreachable>
/
C1 --- C2
\
refs/heads/master
where C2 is unreachable, but has a parent (C1) which is reachable, and
C2 would be pruned, then the expiry pack will contain only C2, not C1.
[1]: https://lore.kernel.org/git/20190319001829.GL29661@sigill.intra.peff.net/
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the following commit, a new write_cruft_pack() caller will be added
which wants to write a cruft pack to an arbitrary location. Prepare for
this by adding a parameter which controls the destination of the cruft
pack.
For now, provide "packtmp" so that this commit does not change any
behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`builtin/repack.c`'s `write_cruft_pack()` is used to generate the cruft
pack when `--cruft` is supplied. It uses a static variable
"cruft_expiration" which is filled in by option parsing.
A future patch will add an `--expire-to` option which allows `git
repack` to write a cruft pack containing the pruned objects out to a
separate repository. In order to implement this functionality, some
callers will have to pass a value for `cruft_expiration` different than
the one filled out by option parsing.
Prepare for this by teaching `write_cruft_pack` to take a
"cruft_expiration" parameter, instead of reading a single static
variable.
The (sole) existing caller of `write_cruft_pack()` will pass the value
for "cruft_expiration" filled in by option parsing, retaining existing
behavior. This means that we can make the variable local to
`cmd_repack()`, and eliminate the static declaration.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`builtin/repack.c`'s `prepare_pack_objects()` is used to prepare a set
of arguments to a `pack-objects` process which will generate a desired
pack.
A future patch will add an `--expire-to` option which allows `git
repack` to write a cruft pack containing the pruned objects out to a
separate repository. Prepare for this by teaching that function to write
packs to an arbitrary location specified by the caller.
All existing callers of `prepare_pack_objects()` will pass `packtmp` for
`out`, retaining the existing behavior.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an option, --stdin, to merge-tree which will accept lines of input
with two branches to merge per line, and which will perform all the
merges and give output for each in turn. This option implies -z, and
modifies the output to also include a merge status since the exit code
of the program can no longer convey that information now that multiple
merges are involved.
This could be useful, for example, by Git hosting providers. When one
branch is updated, one may want to check whether all code reviews
targetting that branch can still cleanly merge. Avoiding the overhead
of starting up a separate process for each of those code reviews might
provide significant savings in a repository with many code reviews.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After we've successfully finished the repack, we call
remove_temporary_files(), which looks for and removes any files matching
".tmp-$$-pack-*", where $$ is the pid of the current process. But this
is pointless. If we make it this far in the process, we've already
renamed these tempfiles into place, and there is nothing left to delete.
Nor is there a point in trying to call it to clean up when we _aren't_
successful. It's not safe for using in a signal handler, and the
previous commit already handed that job over to the tempfile API.
It might seem like it would be useful to clean up stray .tmp files left
by other invocations of git-repack. But it won't clean those files; it
only matches ones with its pid, and leaves the rest. Fortunately, those
are cleaned up naturally by successive calls to git-repack; we'll
consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc,
will roll up their contents and eventually delete them.
The one case that could matter is if pack-objects generates an extension
we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current
code will quietly delete such a file, while after this patch we'd leave
it in place. In practice this doesn't happen, and would be indicative of
a bug. Leaving the file as cruft is arguably a better behavior, as it
means somebody is more likely to eventually notice and fix the bug. If
we really wanted to be paranoid, we could scan for and warn about such
files, but that seems like overkill.
There's nothing to test with regard to the removal of this function. It
was doing nothing, so the behavior should be the same. However, we can
verify (and protect) our assumption that "repack -ad" will eventually
remove stray files by adding a test for that.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git-repack exits due to a signal, it tries to clean up by calling
its remove_temporary_files() function, which walks through the packs dir
looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of
the current process).
The biggest problem here is that remove_temporary_files() is not safe to
call in a signal handler. It uses opendir(), which isn't on the POSIX
async-signal-safe list. The details will be platform-specific, but a
likely issue is that it needs to allocate memory; if we receive a signal
while inside malloc(), etc, we'll conflict on the allocator lock and
deadlock with ourselves.
We can fix this by just cleaning up the files directly, without walking
the directory. We already know the complete list of .tmp-* files that
were generated, because we recorded them via populate_pack_exts(). When
we find files there, we can use register_tempfile() to record the
filenames. If we receive a signal, then the tempfile API will clean them
up for us, and it's async-safe and pretty battle-tested.
Note that this is slightly racier than the existing scheme. We don't
record the filenames until pack-objects tells us the hash over stdout.
So during the period between it generating the file and reporting the
hash, we'd fail to clean up. However, that period is very small. During
most of the pack generation process pack-objects is using its own
internal tempfiles. It's only at the very end that it moves them into
the names git-repack expects, and then it immediately reports the name
to us. Given that cleanup like this is best effort (after all, we may
get SIGKILL), this level of race is acceptable.
When we register the tempfiles, we'll record them locally and use the
result to call rename_tempfile(), rather than renaming by hand. This
isn't strictly necessary, as once we've renamed the files they're gone,
and the tempfile API's cleanup unlink() would simply become a pointless
noop. But managing the lifetimes of the tempfile objects is the cleanest
thing to do, and the tempfile pointers naturally fill the same role as
the old booleans.
This patch also fixes another small problem. We only hook signals, and
don't set up an atexit handler. So if we see an error that causes us to
die(), we'll leave the .tmp-* files in place. But since the tempfile API
handles this for us, this is now fixed for free. The new test covers
this by stimulating a failure of pack-objects when generating a cruft
pack. Before this patch, the .tmp-* file for the main pack would have
been left, but now we correctly clean it up.
Two small subtleties on the implementation:
- in the renaming loop, we can stop re-constructing fname_old; we only
use it when we have a tempfile to rename, so we can just ask the
tempfile for its path (which, barring bugs, should be identical)
- when renaming fails, our error message mentions fname_old. But since
a failed rename_tempfile() invalidates the tempfile struct, we'll
lose access to that string. Instead, let's mention the destination
filename, which is what most other callers do.
Reported-by: Jan Pokorný <poki@fnusa.cz>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If pack-objects tells us it generated pack $hash, we expect to find
.tmp-$$-pack-$hash.pack, .idx, .rev, and so on. Some of these files are
optional, but others are not. For the required ones, we'll bail with an
error if any of them is missing.
The error message is just "missing required file", which is a bit vague.
We should be more clear that it is not the user's fault, but rather that
the sub-pgoram we called is not operating as expected. In practice,
nobody should ever see this message, as it would generally only be
caused by a bug in Git.
It probably doesn't make sense to convert this to a BUG(), though, as
there are other (unlikely) possibilities, such as somebody else racily
deleting the files, filesystem errors causing stat() to fail, and so on.
A nice side effect here is that we stop relying on fname_old in this
code path, which will let us deal with it only in the first part of the
conditional.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After generating the main pack and then any additional cruft packs, we
iterate over the "names" list (which contains hashes of packs generated
by pack-objects), and call populate_pack_exts() for each.
There's one small problem with this. In repack_promisor_objects(), we
may add entries to "names" and call populate_pack_exts() for them.
Calling it again is mostly just wasteful, as we'll stat() the filename
with each possible extension, get the same result, and just overwrite
our bits.
So we could drop the call there, and leave the final loop to populate
all of the bits. But instead, this patch does the reverse: drops the
final loop, and teaches the other two sites to populate the bits as they
add entries.
This makes the code easier to reason about, as you never have to worry
about when the util field is valid; it is always valid for each entry.
It also serves my ulterior purpose: recording the generated filenames as
soon as possible will make it easier for a future patch to use them for
cleaning up from a failed operation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We keep a string_list "names" containing the hashes of packs generated
on our behalf by pack-objects. The util field of each item is treated as
a bitfield that tells us which extensions (.pack, .idx, .rev, etc) are
present for each name.
Let's switch this to allocating a real array. That will give us room in
a future patch to store more data than just a single bit per extension.
And it makes the code a little easier to read, as we avoid casting back
and forth between uintptr_t and a void pointer.
Since the only thing we're storing is an array, we could just allocate
it directly. But instead I've put it into a named struct here. That
further increases readability around the casts, and in particular helps
differentiate us from other string_lists in the same file which use
their util field differently. E.g., the existing_*_packs lists still do
bit-twiddling, but their bits have different meaning than the ones in
"names". This makes it hard to grep around the code to see how the util
fields are used; now you can look for "generated_pack_data".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git branch --edit-description @{-1}" is now a way to edit branch
description of the branch you were on before switching to the
current branch.
* rj/branch-edit-description-with-nth-checkout:
branch: support for shortcuts like @{-1}, completed
After checking out a "branch" that is a symbolic-ref that points at
another branch, "git symbolic-ref HEAD" reports the underlying
branch, not the symbolic-ref the user gave checkout as argument.
The command learned the "--no-recurse" option to stop after
dereferencing a symbolic-ref only once.
* jc/symbolic-ref-no-recurse:
symbolic-ref: teach "--[no-]recurse" option
`git repack` supports a `--pack-kept-objects` flag which more or less
translates to whether or not we pass `--honor-pack-keep` down to `git
pack-objects` when assembling a new pack.
This behavior has existed since ee34a2bead (repack: add
`repack.packKeptObjects` config var, 2014-03-03). In that commit, the
documentation was extended to say:
[...] Note that we still do not delete `.keep` packs after
`pack-objects` finishes.
Unfortunately, this is not the case when `--pack-kept-objects` is
combined with a `--geometric` repack. When doing a geometric repack, we
include `.keep` packs when enumerating available packs only when
`pack_kept_objects` is set.
So this all works fine when `--no-pack-kept-objects` (or similar) is
given. Kept packs are excluded from the geometric roll-up, so when we go
to delete redundant packs (with `-d`), no `.keep` packs appear "below
the split" in our geometric progression.
But when `--pack-kept-objects` is given, things can go awry. Namely,
when a kept pack is included in the list of packs tracked by the
`pack_geometry` struct *and* part of the pack roll-up, we will delete
the `.keep` pack when we shouldn't.
Note that this *doesn't* result in object corruption, since the `.keep`
pack's objects are still present in the new pack. But the `.keep` pack
itself is removed, which violates our promise from back in ee34a2bead.
But there's more. Because `repack` computes the geometric roll-up
independently from selecting which packs belong in a MIDX (with
`--write-midx`), this can lead to odd behavior. Consider when a `.keep`
pack appears below the geometric split (ie., its objects will be part of
the new pack we generate).
We'll write a MIDX containing the new pack along with the existing
`.keep` pack. But because the `.keep` pack appears below the geometric
split line, we'll (incorrectly) try to remove it. While this doesn't
corrupt the repository, it does cause us to remove the MIDX we just
wrote, since removing that pack would invalidate the new MIDX.
Funny enough, this behavior became far less noticeable after e4d0c11c04
(repack: respect kept objects with '--write-midx -b', 2021-12-20), which
made `pack_kept_objects` be enabled by default only when we were writing
a non-MIDX bitmap.
But e4d0c11c04 didn't resolve this bug, it just made it harder to notice
unless callers explicitly passed `--pack-kept-objects`.
The solution is to avoid trying to remove `.keep` packs during
`--geometric` repacks, even when they appear below the geometric split
line, which is the approach this patch implements.
Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we write a MIDX bitmap after repacking, it is possible that the
repository would be left in a state with both pack- and multi-pack
reachability bitmaps.
This can occur, for instance, if a pack that was kept (either by having
a .keep file, or during a geometric repack in which it is not rolled up)
has a bitmap file, and the repack wrote a multi-pack index and bitmap.
When loading a reachability bitmap for the repository, the multi-pack
one is always preferred, so the pack-based one is redundant. Let's
remove it unconditionally, even if '-d' isn't passed, since there is no
practical reason to keep both around. The patch below does just that.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
String-lists may be used with callbacks for clearing or iteration. These
callbacks need to conform to a particular interface, even though not
every callback needs all of its parameters. Mark the unused ones to make
-Wunused-parameter happy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parse-options callback for --again soaks up all remaining options by
manipulating the parse_opt_ctx's argc and argv fields. Even though it
has to look at both, the actual parsing happens via the do_reupdate()
helper, which only looks at the argv half (by passing it along to
parse_pathspec). So that helper doesn't need to see argc at all.
Note that the helper does look at "argv + 1" without confirming that
argc is greater than 0. We know this is correct because it is skipping
past the actual "--again" string, which will always be present. However,
to make what's going on more obvious, let's move that "+1" into the
caller, which has the matching "-1" when fixing up the ctx's argc/argv.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The module_list_compute() function takes an argc/argv pair, but never
looks at argc. This is OK, as the NULL terminator in argv is sufficient
for our purposes (we feed it to parse_pathspec(), which takes only the
array, not a count).
Note that one of the callers _looks_ like it would be buggy, but isn't:
we pass 0/NULL for argc/argv from module_foreach(), so finding the
terminating NULL in that argv naively would segfault. However,
parse_pathspec() is smart enough to interpret a bare NULL as an empty
argv.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git branch --edit-description" on an unborh branch misleadingly
said that no such branch exists, which has been corrected.
* rj/branch-edit-desc-unborn:
branch: description for non-existent branch errors
"GIT_EDITOR=: git branch --edit-description" resulted in failure,
which has been corrected.
* jc/branch-description-unset:
branch: do not fail a no-op --edit-desc
Code clean-up.
* jk/cleanup-callback-parameters:
attr: drop DEBUG_ATTR code
commit: avoid writing to global in option callback
multi-pack-index: avoid writing to global in option callback
test-submodule: inline resolve_relative_url() function
By default, use of fsmonitor on a repository on networked
filesystem is disabled. Add knobs to make it workable on macOS.
* ed/fsmonitor-on-networked-macos:
fsmonitor: fix leak of warning message
fsmonitor: add documentation for allowRemote and socketDir options
fsmonitor: check for compatability before communicating with fsmonitor
fsmonitor: deal with synthetic firmlinks on macOS
fsmonitor: avoid socket location check if using hook
fsmonitor: relocate socket file if .git directory is remote
fsmonitor: refactor filesystem checks to common interface
Treating the action as a string is a hang over from the scripted
rebase. The last commit removed the only remaining use of the action
that required a string so lets convert the other action users to use
the existing action enum instead. If we ever need the action name as a
string in the future the action_names array exists exactly for that
purpose.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When aborting a rebase the reflog message looks like
rebase (abort): updating HEAD
which is not very informative. Improve the message by mentioning the
branch that we are returning to as we do at the end of a successful
rebase so it looks like.
rebase (abort): returning to refs/heads/topic
If GIT_REFLOG_ACTION is set in the environment we no longer omit
"(abort)" from the reflog message. We don't omit "(start)" and
"(finish)" when starting and finishing a rebase in that case so we
shouldn't omit "(abort)".
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The apply backend creates slightly different reflog messages to the
merge backend when starting or finishing a rebase and when picking
commits. These differences make it harder than it needs to be to parse
the reflog (I have a script that reads the finishing messages from
rebase and it is a pain to have to accommodate two different message
formats). While it is possible to determine the backend used for a
rebase from the reflog messages, the differences are not designed for
that purpose. c2417d3af7 (rebase: drop '-i' from the reflog for
interactive-based rebases, 2020-02-15) removed the clear distinction
between the reflog messages of the two backends without complaint.
As the merge backend is the default it is likely to be the format most
common in existing reflogs. For that reason the apply backend is changed
to format its reflog messages to match the merge backend as closely as
possible. Note that there is still a difference as when committing a
conflict resolution the apply backend will use "(pick)" rather than
"(continue)" because it is not currently possible to change the message
for a single commit.
In addition to c2417d3af7 we also changed the reflog messages in
68aa495b59 (rebase: implement --merge via the interactive machinery,
2018-12-11) and 2ac0d6273f (rebase: change the default backend from "am"
to "merge", 2020-02-15). This commit makes the same change to "git
rebase --apply" that 2ac0d6273f made to "git rebase" without any backend
specific options. As the messages are changed to use an existing format
any scripts that can parse the reflog messages of the default rebase
backend should be unaffected by this change.
There are existing tests for the messages from both backends which are
adjusted to ensure that they do not get out of sync in the future.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reflog messages when finishing a rebase hard code "rebase" rather
than using GIT_REFLOG_ACTION.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reflog message for every pick after running "rebase --skip" looks
like
rebase (skip) (pick): commit subject line
Fix this by not appending " (skip)" to the reflog action.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reflog message for a conflict resolution committed by "rebase
--continue" looks like
rebase (continue): commit subject line
Unfortunately the reflog message each subsequent pick look like
rebase (continue) (pick): commit subject line
Fix this by setting the reflog message for "rebase --continue" in
sequencer_continue() so it does not affect subsequent commits. This
introduces a memory leak similar to the one leaking GIT_REFLOG_ACTION
in pick_commits(). Both of these will be fixed in a future series that
stops the sequencer calling setenv().
If we fail to commit the staged changes then we error out so
GIT_REFLOG_ACTION does not need to be reset in that case.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use move_to_original_branch() when reattaching HEAD after a fast-forward
rather than open coding a copy of that code. move_to_original_branch()
does not call reset_head() if head_name is NULL but there should be no
user visible changes even though we currently call reset_head() in that
case. The reason for this is that the reset_head() call does not add a
message to the reflog because we're not changing the commit that HEAD
points to and so lock_ref_for_update() elides the update. When head_name
is not NULL then reset_head() behaves like "git symbolic-ref" and so the
reflog is updated.
Note that the removal of "strbuf_release(&msg)" is safe as there is an
identical call just above this hunk which can be seen by viewing the
diff with -U6.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pw/rebase-keep-base-fixes:
rebase --keep-base: imply --no-fork-point
rebase --keep-base: imply --reapply-cherry-picks
rebase: factor out branch_base calculation
rebase: rename merge_base to branch_base
rebase: store orig_head as a commit
rebase: be stricter when reading state files containing oids
t3416: set $EDITOR in subshell
t3416: tighten two tests
Given the name of the option it is confusing if --keep-base actually
changes the base of the branch without --fork-point being explicitly
given on the command line.
The combination of --keep-base with an explicit --fork-point is still
supported even though --fork-point means we do not keep the same base
if the upstream branch has been rewound. We do this in case anyone is
relying on this behavior which is tested in t3431[1]
[1] https://lore.kernel.org/git/20200715032014.GA10818@generichostname/
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As --keep-base does not rebase the branch it is confusing if it
removes commits that have been cherry-picked to the upstream branch.
As --reapply-cherry-picks is not supported by the "apply" backend this
commit ensures that cherry-picks are reapplied by forcing the upstream
commit to match the onto commit unless --no-reapply-cherry-picks is
given.
Reported-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Separate out calculating the merge base between 'onto' and 'HEAD' from
the check for whether we can fast-forward or not. This means we can skip
the fast-forward checks when the rebase is forced and avoid calculating
the merge-base between 'HEAD' and 'onto' when --keep-base is given.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
merge_base is not a very descriptive name, the variable always holds
the merge-base of 'branch' and 'onto' which is commit at the base of
the branch being rebased so rename it to branch_base.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using a struct commit rather than a struct oid to hold orig_head means
that we error out straight away if the branch being rebased does not
point to a commit. It also simplifies the code that handles finding
the merge base and fork point as it no longer has to convert from an
oid to a commit.
To avoid changing the behavior of "git rebase <upstream> <branch>" we
keep the existing call to read_ref() and use lookup_commit_object()
on the oid returned by that rather than calling
lookup_commit_reference_by_name() which applies the ref dwim rules to
its argument.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The state files for 'onto' and 'orig_head' should contain a full hex
oid, change the reading functions from get_oid() to get_oid_hex() to
reflect this. They should also name commits and not tags so add and use
a function that looks up a commit from an oid like
lookup_commit_reference() but without dereferencing tags.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the "worktree" -h output consistent with the *.txt version.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid repeating the "-h" output for the "git worktree" command, and
instead define the usage of each subcommand with macros, so that the
"-h" output for the command itself can re-use those definitions. See
[1], [2] and [3] for prior art using the same pattern.
1. b25b727494 (builtin/multi-pack-index.c: define common usage with a
macro, 2021-03-30)
2. 8757b35d44 (commit-graph: define common usage with a macro,
2021-08-23)
3. 1e91d3faf6 (reflog: move "usage" variables and use macros,
2022-03-17)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the "-h" output of "git commit" consistent with the *.txt version
by exhaustively listing the options that it takes.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the "diff-tree -h" output consistent with the *.txt version.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Correct uses of "<label>..." where we really meant to say
"[<label>...]", i.e. the command in question taken an optional set of
"<label>". As the CodingGuidelines notes "[o]ptional parts [should be]
enclosed in square brackets".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The cmd_blame() already detected whether it was processing "blame" or
"annotate", but it didn't adjust its usage output accordingly. Let's
do that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend both the -h output and *.txt to match one another. In this case
the *.txt didn't list the "save" subcommand, and the "-h" was
similarly missing some commands.
Let's also convert the *.c code to use a macro definition, similar to
that used in preceding commits. This avoids duplication.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change those built-in commands that were attempting to exhaustively
list the options in the "-h" output to actually do so, and always
have *.txt documentation know about the exhaustive list of options.
Let's also fix the documentation and -h output for those built-in
commands where the *.txt and -h output was a mismatch of missing
options on both sides.
In the case of "interpret-trailers" fixing the missing options reveals
that the *.txt version was implicitly claiming that the command had
two operating modes, which a look at the -h version (and studying the
documentation) will show is not the case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the "git cmd" form instead of "git-cmd" for both "git
receive-pack" and "git credential-cache--daemon".
For "git-receive-pack" we do have a binary with that name, even when
installed with SKIP_DASHED_BUILT_INS=YesPlease, but for the purposes
of the SYNOPSIS let's use the "git cmd" form like everywhere else. It
can be invoked like that (and our tests do so), the parts of our
documentation that explain when you need to use the dashed form do so,
and use it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the -h output to match that of the *.txt output, the differences
were fairly small. In the case of "[<options>]" we only have a few of
them, so let's exhaustively list them as in the *.txt.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The C version was right to use "()" in place of "[]" around the option
listing, let's update the *.txt version accordingly, and furthermore
list the *.c options in the same order as the *.txt.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For "rerere" say "pathspec" consistently, and list the subcommands in
the order that they're discussed in the "COMMANDS" section of the
documentation.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various issues of SYNOPSIS and -h output syntax where:
* Options such as --force were missing entirely
* ...or the short option, such as -f
* We said "opts" or "options", but could instead enumerate
the (small) set of supported options
* Options that were missing entirely (ls-remote's --sort=<key>)
As we can specify "--sort" multiple times (it's backed by a
string-list" it should really be "[(--sort=<key>)...]", which is
what "git for-each-ref" lists it as, but let's leave that issue for
a subsequent cleanup, and stop at making these consistent. Other
"ref-filter.h" users share the same issue, e.g. "git-branch.txt".
* For "verify-tag" and "verify-commit" we were missing the "--raw"
option.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix cases where the SYNOPSIS and -h output was presented in a
different order.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the "[--]" for those cases where the *.txt and -h were
inconsistent, or where we incorrectly stated in one but not the other
that the "--" was mandatory.
In the case of "rev-list" both sides were wrong, as we we don't
require one or more paths if "--" is used, e.g. this is OK:
git rev-list HEAD --
That part of this change is not a "doc txt & -h consistency" change,
as we're changing both versions, doing so here makes both sides
consistent.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix various inconsistencies between command SYNOPSIS and the
corresponding -h output where our translatable labels didn't match
up.
In some cases we need to adjust the prose that follows the SYNOPSIS
accordingly, as it refers back to the changed label.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change "builtin/credential-cache--daemon.c" to use "<socket-path>" not
"<socket_path>" in a placeholder label, almost all of our
documentation uses this form.
This is now consistent with the "If a placeholder has multiple words,
they are separated by dashes" guideline added in
9c9b4f2f8b (standardize usage info string format, 2015-01-13), let's
add a now-passing test to assert that that's the case.
To do this we need to introduce a very sed-powered parser to extract
the SYNOPSIS from the *.txt, and handle not all commands with "-h"
having a corresponding *.txt (e.g. "bisect--helper"). We'll still want
to handle syntax edge cases in the *.txt in subsequent commits for
other checks, but let's do that then.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's arguably more correct to say "[<option>...]" than either of these
forms, but the vast majority of our documentation uses the
"[<options>]" form to indicate an arbitrary number of options, let's
do the same in these cases, which were the odd ones out.
In the case of "mv" and "sparse-checkout" let's add the missing "[]"
to indicate that these are optional.
In the case of "t/helper/test-proc-receive.c" there is no *.txt
version, making it the only hunk in this commit that's not a "doc txt
& -h consistency" change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The whitespace padding of alternatives should be of the form "[-f |
--force]" not "[-f|--force]". Likewise we should not have padding
before the first option, so "(--all | <pack-filename>...)" is correct,
not "( --all | <pack-filename>... )".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The whitespace padding of alternatives should be of the form "[-f |
--force]" not "[-f|--force]". Likewise we should not have padding
before the first option, so "(--all | <pack-filename>...)" is correct,
not "( --all | <pack-filename>... )".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a "-h" output syntax issue introduced when "--diagnose" was added
in aac0e8ffee (builtin/bugreport.c: create '--diagnose' option,
2022-08-12): We need to close the "[" we opened. The
corresponding *.txt change did not have the same issue.
The "help -h" output then had one "]" too many, which is an issue
introduced in b40845293b (help: correct the usage string in -h and
documentation, 2021-09-10).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a bug in db9d67f2e9 (builtin/cat-file.c: support NUL-delimited
input with `-z`, 2022-07-22), before that change the SYNOPSIS and "-h"
output were the same, but not afterwards.
That change followed a similar earlier divergence in
473fa2df08 (Documentation: add --batch-command to cat-file synopsis,
2022-04-07). Subsequent commits will fix this sort of thing more
systematically, but let's fix this one as a one-off.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix the incorrect "[-o | --option <argument>]" syntax, which should be
"[(-o | --option) <argument>]", we were previously claiming that only
the long option accepted the "<argument>", which isn't what we meant.
This syntax issue for "bugreport" originated in
238b439d69 (bugreport: add tool to generate debugging info,
2020-04-16), and for "diagnose" in 6783fd3cef (builtin/diagnose.c:
create 'git diagnose' builtin, 2022-08-12), which copied and adjusted
"bugreport" documentation and code.
In the case of "Documentation/git-stash.txt" and "builtin/stash.c"
this is not a "doc txt & -h consistency" change, as we're changing
both versions, doing so here makes a subsequent change smaller.
In that case fix the incorrect "[-o | --option <argument>]" syntax,
which should be "[(-o | --option) <argument>]", we were previously
claiming that only the long option accepted the "<argument>", which
isn't what we meant.
The "stash" issue has been with us in both the "-h" and *.txt versions
since bd514cada4 (stash: introduce 'git stash store', 2013-06-15).
We could claim that this isn't a syntax issue if a "vertical bar binds
tighter than option and its argument", but such a rule would change
e.g. this "cat-file" SYNOPSIS example to mean something we don't:
... [<rev>:<path|tree-ish> | --path=<path|tree-ish> <rev>]
We have various other examples where the post-image here is already
used, e.g. for "format-patch" ("-o"), "grep" ("-m"),
"submodule" ("set-branch -b") etc.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the documentation and -h output for those built-in commands
where both the -h output and *.txt were lacking in word-wrapping.
There are many more built-ins that could use this treatment, this
change is narrowed to those where this whitespace change is needed to
make the -h and *.txt consistent in the end.
In the case of "Documentation/git-hash-object.txt" and
"builtin/hash-object.c" this is not a "doc txt & -h consistency"
change, as we're changing both versions, doing so here makes a
subsequent change smaller.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change commands in the "diff" family and "rev-list" to separate the
usage information and option listing with an empty line.
In the case of "git diff -h" we did this already (but let's use a
consistent "\n" pattern there), for the rest these are now consistent
with how the parse_options() API would emit usage.
As we'll see in a subsequent commit this also helps to make the "git
<cmd> -h" output more easily machine-readable, as we can assume that
the usage information is separated from the options by an empty line.
Note that "COMMON_DIFF_OPTIONS_HELP" starts with a "\n", so the
seeming omission of a "\n" here is correct, the second one is provided
by the macro.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid repeating the "-h" output for the "git bundle" command, and
instead define the usage of each subcommand with macros, so that the
"-h" output for the command itself can re-use those definitions. See
[1], [2] and [3] for prior art using the same pattern.
1. b25b727494 (builtin/multi-pack-index.c: define common usage with a
macro, 2021-03-30)
2. 8757b35d44 (commit-graph: define common usage with a macro,
2021-08-23)
3. 1e91d3faf6 (reflog: move "usage" variables and use macros,
2022-03-17)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix indentation issues introduced with 73c3253d75 (bundle: framework
for options before bundle file, 2019-11-10), and carried forward in
some subsequent commits.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Have the users of the "run_processes_parallel_tr2()" function use
"run_processes_parallel()" instead. In preceding commits the latter
was refactored to take a "struct run_process_parallel_opts" argument,
since the only reason for "run_processes_parallel_tr2()" to exist was
to take arguments that are now a part of that struct we can do away
with it.
See ee4512ed48 (trace2: create new combined trace facility,
2019-02-22) for the addition of the "*_tr2()" variant of the function,
it was used by every caller except "t/helper/test-run-command.c"..
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a "jobs = 0" is passed let's BUG() out rather than fall back on
online_cpus(). The default behavior was added when this API was
implemented in c553c72eed (run-command: add an asynchronous parallel
child processor, 2015-12-15).
Most of our code in-tree that scales up to "online_cpus()" by default
calls that function by itself. Keeping this default behavior just for
the sake of two callers means that we'd need to maintain this one spot
where we're second-guessing the config passed down into pp_init().
The preceding commit has an overview of the API callers that passed
"jobs = 0". There were only two of them (actually three, but they
resolved to these two config parsing codepaths).
The "fetch.parallel" caller already had a test for the
"fetch.parallel=0" case added in 0353c68818 (fetch: do not run a
redundant fetch from submodule, 2022-05-16), but there was no such
test for "submodule.fetchJobs". Let's add one here.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the "run_processes_parallel{,_tr2}()" functions to return void,
instead of int. Ever since c553c72eed (run-command: add an
asynchronous parallel child processor, 2015-12-15) they have
unconditionally returned 0.
To get a "real" return value out of this function the caller needs to
get it via the "task_finished_fn" callback, see the example in hook.c
added in 96e7225b31 (hook: add 'run' subcommand, 2021-12-22).
So the "result = " and "if (!result)" code added to "builtin/fetch.c"
d54dea77db (fetch: let --jobs=<n> parallelize --multiple, too,
2019-10-05) has always been redundant, we always took that "if"
path. Likewise the "ret =" in "t/helper/test-run-command.c" added in
be5d88e112 (test-tool run-command: learn to run (parts of) the
testsuite, 2019-10-04) wasn't used, instead we got the return value
from the "if (suite.failed.nr > 0)" block seen in the context.
Subsequent commits will alter this API interface, getting rid of this
always-zero return value makes it easier to understand those changes.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When downloading a list of bundles in "all" mode, Git has no
understanding of the dependencies between the bundles. Git attempts to
unbundle the bundles in some order, but some may not pass the
verify_bundle() step because of missing prerequisites. This is passed as
error messages to the user, even when they eventually succeed in later
attempts after their dependent bundles are unbundled.
Add a new VERIFY_BUNDLE_QUIET flag to verify_bundle() that avoids the
error messages from the missing prerequisite commits. The method still
returns the number of missing prerequisit commits, allowing callers to
unbundle() to notice that the bundle failed to apply.
Use this flag in bundle-uri.c and test that the messages go away for
'git clone --bundle-uri' commands.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The verify_bundle() method has a 'verbose' option, but we will want to
extend this method to have more granular control over its output. First,
replace this 'verbose' option with a new 'flags' option with a single
possible value: VERIFY_BUNDLE_VERBOSE.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
branch command with options "edit-description", "set-upstream-to" and
"unset-upstream" expects a branch name. Since ae5a6c3684 (checkout:
implement "@{-N}" shortcut name for N-th last branch, 2009-01-17) a
branch can be specified using shortcuts like @{-1}. Those shortcuts
need to be resolved when considering the arguments.
We can modify the description of the previously checked out branch with:
$ git branch --edit--description @{-1}
We can modify the upstream of the previously checked out branch with:
$ git branch --set-upstream-to upstream @{-1}
$ git branch --unset-upstream @{-1}
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep" learned to expand the sparse-index more lazily and on
demand in a sparse checkout.
* sy/sparse-grep:
builtin/grep.c: integrate with sparse index
"scalar unregister" in a repository that is already been
unregistered reported an error.
* ds/scalar-unregister-idempotent:
string-list: document iterator behavior on NULL input
gc: replace config subprocesses with API calls
scalar: make 'unregister' idempotent
maintenance: add 'unregister --force'
"git remote rename" failed to rename a remote without fetch
refspec, which has been corrected.
* jk/remote-rename-without-fetch-refspec:
remote: handle rename of remote without fetch refspec
"git clone" did not like to see the "--bare" and the "--origin"
options used together without a good reason.
* jk/clone-allow-bare-and-o-together:
clone: allow "--bare" with "-o"
"git fsck" failed to release contents of tree objects already used
from the memory, which has been fixed.
* jk/fsck-on-diet:
parse_object_buffer(): respect save_commit_buffer
fsck: turn off save_commit_buffer
fsck: free tree buffers after walking unreachable objects
Suppose you are managing many maintenance tracks in your project,
and some of the more recent ones are maint-2.36 and maint-2.37.
Further imagine that your project recently tagged the official 2.38
release, which means you would need to start maint-2.38 track soon,
by doing:
$ git checkout -b maint-2.38 v2.38.0^0
$ git branch --list 'maint-2.3[6-9]'
* maint-2.38
maint-2.36
maint-2.37
So far, so good. But it also is reasonable to want not to have to
worry about which maintenance track is the latest, by pointing a
more generic-sounding 'maint' branch at it, by doing:
$ git symbolic-ref refs/heads/maint refs/heads/maint-2.38
which would allow you to say "whichever it is, check out the latest
maintenance track", by doing:
$ git checkout maint
$ git branch --show-current
maint-2.38
It is arguably better to say that we are on 'maint-2.38' rather than
on 'maint', and "git merge/pull" would record "into maint-2.38" and
not "into maint", so I think what we have is a good behaviour.
One thing that is slightly irritating, however, is that I do not
think there is a good way (other than "cat .git/HEAD") to learn that
you checked out 'maint' to get into that state. Just like the output
of "git branch --show-current" shows above, "git symbolic-ref HEAD"
would report 'refs/heads/maint-2.38', bypassing the intermediate
symbolic ref at 'refs/heads/maint' that is pointed at by HEAD.
The internal resolve_ref() API already has the necessary support for
stopping after resolving a single level of a symbolic-ref, and we
can expose it by adding a "--[no-]recurse" option to the command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the repository does not yet have commits, some errors describe that
there is no branch:
$ git init -b first
$ git branch --edit-description first
error: No branch named 'first'.
$ git branch --set-upstream-to=upstream
fatal: branch 'first' does not exist
$ git branch -c second
error: refname refs/heads/first not found
fatal: Branch copy failed
That "first" branch is unborn but to say it doesn't exists is confusing.
Options "-c" (copy) and "-m" (rename) show the same error when the
origin branch doesn't exists:
$ git branch -c non-existent-branch second
error: refname refs/heads/non-existent-branch not found
fatal: Branch copy failed
$ git branch -m non-existent-branch second
error: refname refs/heads/non-existent-branch not found
fatal: Branch rename failed
Note that "--edit-description" without an explicit argument is already
considering the _empty repository_ circumstance in its error. Also note
that "-m" on the initial branch it is an allowed operation.
Make the error descriptions for those branch operations with unborn or
non-existent branches, more informative.
This is the result of the change:
$ git init -b first
$ git branch --edit-description first
error: No commit on branch 'first' yet.
$ git branch --set-upstream-to=upstream
fatal: No commit on branch 'first' yet.
$ git branch -c second
fatal: No commit on branch 'first' yet.
$ git branch [-c/-m] non-existent-branch second
fatal: No branch named 'non-existent-branch'.
Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strvec "argv" is used to build a command for run_command_v_opt(),
but never freed. Use a constant string array instead, which doesn't
require any cleanup.
Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callback function for --trailer writes directly to the global
trailer_args and ignores opt->value completely. This is OK, since that's
where we expect to find the value. But it does mean the option
declaration isn't as clear. E.g., we have:
OPT_BOOL(0, "reset-author", &renew_authorship, ...),
OPT_CALLBACK_F(0, "trailer", NULL, ..., opt_pass_trailer)
In the first one we can see where the result will be stored, but in the
second, we get only NULL, and you have to go read the callback.
Let's pass &trailer_args, and use it in the callback. As a bonus, this
silences a -Wunused-parameter warning.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We declare the --object-dir option like:
OPT_CALLBACK(0, "object-dir", &opts.object_dir, ...);
but the pointer to opts.object_dir is completely unused. Instead, the
callback writes directly to a global. Which fortunately happens to be
opts.object_dir. So everything works as expected, but it's unnecessarily
confusing.
Instead, let's have the callback write to the option value pointer that
has been passed in. This also quiets a -Wunused-parameter warning (since
we don't otherwise look at "opt").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass a constant string array directly to run_command_v_opt() instead of
copying it into a strvec first. This shortens the code and avoids heap
allocations.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Starting with macOS 10.15 (Catalina), Apple introduced a new feature
called 'firmlinks' in order to separate the boot volume into two
volumes, one read-only and one writable but still present them to the
user as a single volume. Along with this change, Apple removed the
ability to create symlinks in the root directory and replaced them with
'synthetic firmlinks'. See 'man synthetic.conf'
When FSEevents reports the path of changed files, if the path involves
a synthetic firmlink, the path is reported from the point of the
synthetic firmlink and not the real path. For example:
Real path:
/System/Volumes/Data/network/working/directory/foo.txt
Synthetic firmlink:
/network -> /System/Volumes/Data/network
FSEvents path:
/network/working/directory/foo.txt
This causes the FSEvents path to not match against the worktree
directory.
There are several ways in which synthetic firmlinks can be created:
they can be defined in /etc/synthetic.conf, the automounter can create
them, and there may be other means. Simply reading /etc/synthetic.conf
is insufficient. No matter what process creates synthetic firmlinks,
they all get created in the root directory.
Therefore, in order to deal with synthetic firmlinks, the root directory
is scanned and the first possible synthetic firmink that, when resolved,
is a prefix of the worktree is used to map FSEvents paths to worktree
paths.
Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the .git directory is on a remote filesystem, create the socket
file in 'fsmonitor.socketDir' if it is defined, else create it in $HOME.
Signed-off-by: Eric DeCosta <edecosta@mathworks.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
revision.c::handle_revision_arg_1() resolves <rev>^! by first adding the
negated parents and then <rev> itself. builtin_diff_combined() expects
the first tree to be the merge and the remaining ones to be the parents,
though. This mismatch results in bogus diff output.
Remember the first tree that doesn't belong to a parent and use it
instead of blindly picking the first one. This makes "git diff <rev>^!"
consistent with "git show <rev>^!".
Reported-by: Tim Jaacks <tim.jaacks@garz-fricke.com>
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning a repository with `--local`, Git relies on either making a
hardlink or copy to every file in the "objects" directory of the source
repository. This is done through the callpath `cmd_clone()` ->
`clone_local()` -> `copy_or_link_directory()`.
The way this optimization works is by enumerating every file and
directory recursively in the source repository's `$GIT_DIR/objects`
directory, and then either making a copy or hardlink of each file. The
only exception to this rule is when copying the "alternates" file, in
which case paths are rewritten to be absolute before writing a new
"alternates" file in the destination repo.
One quirk of this implementation is that it dereferences symlinks when
cloning. This behavior was most recently modified in 36596fd2df (clone:
better handle symlinked files at .git/objects/, 2019-07-10), which
attempted to support `--local` clones of repositories with symlinks in
their objects directory in a platform-independent way.
Unfortunately, this behavior of dereferencing symlinks (that is,
creating a hardlink or copy of the source's link target in the
destination repository) can be used as a component in attacking a
victim by inadvertently exposing the contents of file stored outside of
the repository.
Take, for example, a repository that stores a Dockerfile and is used to
build Docker images. When building an image, Docker copies the directory
contents into the VM, and then instructs the VM to execute the
Dockerfile at the root of the copied directory. This protects against
directory traversal attacks by copying symbolic links as-is without
dereferencing them.
That is, if a user has a symlink pointing at their private key material
(where the symlink is present in the same directory as the Dockerfile,
but the key itself is present outside of that directory), the key is
unreadable to a Docker image, since the link will appear broken from the
container's point of view.
This behavior enables an attack whereby a victim is convinced to clone a
repository containing an embedded submodule (with a URL like
"file:///proc/self/cwd/path/to/submodule") which has a symlink pointing
at a path containing sensitive information on the victim's machine. If a
user is tricked into doing this, the contents at the destination of
those symbolic links are exposed to the Docker image at runtime.
One approach to preventing this behavior is to recreate symlinks in the
destination repository. But this is problematic, since symlinking the
objects directory are not well-supported. (One potential problem is that
when sharing, e.g. a "pack" directory via symlinks, different writers
performing garbage collection may consider different sets of objects to
be reachable, enabling a situation whereby garbage collecting one
repository may remove reachable objects in another repository).
Instead, prohibit the local clone optimization when any symlinks are
present in the `$GIT_DIR/objects` directory of the source repository.
Users may clone the repository again by prepending the "file://" scheme
to their clone URL, or by adding the `--no-local` option to their `git
clone` invocation.
The directory iterator used by `copy_or_link_directory()` must no longer
dereference symlinks (i.e., it *must* call `lstat()` instead of `stat()`
in order to discover whether or not there are symlinks present). This has
no bearing on the overall behavior, since we will immediately `die()` on
encounter a symlink.
Note that t5604.33 suggests that we do support local clones with
symbolic links in the source repository's objects directory, but this
was likely unintentional, or at least did not take into consideration
the problem with sharing parts of the objects directory with symbolic
links at the time. Update this test to reflect which options are and
aren't supported.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Imagine running "git branch --edit-description" while on a branch
without the branch description, and then exit the editor after
emptying the edit buffer, which is the way to tell the command that
you changed your mind and you do not want the description after all.
The command should just happily oblige, adding no branch description
for the current branch, and exit successfully. But it fails to do
so:
$ git init -b main
$ git commit --allow-empty -m commit
$ GIT_EDITOR=: git branch --edit-description
fatal: could not unset 'branch.main.description'
The end result is OK in that the configuration variable does not
exist in the resulting repository, but we should do better. If we
know we didn't have a description, and if we are asked not to have a
description by the editor, we can just return doing nothing.
This of course introduces TOCTOU. If you add a branch description
to the same branch from another window, while you had the editor
open to edit the description, and then exit the editor without
writing anything there, we'd end up not removing the description you
added in the other window. But you are fooling yourself in your own
repository at that point, and if it hurts, you'd be better off not
doing so ;-).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"upstream branches" is plural but "name" and "local branch" are
singular. Make them all singular. And because we're talking about a
hypothetical branch that doesn't exist yet, use the future tense.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance [un]register' commands set or unset the multi-
valued maintenance.repo config key with the absolute path of the current
repository. These are set in the global config file.
Instead of calling a subcommand and creating a new process, create the
proper API calls to git_config_set_multivar_in_file_gently(). It
requires loading the filename for the global config file (and erroring
out if now $HOME value is set). We also need to be careful about using
CONFIG_REGEX_NONE when adding the value and using
CONFIG_FLAGS_FIXED_VALUE when removing the value. In both cases, we
check that the value already exists (this check already existed for
'unregister').
Also, remove the transparent translation of the error code from the
config API to the exit code of 'git maintenance'. Instead, use die() to
recover from failures at that level. In the case of 'unregister
--force', allow the CONFIG_NOTHING_SET error code to be a success. This
allows a possible race where another process removes the config value.
The end result is that the config value is not set anymore, so we can
treat this as a success.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance unregister' subcommand has a step that removes the
current repository from the multi-valued maitenance.repo config key.
This fails if the repository is not listed in that key. This makes
running 'git maintenance unregister' twice result in a failure in the
second instance.
This failure exit code is helpful, but its message is not. Add a new
die() message that explicitly calls out the failure due to the
repository not being registered.
In some cases, users may want to run 'git maintenance unregister' just
to make sure that background jobs will not start on this repository, but
they do not want to check to see if it is registered first. Add a new
'--force' option that will siltently succeed if the repository is not
already registered.
Also add an extra test of 'git maintenance unregister' at a point where
there are no registered repositories. This should fail without --force.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Turn on sparse index and remove ensure_full_index().
Before this patch, `git-grep` utilizes the ensure_full_index() method to
expand the index and search all the entries. Because this method
requires walking all the trees and constructing the index, it is the
slow part within the whole command.
To achieve better performance, this patch uses grep_tree() to search the
sparse directory entries and get rid of the ensure_full_index() method.
Why grep_tree() is a better choice over ensure_full_index()?
1) grep_tree() is as correct as ensure_full_index(). grep_tree() looks
into every sparse-directory entry (represented by a tree) recursively
when looping over the index, and the result of doing so matches the
result of expanding the index.
2) grep_tree() utilizes pathspecs to limit the scope of searching.
ensure_full_index() always expands the index, which means it will
always walk all the trees and blobs in the repo without caring if
the user only wants a subset of the content, i.e. using a pathspec.
On the other hand, grep_tree() will only search the contents that
match the pathspec, and thus possibly walking fewer trees.
3) grep_tree() does not construct and copy back a new index, while
ensure_full_index() does. This also saves some time.
----------------
Performance test
- Summary:
p2000 tests demonstrate a ~71% execution time reduction for
`git grep --cached bogus -- "f2/f1/f1/*"` using tree-walking logic.
However, notice that this result varies depending on the pathspec
given. See below "Command used for testing" for more details.
Test HEAD~ HEAD
-------------------------------------------------------
2000.78: git grep ... (full-v3) 0.35 0.39 (≈)
2000.79: git grep ... (full-v4) 0.36 0.30 (≈)
2000.80: git grep ... (sparse-v3) 0.88 0.23 (-73.8%)
2000.81: git grep ... (sparse-v4) 0.83 0.26 (-68.6%)
- Command used for testing:
git grep --cached bogus -- "f2/f1/f1/*"
The reason for specifying a pathspec is that, if we don't specify a
pathspec, then grep_tree() will walk all the trees and blobs to find the
pattern, and the time consumed doing so is not too different from using
the original ensure_full_index() method, which also spends most of the
time walking trees. However, when a pathspec is specified, this latest
logic will only walk the area of trees enclosed by the pathspec, and the
time consumed is reasonably a lot less.
Generally speaking, because the performance gain is acheived by walking
less trees, which are specified by the pathspec, the HEAD time v.s.
HEAD~ time in sparse-v[3|4], should be proportional to
"pathspec enclosed area" v.s. "all area", respectively. Namely, the
wider the <pathspec> is encompassing, the less the performance
difference between HEAD~ and HEAD, and vice versa.
That is, if we don't specify a pathspec, the performance difference [1]
is indistinguishable: both methods walk all the trees and take generally
same amount of time (even with the index construction time included for
ensure_full_index()).
[1] Performance test result without pathspec (hence walking all trees):
Command used:
git grep --cached bogus
Test HEAD~ HEAD
---------------------------------------------------
2000.78: git grep ... (full-v3) 6.17 5.19 (≈)
2000.79: git grep ... (full-v4) 6.19 5.46 (≈)
2000.80: git grep ... (sparse-v3) 6.57 6.44 (≈)
2000.81: git grep ... (sparse-v4) 6.65 6.28 (≈)
--------------------------
NEEDSWORK about submodules
There are a few NEEDSWORKs that belong to improvements beyond this
topic. See the NEEDSWORK in builtin/grep.c::grep_submodule() for
more context. The other two NEEDSWORKs in t1092 are also relative.
Suggested-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Helped-by: Victoria Dye <vdye@github.com>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We return an error when trying to rename a remote that has no fetch
refspec:
$ git config --unset-all remote.origin.fetch
$ git remote rename origin foo
fatal: could not unset 'remote.foo.fetch'
To make things even more confusing, we actually _do_ complete the config
modification, via git_config_rename_section(). After that we try to
rewrite the fetch refspec (to say refs/remotes/foo instead of origin).
But our call to git_config_set_multivar() to remove the existing entries
fails, since there aren't any, and it calls die().
We could fix this by using the "gently" form of the config call, and
checking the error code. But there is an even simpler fix: if we know
that there are no refspecs to rewrite, then we can skip that part
entirely.
Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We explicitly forbid the combination of "--bare" with "-o", but there
doesn't seem to be any good reason to do so. The original logic came as
part of e6489a1bdf (clone: do not accept more than one -o option.,
2006-01-22), but that commit does not give any reason.
Furthermore, the equivalent combination via config is allowed:
git -c clone.defaultRemoteName=foo clone ...
and works as expected. It may be that this combination was considered
useless, because a bare clone does not set remote.origin.fetch (and
hence there is no refs/remotes/origin hierarchy). But it does set
remote.origin.url, and that name is visible to the user via "git fetch
origin", etc.
Let's allow the options to be used together, and switch the "forbid"
test in t5606 to check that we use the requested name. That test came
much later in 349cff76de (clone: add tests for --template and some
disallowed option pairs, 2020-09-29), and does not offer any logic
beyond "let's test what the code currently does".
Reported-by: John A. Leuenhagen <john@zlima12.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the global variable "save_commit_buffer" is set to 0, then
parse_commit() will throw away the commit object data after parsing it,
rather than sticking it into a commit slab. This goes all the way back
to 60ab26de99 ([PATCH] Avoid wasting memory in git-rev-list,
2005-09-15).
But there's another code path which may similarly stash the buffer:
parse_object_buffer(). This is where we end up if we parse a commit via
parse_object(), and it's used directly in a few other code paths like
git-fsck.
The original goal of 60ab26de99 was avoiding extra memory usage for
rev-list. And there it's not all that important to catch parse_object().
We use that function only for looking at the tips of the traversal, and
the majority of the commits are parsed by following parent links, where
we use parse_commit() directly. So we were wasting some memory, but only
a small portion.
It's much easier to see the effect with fsck. Since we now turn off
save_commit_buffer by default there, we _should_ be able to drop the
freeing of the commit buffer in fsck_obj(). But if we do so (taking the
first hunk of this patch without the rest), then the peak heap of "git
fsck" in a clone of git.git goes from 136MB to 194MB. Teaching
parse_object_buffer() to respect save_commit_buffer brings that down to
134.5MB (it's hard to tell from massif's output, but I suspect the
savings comes from avoiding the overhead of the mostly-empty commit
slab).
Other programs should see a small improvement. Both "rev-list --all" and
"fsck --connectivity-only" improve by a few hundred kilobytes, as they'd
avoid loading the tip objects of their traversals.
Most importantly, no code should be hurt by doing this. Any program that
turns off save_commit_buffer is already making the assumption that any
commit it sees may need to have its object data loaded on demand, as it
doesn't know which ones were parsed by parse_commit() versus
parse_object(). Not to mention that anything parsed by the commit graph
may be in the same boat, even if save_commit_buffer was not disabled.
This should be the only spot that needs to be fixed. Grepping for
set_commit_buffer() shows that this and parse_commit() are the only
relevant calls.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing a commit, the default behavior is to stuff the original
buffer into a commit_slab (which takes ownership of it). But for a tool
like fsck, this isn't useful. While we may look at the buffer further as
part of fsck_commit(), we'll always do so through a separate pointer;
attaching the buffer to the slab doesn't help.
Worse, it means we have to remember to free the commit buffer in all
call paths. We do so in fsck_obj(), which covers a regular "git fsck".
But with "--connectivity-only", we forget to do so in both
traverse_one_object(), which covers reachable objects, and
mark_unreachable_referents(), which covers unreachable ones. As a
result, that mode ends up storing an uncompressed copy of every commit
on the heap at once.
We could teach the code paths for --connectivity-only to also free
commit buffers. But there's an even easier fix: we can just turn off the
save_commit_buffer flag, and then we won't attach them to the commits in
the first place.
This reduces the peak heap of running "git fsck --connectivity-only" in
a clone of linux.git from ~2GB to ~1GB. According to massif, the
remaining memory goes where you'd expect: the object structs themselves,
the obj_hash containing them, and the delta base cache.
Note that we'll leave the call to free commit buffers in fsck_obj() for
now; it's not quite redundant because of a related bug that we'll fix in
a subsequent commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After calling fsck_walk(), a tree object struct may be left in the
parsed state, with the full tree contents available via tree->buffer.
It's the responsibility of the caller to free these when it's done with
the object to avoid having many trees allocated at once.
In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which
does call free_tree_buffer(). Likewise for "--connectivity-only", we see
most objects via traverse_one_object(), which makes a similar call.
The exception is in mark_unreachable_referents(). When using both
"--connectivity-only" and "--dangling" (the latter of which is the
default), we walk all of the unreachable objects, and there we forget to
free. Most cases would not notice this, because they don't have a lot of
unreachable objects, but you can make a pathological case like this:
git clone --bare /path/to/linux.git repo.git
cd repo.git
rm packed-refs ;# now everything is unreachable!
git fsck --connectivity-only
That ends up with peak heap usage ~18GB, which is (not coincidentally)
close to the size of all uncompressed trees in the repository. After
this patch, the peak heap is only ~2GB.
A few things to note:
- it might seem like fsck_walk(), if it is parsing the trees, should
be responsible for freeing them. But the situation is quite tricky.
In the non-connectivity mode, after we call fsck_walk() we then
proceed with fsck_object() which actually does the type-specific
sanity checks on the object contents. We do pass our own separate
buffer to fsck_object(), but there's a catch: our earlier call to
parse_object_buffer() may have attached that buffer to the object
struct! So by freeing it, we leave the rest of the code with a
dangling pointer.
Likewise, the call to fsck_walk() in index-pack is subtle. It
attaches a buffer to the tree object that must not be freed! And
so rather than calling free_tree_buffer(), it actually detaches it
by setting tree->buffer to NULL.
These cases would _probably_ be fixable by having fsck_walk() free
the tree buffer only when it was the one who allocated it via
parse_tree(). But that would still leave the callers responsible for
freeing other cases, so they wouldn't be simplified. While the
current semantics for fsck_walk() make it easy to accidentally leak
in new callers, at least they are simple to explain, and it's not a
function that's likely to get a lot of new call-sites.
And in any case, it's probably sensible to fix the leak first with
this simple patch, and try any more complicated refactoring
separately.
- a careful reader may notice that fsck_obj() also frees commit
buffers, but neither the call in traverse_one_object() nor the one
touched in this patch does so. And indeed, this is another problem
for --connectivity-only (and accounts for most of the 2GB heap after
this patch), but it's one we'll fix in a separate commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These commands have no placeholders to be translated.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These strings are not translatable in the diagnose_options array in
diagnose.c. Don't translate them in builtin/diagnose.c either.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command you type is still "git maintenance" even in other languages.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hoist the remainder of "scalar" out of contrib/ to the main part of
the codebase.
* vd/scalar-to-main:
Documentation/technical: include Scalar technical doc
t/perf: add 'GIT_PERF_USE_SCALAR' run option
t/perf: add Scalar performance tests
scalar-clone: add test coverage
scalar: add to 'git help -a' command list
scalar: implement the `help` subcommand
git help: special-case `scalar`
scalar: include in standard Git build & installation
scalar: fix command documentation section header
A couple of bugfixes with code clean-up.
* jk/list-objects-filter-cleanup:
list-objects-filter: convert filter_spec to a strbuf
list-objects-filter: add and use initializers
list-objects-filter: handle null default filter spec
list-objects-filter: don't memset after releasing filter struct
"git mv A B" in a sparsely populated working tree can be asked to
move a path from a directory that is "in cone" to another directory
that is "out of cone". Handling of such a case has been improved.
* sy/mv-out-of-cone:
builtin/mv.c: fix possible segfault in add_slash()
mv: check overwrite for in-to-out move
advice.h: add advise_on_moving_dirty_path()
mv: cleanup empty WORKING_DIRECTORY
mv: from in-cone to out-of-cone
mv: remove BOTH from enum update_mode
mv: check if <destination> is a SKIP_WORKTREE_DIR
mv: free the with_slash in check_dir_in_index()
mv: rename check_dir_in_index() to empty_dir_has_sparse_contents()
t7002: add tests for moving from in-cone to out-of-cone
The logic to handle worktree refs (worktrees/NAME/REF and
main-worktree/REF) existed in two places:
* ref_type() in refs.c
* parse_worktree_ref() in worktree.c
Collapse this logic together in one function parse_worktree_ref():
this avoids having to cross-check the result of parse_worktree_ref()
and ref_type().
Introduce enum ref_worktree_type, which is slightly different from
enum ref_type. The latter is a misleading name (one would think that
'ref_type' would have the symref option).
Instead, enum ref_worktree_type only makes explicit how a refname
relates to a worktree. From this point of view, HEAD and
refs/bisect/abc are the same: they specify the current worktree
implicitly.
The files-backend must avoid packing refs/bisect/* and friends into
packed-refs, so expose is_per_worktree_ref() separately.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" over protocol v2 sent an incorrect ref prefix request
to the server and made "git pull" with configured fetch refspec
that does not cover the remote branch to merge with fail, which has
been corrected.
* jk/proto-v2-ref-prefix-fix:
fetch: add branch.*.merge to default ref-prefix extension
fetch: stop checking for NULL transport->remote in do_fetch()
Plugging leaks in submodule--helper.
* ab/submodule-helper-leakfix:
submodule--helper: fix a configure_added_submodule() leak
submodule--helper: free rest of "displaypath" in "struct update_data"
submodule--helper: free some "displaypath" in "struct update_data"
submodule--helper: fix a memory leak in print_status()
submodule--helper: fix a leak in module_add()
submodule--helper: fix obscure leak in module_add()
submodule--helper: fix "reference" leak
submodule--helper: fix a memory leak in get_default_remote_submodule()
submodule--helper: fix a leak with repo_clear()
submodule--helper: fix "sm_path" and other "module_cb_list" leaks
submodule--helper: fix "errmsg_str" memory leak
submodule--helper: add and use *_release() functions
submodule--helper: don't leak {run,capture}_command() cp.dir argument
submodule--helper: "struct pathspec" memory leak in module_update()
submodule--helper: fix most "struct pathspec" memory leaks
submodule--helper: fix trivial get_default_remote_submodule() leak
submodule--helper: fix a leak in "clone_submodule"
Undoes 'jk/unused-annotation' topic and redoes it to work around
Coccinelle rules misfiring false positives in unrelated codepaths.
* ab/unused-annotation:
git-compat-util.h: use "deprecated" for UNUSED variables
git-compat-util.h: use "UNUSED", not "UNUSED(var)"
Annotate function parameters that are not used (but cannot be
removed for structural reasons), to prepare us to later compile
with -Wunused warning turned on.
* jk/unused-annotation:
is_path_owned_by_current_uid(): mark "report" parameter as unused
run-command: mark unused async callback parameters
mark unused read_tree_recursive() callback parameters
hashmap: mark unused callback parameters
config: mark unused callback parameters
streaming: mark unused virtual method parameters
transport: mark bundle transport_options as unused
refs: mark unused virtual method parameters
refs: mark unused reflog callback parameters
refs: mark unused each_ref_fn parameters
git-compat-util: add UNUSED macro
The auto-stashed local changes created by "git merge --autostash"
was mixed into a conflicted state left in the working tree, which
has been corrected.
* en/merge-unstash-only-on-clean-merge:
merge: only apply autostash when appropriate
The parser in the script interface to parse-options in "git
rev-parse" has been updated to diagnose a bogus input correctly.
* ow/rev-parse-parseopt-fix:
rev-parse --parseopt: detect missing opt-spec
The codepath for the OPT_SUBCOMMAND facility has been cleaned up.
* sg/parse-options-subcommand:
notes, remote: show unknown subcommands between `'
notes: simplify default operation mode arguments check
test-parse-options.c: fix style of comparison with zero
test-parse-options.c: don't use for loop initial declaration
t0040-parse-options: remove leftover debugging
ce74de9(ls-files: introduce "--format" option) miss
a space between two words incorrectly, it leads to
wrong i10n messages. So fix it by adding a space at
the end of the error message.
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
strings, 2022-09-08), we noted that the filter_spec string_list was
inconsistent in how it handled memory ownership of strings stored in the
list. The fix there was a bit of a band-aid to set the "strdup_strings"
variable right before adding anything.
That works OK, and it lets the users of the API continue to
zero-initialize the struct. But it makes the code a bit hard to follow
and accident-prone, as any other spots appending the filter_spec need to
think about whether to set the strdup_strings value, too (there's one
such spot in partial_clone_get_default_filter_spec(), which is probably
a possible memory leak).
So let's do that full cleanup now. We'll introduce a
LIST_OBJECTS_FILTER_INIT macro and matching function, and use them as
appropriate (though it is for the "_options" struct, this matches the
corresponding list_objects_filter_release() function).
This is harder than it seems! Many other structs, like
git_transport_data, embed the filter struct. So they need to initialize
it themselves even if the rest of the enclosing struct is OK with
zero-initialization. I found all of the relevant spots by grepping
manually for declarations of list_objects_filter_options. And then doing
so recursively for structs which embed it, and ones which embed those,
and so on.
I'm pretty sure I got everything, but there's no change that would alert
the compiler if any topics in flight added new declarations. To catch
this case, we now double-check in the parsing function that things were
initialized as expected and BUG() if appropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A possible segfault was introduced in c08830de41 (mv: check if
<destination> is a SKIP_WORKTREE_DIR, 2022-08-09).
When running t7001 with SANITIZE=address, problem appears when running:
git mv path1/path2/ .
or
git mv directory ../
or
any <destination> that makes dest_path[0] an empty string.
The add_slash() call could segfault when path argument to it is an empty
string, because it makes an out-of-bounds read to decide if an extra
slash '/' needs to be appended to it.
As add_slash() is used to make sure that a valid pathname to a file in
the given directory can be made by appending a filename after the value
returned from it, if path is an empty string, we want to return it
as-is. The path to a file "F" in the top-level of the working tree
(i.e. path=="") is formed by appending "F" after "" (i.e. path) without
any slash in between.
So, just like the case where a non-empty path already ends with a slash,
return an empty path as-is.
Reported-by: Jeff King <peff@peff.net>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch --from=<ident>" can be told to add an in-body
"From:" line even for commits that are authored by the given
<ident> with "--force-in-body-from"option.
* jc/format-patch-force-in-body-from:
format-patch: learn format.forceInBodyFrom configuration variable
format-patch: allow forcing the use of in-body From: header
pretty: separate out the logic to decide the use of in-body from
After 2d893dff4c (rev-parse --parseopt: allow [*=?!] in argument hints,
2015-07-14) updated the parser, a line in parseopts's input can start
with one of the flag characters and be erroneously parsed as a opt-spec
where the short name of the option is the flag character itself and the
long name is after the end of the string. This makes Git want to
allocate SIZE_MAX bytes of memory at this line:
o->long_name = xmemdupz(sb.buf + 2, s - sb.buf - 2);
Since s and sb.buf are equal the second argument is -2 (except unsigned)
and xmemdupz allocates len + 1 bytes, ie. -1 meaning SIZE_MAX.
Avoid this by checking whether a flag character was found in the zeroth
position.
Reported-by: Ingy dot Net <ingy@ingy.net>
Signed-off-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "git pull" with no arguments, we'll do a default "git
fetch" and then try to merge the branch specified by the branch.*.merge
config. There's code in get_ref_map() to treat that "merge" branch as
something we want to fetch, even if it is not otherwise covered by the
default refspec.
This works fine with the v0 protocol, as the server tells us about all
of the refs, and get_ref_map() is the ultimate decider of what we fetch.
But in the v2 protocol, we send the ref-prefix extension to the server,
asking it to limit the ref advertisement. And we only tell it about the
default refspec for the remote; we don't mention the branch.*.merge
config at all.
This usually doesn't matter, because the default refspec matches
"refs/heads/*", which covers all branches. But if you explicitly use a
narrow refspec, then "git pull" on some branches may fail. The server
doesn't advertise the branch, so we don't fetch it, and "git pull"
thinks that it went away upstream.
We can fix this by including any branch.*.merge entries for the current
branch in the list of ref-prefixes we pass to the server. This only
needs to happen when using the default configured refspec (since
command-line refspecs are already added, and take precedence in deciding
what we fetch). We don't otherwise need to replicate any of the "what to
fetch" logic in get_ref_map(). These ref-prefixes are an optimization,
so it's OK if we tell the server to advertise the branch.*.merge ref,
even if we're not going to pull it. We'll just choose not to fetch it.
The test here is based on one constructed by Johannes. I modified the
branch names to trigger the ref-prefix issue (and be more descriptive),
and to confirm that "git pull" actually updated the local ref, which
should be more robust than just checking stderr.
Reported-by: Lana Deere <lana.deere@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This field will never be NULL; if it were, we'd segfault earlier in the
function when we unconditionally check transport->remote->fetch_tags.
Likewise, many other functions dereference it unconditionally.
This is a small simplification, but it will make things easier as we
extend this conditional in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the "unknown subcommand" error message in 'git notes' and 'git
remote' to wrap the offending argument between `', to make it
consistent with the "unknown switch/option/subcommand" error messages
in parse-options.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git notes' has a default operation mode, but when invoked without a
subcommand it doesn't accept any arguments (although the 'list'
subcommand implementing the default operation mode does accept
arguments). The condition checking this ended up a bit awkward, so
let's make it clearer.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pack bitmap file gained a bitmap-lookup table to speed up
locating the necessary bitmap for a given commit.
* ac/bitmap-lookup-table:
pack-bitmap-write: drop unused pack_idx_entry parameters
bitmap-lookup-table: add performance tests for lookup table
pack-bitmap: prepare to read lookup table extension
pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
pack-bitmap-write.c: write lookup table extension
bitmap: move `get commit positions` code to `bitmap_writer_finish`
Documentation/technical: describe bitmap lookup table extension
With this commit, `git help scalar` will open the appropriate manual
or HTML page (instead of looking for `gitscalar`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix config API a memory leak added in a452128a36 (submodule--helper:
introduce add-config subcommand, 2021-08-06) by using the *_tmp()
variant of git_config_get_string().
In this case we're only checking whether
the (repo|git)_config_get_string() call is telling us that the
"submodule.active" key exists.
As with the preceding commit we'll find many other such patterns in
the codebase if we go fishing. E.g. "git gc" leaks in the code added
in 61f7a383d3 (maintenance: use 'incremental' strategy by default,
2020-10-15). Similar code in "git gc" added in
b08ff1fee0 (maintenance: add --schedule option and config,
2020-09-11) doesn't leak, but we could avoid the malloc() & free() in
that case.
A coccinelle rule to find those would find and fix some leaks, and
cases where we're doing needless malloc() + free()'s but only care
about the key existence, or are copying
the (repo|git)_config_get_string() return value right away.
But as with the preceding commit let's punt on all of that for now,
and just narrowly fix this specific case in submodule--helper.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a leak in code added in c51f8f94e5 (submodule--helper: run update
procedures from C, 2021-08-24), we clobber the "displaypath" member of
the passed-in "struct update_data" both so that die() messages in this
update_submodule() function itself can use it, and for the
run_update_procedure() called within this function.
Fix a leak in code added in 51f8f94e5b (submodule--helper: run update
procedures from C, 2021-08-24). We'd always clobber the old
"displaypath" member of the previously passed-in "struct update_data".
A better fix for this would be to remove the "displaypath" member from
the "struct update_data" entirely. Along with "oid", "suboid",
"just_cloned" and "sm_path" it's managing members that mainly need to
be passed between 1-3 stack frames of functions adjacent to this
code. But doing so would be a much larger change (I have it locally,
and fully untangling that in an incremental way is a 10 patch
journey).
So let's go for this much more isolated fix suggested by Glen. We
FREE_AND_NULL() the "update_data->displaypath", the "AND_NULL()" part
of that is needed due to the later "free(ud->displaypath)" in
"update_data_release()" introduced in the preceding commit
Moving ensure_core_worktree() out of update_submodule() may not be
strictly required, but in doing so we are left with the exact same
ordering as before, making this a smaller functional change.
Helped-by: Glen Choo <chooglen@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the update_data_release() function free "displaypath" member when
appropriate. The "displaypath" member is always ours, the "const" on
the "char *" was wrong to begin with.
This leaves a leak of "displaypath" in update_submodule(), which as
we'll see in subsequent commits is harder to deal with than this
trivial fix.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a leak in print_status(), the compute_rev_name() function
implemented in this file will return a strbuf_detach()'d value, or
NULL.
This leak has existed since this code was added in
a9f8a37584 (submodule: port submodule subcommand 'status' from shell
to C, 2017-10-06), but in 0b5e2ea7cf (submodule--helper: don't print
null in 'submodule status', 2018-04-18) we added a "const"
intermediate variable for the return value, that "const" should be
removed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a leak in module_path(), since a6226fd772 (submodule--helper:
convert the bulk of cmd_add() to C, 2021-08-10), we've been freeing
add_data.sm_path, but in this case we clobbered it, and didn't free
the value we clobbered.
This makes test 28 of "t/t7400-submodule-basic.sh" ("submodule add in
subdirectory") pass when we're compiled with SANITIZE=leak..
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix an obscure leak in module_add(), if the "git add" command we were
piping to failed we'd fail to strbuf_release(&sb). This fixes a leak
introduced in a6226fd772 (submodule--helper: convert the bulk of
cmd_add() to C, 2021-08-10).
In fixing it move to a "goto cleanup" pattern, and since we need to
introduce a "ret" variable to do that let's also get rid of the
intermediate "exit_code" variable. The initialization to "-1" in
a6226fd772 has always been redundant, we'd only use the "exit_code"
value after assigning the return value of pipe_command() to it.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix leaks in the "reference" variable declared in add_submodule() and
module_clone().
In preceding commits this variable was refactored out of the "struct
module_clone_data", but the leak has been with us since
31224cbdc7 (clone: recursive and reference option triggers submodule
alternates, 2016-08-17) and 8c8195e9c3 (submodule--helper: introduce
add-clone subcommand, 2021-07-10).
Those commits added an xstrdup()'d member of the
STRING_LIST_INIT_NODUP'd "struct string_list". We need to free()
those, but not the ones we get from argv, let's make use of the "util"
member, if it has a pointer it's the pointer we'll need to free,
otherwise it'll be NULL (i.e. from argv).
Note that the free() of the "util" member is needed in both
module_clone() and add_submodule(). The module_clone() function itself
doesn't populate the "util" pointer as add_submodule() does, but
module_clone() is upstream of the
add_possible_reference_from_superproject() caller we're modifying
here, which does do that.
This does preclude the use of the "util" pointer for any other reasons
for now, but that's OK. If we ever need to use it for something else
we could turn it into a small "struct" with an optional "to_free"
member, and switch to using string_list_clear_func().
Alternatively we could have another "struct string_list to_free" which
would keep a copy of the strings we've dup'd to free(). But for now
this is perfectly adequate.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in the get_default_remote_submodule() function added
in a77c3fcb5e (submodule--helper: get remote names from any
repository, 2022-03-04), we need to repo_clear() the submodule we
initialize.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call repo_clear() in ensure_core_worktree() to free the "struct
repository". Fixes a leak that's been here since
74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by
ensure-core-worktree, 2018-08-13).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix leaks in "struct module_cb_list" and the "struct module_cb" which
it contains, these fix leaks in e83e3333b5 (submodule: port submodule
subcommand 'summary' from shell to C, 2020-08-13).
The "sm_path" should always have been a "char *", not a "const
char *", we always create it with xstrdup().
We can't mark any tests passing passing with SANITIZE=leak using
"TEST_PASSES_SANITIZE_LEAK=true" as a result of this change, but
"t7401-submodule-summary.sh" gets closer to passing as a result of
this change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak introduced in e83e3333b5 (submodule: port submodule
subcommand 'summary' from shell to C, 2020-08-13), we sometimes append
to the "errmsg", and need to free the "struct strbuf".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak in c51f8f94e5 (submodule--helper: run update
procedures from C, 2021-08-24) and 3c3558f095 (submodule--helper: run
update using child process struct, 2022-03-15) by not allocating
memory in the first place.
The "dir" member of "struct child_process" will not be modified by
that API, and it's declared to be "const char *". So let's not
needlessly duplicate these strings.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The module_update() function calls module_list_compute() twice, which
in turn will reset the "struct pathspec" passed to it. Let's instead
track two of them, and clear them both.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call clear_pathspec() at the end of various functions that work with
and allocate a "struct pathspec".
In some cases the zero-initialization here isn't strictly needed, but
as we're moving to a "goto cleanup" pattern let's make sure that it's
safe to call clear_pathspec(), we don't want the data to be
uninitialized.
E.g. for module_foreach() we can see from looking at
module_list_compute() that if it returns non-zero that the "pathspec"
will always have been initialized. But relying on that both assumes
knowledge about parse_pathspec(), and would set up a fragile pattern
going forward.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a leak in code added in 1012a5cbc3 (submodule--helper
run-update-procedure: learn --remote, 2022-03-04), we need to free()
the xstrdup()'d string. This gets e.g. t/t7419-submodule-set-branch.sh
closer to passing under SANITIZE=leak.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a memory leak of the "clone_data_path" variable that we copy or
derive from the "struct module_clone_data" in clone_submodule(). This
code was refactored in preceding commits, but the leak has been with
us since f8eaa0ba98 (submodule--helper, module_clone: always operate
on absolute paths, 2016-03-31).
For the "else" case we don't need to xstrdup() the "clone_data->path",
and we don't need to free our own "clone_data_path". We can therefore
assign the "clone_data->path" to our own "clone_data_path" right away,
and only override it (and remember to free it!) if we need to
xstrfmt() a replacement.
In the case of the module_clone() caller it's from "argv", and doesn't
need to be free'd, and in the case of the add_submodule() caller we
get a pointer to "sm_path", which doesn't need to be directly free'd
either.
Fixing this leak makes several tests pass, so let's mark them as
passing with TEST_PASSES_SANITIZE_LEAK=true.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix bad config API usage added in a452128a36 (submodule--helper:
introduce add-config subcommand, 2021-08-06). After
git_config_get_string() returns successfully we know the "char **dest"
will be non-NULL.
A coccinelle patch that transforms this turns up a couple of other
such issues, one in fetch-pack.c, and another in upload-pack.c:
@@
identifier F =~ "^(repo|git)_config_get_string(_tmp)?$";
identifier V;
@@
!F(..., &V)
- && (V)
But let's focus narrowly on submodule--helper for now, we can fix
those some other time.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in a preceding commit the get_default_remote_submodule() and
remote_submodule_branch() functions would invoke die(), and thus leave
update_submodule() only partially lib-ified. We've addressed the
former of those in a preceding commit, let's now address the latter.
In addition to lib-ifying the function this fixes a potential (but
obscure) segfault introduced by a logic error in
1012a5cbc3 (submodule--helper run-update-procedure: learn --remote,
2022-03-04):
We were assuming that remote_submodule_branch() would always return
non-NULL, but if the submodule_from_path() call in that function fails
we'll return NULL. See its introduction in
92bbe7ccf1 (submodule--helper: add remote-branch helper,
2016-08-03). I.e. we'd previously have segfaulted in the xstrfmt()
call in update_submodule() seen in the context.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in a preceding commit the get_default_remote_submodule() and
remote_submodule_branch() functions would invoke die(), and thus leave
update_submodule() only partially lib-ified. Let's address the former
of those cases.
Change the functions to return an int exit code (non-zero on failure),
while leaving the get_default_remote() function for the callers that
still want the die() semantics.
This change addresses 1/2 of the "die" issue in these two lines in
update_submodule():
char *remote_name = get_default_remote_submodule(update_data->sm_path);
const char *branch = remote_submodule_branch(update_data->sm_path);
We can safely remove the "!default_remote" case from sync_submodule(),
because our get_default_remote_submodule() function now returns a
die_message() on failure, so we can have it and other callers check if
the exit code should be non-zero instead.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix code added in ce125d431a (submodule: extract path to submodule
gitdir func, 2021-09-15) and a77c3fcb5e (submodule--helper: get
remote names from any repository, 2022-03-04) which failed to check
the return values of repo_init() and repo_submodule_init(). If we
failed to initialize the repository or submodule we could segfault
when trying to access the invalid repository structs.
Let's also check that these were the only such logic errors in the
codebase by making use of the "warn_unused_result" attribute. This is
valid as of GCC 3.4.0 (and clang will catch it via its faking of
__GNUC__ ).
As the comment being added to git-compat-util.h we're piggy-backing on
the LAST_ARG_MUST_BE_NULL version check out of lazyness. See
9fe3edc47f (Add the LAST_ARG_MUST_BE_NULL macro, 2013-07-18) for its
addition. The marginal benefit of covering gcc 3.4.0..4.0.0 is
near-zero (or zero) at this point. It mostly matters that we catch
this somewhere.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the libification of codepaths that previously relied on
"must_die_on_failure". In these cases we've always been early aborting
by calling die(), but as we know that these codepaths will properly
handle return codes of 128 to mean an early abort let's have them use
die_message() instead.
This still isn't a complete migration away from die() for these
codepaths, in particular this code in update_submodule() will still call die() in some cases:
char *remote_name = get_default_remote_submodule(update_data->sm_path);
const char *branch = remote_submodule_branch(update_data->sm_path);
But as that code is used by other callers than the "update" code let's
leave converting it for a subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git submodule update" runs it might call "checkout", "merge",
"rebase", or a custom command. Ever since run_update_command() was
added in c51f8f94e5 (submodule--helper: run update procedures from C,
2021-08-24) we'd either exit immediately if the
"submodule.<name>.update" method failed, or in the case of "checkout"
continue trying to update other submodules.
This code used to use the magical "2" return code, but in
55b3f12cb5 (submodule update: use die_message(), 2022-03-15) it was
made to exit(128), which in preceding commits has been changed to
return that 128 code to the top-level.
Let's "libify" this code even more by not having it arbitrarily
override the return code. In practice this doesn't change anything as
the code "git checkout" would return on any normal failure is "1", but
we'll now in principle properly abort the operation if "git checkout"
were to exit with 128.
It would make sense to follow-up this change with a change to allow
the "submodule.<name>.update = !..." (SM_UPDATE_COMMAND) method the
same liberties as "checkout", and perhaps to do the same with a failed
"merge" or "rebase". But let's leave that for now.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preceding commits the codepaths around update_submodules() were
changed from using exit() or die() to ferrying up a
"must_die_on_failure" in the cases where we'd exit(), and in most
cases where we'd die().
We needed to do this this to ensure that we'd early exit or otherwise
abort the update_submodules() processing before it was completed.
Now that those preceding changes have shown that we've converted those
paths, we can remove the remaining "ret == 128" special-cases, leaving
the only such special-case in update_submodules(). I.e. we now know
after having gone through the various codepaths that we were only
returning 128 if we meant to early abort.
In update_submodules() we'll for now set any non-zero non-128 exit
codes to "1", but will start ferrying up the exit code as-is in a
subsequent commit.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Libify the determine_submodule_update_strategy() by having it invoke
die_message() rather than die(), and returning the code die_message()
returns on failure.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change code downstream of module_update() to short-circuit and return
to the top-level on failure, rather than calling exit().
To do so we need to diligently check whether we "must_die_on_failure",
which is a pattern started in c51f8f94e5 (submodule--helper: run
update procedures from C, 2021-08-24), but which hadn't been completed
to the point where we could avoid calling exit() here.
This introduces no functional changes, but makes it easier to both
call these routines as a library in the future, and to eventually
avoid leaking memory.
This and similar control flow in submodule--helper.c could be made
simpler by properly "libifying" it, i.e. to have it consistently
return -1 on failures, and to early return on any non-success.
But let's leave that larger project for now, and (mostly) emulate what
were doing with the "exit(128)" before this change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply some DRY principles in run_update_command() and don't have two
"switch" statements over "ud->update_strategy.type" determine the same
thing.
First we were setting "must_die_on_failure = 1" in all cases except
"SM_UPDATE_CHECKOUT" (and we'd BUG(...) out on the rest). This code
was added in c51f8f94e5 (submodule--helper: run update procedures
from C, 2021-08-24).
Then we'd duplicate same "switch" logic when we were using the
"must_die_on_failure" variable.
Let's instead have the "case" branches in that inner "switch"
determine whether or not the "update must continue" by picking an exit
code.
This also mostly avoids hardcoding the "128" exit code, instead we can
make use of the return value of the die_message() function, which
we've been calling here since 55b3f12cb5 (submodule update: use
die_message(), 2022-03-15). We're still hardcoding it to determine if
we "exit()", but subsequent commit(s) will address that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the submodule_strategy_to_string() function added in
3604242f08 (submodule: port init from shell to C, 2016-04-15) to
really return a "const char *". In the "SM_UPDATE_COMMAND" case it
would return a strbuf_detach().
Furthermore, this function would return NULL on SM_UPDATE_UNSPECIFIED,
so it wasn't safe to xstrdup() its return value in the general case,
or to use it in a sprintf() format as the code removed in the
preceding commit did.
But its callers would never call it with either SM_UPDATE_UNSPECIFIED
or SM_UPDATE_COMMAND. Let's have its behavior reflect how its only
user expects it to behave, and BUG() out on the rest.
By doing this we can also stop needlessly xstrdup()-ing and free()-ing
the memory for the config we're setting. We can instead always use
constant strings. We can also use the *_tmp() variant of
git_config_get_string().
Let's also rename this submodule_strategy_to_string() function to
submodule_update_type_to_string(). Now that it's only tasked with
returning a string version of the "enum submodule_update_type type".
Before it would look at the "command" field in "struct
submodule_update_strategy".
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't call submodule_strategy_to_string() in a BUG() message. These
calls added in c51f8f94e5 (submodule--helper: run update procedures
from C, 2021-08-24) don't need the extra information
submodule_strategy_to_string() gives us, as we'll never reach the
SM_UPDATE_COMMAND case here.
That case is the only one where we'd get any information beyond the
straightforward number-to-string mapping.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add missing braces to an "else" arm in init_submodule(), this
stylistic change makes this code conform to the CodingGuidelines, and
makes a subsequent commit smaller.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Glen Choo <chooglen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>