Various subcommands of "git config" that takes value_regex
learn the "--literal-value" option to take the value_regex option
as a literal string.
* ds/config-literal-value:
config doc: value-pattern is not necessarily a regexp
config: implement --fixed-value with --get*
config: plumb --fixed-value into config API
config: add --fixed-value option, un-implemented
t1300: add test for --replace-all with value-pattern
t1300: test "set all" mode with value-pattern
config: replace 'value_regex' with 'value_pattern'
config: convert multi_replace to flags
"git update-ref --stdin" learns to take multiple transactions in a
single session.
* ps/update-ref-multi-transaction:
update-ref: disallow "start" for ongoing transactions
p1400: use `git-update-ref --stdin` to test multiple transactions
update-ref: allow creation of multiple transactions
t1400: avoid touching refs on filesystem
The on-disk bitmap format has a flag to mark a bitmap to be "reused".
This is a rather curious feature, and works like this:
- a run of pack-objects would decide to mark the last 80% of the
bitmaps it generates with the reuse flag
- the next time we generate bitmaps, we'd see those reuse flags from
the last run, and mark those commits as special:
- we'd be more likely to select those commits to get bitmaps in
the new output
- when generating the bitmap for a selected commit, we'd reuse the
old bitmap as-is (rearranging the bits to match the new pack, of
course)
However, neither of these behaviors particularly makes sense.
Just because a commit happened to be bitmapped last time does not make
it a good candidate for having a bitmap this time. In particular, we may
choose bitmaps based on how recent they are in history, or whether a ref
tip points to them, and those things will change. We're better off
re-considering fresh which commits are good candidates.
Reusing the existing bitmap _is_ a reasonable thing to do to save
computation. But only reusing exact bitmaps is a weak form of this. If
we have an old bitmap for A and now want a new bitmap for its child, we
should be able to compute that only by looking at trees and that are new
to the child. But this code would consider only exact reuse (which is
perhaps why it was eager to select those commits in the first place).
Furthermore, the recent switch to the reverse-edge algorithm for
generating bitmaps dropped this optimization entirely (and yet still
performs better).
So let's do a few cleanups:
- drop the whole "reusing bitmaps" phase of generating bitmaps. It's
not helping anything, and is mostly unused code (or worse, code that
is using CPU but not doing anything useful)
- drop the use of the on-disk reuse flag to select commits to bitmap
- stop setting the on-disk reuse flag in bitmaps we generate (since
nothing respects it anymore)
We will keep a few innards of the reuse code, which will help us
implement a more capable version of the "reuse" optimization:
- simplify rebuild_existing_bitmaps() into a function that only builds
the mapping of bits between the old and new orders, but doesn't
actually convert any bitmaps
- make rebuild_bitmap() public; we'll call it lazily to convert bitmaps
as we traverse (using the mapping created above)
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If 'git clone' couldn't execute 'transport_fetch_refs()' (e.g., because
of an error on the remote's side in 'git upload-pack'), then it will
silently ignore it.
Even though this has been the case at least since clone was ported to C
(way back in 8434c2f1af (Build in clone, 2008-04-27)), 'git fetch'
doesn't ignore these and reports any failures it sees.
That suggests that ignoring the return value in 'git clone' is simply an
oversight that should be corrected. That's exactly what this patch does.
(Noticing and fixing this is no coincidence, we'll want it in the next
patch in order to demonstrate a regression in 'git upload-pack' via a
'git clone'.)
There's no additional logging here, but that matches how 'git fetch'
handles the same case. An assumption there is that whichever part of
transport_fetch_refs() fails will complain loudly, so any additional
logging here is redundant.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify the logic to deal with a repack operation that ended up
creating the same packfile.
* tb/repack-simplify:
builtin/repack.c: don't move existing packs out of the way
builtin/repack.c: keep track of what pack-objects wrote
repack: make "exts" array available outside cmd_repack()
"git pull --rebase --recurse-submodules" checked for local changes
in a wrong range and failed to run correctly when it should.
* pb/pull-rebase-recurse-submodules:
pull: check for local submodule modifications with the right range
t5572: describe '--rebase' tests a little more
t5572: add notes on a peculiar test
pull --rebase: compute rebase arguments in separate function
sparse-checkouts are built on the patterns in the
$GIT_DIR/info/sparse-checkout file, where commands have modified
behavior for paths that do not match those patterns. The differences in
behavior, as far as the bugs concerned here, fall into three different
categories (with git subcommands that fall into each category listed):
* commands that only look at files matching the patterns:
* status
* diff
* clean
* update-index
* commands that remove files from the working tree that do not match
the patterns, and restore files that do match them:
* read-tree
* switch
* checkout
* reset (--hard)
* commands that omit writing files to the working tree that do not
match the patterns, unless those files are not clean:
* merge
* rebase
* cherry-pick
* revert
There are some caveats above, e.g. a plain `git diff` ignores files
outside the sparsity patterns but will show diffs for paths outside the
sparsity patterns when revision arguments are passed. (Technically,
diff is treating the sparse paths as matching HEAD.) So, there is some
internal inconsistency among these commands. There are also additional
commands that should behave differently in the face of sparse-checkouts,
as the sparse-checkout documentation alludes to, but the above is
sufficient for me to explain how `git stash` is affected.
What is relevant here is that logically 'stash' should behave like a
merge; it three-way merges the changes the user had in progress at stash
creation time, the HEAD at the time the stash was created, and the
current HEAD, in order to get the stashed changes applied to the current
branch. However, this simplistic view doesn't quite work in practice,
because stash tweaks it a bit due to two factors: (1) flags like
--keep-index and --include-untracked (why we used two different verbs,
'keep' and 'include', is a rant for another day) modify what should be
staged at the end and include more things that should be quasi-merged,
(2) stash generally wants changes to NOT be staged. It only provides
exceptions when (a) some of the changes had conflicts and thus we want
to use stages to denote the clean merges and higher order stages to
mark the conflicts, or (b) if there is a brand new file we don't want
it to become untracked.
stash has traditionally gotten this special behavior by first doing a
merge, and then when it's clean, applying a pipeline of commands to
modify the result. This series of commands for
unstaging-non-newly-added-files came from the following commands:
git diff-index --cached --name-only --diff-filter=A $CTREE >"$a"
git read-tree --reset $CTREE
git update-index --add --stdin <"$a"
rm -f "$a"
Looking back at the different types of special sparsity handling listed
at the beginning of this message, you may note that we have at least one
of each type covered here: merge, diff-index, and read-tree. The weird
mix-and-match led to 3 different bugs:
(1) If a path merged cleanly and it didn't match the sparsity patterns,
the merge backend would know to avoid writing it to the working tree and
keep the SKIP_WORKTREE bit, simply only updating it in the index.
Unfortunately, the subsequent commands would essentially undo the
changes in the index and thus simply toss the changes altogether since
there was nothing left in the working tree. This means the stash is
only partially applied.
(2) If a path existed in the worktree before `git stash apply` despite
having the SKIP_WORKTREE bit set, then the `git read-tree --reset` would
print an error message of the form
error: Entry 'modified' not uptodate. Cannot merge.
and cause stash to abort early.
(3) If there was a brand new file added by the stash, then the
diff-index command would save that pathname to the temporary file, the
read-tree --reset would remove it from the index, and the update-index
command would barf due to no such file being present in the working
copy; it would print a message of the form:
error: NEWFILE: does not exist and --remove not passed
fatal: Unable to process path NEWFILE
and then cause stash to abort early.
Basically, the whole idea of unstage-unless-brand-new requires special
care when you are dealing with a sparse-checkout. Fix these problems
by applying the following simple rule:
When we unstage files, if they have the SKIP_WORKTREE bit set,
clear that bit and write the file out to the working directory.
(*) If there's already a file present in the way, rename it first.
This fixes all three problems in t7012.13 and allows us to mark it as
passing.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When stash was converted from shell to a builtin, it merely
transliterated the forking of various git commands from shell to a C
program that would fork the same commands. Some of those were converted
over to actual library calls, but much of the pipeline-of-commands
design still remains. Fix some of this by replacing the portion
corresponding to
git diff-index --cached --name-only --diff-filter=A $CTREE >"$a"
git read-tree --reset $CTREE
git update-index --add --stdin <"$a"
rm -f "$a"
into a library function that does the same thing. (The read-tree
--reset was already partially converted over to a library call, but as
an independent piece.) Note here that this came after a merge operation
was performed. The merge machinery always stages anything that cleanly
merges, and the above code only runs if there are no conflicts. Its
purpose is to make it so that when there are no conflicts, all the
changes from the stash are unstaged. However, that causes brand new
files from the stash to become untracked, so the code above first saves
those files off and then re-adds them afterwards.
We replace the whole series of commands with a simple function that will
unstage files that are not newly added. This doesn't fix any bugs in
the usage of these commands, it simply matches the existing behavior but
makes it into a single atomic operation that we can then operate on as a
whole. A subsequent commit will take advantage of this to fix issues
with these commands in sparse-checkouts.
This conversion incidentally fixes t3906.1, because the separate
update-index process would die with the following error messages:
error: uninitialized_sub: is a directory - add files inside instead
fatal: Unable to process path uninitialized_sub
The unstaging of the directory as a submodule meant it was no longer
tracked, and thus as an uninitialized directory it could not be added
back using `git update-index --add`, thus resulting in this error and
early abort. Most of the submodule tests in 3906 continue to fail after
this change, this change was just enough to push the first of those
tests to success.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To generate its filename, the 'git bugreport' builtin asks the system
for the current time with 'localtime()'. Since this uses a shared
buffer, it is not thread-safe.
Even though 'git bugreport' is not multi-threaded, using localtime() can
trigger some static analysis tools to complain, and a quick
$ git grep -oh 'localtime\(_.\)\?' -- **/*.c | sort | uniq -c
shows that the only usage of the thread-unsafe 'localtime' is in a piece
of documentation.
So, convert this instance to use the thread-safe version for
consistency, and to appease some analysis tools.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Multiple "credential-store" backends can race to lock the same
file, causing everybody else but one to fail---reattempt locking
with some timeout to reduce the rate of the failure.
* sa/credential-store-timeout:
crendential-store: use timeout when locking file
Fix formulation of an error message with two placeholders in "git
worktree add" subcommand.
* mt/worktree-error-message-fix:
worktree: fix order of arguments in error message
Fix an option name in "gc" documentation.
* ab/gc-keep-base-option:
gc: rename keep_base_pack variable for --keep-largest-pack
gc docs: change --keep-base-pack to --keep-largest-pack
The "git maintenance run" and "git maintenance start/stop" commands
holds a file-based lock at the .git/maintenance.lock and
.git/schedule.lock respectively. These locks are used to ensure only
one maintenance process is executed at the time as both operations
involves writing data into the git repository.
The path to the lock file is built using
"the_repository->objects->odb->path" that results in SEGFAULT when we
have no repository available as "the_repository->objects->odb" is
set to NULL.
Let's teach maintenance command to use RUN_SETUP option that will
provide the validation and fail when running outside of a repository.
Hence fixing the SEGFAULT for all three operations and making the
behaviour consistent across all subcommands.
Setting the RUN_SETUP also provides the same protection for all
subcommands given that the "register" and "unregister" also requires to
be executed inside a repository.
Furthermore let's remove the local validation implemented by the
"register" and "unregister" as this will not be required anymore with
the new option.
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code was not prepared to deal with pack .idx file that is
larger than 4GB.
* jk/4gb-idx:
packfile: detect overflow in .idx file size checks
block-sha1: take a size_t length parameter
fsck: correctly compute checksums on idx files larger than 4GB
use size_t to store pack .idx byte offsets
compute pack .idx byte offsets using size_t
The exchange between receive-pack and proc-receive hook did not
carefully check for errors.
* jx/t5411-flake-fix:
receive-pack: use default version 0 for proc-receive
receive-pack: gently write messages to proc-receive
t5411: new helper filter_out_user_friendly_and_stable_output
When a repository's leading directories contain regex metacharacters,
the config calls for 'git maintenance register' and 'git maintenance
unregister' are not careful enough. Use the new --fixed-value option
to direct the config machinery to use exact string matches. This is a
more robust option than escaping these arguments in a piecemeal fashion.
For the test, require that we are not running on Windows since the '+'
and '*' characters are not allowed on that filesystem.
Reported-by: Emily Shaffer <emilyshaffer@google.com>
Reported-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config builtin does its own regex matching of values for the --get,
--get-all, and --get-regexp modes. Plumb the existing 'flags' parameter
to the get_value() method so we can initialize the value-pattern argument
as a fixed string instead of a regex pattern.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git_config_set_multivar_in_file_gently() and related methods now
take a 'flags' bitfield, so add a new bit representing the --fixed-value
option from 'git config'. This alters the purpose of the value_pattern
parameter to be an exact string match. This requires some initialization
changes in git_config_set_multivar_in_file_gently() and a new strcmp()
call in the matches() method.
The new CONFIG_FLAGS_FIXED_VALUE flag is initialized in builtin/config.c
based on the --fixed-value option, and that needs to be updated in
several callers.
This patch only affects some of the modes of 'git config', and the rest
will be completed in the next change.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git config' builtin takes a 'value-pattern' parameter for several
actions. This can cause confusion when expecting exact value matches
instead of regex matches, especially when the input string contains
metacharacters. While callers can escape the patterns themselves, it
would be more friendly to allow an argument to disable the pattern
matching in favor of an exact string match.
Add a new '--fixed-value' option that does not currently change the
behavior. The implementation will be filled in by later changes for
each appropriate action. For now, check and test that --fixed-value
will abort the command when included with an incompatible action or
without a 'value-pattern' argument.
The name '--fixed-value' was chosen over something simpler like
'--fixed' because some commands allow regular expressions on the
key in addition to the value.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'value_regex' argument in the 'git config' builtin is poorly named,
especially related to an upcoming change that allows exact string
matches instead of ERE pattern matches.
Perform a mostly mechanical change of every instance of 'value_regex' to
'value_pattern' in the codebase. This is only critical for documentation
and error messages, but it is best to be consistent inside the codebase,
too.
For documentation, use 'value-pattern' which is better punctuation. This
affects Documentation/git-config.txt and the usage in builtin/config.c,
which was already mixed between 'value_regex' and 'value-regex'.
I gave some thought to leaving the value_regex variables inside config.c
that are regex_t pointers. However, it is probably best to keep the name
consistent with the rest of the variables.
This does not update the translations inside the po/ directory, as that
creates conflicts with ongoing work. The input strings should
automatically update through automation, and a few of the output strings
currently use "[value_regex]" directly.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will extend the flexibility of the config API. Before doing so, let's
take an existing 'int multi_replace' parameter and replace it with a new
'unsigned flags' parameter that can take multiple options as a bit field.
Update all callers that specified multi_replace to now specify the
CONFIG_FLAGS_MULTI_REPLACE flag. To add more clarity, extend the
documentation of git_config_set_multivar_in_file() including a clear
labeling of its arguments. Other config API methods in config.h require
only a change of the final parameter from 'int' to 'unsigned'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When holding the lock for rewriting the credential file, use a timeout
to avoid race conditions when the credentials file needs to be updated
in parallel.
An example would be doing `fetch --all` on a repository with several
remotes that need credentials, using parallel fetching.
The timeout can be configured using "credentialStore.lockTimeoutMS",
defaulting to 1 second.
Signed-off-by: Simão Afonso <simao.afonso@powertools-tech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing schedule mechanism using 'cron' is supported by POSIX
platforms, but not Windows. It also works slightly differently on
macOS to significant detriment of the user experience. To allow for
new implementations on these platforms, extract a method that
performs the platform-specific scheduling mechanism. This will be
swapped at compile time with new implementations on specialized
platforms.
As we add this generality, rename GIT_TEST_CRONTAB to
GIT_TEST_MAINT_SCHEDULER. Further, this variable is now parsed as
"<scheduler>:<command>" so we can test platform-specific scheduling
logic even when not on the correct platform. By specifying the
<scheduler> in this string, we will be able to test all three sets of
Git logic from a Linux machine.
Co-authored-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Restore a space that was lost in 8a0fc8d19d (stash: convert apply to
builtin, 2019-02-25).
Signed-off-by: Kyle Meyer <kyle@kyleam.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A specialization of hashmap that uses a string as key has been
introduced. Hopefully it will see wider use over time.
* en/strmap:
shortlog: use strset from strmap.h
Use new HASHMAP_INIT macro to simplify hashmap initialization
strmap: take advantage of FLEXPTR_ALLOC_STR when relevant
strmap: enable allocations to come from a mem_pool
strmap: add a strset sub-type
strmap: split create_entry() out of strmap_put()
strmap: add functions facilitating use as a string->int map
strmap: enable faster clearing and reusing of strmaps
strmap: add more utility functions
strmap: new utility functions
hashmap: provide deallocation function names
hashmap: introduce a new hashmap_partial_clear()
hashmap: allow re-use after hashmap_free()
hashmap: adjust spacing to fix argument alignment
hashmap: add usage documentation explaining hashmap_free[_entries]()
"git rev-parse" learned the "--end-of-options" to help scripts to
safely take a parameter that is supposed to be a revision, e.g.
"git rev-parse --verify -q --end-of-options $rev".
* jk/rev-parse-end-of-options:
rev-parse: handle --end-of-options
rev-parse: put all options under the "-" check
rev-parse: don't accept options after dashdash
The maximum length of output filenames "git format-patch" creates
has become configurable (used to be capped at 64).
* jc/format-patch-name-max:
format-patch: make output filename configurable
In 15fabd1bbd ("builtin/grep.c: make configuration callback more
reusable", 2012-10-09), we learned to fill a `static struct grep_opt
grep_defaults` which we can use as a blueprint for other such structs.
At the time, we didn't consider designated initializers to be widely
useable, but these days, we do. (See, e.g., cbc0f81d96 ("strbuf: use
designated initializers in STRBUF_INIT", 2017-07-10).)
Use designated initializers to let the compiler set up the struct and so
that we don't need to remember to call `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`init_grep_defaults()` fills a `static struct grep_opt grep_defaults`.
This struct is then used by `grep_init()` as a blueprint for other such
structs. Notably, `grep_init()` takes a `struct repo *` and assigns it
into the target struct.
As a result, it is unnecessary for us to take a `struct repo *` in
`init_grep_defaults()` as well. We assign it into the default struct and
never look at it again. And in light of how we return early if we have
already set up the default struct, it's not just unnecessary, but is
also a bit confusing: If we are called twice and with different repos,
is it a bug or a feature that we ignore the second repo?
Drop the repo parameter for `init_grep_defaults()`.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git worktree add` (without --force) errors out when given a path
that is already registered as a worktree and the path is missing on
disk. But the `cmd` and `path` strings are switched on the error
message. Let's fix that.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted in an earlier change the keep_base_pack variable name is a
relic from an earlier on-list version of ae4e89e549 ("gc: add
--keep-largest-pack option", 2018-04-15) before it was renamed to
--keep-largest-pack.
Let's change the variable name to avoid that confusion, it's easier to
read the code if there's a 1=1 mapping between the variable name and
option name.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d18c950a69 (pull: warn if the user didn't say whether to rebase or
to merge, 2020-03-09), a new hint was introduced to encourage users to
make a conscious decision about whether they want their pull to merge or
to rebase by configuring the `pull.rebase` setting.
This warning was clearly intended to advise users, but as pointed out in
https://lore.kernel.org/git/87ima2rdsm.fsf%40evledraar.gmail.com, it
uses `warning()` instead of `advise()`.
One consequence is that the advice is not colorized in the same manner
as other, similar messages. So let's use `advise()` instead.
Pointed-out-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
compare_tasks_by_selection() is used with QSORT and gets passed pointers
to the elements of "static struct maintenance_task tasks[]". It casts
the *addresses* of these passed pointers to element pointers, though,
and thus effectively compares some unrelated values from the stack. Fix
the casts to actually compare array elements.
Detected by USan (make SANITIZE=undefined test).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame -L :funcname -- path" did not work well for a path for
which a userdiff driver is defined.
* pb/blame-funcname-range-userdiff:
blame: simplify 'setup_blame_bloom_data' interface
blame: simplify 'setup_scoreboard' interface
blame: enable funcname blaming with userdiff driver
line-log: mention both modes in 'blame' and 'log' short help
doc: add more pointers to gitattributes(5) for userdiff
blame-options.txt: also mention 'funcname' in '-L' description
doc: line-range: improve formatting
doc: log, gitk: move '-L' description to 'line-range-options.txt'
Preparation for a new merge strategy.
* en/merge-ort-api-null-impl:
merge,rebase,revert: select ort or recursive by config or environment
fast-rebase: demonstrate merge-ort's API via new test-tool command
merge-ort-wrappers: new convience wrappers to mimic the old merge API
merge-ort: barebones API of new merge strategy with empty implementation
Parts of "git maintenance" to ease writing crontab entries (and
other scheduling system configuration) for it.
* ds/maintenance-part-3:
maintenance: add troubleshooting guide to docs
maintenance: use 'incremental' strategy by default
maintenance: create maintenance.strategy config
maintenance: add start/stop subcommands
maintenance: add [un]register subcommands
for-each-repo: run subcommands on configured repos
maintenance: add --schedule option and config
maintenance: optionally skip --auto process
"git rebase -i" did not store ORIG_HEAD correctly.
* pw/rebase-i-orig-head:
rebase -i: simplify get_revision_ranges()
rebase -i: use struct object_id when writing state
rebase -i: use struct object_id rather than looking up commit
rebase -i: stop overwriting ORIG_HEAD buffer
"git format-patch --output=there" did not work as expected and
instead crashed. The option is now supported.
* jk/format-patch-output:
format-patch: support --output option
format-patch: tie file-opening logic to output_directory
format-patch: refactor output selection
"git log -L<range>:<path>" is documented to take no pathspec, but
this was not enforced by the command line option parser, which has
been corrected.
* jc/line-log-takes-no-pathspec:
log: diagnose -L used with pathspec as an error
The code to see if "git stash drop" can safely remove refs/stash
has been made more carerful.
* rs/empty-reflog-check-fix:
stash: simplify reflog emptiness check
When 'git repack' creates a pack with the same name as any existing
pack, it moves the existing one to 'old-pack-xxx.{pack,idx,...}' and
then renames the new one into place.
Eventually, it would be nice to have 'git repack' allow for writing a
multi-pack index at the critical time (after the new packs have been
written / moved into place, but before the old ones have been deleted).
Guessing that this option might be called '--write-midx', this makes the
following situation (where repacks are issued back-to-back without any
new objects) impossible:
$ git repack -adb
$ git repack -adb --write-midx
In the second repack, the existing packs are overwritten verbatim with
the same rename-to-old sequence. At that point, the current MIDX is
invalidated, since it refers to now-missing packs. So that code wants to
be run after the MIDX is re-written. But (prior to this patch) the new
MIDX can't be written until the new packs are moved into place. So, we
have a circular dependency.
This is all hypothetical, since no code currently exists to write a MIDX
safely during a 'git repack' (the 'GIT_TEST_MULTI_PACK_INDEX' does so
unsafely). Putting hypothetical aside, though: why do we need to rename
existing packs to be prefixed with 'old-' anyway?
This behavior dates all the way back to 2ad47d6 (git-repack: Be
careful when updating the same pack as an existing one., 2006-06-25).
2ad47d6 is mainly concerned about a case where a newly written pack
would have a different structure than its index. This used to be
possible when the pack name was a hash of the set of objects. Under this
naming scheme, two packs that store the same set of objects could differ
in delta selection, object positioning, or both. If this happened, then
any such packs would be unreadable in the instant between copying the
new pack and new index (i.e., either the index or pack will be stale
depending on the order that they were copied).
But since 1190a1a (pack-objects: name pack files after trailer hash,
2013-12-05), this is no longer possible, since pack files are named not
after their logical contents (i.e., the set of objects), but by the
actual checksum of their contents. So, this old- behavior can safely go,
which allows us to avoid our circular dependency above.
In addition to avoiding the circular dependency, this patch also makes
'git repack' a lot simpler, since we don't have to deal with failures
encountered when renaming existing packs to be prefixed with 'old-'.
This patch is mostly limited to removing code paths that deal with the
'old' prefixing, with the exception of files that include the pack's
name in their own filename, like .idx, .bitmap, and related files. The
exception is that we want to continue to trust what pack-objects wrote.
That is, it is not the case that we pretend as if pack-objects didn't
write files identical to ones that already exist, but rather that we
respect what pack-objects wrote as the source of truth. That cuts two
ways:
- If pack-objects produced an identical pack to one that already
exists with a bitmap, but did not produce a bitmap, we remove the
bitmap that already exists. (This behavior is codified in t7700.14).
- If pack-objects produced an identical pack to one that already
exists, we trust the just-written version of the coresponding .idx,
.promisor, and other files over the ones that already exist. This
ensures that we use the most up-to-date versions of this files,
which is safe even in the face of format changes in, say, the .idx
file (which would not be reflected in the .idx file's name).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since 'git pull' learned '--recurse-submodules' in a6d7eb2c7a
(pull: optionally rebase submodules (remote submodule changes only),
2017-06-23), we check if there are local submodule modifications by
checking the revision range 'curr_head --not rebase_fork_point'.
The goal of this check is to abort the pull if there are submodule
modifications in the local commits being rebased, since this scenario is
not supported.
However, the actual range of commits being rebased is not
'rebase_fork_point..curr_head', as the logic in
'get_rebase_newbase_and_upstream' reveals, it is 'upstream..curr_head'.
If the 'git merge-base --fork-point' invocation in
'get_rebase_fork_point' fails to find a fork point between the current
branch and the remote-tracking branch we are pulling from,
'rebase_fork_point' is null and since 4d36f88be7 (submodule: do not pass
null OID to setup_revisions, 2018-05-24), 'submodule_touches_in_range'
checks 'curr_head' and all its ancestors for submodule modifications.
Since it is highly likely that there are submodule modifications in this
range (which is in effect the whole history of the current branch), this
prevents 'git pull --rebase --recurse-submodules' from succeeding if no
fork point exists between the current branch and the remote-tracking
branch being pulled. This can happen, for example, when the current
branch was forked from a commit which was never recorded in the reflog
of the remote-tracking branch we are pulling, as the last two paragraphs
of the "Discussion on fork-point mode" section in git-merge-base(1)
explain.
Fix this bug by passing 'upstream' instead of 'rebase_fork_point' as the
'excl_oid' argument to 'submodule_touches_in_range'.
Reported-by: Brice Goglin <bgoglin@free.fr>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function 'run_rebase' is responsible for constructing the
command line to be passed to 'git rebase'. This includes both forwarding
pass-through options given to 'git pull' as well computing the <newbase>
and <upstream> arguments to 'git rebase'.
A following commit will need to access the <upstream> argument in
'cmd_pull' to fix a bug with 'git pull --rebase --recurse-submodules'.
In order to do so, refactor the code so that the <newbase> and
<upstream> commits are computed in a new, separate function,
'get_rebase_newbase_and_upstream'.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the subsequent commit, it will become useful to keep track of which
metadata files were written by pack-objects. We already do this to an
extent with the 'exts' array, which only is used in the context of
existing packs.
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'll use it in a helper function soon.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is currently possible to write multiple "start" commands into
git-update-ref(1) for a single session, but none of them except for the
first one actually have any effect.
Using such nested "start"s may eventually have a sensible effect. One
may imagine that it restarts the current transaction, effectively
emptying it and creating a new one. It may also allow for creation of
nested transactions. But currently, none of these are implemented.
Silently ignoring this misuse is making it hard to iterate in the future
if "start" is ever going to have meaningful semantics in such a context.
This commit thus makes sure to error out in case we see such use.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While git-update-ref has recently grown commands which allow interactive
control of transactions in e48cf33b61 (update-ref: implement interactive
transaction handling, 2020-04-02), it is not yet possible to create
multiple transactions in a single session. To do so, one currently still
needs to invoke the executable multiple times.
This commit addresses this shortcoming by allowing the "start" command
to create a new transaction if the current transaction has already been
either committed or aborted.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes store the offset into a pack .idx file as an "unsigned
long", but the mmap'd size of a pack .idx file can exceed 4GB. This is
sufficient on LP64 systems like Linux, but will be too small on LLP64
systems like Windows, where "unsigned long" is still only 32 bits. Let's
use size_t, which is a better type for an offset into a memory buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A pack and its matching .idx file are limited to 2^32 objects, because
the pack format contains a 32-bit field to store the number of objects.
Hence we use uint32_t in the code.
But the byte count of even a .idx file can be much larger than that,
because it stores at least a hash and an offset for each object. So
using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650
objects. This confuses load_idx(), which computes the minimum size like
this:
unsigned long min_size = 8 + 4*256 + nr*(hashsz + 4 + 4) + hashsz + hashsz;
Even though min_size will be big enough on most 64-bit platforms, the
actual arithmetic is done as a uint32_t, resulting in a truncation. We
actually exceed that min_size, but then we do:
unsigned long max_size = min_size;
if (nr)
max_size += (nr - 1)*8;
to account for the variable-sized table. That computation doesn't
overflow quite so low, but with the truncation for min_size, we end up
with a max_size that is much smaller than our actual size. So we
complain that the idx is invalid, and can't find any of its objects.
We can fix this case by casting "nr" to a size_t, which will do the
multiplication in 64-bits (assuming you're on a 64-bit platform; this
will never work on a 32-bit system since we couldn't map the whole .idx
anyway). Likewise, we don't have to worry about further additions,
because adding a smaller number to a size_t will convert the other side
to a size_t.
A few notes:
- obviously we could just declare "nr" as a size_t in the first place
(and likewise, packed_git.num_objects). But it's conceptually a
uint32_t because of the on-disk format, and we correctly treat it
that way in other contexts that don't need to compute byte offsets
(e.g., iterating over the set of objects should and generally does
use a uint32_t). Switching to size_t would make all of those other
cases look wrong.
- it could be argued that the proper type is off_t to represent the
file offset. But in practice the .idx file must fit within memory,
because we mmap the whole thing. And the rest of the code (including
the idx_size variable we're comparing against) uses size_t.
- we'll add the same cast to the max_size arithmetic line. Even though
we're adding to a larger type, which will convert our result, the
multiplication is still done as a 32-bit value and can itself
overflow. I didn't check this with my test case, since it would need
an even larger pack (~530M objects), but looking at compiler output
shows that it works this way. The standard should agree, but I
couldn't find anything explicit in 6.3.1.8 ("usual arithmetic
conversions").
The case in load_idx() was the most immediate one that I was able to
trigger. After fixing it, looking up actual objects (including the very
last one in sha1 order) works in a test repo with 153,725,110 objects.
That's because bsearch_hash() works with uint32_t entry indices, and the
actual byte access:
int cmp = hashcmp(table + mi * stride, sha1);
is done with "stride" as a size_t, causing the uint32_t "mi" to be
promoted to a size_t. This is the way most code will access the index
data.
However, I audited all of the other byte-wise accesses of
packed_git.index_data, and many of the others are suspect (they are
similar to the max_size one, where we are adding to a properly sized
offset or directly to a pointer, but the multiplication in the
sub-expression can overflow). I didn't trigger any of these in practice,
but I believe they're potential problems, and certainly adding in the
cast is not going to hurt anything here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When receive-pack receives a session-id capability from the client, log
the received session ID via a trace2 data event.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When transfer.advertiseSID is true, advertise receive-pack's session ID
via the new session-id capability.
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter. Convert
some callsites over to take advantage of this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the verison negotiation phase between "receive-pack" and
"proc-receive", "proc-receive" can send an empty flush-pkt to end the
negotiation and use default version 0. Capabilities (such as
"push-options") are not supported in version 0.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Johannes found a flaky hang in `t5411/test-0013-bad-protocol.sh` in the
osx-clang job of the CI/PR builds, and ran into an issue when using
the `--stress` option with the following error messages:
fatal: unable to write flush packet: Broken pipe
send-pack: unexpected disconnect while reading sideband packet
fatal: the remote end hung up unexpectedly
In this test case, the "proc-receive" hook sends an error message and
dies earlier. While "receive-pack" on the other side of the pipe
should forward the error message of the "proc-receive" hook to the
client side, but it fails to do so. This is because "receive-pack"
uses `packet_write_fmt()` and `packet_flush()` to write pkt-line
message to "proc-receive" hook, and these functions die immediately
when pipe is broken. Using "gently" forms for these functions will get
more predicable output.
Add more "--die-*" options to test helper to test different stages of
the protocol between "receive-pack" and "proc-receive" hook.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We taught rev-list a new way to separate options from revisions in
19e8789b23 (revision: allow --end-of-options to end option parsing,
2019-08-06), but rev-parse uses its own parser. It should know about
--end-of-options not only for consistency, but because it may be
presented with similarly ambiguous cases. E.g., if a caller does:
git rev-parse "$rev" -- "$path"
to parse an untrusted input, then it will get confused if $rev contains
an option-like string like "--local-env-vars". Or even "--not-real",
which we'd keep as an option to pass along to rev-list.
Or even more importantly:
git rev-parse --verify "$rev"
can be confused by options, even though its purpose is safely parsing
untrusted input. On the plus side, it will always fail the --verify
part, as it will not have parsed a revision, so the caller will
generally "fail closed" rather than continue to use the untrusted
string. But it will still trigger whatever option was in "$rev"; this
should be mostly harmless, since rev-parse options are all read-only,
but I didn't carefully audit all paths.
This patch lets callers write:
git rev-parse --end-of-options "$rev" -- "$path"
and:
git rev-parse --verify --end-of-options "$rev"
which will both treat "$rev" always as a revision parameter. The latter
is a bit clunky. It would be nicer if we had defined "--verify" to
require that its next argument be the revision. But we have not
historically done so, and:
git rev-parse --verify -q "$rev"
does currently work. I added a test here to confirm that we didn't break
that.
A few implementation notes:
- We don't document --end-of-options explicitly in commands, but rather
in gitcli(7). So I didn't give it its own section in git-rev-parse(1).
But I did call it out specifically in the --verify section, and
include it in the examples, which should show best practices.
- We don't have to re-indent the main option-parsing block, because we
can combine our "did we see end of options" check with "does it start
with a dash". The exception is the pre-setup options, which need
their own block.
- We do however have to pull the "--" parsing out of the "does it start
with dash" block, because we want to parse it even if we've seen
--end-of-options.
- We'll leave "--end-of-options" in the output. This is probably not
technically necessary, as a careful caller will do:
git rev-parse --end-of-options $revs -- $paths
and anything in $revs will be resolved to an object id. However, it
does help a slightly less careful caller like:
git rev-parse --end-of-options $revs_or_paths
where a path "--foo" will remain in the output as long as it also
exists on disk. In that case, it's helpful to retain --end-of-options
to get passed along to rev-list, s it would otherwise see just
"--foo".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The option-parsing loop of rev-parse checks whether the first character
of an arg is "-". If so, then it enters a series of conditionals
checking for individual options. But some options are inexplicably
outside of that outer conditional.
This doesn't produce the wrong behavior; the conditional is actually
redundant with the individual option checks, and it's really only its
fallback "continue" that we care about. But we should at least be
consistent.
One obvious alternative is that we could get rid of the conditional
entirely. But we'll be using the extra block it provides in the next
patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because of the order in which we check options in rev-parse, there are a
few options we accept even after a "--". This is wrong, because the
whole point of "--" is to say "everything after here is a path". Let's
move the "did we see a dashdash" check (it's called "as_is" in the code)
to the top of the parsing loop.
Note there is one subtlety here. The options are ordered so that some
are checked before we even see if we're in a repository (they continue
the loop, and if we get past a certain point, then we do the repository
setup). By moving the as_is check higher, it's also in that "before
setup" section, even though it might look at the repository via
verify_filename(). However, this works out: we'd never set as_is until
we parse "--", and we don't parse that until after doing the setup.
An alternative here to avoid the subtlety is to put the as_is check at
the top of the post-setup options. But then every pre-setup option would
have to remember to check "if (!as_is && !strcmp(...))". So while this
is a bit magical, it's harder for future code to get wrong.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the past 15 years, we've used the hardcoded 64 as the length
limit of the filename of the output from the "git format-patch"
command. Since the value is shorter than the 80-column terminal, it
could grow without line wrapping a bit. At the same time, since the
value is longer than half of the 80-column terminal, we could fit
two or more of them in "ls" output on such a terminal if we allowed
to lower it.
Introduce a new command line option --filename-max-length=<n> and a
new configuration variable format.filenameMaxLength to override the
hardcoded default.
While we are at it, remove a check that the name of output directory
does not exceed PATH_MAX---this check is pointless in that by the
time control reaches the function, the caller would already have
done an equivalent of "mkdir -p", so if the system does not like an
overly long directory name, the control wouldn't have reached here,
and otherwise, we know that the system allowed the output directory
to exist. In the worst case, we will get an error when we try to
open the output file and handle the error correctly anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Exit codes from "git remote add" etc. were not usable by scripted
callers.
* ab/git-remote-exit-code:
remote: add meaningful exit code on missing/existing
"git checkout-index" did not consistently signal an error with its
exit status.
* jk/checkout-index-errors:
checkout-index: propagate errors to exit code
checkout-index: drop error message from empty --stage=all
Now that all the external users of head_hash have been converted to
use a opts->orig_head instead we can stop returning head_hash from
get_revision_ranges().
Because we want to pass the full object names back to the caller in
`revisions` the find_unique_abbrev_r() call that was used to initialize
`head_hash` is replaced with oid_to_hex().
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than passing a string around pass the struct object_id that the
string was created from call oid_hex() when we write the file.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have a struct object_id containing the oid that we want to
set ORIG_HEAD to so use that rather than converting it to a string and
then calling get_oid() on that string.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After rebasing, ORIG_HEAD is supposed to point to the old HEAD of the
rebased branch. The code used find_unique_abbrev() to obtain the
object name of the old HEAD and wrote to both
.git/rebase-merge/orig-head (used by `rebase --abort` to go back to
the previous state) and to ORIG_HEAD. The buffer find_unique_abbrev()
gives back is volatile, unfortunately, and was overwritten after the
former file is written but before ORIG_FILE is written, leaving an
incorrect object name in it.
Avoid relying on the volatile buffer of find_unique_abbrev(), and
instead supply our own buffer to keep the object name.
I think that all of the users of head_hash should actually be using
opts->orig_head instead as passing a string rather than a struct
object_id around is a hang over from the scripted implementation. This
patch just fixes the immediate bug and adds a regression test based on
Caspar's reproduction example[1]. The users will be converted to use
struct object_id and head_hash removed in the next few commits.
[1] https://lore.kernel.org/git/CAFzd1+7PDg2PZgKw7U0kdepdYuoML9wSN4kofmB_-8NHrbbrHg@mail.gmail.com
Reported-by: Caspar Duregger <herr.kaste@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We've never intended to support diff's --output option in format-patch.
And until baa4adc66a (parse-options: disable option abbreviation with
PARSE_OPT_KEEP_UNKNOWN, 2019-01-27), it was impossible to trigger. We
first parse the format-patch options before handing the remainder off to
setup_revisions(). Before that commit, we'd accept "--output=foo" as an
abbreviation for "--output-directory=foo". But afterwards, we don't
check abbreviations, and --output gets passed to the diff code.
This results in nonsense behavior and bugs. The diff code will have
opened a filehandle at rev.diffopt.file, but we'll overwrite that with
our own handles that we open for each individual patch file. So the
--output file will always just be empty. But worse, the diff code also
sets rev.diffopt.close_file, so log_tree_commit() will close the
filehandle itself. And then the main loop in cmd_format_patch() will try
to close it again, resulting in a double-free.
The simplest solution would be to just disallow --output with
format-patch, as nobody ever intended it to work. However, we have
accidentally documented it (because format-patch includes diff-options).
And it does work with "git log", which writes the whole output to the
specified file. It's easy enough to make that work for format-patch,
too: it's really the same as --stdout, but pointed at a specific file.
We can detect the use of the --output option by the "close_file" flag
(note that we can't use rev.diffopt.file, since the diff setup will
otherwise set it to stdout). So we just need to unset that flag, but
don't have to do anything else. Our situation is otherwise exactly like
--stdout (note that we don't fclose() the file, but nor does the stdout
case; exiting the program takes care of that for us).
Reported-by: Johannes Postler <johannes.postler@txture.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In format-patch we're either outputting to stdout or to individual files
in an output directory (which may be just "./"). Our logic for whether
to open a new file for each patch is checked with "!use_stdout", but it
is equally correct to check for a non-NULL output_directory.
The distinction will matter when we add a new single-stream output in a
future patch, when only one of the three methods will want individual
files. Let's swap the logic here in preparation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --stdout and --output-directory options are mutually exclusive, but
it's hard to tell from reading the code. We have three separate
conditionals that check for use_stdout, and it's only after we've set up
the output_directory fully that we check whether the user also specified
--stdout.
Instead, let's check the exclusion explicitly first, then have a single
conditional that handles stdout versus an output directory. This is
slightly easier to follow now, and also will keep things sane when we
add another output mode in a future patch.
We'll add a few tests as well, covering the mutual exclusion and the
fact that we are not confused by a configured output directory.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The -L option is documented to accept no pathspec, but the
command line option parser has allowed the combination without
checking so far. Ensure that there is no pathspec when the -L
option is in effect to fix this.
Incidentally, this change fixes another bug in the command line
option parser, which has allowed the -L option used together
with the --follow option. Because the latter requires exactly
one path given, but the former takes no pathspec, they become
mutually incompatible automatically. Because the -L option
follows renames on its own, there is no reason to give --follow
at the same time.
The new tests say they may fail with "-L and --follow being
incompatible" instead of "-L and pathspec being incompatible".
Currently the expected failure can come only from the latter, but
this is to futureproof them, in case we decide to add code to
explicititly die on -L and --follow used together.
Heled-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the testsuite to run where it treats requests for "recursive" or
the default merge algorithm via consulting the environment variable
GIT_TEST_MERGE_ALGORITHM which is expected to either be "recursive" (the
old traditional algorithm) or "ort" (the new algorithm).
Also, allow folks to pick the new algorithm via config setting. It
turns out builtin/merge.c already had a way to allow users to specify a
different default merge algorithm: pull.twohead. Rather odd
configuration name (especially to be in the 'pull' namespace rather than
'merge') but it's there. Add that same configuration to rebase,
cherry-pick, and revert.
This required updating the various callsites that called merge_trees()
or merge_recursive() to conditionally call the new API, so this serves
as another demonstration of what the new API looks and feels like.
There are almost certainly some callsites that have not yet been
modified to work with the new merge algorithm, but this represents the
ones that I have been testing with thus far.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" family of commands learned the "-I<regex>" option to
ignore hunks whose changed lines all match the given pattern.
* mk/diff-ignore-regex:
diff: add -I<regex> that ignores matching changes
merge-base, xdiff: zero out xpparam_t structures
"git credential' didn't honor the core.askPass configuration
variable (among other things), which has been corrected.
* tk/credential-config:
credential: load default config
Document that the meaning of a Signed-off-by trailer can vary from
project to project in the end-user documentation, and clarify what
it means to this project.
* bk/sob-dco:
Documentation: stylistically normalize references to Signed-off-by:
SubmittingPatches: clarify DCO is our --signoff rule
Documentation: clarify and expand description of --signoff
doc: preparatory clean-up of description on the sign-off option
Test-coverage enhancement of running commit-graph task "git
maintenance" as needed led to discovery and fix of a bug.
* ds/maintenance-commit-graph-auto-fix:
maintenance: core.commitGraph=false prevents writes
maintenance: test commit-graph auto condition
"git fast-import" wasted a lot of memory when many marks were in use.
* jk/fast-import-marks-alloc-fix:
fast-import: fix over-allocation of marks storage
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported. Peff suggested we adopt the following names[1]:
- hashmap_clear() - remove all entries and de-allocate any
hashmap-specific data, but be ready for reuse
- hashmap_clear_and_free() - ditto, but free the entries themselves
- hashmap_partial_clear() - remove all entries but don't deallocate
table
- hashmap_partial_clear_and_free() - ditto, but free the entries
This patch provides the new names and converts all existing callers over
to the new naming scheme.
[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The penultimate commit moved the initialization of 'sb.path' in
'builtin/blame.c::cmd_blame' before the call to
'blame.c::setup_blame_bloom_data'. Since 'cmd_blame' is the only caller
of 'setup_blame_bloom_data', it is now unnecessary for
'setup_blame_bloom_data' to receive 'path' as a separate argument, as
'sb.path' is already initialized.
Remove this argument from setup_blame_bloom_data's interface and use the
'path' field of the 'sb' 'struct blame_scoreboard' instead.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit moved the initialization of 'sb.path' in
'builtin/blame.c::cmd_blame' before the call to
'blame.c::setup_scoreboard'. Since 'cmd_blame' is the only caller of
'setup_scoreboard', it is now unnecessary for 'setup_scoreboard' to
receive 'path' as a separate argument, as 'sb.path' is already
initialized.
Remove this argument from setup_scoreboard's interface and use the
'path' field of the 'sb' 'struct blame_scoreboard' instead.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In blame.c::cmd_blame, we send the 'path' field of the 'sb' 'struct
blame_scoreboard' as the 'path' argument to
'line-range.c::parse_range_arg', but 'sb.path' is not set yet; it's set
to the local variable 'path' a few lines later at line 1137.
This 'path' argument is only used in 'parse_range_arg' if we are blaming
a funcname, i.e. `git blame -L :<funcname> <path>`, and in that case it
is sent to 'parse_range_funcname', where it is used to determine if a
userdiff driver should be used for said <path> to match the given
funcname.
Since 'path' is yet unset, the userdiff driver is never used, so we fall
back to the default funcname regex, which is usually not appropriate for
paths that are set to use a specific userdiff driver, and thus either we
match some unrelated lines, or we die with
fatal: -L parameter '<funcname>' starting at line 1: no match
This has been the case ever since `git blame` learned to blame a
funcname in 13b8f68c1f (log -L: :pattern:file syntax to find by
funcname, 2013-03-28).
Enable funcname blaming for paths using specific userdiff drivers by
initializing 'sb.path' earlier in 'cmd_blame', when some of its other
fields are initialized, so that it is set when passed to
'parse_range_arg'.
Add a regression test in 'annotate-tests.sh', which is sourced in
t8001-annotate.sh and t8002-blame.sh, leveraging an existing file used
to test the userdiff patterns in t4018-diff-funcname.
Also, use 'sb.path' instead of 'path' when constructing the error
message at line 1114, for consistency.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git blame -h' and 'git log -h' both show '-L <n,m>' and describe this
option as "Process only line range n,m, counting from 1". No hint is
given that a function name regex can also be used.
Use <range> instead, and expand the description of the option to mention
both modes. Remove "counting from 1" as it's uneeded; it's uncommon to
refer to the first line of a file as "line 0".
Also, for 'git log', improve the wording to better reflect the long help.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calling rev-parse to check if the drop subcommand removed the last stash
and treating its failure as confirmation is fragile, as the command can
fail for other reasons, e.g. because the system is out of memory.
Directly check if the reflog is empty instead, which is more robust.
Reported-by: Marek Mrva <mrva@eof-studios.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow callers to specify the repository to use. Rename the function to
repo_clear_commit_marks to document its new scope. No functional change
intended.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.
* jk/committer-date-is-author-date-fix:
rebase: fix broken email with --committer-date-is-author-date
am: fix broken email with --committer-date-is-author-date
t3436: check --committer-date-is-author-date result more carefully
"git checkout" learned to use checkout.guess configuration variable
and enable/disable its "--[no-]guess" option accordingly.
* dl/checkout-guess:
checkout: learn to respect checkout.guess
Documentation/config/checkout: replace sq with backticks
"git checkout -p A...B [-- <path>]" did not work, even though the
same command without "-p" correctly used the merge-base between
commits A and B.
* dl/checkout-p-merge-base:
t2016: add a NEEDSWORK about the PERL prerequisite
add-patch: add NEEDSWORK about comparing commits
Doc: document "A...B" form for <tree-ish> in checkout and switch
builtin/checkout: fix `git checkout -p HEAD...` bug
"git clone" learned clone.defaultremotename configuration variable
to customize what nickname to use to call the remote the repository
was cloned from.
* sb/clone-origin:
clone: allow configurable default for `-o`/`--origin`
clone: read new remote name from remote_name instead of option_origin
clone: validate --origin option before use
refs: consolidate remote name validation
remote: add tests for add and rename with invalid names
clone: use more conventional config/option layering
clone: add tests for --template and some disallowed option pairs
"git push --force-with-lease[=<ref>]" can easily be misused to lose
commits unless the user takes good care of their own "git fetch".
A new option "--force-if-includes" attempts to ensure that what is
being force-pushed was created after examining the commit at the
tip of the remote ref that is about to be force-replaced.
* sk/force-if-includes:
t, doc: update tests, reference for "--force-if-includes"
push: parse and set flag for "--force-if-includes"
push: add reflog check for "--force-if-includes"
"git worktree list" now shows if each worktree is locked. This
possibly may open us to show other kinds of states in the future.
* rs/worktree-list-show-locked:
worktree: teach `list` to annotate locked worktree
If we encounter an error while checking out an explicit path, we print a
message to stderr but do not actually exit with a non-zero code. While
this is a plumbing command and the behavior goes all the way back to
33db5f4d90 (Add a "checkout-cache" command which does what the name
suggests., 2005-04-09), this is almost certainly an oversight:
- we _do_ return an exit code from checkout_file(); the caller just
never reads it
- errors while checking out all paths (with "-a") do result in a
non-zero exit code.
- it would be quite unusual not to use the exit code for an error,
as otherwise the caller has no idea the command failed except by
scraping stderr
To keep our tests simple and portable, we can use the most obvious
error: asking to checkout a path which is not in the index at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If checkout-index is given --stage=all for a specific path, it will try
to write stages 1-3 (if present) for that path to temporary files.
However, if the file is present only at stage 0, it writes nothing but
gives a confusing message:
$ git checkout-index --stage=all -- Makefile
git checkout-index: Makefile does not exist at stage 4
This is nonsense. There is no stage 4 (it's just an internal enum value
we use for "all"), and the documentation clearly states:
Paths which only have a stage 0 entry will always be omitted from the
output.
Here it's talking about the list of tempfiles written to stdout, but it
seems clear that this case was not meant to be an error. We even have a
test which covers it, but it only checks that the command reports an
exit code of 0, not its stderr. And it reports 0 only because of another
bug which fails to propagate errors (which will be fixed in a subsequent
patch).
So let's make the test more thorough. We'll also cover the case that we
found _no_ entry, not even a stage zero, which should still be an error.
However, because of the other bug, we'll have to mark this as expecting
failure for the moment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the exit code for the likes of "git remote add/rename" to exit
with 2 if the remote in question doesn't exist, and 3 if it
does. Before we'd just die() and exit with the general 128 exit code.
This changes the output message from e.g.:
fatal: remote origin already exists.
To:
error: remote origin already exists.
Which I believe is a feature, since we generally use "fatal" for the
generic errors, and "error" for the more specific ones with a custom
exit code, but this part of the change may break code that already
relies on stderr parsing (not that we ever supported that...).
The motivation for this is a discussion around some code in GitLab's
gitaly which wanted to check this, and had to parse stderr to do so:
https://gitlab.com/gitlab-org/gitaly/-/merge_requests/2695
It's worth noting as an aside that a method of checking this that
doesn't rely on that is to check with "git config" whether the value
in question does or doesn't exist. That introduces a TOCTOU race
condition, but on the other hand this code (e.g. "git remote add")
already has a TOCTOU race.
We go through the config.lock for the actual setting of the config,
but the pseudocode logic is:
read_config();
check_config_and_arg_sanity();
save_config();
So e.g. if a sleep() is added right after the remote_is_configured()
check in add() we'll clobber remote.NAME.url, and add another (usually
duplicate) remote.NAME.fetch entry (and other values, depending on
invocation).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 2.29, "--committer-date-is-author-date" option of "rebase" and
"am" subcommands lost the e-mail address by mistake, which has been
corrected.
* jk/committer-date-is-author-date-fix:
rebase: fix broken email with --committer-date-is-author-date
am: fix broken email with --committer-date-is-author-date
t3436: check --committer-date-is-author-date result more carefully
For the --committer-date-is-author-date option of git-am and git-rebase,
we format the committer ident, then re-parse it to find the name and
email, and then feed those back to fmt_ident().
We can simplify this by handling it all at the time of the fmt_ident()
call. We pass in the appropriate getenv() results, and if they're not
present, then our WANT_COMMITTER_IDENT flag tells fmt_ident() to fill in
the appropriate value from the config. Which is exactly what
git_committer_ident() was doing under the hood.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit e8cbe2118a (am: stop exporting GIT_COMMITTER_DATE, 2020-08-17)
rewrote the code for setting the committer date to use fmt_ident(),
rather than setting an environment variable and letting commit_tree()
handle it. But it introduced two bugs:
- we use the author email string instead of the committer email
- when parsing the committer ident, we used the wrong variable to
compute the length of the email, resulting in it always being a
zero-length string
This commit fixes both, which causes our test of this option via the
rebase "apply" backend to now succeed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen. Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.
Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ted reported an old typo in the git-commit.txt and merge-options.txt.
Namely, the phrase "Signed-off-by line" was used without either a
definite nor indefinite article.
Upon examination, it seems that the documentation (including items in
Documentation/, but also option help strings) have been quite
inconsistent on usage when referring to `Signed-off-by`.
First, very few places used a definite or indefinite article with the
phrase "Signed-off-by line", but that was the initial typo that led
to this investigation. So, normalize using either an indefinite or
definite article consistently.
The original phrasing, in Commit 3f971fc425 (Documentation updates,
2005-08-14), is "Add Signed-off-by line". Commit 6f855371a5 (Add
--signoff, --check, and long option-names. 2005-12-09) switched to
using "Add `Signed-off-by:` line", but didn't normalize the former
commit to match. Later commits seem to have cut and pasted from one
or the other, which is likely how the usage became so inconsistent.
Junio stated on the git mailing list in
<xmqqy2k1dfoh.fsf@gitster.c.googlers.com> a preference to leave off
the colon. Thus, prefer `Signed-off-by` (with backticks) for the
documentation files and Signed-off-by (without backticks) for option
help strings.
Additionally, Junio argued that "trailer" is now the standard term to
refer to `Signed-off-by`, saying that "becomes plenty clear that we
are not talking about any random line in the log message". As such,
prefer "trailer" over "line" anywhere the former word fits.
However, leave alone those few places in documentation that use
Signed-off-by to refer to the process (rather than the specific
trailer), or in places where mail headers are generally discussed in
comparison with Signed-off-by.
Reported-by: "Theodore Y. Ts'o" <tytso@mit.edu>
Signed-off-by: Bradley M. Kuhn <bkuhn@sfconservancy.org>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make `git credential fill` honour the core.askPass variable.
Signed-off-by: Thomas Koutcher <thomas.koutcher@online.fr>
[jk: added test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--bisect-autostart` subcommand is no longer used from the
git-bisect.sh shell script. Instead the function
`bisect_autostart()` is directly called from the C implementation.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--write-terms` subcommand is no longer used from the
git-bisect.sh shell script. Instead the function `write_terms()`
is called from the C implementation of `set_terms()` and
`bisect_start()`.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--check-expected-revs` subcommand is no longer used from the
git-bisect.sh shell script. Functions `check_expected_revs` and
`is_expected_revs` are also deleted.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_state()` shell functions in C and also add a
subcommand `--bisect-state` to `git-bisect--helper` to call them from
git-bisect.sh .
Using `--bisect-state` subcommand is a temporary measure to port shell
function to C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired and will be called by some
other methods.
`bisect_head()` is only called from `bisect_state()`, thus it is not
required to introduce another subcommand.
Note that the `eval` in the changed line of `git-bisect.sh` cannot be
dropped: it is necessary because the `rev` and the `tail`
variables may contain multiple, quoted arguments that need to be
passed to `bisect--helper` (without the quotes, naturally).
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--next-all` subcommand is no longer used from the git-bisect.sh
shell script. Instead the function `bisect_next_all()` is called from
the C implementation of `bisect_next()`.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `--bisect-clean-state` subcommand is no longer used from the
git-bisect.sh shell script. Instead the function
`bisect_clean_state()` is directly called from the C
implementation.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the subcommand to `git bisect--helper` and call it from
git-bisect.sh.
With the conversion of `bisect_auto_next()` from shell to C in a
previous commit, `bisect_start()` can now be fully ported to C.
So let's complete the `--bisect-start` subcommand of
`git bisect--helper` so that it fully implements `bisect_start()`,
and let's use this subcommand in `git-bisect.sh` instead of
`bisect_start()`.
Note that the `eval` in the changed line of `git-bisect.sh` cannot be
dropped: it is necessary because the `rev` and the `tail`
variables may contain multiple, quoted arguments that need to be
passed to `bisect--helper` (without the quotes, naturally).
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 1bdca81641 (fast-import: add options for rewriting submodules,
2020-02-22) accidentally added two lines parsing the option
"rewrite-submodules-from". This didn't do anything in practice, because
they're in an if/else chain and so the second one can never trigger.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance (register|start)' subcommands add the current
repository to the global Git config so maintenance will operate on that
repository. It does not specify what maintenance should occur or how
often.
To make it simple for users to start background maintenance with a
recommended schedlue, update the 'maintenance.strategy' config option in
both the 'register' and 'start' subcommands. This allows users to
customize beyond the defaults using individual
'maintenance.<task>.schedule' options, but also the user can opt-out of
this strategy using 'maintenance.strategy=none'.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To provide an on-ramp for users to use background maintenance without
several 'git config' commands, create a 'maintenance.strategy' config
option. Currently, the only important value is 'incremental' which
assigns the following schedule:
* gc: never
* prefetch: hourly
* commit-graph: hourly
* loose-objects: daily
* incremental-repack: daily
These tasks are chosen to minimize disruptions to foreground Git
commands and use few compute resources.
The 'maintenance.strategy' is intended as a baseline that can be
customzied further by manually assigning 'maintenance.<task>.enabled'
and 'maintenance.<task>.schedule' config options, which will override
any recommendation from 'maintenance.strategy'. This operates similarly
to config options like 'feature.experimental' which operate as "meta"
config options that change default config values.
This presents a way forward for updating the 'incremental' strategy in
the future or adding new strategies. For example, a potential strategy
could be to include a 'full' strategy that runs the 'gc' task weekly
and no other tasks by default.
Helped-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fast-import stores its marks in a trie-like structure made of mark_set
structs. Each struct has a fixed size (1024). If our id number is too
large to fit in the struct, then we allocate a new struct which shifts
the id number by 10 bits. Our original struct becomes a child node
of this new layer, and the new struct becomes the top level of the trie.
This scheme was broken by ddddf8d7e2 (fast-import: permit reading
multiple marks files, 2020-02-22). Before then, we had a top-level
"marks" pointer, and the push-down worked by assigning the new top-level
struct to "marks". But after that commit, insert_mark() takes a pointer
to the mark_set, rather than using the global "marks". It continued to
assign to the global "marks" variable during the push down, which was
wrong for two reasons:
- we added a call in option_rewrite_submodules() which uses a separate
mark set; pushing down on "marks" is outright wrong here. We'd
corrupt the "marks" set, and we'd fail to correctly store any
submodule mappings with an id over 1024.
- the other callers passed "marks", but the push-down was still wrong.
In read_mark_file(), we take the pointer to the mark_set as a
parameter. So even though insert_mark() was updating the global
"marks", the local pointer we had in read_mark_file() was not
updated. As a result, we'd add a new level when needed, but then the
next call to insert_mark() wouldn't see it! It would then allocate a
new layer, which would also not be seen, and so on. Lookups for the
lost layers obviously wouldn't work, but before we even hit any
lookup stage, we'd generally run out of memory and die.
Our tests didn't notice either of these cases because they didn't have
enough marks to trigger the push-down behavior. The new tests in t9304
cover both cases (and fail without this patch).
We can solve the problem by having insert_mark() take a pointer-to-pointer
of the top-level of the set. Then our push down can assign to it in a
way that the caller actually sees. Note the subtle reordering in
option_rewrite_submodules(). Our call to read_mark_file() may modify our
top-level set pointer, so we have to wait until after it returns to
assign its value into the string_list.
Reported-by: Sergey Brester <serg.brester@sebres.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
deref_tag() can return NULL. Exit gracefully in that case instead
of blindly dereferencing the return value.
.name shouldn't ever be NULL, but grep_object() handles that case
explicitly, so let's be defensive here as well and show the broken
object's ID if it happens to lack a name after all.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "git worktree list" shows the absolute path to the working tree,
the commit that is checked out and the name of the branch. It is not
immediately obvious which of the worktrees, if any, are locked.
"git worktree remove" refuses to remove a locked worktree with
an error message. If "git worktree list" told which worktrees
are locked in its output, the user would not even attempt to
remove such a worktree, or would realize that
"git worktree remove -f -f <path>" is required.
Teach "git worktree list" to append "locked" to its output.
The output from the command becomes like so:
$ git worktree list
/path/to/main abc123 [master]
/path/to/worktree 456def (detached HEAD)
/path/to/locked-worktree 123abc (detached HEAD) locked
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recently, a user had an issue due to combining
fetch.writeCommitGraph=true with core.commitGraph=false. The root bug
has been resolved by preventing commit-graph writes when
core.commitGraph is disabled. This happens inside the 'git commit-graph
write' command, but we can be more aware of this situation and prevent
that process from ever starting in the 'commit-graph' maintenance task.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Doc fixes.
* ja/misc-doc-fixes:
doc: fix the bnf like style of some commands
doc: git-remote fix ups
doc: use linkgit macro where needed.
git-bisect-lk2009: make continuation of list indented
Hotfix and clean-up for the jt/threaded-index-pack topic that has
graduated to v2.29-rc0.
* jk/index-pack-hotfixes:
index-pack: make get_base_data() comment clearer
index-pack: drop type_cas mutex
index-pack: restore "resolving deltas" progress meter
In command line options, variables are entered between < and >
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The auto condition for the commit-graph maintenance task walks refs
looking for commits that are not in the commit-graph file. This was
added in 4ddc79b2 (maintenance: add auto condition for commit-graph
task, 2020-09-17) but was left untested.
The initial goal of this change was to demonstrate the feature works
properly by adding tests. However, there was an off-by-one error that
caused the basic tests around maintenance.commit-graph.auto=1 to fail
when it should work.
The subtlety is that if a ref tip is not in the commit-graph, then we
were not adding that to the total count. In the test, we see that we
have only added one commit since our last commit-graph write, so the
auto condition would say there is nothing to do.
The fix is simple: add the check for the commit-graph position to see
that the tip is not in the commit-graph file before starting our walk.
Since this happens before adding to the DFS stack, we do not need to
clear our (currently empty) commit list.
This does add some extra complexity for the test, because we also want
to verify that the walk along the parents actually does some work. This
means we need to add at least two commits in a row without writing the
commit-graph. However, we also need to make sure no additional refs are
pointing to the middle of this list or else the for_each_ref() in
should_write_commit_graph() might visit these commits as tips instead of
doing a DFS walk. Hence, the last two commits are added with "git
commit" instead of "test_commit".
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current behavior of git checkout/switch is that --guess is currently
enabled by default. However, some users may not wish for this to happen
automatically. Instead of forcing users to specify --no-guess manually
each time, teach these commands the checkout.guess configuration
variable that gives users the option to set a default behavior.
Teach the completion script to recognize the new config variable and
disable DWIM logic if it is set to false.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A comment mentions that we may free cached delta bases via
find_unresolved_deltas(), but that function went away in f08cbf60fe
(index-pack: make quantum of work smaller, 2020-09-08). Since we need to
rewrite that comment anyway, make the entire comment clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The type_cas lock lost all of its callers in f08cbf60fe (index-pack:
make quantum of work smaller, 2020-09-08), so we can safely delete it.
The compiler didn't alert us that the variable became unused, because we
still call pthread_mutex_init() and pthread_mutex_destroy() on it.
It's worth considering also whether that commit was in error to remove
the use of the lock. Why don't we need it now, if we did before, as
described in ab791dd138 (index-pack: fix race condition with duplicate
bases, 2014-08-29)? I think the answer is that we now look at and assign
the child_obj->real_type field in the main thread while holding the
work_lock(). So we don't have to worry about racing with the worker
threads.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit f08cbf60fe (index-pack: make quantum of work smaller, 2020-09-08)
refactored the main loop in threaded_second_pass(), but also deleted the
call to display_progress() at the top of the loop. This means that users
typically see no progress at all during the delta resolution phase (and
for large repositories, Git appears to hang).
This looks like an accident that was unrelated to the intended change of
that commit, since we continue to update nr_resolved_deltas in
resolve_delta(). Let's restore the call to get that progress back.
We'll also add a test that confirms we generate the expected progress.
This isn't perfect, as it wouldn't catch a bug where progress was
delayed to the end. That was probably possible to trigger when receiving
a thin pack, because we'd eventually call display_progress() from
fix_unresolved_deltas(), but only once after doing all the work.
However, since our test case generates a complete pack, it reliably
demonstrates this particular bug and its fix. And we can't do better
without making the test racy.
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git checkout -p` with a merge-base rev results in an error:
$ git checkout -p HEAD...
usage: git diff-index [-m] [--cached] [<common-diff-options>] <tree-ish> [<path>...]
common diff options:
-z output diff-raw with lines terminated with NUL.
-p output patch format.
-u synonym for -p.
--patch-with-raw
output both a patch and the diff-raw format.
--stat show diffstat instead of patch.
--numstat show numeric diffstat instead of patch.
--patch-with-stat
output a patch and prepend its diffstat.
--name-only show only names of changed files.
--name-status show names and status of changed files.
--full-index show full object name on index lines.
--abbrev=<n> abbreviate object names in diff-tree header and diff-raw.
-R swap input file pairs.
-B detect complete rewrites.
-M detect renames.
-C detect copies.
--find-copies-harder
try unchanged files as candidate for copy detection.
-l<n> limit rename attempts up to <n> paths.
-O<file> reorder diffs according to the <file>.
-S<string> find filepair whose only one side contains the string.
--pickaxe-all
show all files diff when -S is used and hit is found.
-a --text treat all files as text.
Cannot close git diff-index --cached --numstat --summary HEAD... -- () at <redacted>/libexec/git-core/git-add--interactive line 183.
This happens because checkout passes the literal argument (in the
example, `HEAD...`) to diff-index which does not recognise merge-base
revs.
Fix this by using the hex of the found commit instead of the given name.
Note that "HEAD" is handled specially in run_add_interactive() so it's
explicitly not changed.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git format-patch" learns to take "whenAble" as a possible value
for the format.useAutoBase configuration variable to become no-op
when the automatically computed base does not make sense.
* jk/format-auto-base-when-able:
format-patch: teach format.useAutoBase "whenAble" option
The lazy fetching done internally to make missing objects available
in a partial clone incorrectly made permanent damage to the partial
clone filter in the repository, which has been corrected.
* jt/keep-partial-clone-filter-upon-lazy-fetch:
fetch: do not override partial clone filter
promisor-remote: remove unused variable
Code cleanup.
* jk/unused:
dir.c: drop unused "untracked" from treat_path_fast()
sequencer: handle ignore_footer when parsing trailers
test-advise: check argument count with argc instead of argv
sparse-checkout: fill in some options boilerplate
sequencer: drop repository argument from run_git_commit()
push: drop unused repo argument to do_push()
assert PARSE_OPT_NONEG in parse-options callbacks
env--helper: write to opt->value in parseopt helper
drop unused argc parameters
convert: drop unused crlf_action from check_global_conv_flags_eol()
Update the tests to drop word 'master' from them.
* js/default-branch-name-part-2:
t9902: avoid using the branch name `master`
tests: avoid variations of the `master` branch name
t3200: avoid variations of the `master` branch name
fast-export: avoid using unnecessary language in a code comment
t/test-terminal: avoid non-inclusive language
"git shortlog" has been taught to group commits by the contents of
the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc.
* jk/shortlog-group-by-trailer:
shortlog: allow multiple groups to be specified
shortlog: parse trailer idents
shortlog: rename parse_stdin_ident()
shortlog: de-duplicate trailer values
shortlog: match commit trailers with --group
trailer: add interface for iterating over commit trailers
shortlog: add grouping option
shortlog: change "author" variables to "ident"
Rewrite of the "git bisect" script in C continues.
* mr/bisect-in-c-2:
bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C
bisect: call 'clear_commit_marks_all()' in 'bisect_next_all()'
bisect--helper: reimplement `bisect_autostart` shell function in C
bisect--helper: introduce new `write_in_file()` function
bisect--helper: use '-res' in 'cmd_bisect__helper' return
bisect--helper: BUG() in cmd_*() on invalid subcommand
"git bisect start X Y", when X and Y are not valid committish
object names, should take X and Y as pathspec, but didn't.
* cc/bisect-start-fix:
bisect: don't use invalid oid as rev when starting
"git blame --ignore-rev/--ignore-revs-file" failed to validate
their input are valid revision, and failed to take into account
that the user may want to give an annotated tag instead of a
commit, which has been corrected.
* jc/blame-ignore-fix:
blame: validate and peel the object names on the ignore list
t8013: minimum preparatory clean-up
Compilation fix around type punning.
* jk/drop-unaligned-loads:
Revert "fast-export: use local array to store anonymized oid"
bswap.h: drop unaligned loads
The previous commit added the necessary machinery to implement the
"--force-if-includes" protection, when "--force-with-lease" is used
without giving exact object the remote still ought to have. Surface
the feature by adding a command line option and a configuration
variable to enable it.
- Add a flag: "TRANSPORT_PUSH_FORCE_IF_INCLUDES" to indicate that the
new option was passed from the command line of via configuration
settings; update command line and configuration parsers to set the
new flag accordingly.
- Introduce a new configuration option "push.useForceIfIncludes", which
is equivalent to setting "--force-if-includes" in the command line.
- Update "remote-curl" to recognize and pass this option to "send-pack"
when enabled.
- Update "advise" to catch the reject reason "REJECT_REF_NEEDS_UPDATE",
set when the ref status is "REF_STATUS_REJECT_REMOTE_UPDATED" and
(optionally) print a help message when the push fails.
- The new option is a "no-op" in the following scenarios:
* When used without "--force-with-lease".
* When used with "--force-with-lease", and if the expected commit
on the remote side is specified as an argument.
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a check to verify if the remote-tracking ref of the local branch
is reachable from one of its "reflog" entries.
The check iterates through the local ref's reflog to see if there
is an entry for the remote-tracking ref and collecting any commits
that are seen, into a list; the iteration stops if an entry in the
reflog matches the remote ref or if the entry timestamp is older
the latest entry of the remote ref's "reflog". If there wasn't an
entry found for the remote ref, "in_merge_bases_many()" is called
to check if it is reachable from the list of collected commits.
When a local branch that is based on a remote ref, has been rewound
and is to be force pushed on the remote, "--force-if-includes" runs
a check that ensures any updates to the remote-tracking ref that may
have happened (by push from another repository) in-between the time
of the last update to the local branch (via "git-pull", for instance)
and right before the time of push, have been integrated locally
before allowing a forced update.
If the new option is passed without specifying "--force-with-lease",
or specified along with "--force-with-lease=<refname>:<expect>" it
is a "no-op".
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format.useAutoBase configuration option exists to allow users to
enable '--base=auto' for format-patch by default.
This can sometimes lead to poor workflow, due to unexpected failures
when attempting to format an ancient patch:
$ git format-patch -1 <an old commit>
fatal: base commit shouldn't be in revision list
This can be very confusing, as it is not necessarily immediately obvious
that the user requested a --base (since this was in the configuration,
not on the command line).
We do want --base=auto to fail when it cannot provide a suitable base,
as it would be equally confusing if a formatted patch did not include
the base information when it was requested.
Teach format.useAutoBase a new mode, "whenAble". This mode will cause
format-patch to attempt to include a base commit when it can. However,
if no valid base commit can be found, then format-patch will continue
formatting the patch without a base commit.
In order to avoid making yet another branch name unusable with --base,
do not teach --base=whenAble or --base=whenable.
Instead, refactor the base_commit option to use a callback, and rely on
the global configuration variable auto_base.
This does mean that a user cannot request this optional base commit
generation from the command line. However, this is likely not too
valuable. If the user requests base information manually, they will be
immediately informed of the failure to acquire a suitable base commit.
This allows the user to make an informed choice about whether to
continue the format.
Add tests to cover the new mode of operation for --base.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the default remote name of "origin" can be changed at clone-time
with `git clone`'s `--origin` option, it was previously not possible
to specify a default value for the name of that remote. Add support for
a new `clone.defaultRemoteName` config, with the newly-created remote
name resolved in priority order:
1. (Highest priority) A remote name passed directly to `git clone -o`
2. A `clone.defaultRemoteName=new_name` in config `git clone -c`
3. A `clone.defaultRemoteName` value set in `/path/to/template/config`,
where `--template=/path/to/template` is provided
4. A `clone.defaultRemoteName` value set in a non-template config file
5. The default value of `origin`
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future patch, the name of the remote created by `git clone` may
come from multiple sources. To avoid confusion, convert most uses of
option_origin to remote_name, leaving option_origin to exclusively
represent the -o/--origin option.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Providing a bad origin name to `git clone` currently reports an
'invalid refspec' error instead of a more explicit message explaining
that the `--origin` option was malformed. This behavior dates back to
since 8434c2f1 (Build in clone, 2008-04-27). Reintroduce
validation for the provided `--origin` option, but notably _don't_
include a multi-level check (e.g. "foo/bar") that was present in the
original `git-clone.sh`. `git remote` allows multi-level remote names
since at least 46220ca100 (remote.c: Fix overtight refspec validation,
2008-03-20), so that appears to be the desired behavior.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for a future patch, extract from remote.c a function that
validates possible remote names so that its rules can be used
consistently in other places.
Helped-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Parsing command-line options before reading from config required careful
handling to ensure CLI options were treated with higher priority. Read
config first to let parsed CLI naively overwrite matching config values.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Sean Barag <sean@barag.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both fetch and push support pattern refspecs which allow fetching or
pushing references that match a specific pattern. Because these patterns
are globs, they have somewhat limited ability to express more complex
situations.
For example, suppose you wish to fetch all branches from a remote except
for a specific one. To allow this, you must setup a set of refspecs
which match only the branches you want. Because refspecs are either
explicit name matches, or simple globs, many patterns cannot be
expressed.
Add support for a new type of refspec, referred to as "negative"
refspecs. These are prefixed with a '^' and mean "exclude any ref
matching this refspec". They can only have one "side" which always
refers to the source. During a fetch, this refers to the name of the ref
on the remote. During a push, this refers to the name of the ref on the
local side.
With negative refspecs, users can express more complex patterns. For
example:
git fetch origin refs/heads/*:refs/remotes/origin/* ^refs/heads/dontwant
will fetch all branches on origin into remotes/origin, but will exclude
fetching the branch named dontwant.
Refspecs today are commutative, meaning that order doesn't expressly
matter. Rather than forcing an implied order, negative refspecs will
always be applied last. That is, in order to match, a ref must match at
least one positive refspec, and match none of the negative refspecs.
This is similar to how negative pathspecs work.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-checkout passes along argv and argc to its sub-command helper
functions. Many of these sub-commands do not yet take any command-line
options, and ignore those parameters.
Let's instead add empty option lists and make sure we call
parse_options(). That will give a useful error message for something
like:
git sparse-checkout list --nonsense
which currently just silently ignores the unknown option.
As a bonus, it also silences some -Wunused-parameter warnings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We stopped using the "repo" argument in 8e4c8af058 (push: disallow --all
and refspecs when remote.<name>.mirror is set, 2019-09-02), which moved
the pushremote handling to its caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the spirit of 517fe807d6 (assert NOARG/NONEG behavior of
parse-options callbacks, 2018-11-05), let's cover some parse-options
callbacks which expect to be used with PARSE_OPT_NONEG but don't
explicitly assert that this is the case. These callbacks are all used
correctly in the current code, but this will help document their
expectations and future-proof the code.
As a bonus, it also silences -Wunused-parameters (these were added since
the initial sweep of 517fe807d6, and we can't yet turn on
-Wunused-parameters to remind people because it has too many existing
false positives).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use OPT_CALLBACK_F() to call the option_parse_type() callback,
passing it the address of "cmdmode" as the value to write to. But the
callback doesn't look at opt->value at all, and instead writes to a
global variable.
This works out because that's the same global variable we happen to pass
in, but it's rather confusing. Let's use the passed-in value instead.
We'll also make "cmdmode" a local variable of the main function,
ensuring we can't make the same mistake again.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many functions take an argv/argc pair, but never actually look at argc.
This makes it useless at best (we use the NULL sentinel in argv to find
the end of the array), and misleading at worst (what happens if the argc
count does not match the argv NULL?).
In each of these instances, the argv NULL does match the argc count, so
there are no bugs here. But let's tighten the interfaces to make it
harder to get wrong (and to reduce some -Wunused-parameter complaints).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier we taught "git pull" to warn when the user does not say the
histories need to be merged, rebased or accepts only fast-
forwarding, but the warning triggered for those who have set the
pull.ff configuration variable.
* ah/pull:
pull: don't warn if pull.ff has been set
"git clone" that clones from SHA-1 repository, while
GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an
unusable repository that half-claims to be SHA-256 repository
with SHA-1 objects and refs. This has been corrected.
* bc/clone-with-git-default-hash-fix:
builtin/clone: avoid failure with GIT_DEFAULT_HASH
"git commit-graph write" learned to limit the number of bloom
filters that are computed from scratch with the --max-new-filters
option.
* tb/bloom-improvements:
commit-graph: introduce 'commitGraph.maxNewFilters'
builtin/commit-graph.c: introduce '--max-new-filters=<n>'
commit-graph: rename 'split_commit_graph_opts'
bloom: encode out-of-bounds filters as non-empty
bloom/diff: properly short-circuit on max_changes
bloom: use provided 'struct bloom_filter_settings'
bloom: split 'get_bloom_filter()' in two
commit-graph.c: store maximum changed paths
commit-graph: respect 'commitGraph.readChangedPaths'
t/helper/test-read-graph.c: prepare repo settings
commit-graph: pass a 'struct repository *' in more places
t4216: use an '&&'-chain
commit-graph: introduce 'get_bloom_filter_settings()'
Get rid of 'dense' argument that is redundant for every function that has
'struct rev_info *rev' argument as well, as the value of 'dense' passed is
always taken from 'rev->dense_combined_merges' field.
The only place where this was not the case is in 'submodule.c' where
'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However,
at that call the 'revs' instance used is local to the function, and we now just
set 'revs->dense_combined_merges' to 1 in this local instance.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a fetch with the --filter argument is made, the configured default
filter is set even if one already exists. This change was made in
5e46139376 ("builtin/fetch: remove unique promisor remote limitation",
2019-06-25) - in particular, changing from:
* If this is the FIRST partial-fetch request, we enable partial
* on this repo and remember the given filter-spec as the default
* for subsequent fetches to this remote.
to:
* If this is a partial-fetch request, we enable partial on
* this repo if not already enabled and remember the given
* filter-spec as the default for subsequent fetches to this
* remote.
(The given filter-spec is "remembered" even if there is already an
existing one.)
This is problematic whenever a lazy fetch is made, because lazy fetches
are made using "git fetch --filter=blob:none", but this will also happen
if the user invokes "git fetch --filter=<filter>" manually. Therefore,
restore the behavior prior to 5e46139376, which writes a filter-spec
only if the current fetch request is the first partial-fetch one (for
that remote).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that shortlog supports reading from trailers, it can be useful to
combine counts from multiple trailers, or between trailers and authors.
This can be done manually by post-processing the output from multiple
runs, but it's non-trivial to make sure that each name/commit pair is
counted only once.
This patch teaches shortlog to accept multiple --group options on the
command line, and pull data from all of them. That makes it possible to
run:
git shortlog -ns --group=author --group=trailer:co-authored-by
to get a shortlog that counts authors and co-authors equally.
The implementation is mostly straightforward. The "group" enum becomes a
bitfield, and the trailer key becomes a list. I didn't bother
implementing the multi-group semantics for reading from stdin. It would
be possible to do, but the existing matching code makes it awkward, and
I doubt anybody cares.
The duplicate suppression we used for trailers now covers authors and
committers as well (though in non-trailer single-group mode we can skip
the hash insertion and lookup, since we only see one value per commit).
There is one subtlety: we now care about the case when no group bit is
set (in which case we default to showing the author). The caller in
builtin/log.c needs to be adapted to ask explicitly for authors, rather
than relying on shortlog_init(). It would be possible with some
gymnastics to make this keep working as-is, but it's not worth it for a
single caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Trailers don't necessarily contain name/email identity values, so
shortlog has so far treated them as opaque strings. However, since many
trailers do contain identities, it's useful to treat them as such when
they can be parsed. That lets "-e" work as usual, as well as mailmap.
When they can't be parsed, we'll continue with the old behavior of
treating them as a single string (there's no new test for that here,
since the existing tests cover a trailer like this).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is actually useful for parsing any identity, whether from
stdin or not. We'll need it for handling trailers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation is vague about what happens with
--group=trailer:signed-off-by when we see a commit with:
Signed-off-by: One
Signed-off-by: Two
Signed-off-by: One
We clearly should credit both "One" and "Two", but should "One" get
credited twice? The current code does so, but mostly because that was
the easiest thing to do. It's probably more useful to count each commit
at most once. This will become especially important when we allow
values from multiple sources in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a project uses commit trailers, this patch lets you use
shortlog to see who is performing each action. For example,
running:
git shortlog -ns --group=trailer:reviewed-by
in git.git shows who has reviewed. You can even use a custom
format to see things like who has helped whom:
git shortlog --format="...helped %an (%ad)" \
--group=trailer:helped-by
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding more grouping types, let's refactor the
committer/author grouping code and add a user-facing option that binds
them together. In particular:
- the main option is now "--group", to make it clear
that the various group types are mutually exclusive. The
"--committer" option is an alias for "--group=committer".
- we keep an enum rather than a binary flag, to prepare
for more values
- we prefer switch statements to ternary assignment, since
other group types will need more custom code
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git receive-pack" that accepts requests by "git push" learned to
outsource most of the ref updates to the new "proc-receive" hook.
* jx/proc-receive-hook:
doc: add documentation for the proc-receive hook
transport: parse report options for tracking refs
t5411: test updates of remote-tracking branches
receive-pack: new config receive.procReceiveRefs
doc: add document for capability report-status-v2
New capability "report-status-v2" for git-push
receive-pack: feed report options to post-receive
receive-pack: add new proc-receive hook
t5411: add basic test cases for proc-receive hook
transport: not report a non-head push as a branch
A "git gc"'s big brother has been introduced to take care of more
repository maintenance tasks, not limited to the object database
cleaning.
* ds/maintenance-part-1:
maintenance: add trace2 regions for task execution
maintenance: add auto condition for commit-graph task
maintenance: use pointers to check --auto
maintenance: create maintenance.<task>.enabled config
maintenance: take a lock on the objects directory
maintenance: add --task option
maintenance: add commit-graph task
maintenance: initialize task array
maintenance: replace run_auto_gc()
maintenance: add --quiet option
maintenance: create basic maintenance runner
Add new subcommands to 'git maintenance' that start or stop background
maintenance using 'cron', when available. This integration is as simple
as I could make it, barring some implementation complications.
The schedule is laid out as follows:
0 1-23 * * * $cmd maintenance run --schedule=hourly
0 0 * * 1-6 $cmd maintenance run --schedule=daily
0 0 * * 0 $cmd maintenance run --schedule=weekly
where $cmd is a properly-qualified 'git for-each-repo' execution:
$cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo
where $path points to the location of the Git executable running 'git
maintenance start'. This is critical for systems with multiple versions
of Git. Specifically, macOS has a system version at '/usr/bin/git' while
the version that users can install resides at '/usr/local/bin/git'
(symlinked to '/usr/local/libexec/git-core/git'). This will also use
your locally-built version if you build and run this in your development
environment without installing first.
This conditional schedule avoids having cron launch multiple 'git
for-each-repo' commands in parallel. Such parallel commands would likely
lead to the 'hourly' and 'daily' tasks competing over the object
database lock. This could lead to to some tasks never being run! Since
the --schedule=<frequency> argument will run all tasks with _at least_
the given frequency, the daily runs will also run the hourly tasks.
Similarly, the weekly runs will also run the daily and hourly tasks.
The GIT_TEST_CRONTAB environment variable is not intended for users to
edit, but instead as a way to mock the 'crontab [-l]' command. This
variable is set in test-lib.sh to avoid a future test from accidentally
running anything with the cron integration from modifying the user's
schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our
tests to check how the schedule is modified in 'git maintenance
(start|stop)' commands.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for launching background maintenance from the 'git
maintenance' builtin, create register/unregister subcommands. These
commands update the new 'maintenance.repos' config option in the global
config so the background maintenance job knows which repositories to
maintain.
These commands allow users to add a repository to the background
maintenance list without disrupting the actual maintenance mechanism.
For example, a user can run 'git maintenance register' when no
background maintenance is running and it will not start the background
maintenance. A later update to start running background maintenance will
then pick up this repository automatically.
The opposite example is that a user can run 'git maintenance unregister'
to remove the current repository from background maintenance without
halting maintenance for other repositories.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be helpful to store a list of repositories in global or system
config and then iterate Git commands on that list. Create a new builtin
that makes this process simple for experts. We will use this builtin to
run scheduled maintenance on all configured repositories in a future
change.
The test is very simple, but does highlight that the "--" argument is
optional.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance currently triggers when certain data-size thresholds are
met, such as number of pack-files or loose objects. Users may want to
run certain maintenance tasks based on frequency instead. For example,
a user may want to perform a 'prefetch' task every hour, or 'gc' task
every day. To help these users, update the 'git maintenance run' command
to include a '--schedule=<frequency>' option. The allowed frequencies
are 'hourly', 'daily', and 'weekly'. These values are also allowed in a
new config value 'maintenance.<task>.schedule'.
The 'git maintenance run --schedule=<frequency>' checks the '*.schedule'
config value for each enabled task to see if the configured frequency is
at least as frequent as the frequency from the '--schedule' argument. We
use the following order, for full clarity:
'hourly' > 'daily' > 'weekly'
Use new 'enum schedule_priority' to track these values numerically.
The following cron table would run the scheduled tasks with the correct
frequencies:
0 1-23 * * * git -C <repo> maintenance run --schedule=hourly
0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily
0 0 * * 0 git -C <repo> maintenance run --schedule=weekly
This cron schedule will run --schedule=hourly every hour except at
midnight. This avoids a concurrent run with the --schedule=daily that
runs at midnight every day except the first day of the week. This avoids
a concurrent run with the --schedule=weekly that runs at midnight on
the first day of the week. Since --schedule=daily also runs the
'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily'
tasks, we will still see all tasks run with the proper frequencies.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The incremental-repack task updates the multi-pack-index by deleting pack-
files that have been replaced with new packs, then repacking a batch of
small pack-files into a larger pack-file. This incremental repack is faster
than rewriting all object data, but is slower than some other
maintenance activities.
The 'maintenance.incremental-repack.auto' config option specifies how many
pack-files should exist outside of the multi-pack-index before running
the step. These pack-files could be created by 'git fetch' commands or
by the loose-objects task. The default value is 10.
Setting the option to zero disables the task with the '--auto' option,
and a negative value makes the task run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When repacking during the 'incremental-repack' task, we use the
--batch-size option in 'git multi-pack-index repack'. The initial setting
used --batch-size=0 to repack everything into a single pack-file. This is
not sustainable for a large repository. The amount of work required is
also likely to use too many system resources for a background job.
Update the 'incremental-repack' task by dynamically computing a
--batch-size option based on the current pack-file structure.
The dynamic default size is computed with this idea in mind for a client
repository that was cloned from a very large remote: there is likely one
"big" pack-file that was created at clone time. Thus, do not try
repacking it as it is likely packed efficiently by the server.
Instead, we select the second-largest pack-file, and create a batch size
that is one larger than that pack-file. If there are three or more
pack-files, then this guarantees that at least two will be combined into
a new pack-file.
Of course, this means that the second-largest pack-file size is likely
to grow over time and may eventually surpass the initially-cloned
pack-file. Recall that the pack-file batch is selected in a greedy
manner: the packs are considered from oldest to newest and are selected
if they have size smaller than the batch size until the total selected
size is larger than the batch size. Thus, that oldest "clone" pack will
be first to repack after the new data creates a pack larger than that.
We also want to place some limits on how large these pack-files become,
in order to bound the amount of time spent repacking. A maximum
batch-size of two gigabytes means that large repositories will never be
packed into a single pack-file using this job, but also that repack is
rather expensive. This is a trade-off that is valuable to have if the
maintenance is being run automatically or in the background. Users who
truly want to optimize for space and performance (and are willing to pay
the upfront cost of a full repack) can use the 'gc' task to do so.
Create a test for this two gigabyte limit by creating an EXPENSIVE test
that generates two pack-files of roughly 2.5 gigabytes in size, then
performs an incremental repack. Check that the --batch-size argument in
the subcommand uses the hard-coded maximum.
Helped-by: Chris Torek <chris.torek@gmail.com>
Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change cleaned up loose objects using the
'loose-objects' that can be run safely in the background. Add a
similar job that performs similar cleanups for pack-files.
One issue with running 'git repack' is that it is designed to
repack all pack-files into a single pack-file. While this is the
most space-efficient way to store object data, it is not time or
memory efficient. This becomes extremely important if the repo is
so large that a user struggles to store two copies of the pack on
their disk.
Instead, perform an "incremental" repack by collecting a few small
pack-files into a new pack-file. The multi-pack-index facilitates
this process ever since 'git multi-pack-index expire' was added in
19575c7 (multi-pack-index: implement 'expire' subcommand,
2019-06-10) and 'git multi-pack-index repack' was added in ce1e4a1
(midx: implement midx_repack(), 2019-06-10).
The 'incremental-repack' task runs the following steps:
1. 'git multi-pack-index write' creates a multi-pack-index file if
one did not exist, and otherwise will update the multi-pack-index
with any new pack-files that appeared since the last write. This
is particularly relevant with the background fetch job.
When the multi-pack-index sees two copies of the same object, it
stores the offset data into the newer pack-file. This means that
some old pack-files could become "unreferenced" which I will use
to mean "a pack-file that is in the pack-file list of the
multi-pack-index but none of the objects in the multi-pack-index
reference a location inside that pack-file."
2. 'git multi-pack-index expire' deletes any unreferenced pack-files
and updaes the multi-pack-index to drop those pack-files from the
list. This is safe to do as concurrent Git processes will see the
multi-pack-index and not open those packs when looking for object
contents. (Similar to the 'loose-objects' job, there are some Git
commands that open pack-files regardless of the multi-pack-index,
but they are rarely used. Further, a user that self-selects to
use background operations would likely refrain from using those
commands.)
3. 'git multi-pack-index repack --bacth-size=<size>' collects a set
of pack-files that are listed in the multi-pack-index and creates
a new pack-file containing the objects whose offsets are listed
by the multi-pack-index to be in those objects. The set of pack-
files is selected greedily by sorting the pack-files by modified
time and adding a pack-file to the set if its "expected size" is
smaller than the batch size until the total expected size of the
selected pack-files is at least the batch size. The "expected
size" is calculated by taking the size of the pack-file divided
by the number of objects in the pack-file and multiplied by the
number of objects from the multi-pack-index with offset in that
pack-file. The expected size approximates how much data from that
pack-file will contribute to the resulting pack-file size. The
intention is that the resulting pack-file will be close in size
to the provided batch size.
The next run of the incremental-repack task will delete these
repacked pack-files during the 'expire' step.
In this version, the batch size is set to "0" which ignores the
size restrictions when selecting the pack-files. It instead
selects all pack-files and repacks all packed objects into a
single pack-file. This will be updated in the next change, but
it requires doing some calculations that are better isolated to
a separate change.
These steps are based on a similar background maintenance step in
Scalar (and VFS for Git) [1]. This was incredibly effective for
users of the Windows OS repository. After using the same VFS for Git
repository for over a year, some users had _thousands_ of pack-files
that combined to up to 250 GB of data. We noticed a few users were
running into the open file descriptor limits (due in part to a bug
in the multi-pack-index fixed by af96fe3 (midx: add packs to
packed_git linked list, 2019-04-29).
These pack-files were mostly small since they contained the commits
and trees that were pushed to the origin in a given hour. The GVFS
protocol includes a "prefetch" step that asks for pre-computed pack-
files containing commits and trees by timestamp. These pack-files
were grouped into "daily" pack-files once a day for up to 30 days.
If a user did not request prefetch packs for over 30 days, then they
would get the entire history of commits and trees in a new, large
pack-file. This led to a large number of pack-files that had poor
delta compression.
By running this pack-file maintenance step once per day, these repos
with thousands of packs spanning 200+ GB dropped to dozens of pack-
files spanning 30-50 GB. This was done all without removing objects
from the system and using a constant batch size of two gigabytes.
Once the work was done to reduce the pack-files to small sizes, the
batch size of two gigabytes means that not every run triggers a
repack operation, so the following run will not expire a pack-file.
This has kept these repos in a "clean" state.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.
The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.
Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:
1. Run 'git prune-packed' to delete any loose objects that exist
in a pack-file. Concurrent commands will prefer the packed
version of the object to the loose version. (Of course, there
are exceptions for commands that specifically care about the
location of an object. These are rare for a user to run on
purpose, and we hope a user that has selected background
maintenance will not be trying to do foreground maintenance.)
2. Run 'git pack-objects' on a batch of loose objects. These
objects are grouped by scanning the loose object directories in
lexicographic order until listing all loose objects -or-
reaching 50,000 objects. This is more than enough if the loose
objects are created only by a user doing normal development.
We noticed users with _millions_ of loose objects because VFS
for Git downloads blobs on-demand when a file read operation
requires populating a virtual file.
This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.
Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.
The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.
However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.
When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:
1. --no-tags prevents getting a new tag when a user wants to see
the new tags appear in their foreground fetches.
2. --refmap= removes the configured refspec which usually updates
refs/remotes/<remote>/* with the refs advertised by the remote.
While this looks confusing, this was documented and tested by
b40a50264a (fetch: document and test --refmap="", 2020-01-21),
including this sentence in the documentation:
Providing an empty `<refspec>` to the `--refmap` option
causes Git to ignore the configured refspecs and rely
entirely on the refspecs supplied as command-line arguments.
3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
we can ensure that we actually load the new values somewhere in
our refspace while not updating refs/heads or refs/remotes. By
storing these refs here, the commit-graph job will update the
commit-graph with the commits from these hidden refs.
4. --prune will delete the refs/prefetch/<remote> refs that no
longer appear on the remote.
5. --no-write-fetch-head prevents updating FETCH_HEAD.
We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already match "committer", and we're about to start
matching more things. Let's use a more neutral variable to
avoid confusion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:
- rev=$(git rev-parse -q --verify "$arg^{commit}") || {
- test $has_double_dash -eq 1 &&
- die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
- break
- }
- revs="$revs $rev"
into:
+ char *commit_id = xstrfmt("%s^{commit}", arg);
+ if (get_oid(commit_id, &oid) && has_double_dash)
+ die(_("'%s' does not appear to be a valid "
+ "revision"), arg);
+
+ string_list_append(&revs, oid_to_hex(&oid));
+ free(commit_id);
In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.
In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.
Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction (which is wrong).
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user who understands enough to set pull.ff does not need additional
instructions.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).
Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit f39ad38410.
That commit was trying to silence a type-punning warning on older
versions of gcc. However, its analysis was all wrong. I didn't notice
that we _were_ in fact type-punning because there are two versions of
put_be32(): one that uses casts and unaligned loads, and another that
uses bitshifts. I looked at the latter, but on my platform we were
defaulting to the former.
However, as of the previous commit, we'll always use the bitshift
version. So we can drop this hackery to avoid the warning, making the
code slightly cleaner.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions
in C and add the subcommands to `git bisect--helper` to call them from
git-bisect.sh .
bisect_auto_next() function returns an enum bisect_error type as whole
`git bisect` can exit with an error code when bisect_next() does.
Return an error when `bisect_next()` fails, that fix a bug on shell script
version.
Using `--bisect-next` and `--bisect-auto-next` subcommands is a
temporary measure to port shell function to C so as to use the existing
test suite. As more functions are ported, `--bisect-auto-next`
subcommand will be retired and will be called by some other methods.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reimplement the `bisect_autostart()` shell function in C and add the
C implementation from `bisect_next()` which was previously left
uncovered.
Add `--bisect-autostart` subcommand to be called from git-bisect.sh.
Using `--bisect-autostart` subcommand is a temporary measure to port
the shell function to C so as to use the existing test suite. As more
functions are ported, this subcommand will be retired and
bisect_autostart() will be called directly by `bisect_state()`.
Change behavior of shell script that returned success when user aborted
the bisection.
Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch --all --ipv4/--ipv6" forgot to pass the protocol options
to instances of the "git fetch" that talk to individual remotes,
which has been corrected.
* ar/fetch-ipversion-in-all:
fetch: pass --ipv4 and --ipv6 options to sub-fetches
"git remote set-head" that failed still said something that hints
the operation went through, which was misleading.
* cs/don-t-pretend-a-failed-remote-set-head-succeeded:
remote: don't show success message when set-head fails
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.
* al/ref-filter-merged-and-no-merged:
Doc: prefer more specific file name
ref-filter: make internal reachable-filter API more precise
ref-filter: allow merged and no-merged filters
Doc: cover multiple contains/no-contains filters
t3201: test multiple branch filter combinations
The 'meld' backend of the "git mergetool" learned to give the
underlying 'meld' the '--auto-merge' option, which would help
reduce the amount of text that requires manual merging.
* ls/mergetool-meld-auto-merge:
mergetool: allow auto-merge for meld to follow the vim-diff behavior
"git index-pack" learned to resolve deltified objects with greater
parallelism.
* jt/threaded-index-pack:
index-pack: make quantum of work smaller
index-pack: make resolve_delta() assume base data
index-pack: calculate {ref,ofs}_{first,last} early
index-pack: remove redundant child field
index-pack: unify threaded and unthreaded code
index-pack: remove redundant parameter
Documentation: deltaBaseCacheLimit is per-thread
"format-patch --range-diff=<prev> <origin>..HEAD" has been taught
not to ignore <origin> when <prev> is a single version.
* es/format-patch-interdiff-cleanup:
format-patch: use 'origin' as start of current-series-range when known
diff-lib: tighten show_interdiff()'s interface
diff: move show_interdiff() from its own file to diff-lib
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256". This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).
This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using. We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.
We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be. And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.
Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1. This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.
Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In an ongoing effort to avoid non-inclusive language, let's avoid using
the branch name "master" in a code comment.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.
Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is currently no easy way to take the diff between the working tree
or index and the merge base between an arbitrary commit and HEAD. Even
diff's `...` notation doesn't allow this because it only works between
commits. However, the ability to do this would be desirable to a user
who would like to see all the changes they've made on a branch plus
uncommitted changes without taking into account changes made in the
upstream branch.
Teach diff-index and diff (with one commit) the --merge-base option
which allows a user to use the merge base of a commit and HEAD as the
"before" side.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will teach run_diff_index() to accept more
options via flag bits. For now, change `cached` into a flag in the
`option` bitfield. The behaviour should remain exactly the same.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unlike "git config --local", "git config --worktree" did not fail
early and cleanly when started outside a git repository.
* mt/config-fail-nongit-early:
config: complain about --worktree outside of a git repo
"git status --short" quoted a path with SP in it when tracked, but
not those that are untracked, ignored or unmerged. They are all
shown quoted consistently.
* jc/quote-path-cleanup:
quote: turn 'nodq' parameter into a set of flags
quote: rename misnamed sq_lookup[] to cq_lookup[]
wt-status: consistently quote paths in "status --short" output
quote_path: code clarification
quote_path: optionally allow quoting a path with SP in it
quote_path: give flags parameter to quote_path()
quote_path: rename quote_path_relative() to quote_path()
"git worktree add" learns that the "-d" is a synonym to "--detach"
option to create a new worktree without being on a branch.
* es/wt-add-detach:
git-worktree.txt: discuss branch-based vs. throwaway worktrees
worktree: teach `add` to recognize -d as shorthand for --detach
git-checkout.txt: document -d short option for --detach
The "add -i/-p" machinery has been written in C but it is not used
by default yet. It is made default to those who are participating
in feature.experimental experiment.
* jc/add-i-use-builtin-experimental:
add -i: use the built-in version when feature.experimental is set
Misc cleanups.
* rs/misc-cleanups:
pack-bitmap-write: use hashwrite_be32() in write_hash_cache()
midx: use hashwrite_u8() in write_midx_header()
fast-import: use write_pack_header()
Introduce a configuration variable to specify a default value for the
recently-introduce '--max-new-filters' option of 'git commit-graph
write'.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a command-line flag to specify the maximum number of new Bloom
filters that a 'git commit-graph write' is willing to compute from
scratch.
Prior to this patch, a commit-graph write with '--changed-paths' would
compute Bloom filters for all selected commits which haven't already
been computed (i.e., by a previous commit-graph write with '--split'
such that a roll-up or replacement is performed).
This behavior can cause prohibitively-long commit-graph writes for a
variety of reasons:
* There may be lots of filters whose diffs take a long time to
generate (for example, they have close to the maximum number of
changes, diffing itself takes a long time, etc).
* Old-style commit-graphs (which encode filters with too many entries
as not having been computed at all) cause us to waste time
recomputing filters that appear to have not been computed only to
discover that they are too-large.
This can make the upper-bound of the time it takes for 'git commit-graph
write --changed-paths' to be rather unpredictable.
To make this command behave more predictably, introduce
'--max-new-filters=<n>' to allow computing at most '<n>' Bloom filters
from scratch. This lets "computing" already-known filters proceed
quickly, while bounding the number of slow tasks that Git is willing to
do.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the subsequent commit, additional options will be added to the
commit-graph API which have nothing to do with splitting.
Rename the 'split_commit_graph_opts' structure to the more-generic
'commit_graph_opts' to encompass both. Likewise, rename the 'flags'
member to instead be 'split_flags' to clarify that it only has to do
with the behavior implied by '--split'.
Suggested-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suppress the message 'origin/HEAD set to master' in case of an error.
$ git remote set-head origin -a
error: Not a valid ref: refs/remotes/origin/master
origin/HEAD set to master
Signed-off-by: Christian Schlack <christian@backhub.co>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing a new commit-graph in every 'git maintenance run
--auto' process (when maintenance.commit-graph.enalbed is configured to
be true), only write when there are "enough" commits not in a
commit-graph file.
This count is controlled by the maintenance.commit-graph.auto config
option.
To compute the count, use a depth-first search starting at each ref, and
leaving markers using the SEEN flag. If this count reaches the limit,
then terminate early and start the task. Otherwise, this operation will
peel every ref and parse the commit it points to. If these are all in
the commit-graph, then this is typically a very fast operation. Users
with many refs might feel a slow-down, and hence could consider updating
their limit to be very small. A negative value will force the step to
run every time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git maintenance run' command has an '--auto' option. This is used
by other Git commands such as 'git commit' or 'git fetch' to check if
maintenance should be run after adding data to the repository.
Previously, this --auto option was only used to add the argument to the
'git gc' command as part of the 'gc' task. We will be expanding the
other tasks to perform a check to see if they should do work as part of
the --auto flag, when they are enabled by config.
First, update the 'gc' task to perform the auto check inside the
maintenance process. This prevents running an extra 'git gc --auto'
command when not needed. It also shows a model for other tasks.
Second, use the 'auto_condition' function pointer as a signal for
whether we enable the maintenance task under '--auto'. For instance, we
do not want to enable the 'fetch' task in '--auto' mode, so that
function pointer will remain NULL.
Now that we are not automatically calling 'git gc', a test in
t5514-fetch-multiple.sh must be changed to watch for 'git maintenance'
instead.
We continue to pass the '--auto' option to the 'git gc' command when
necessary, because of the gc.autoDetach config option changes behavior.
Likely, we will want to absorb the daemonizing behavior implied by
gc.autoDetach as a maintenance.autoDetach config option.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, a normal run of "git maintenance run" will only run the 'gc'
task, as it is the only one enabled. This is mostly for backwards-
compatible reasons since "git maintenance run --auto" commands replaced
previous "git gc --auto" commands after some Git processes. Users could
manually run specific maintenance tasks by calling "git maintenance run
--task=<task>" directly.
Allow users to customize which steps are run automatically using config.
The 'maintenance.<task>.enabled' option then can turn on these other
tasks (or turn off the 'gc' task).
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performing maintenance on a Git repository involves writing data to the
.git directory, which is not safe to do with multiple writers attempting
the same operation. Ensure that only one 'git maintenance' process is
running at a time by holding a file-based lock. Simply the presence of
the .git/maintenance.lock file will prevent future maintenance. This
lock is never committed, since it does not represent meaningful data.
Instead, it is only a placeholder.
If the lock file already exists, then no maintenance tasks are
attempted. This will become very important later when we implement the
'prefetch' task, as this is our stop-gap from creating a recursive process
loop between 'git fetch' and 'git maintenance run --auto'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A user may want to only run certain maintenance tasks in a certain
order. Add the --task=<task> option, which allows a user to specify an
ordered list of tasks to run. These cannot be run multiple times,
however.
Here is where our array of maintenance_task pointers becomes critical.
We can sort the array of pointers based on the task order, but we do not
want to move the struct data itself in order to preserve the hashmap
references. We use the hashmap to match the --task=<task> arguments into
the task struct data.
Keep in mind that the 'enabled' member of the maintenance_task struct is
a placeholder for a future 'maintenance.<task>.enabled' config option.
Thus, we use the 'enabled' member to specify which tasks are run when
the user does not specify any --task=<task> arguments. The 'enabled'
member should be ignored if --task=<task> appears.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first new task in the 'git maintenance' builtin is the
'commit-graph' task. This updates the commit-graph file
incrementally with the command
git commit-graph write --reachable --split
By writing an incremental commit-graph file using the "--split"
option we minimize the disruption from this operation. The default
behavior is to merge layers until the new "top" layer is less than
half the size of the layer below. This provides quick writes most
of the time, with the longer writes following a power law
distribution.
Most importantly, concurrent Git processes only look at the
commit-graph-chain file for a very short amount of time, so they
will verly likely not be holding a handle to the file when we try
to replace it. (This only matters on Windows.)
If a concurrent process reads the old commit-graph-chain file, but
our job expires some of the .graph files before they can be read,
then those processes will see a warning message (but not fail).
This could be avoided by a future update to use the --expire-time
argument when writing the commit-graph.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of implementing multiple maintenance tasks inside the
'maintenance' builtin, use a list of structs to describe the work to be
done.
The struct maintenance_task stores the name of the task (as given by a
future command-line argument) along with a function pointer to its
implementation and a boolean for whether the step is enabled.
A list these structs are initialized with the full list of implemented
tasks along with a default order. For now, this list only contains the
"gc" task. This task is also the only task enabled by default.
The run subcommand will return a nonzero exit code if any task fails.
However, it will attempt all tasks in its loop before returning with the
failure. Also each failed task will print an error message.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The run_auto_gc() method is used in several places to trigger a check
for repo maintenance after some Git commands, such as 'git commit' or
'git fetch'.
To allow for extra customization of this maintenance activity, replace
the 'git gc --auto [--quiet]' call with one to 'git maintenance run
--auto [--quiet]'. As we extend the maintenance builtin with other
steps, users will be able to select different maintenance activities.
Rename run_auto_gc() to run_auto_maintenance() to be clearer what is
happening on this call, and to expose all callers in the current diff.
Rewrite the method to use a struct child_process to simplify the calls
slightly.
Since 'git fetch' already allows disabling the 'git gc --auto'
subprocess, add an equivalent option with a different name to be more
descriptive of the new behavior: '--[no-]maintenance'. Update the
documentation to include these options at the same time.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Maintenance activities are commonly used as steps in larger scripts.
Providing a '--quiet' option allows those scripts to be less noisy when
run on a terminal window. Turn this mode on by default when stderr is
not a terminal.
Pipe the option to the 'git gc' child process.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'gc' builtin is our current entrypoint for automatically maintaining
a repository. This one tool does many operations, such as repacking the
repository, packing refs, and rewriting the commit-graph file. The name
implies it performs "garbage collection" which means several different
things, and some users may not want to use this operation that rewrites
the entire object database.
Create a new 'maintenance' builtin that will become a more general-
purpose command. To start, it will only support the 'run' subcommand,
but will later expand to add subcommands for scheduling maintenance in
the background.
For now, the 'maintenance' builtin is a thin shim over the 'gc' builtin.
In fact, the only option is the '--auto' toggle, which is handed
directly to the 'gc' builtin. The current change is isolated to this
simple operation to prevent more interesting logic from being lost in
all of the boilerplate of adding a new builtin.
Use existing builtin/gc.c file because we want to share code between the
two builtins. It is possible that we will have 'maintenance' replace the
'gc' builtin entirely at some point, leaving 'git gc' as an alias for
some specific arguments to 'git maintenance run'.
Create a new test_subcommand helper that allows us to test if a certain
subcommand was run. It requires storing the GIT_TRACE2_EVENT logs in a
file. A negation mode is available that will be used in later tests.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the mergetool used with "meld" backend behave similarly to "vimdiff" by
telling it to auto-merge non-conflicting parts and highlight the conflicting
parts when `mergetool.meld.useAutoMerge` is configured with `true`, or `auto`
for detecting the `--auto-merge` option automatically.
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Lin Sun <lin.sun@zoom.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable ref-filter to process multiple merged and no-merged filters, and
extend functionality to git branch, git tag and git for-each-ref. This
provides an easy way to check for branches that are "graduation
candidates:"
$ git branch --no-merged master --merged next
If passed more than one merged (or more than one no-merged) filter, refs
must be reachable from any one of the merged commits, and reachable from
none of the no-merged commits.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The options indicate user intent for the whole fetch operation, and
ignoring them in sub-fetches (i.e. "--all" and recursive fetching of
submodules) is quite unexpected when, for instance, it is intended
to limit all of the communication to a specific transport protocol
for some reason.
Signed-off-by: Alex Riesen <alexander.riesen@cetitec.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The quote_path() function computes a path (relative to its base
directory) and c-quotes the result if necessary. Teach it to take a
flags parameter to allow its behaviour to be enriched later.
No behaviour change intended.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no quote_path_absolute() or anything that causes confusion,
and one of the two large consumers already rename the long name
locally with a preprocessor macro.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git status" has trouble showing where it came from by interpreting
reflog entries that recordcertain events, e.g. "checkout @{u}", and
gives a hard/fatal error. Even though it inherently is impossible
to give a correct answer because the reflog entries lose some
information (e.g. "@{u}" does not record what branch the user was
on hence which branch 'the upstream' needs to be computed, and even
if the record were available, the relationship between branches may
have changed), at least hide the error to allow "status" show its
output.
* jt/interpret-branch-name-fallback:
wt-status: tolerate dangling marks
refs: move dwim_ref() to header file
sha1-name: replace unsigned int with option struct
Fixups to a topic in 'next'.
* ss/submodule-summary-in-c-fixes:
t7421: eliminate 'grep' check in t7421.4 for mingw compatibility
submodule: fix style in function definition
submodule: eliminate unused parameters from print_submodule_summary()
"git worktree" gained a "repair" subcommand to help users recover
after moving the worktrees or repository manually without telling
Git. Also, "git init --separate-git-dir" no longer corrupts
administrative data related to linked worktrees.
* es/worktree-repair:
init: make --separate-git-dir work from within linked worktree
init: teach --separate-git-dir to repair linked worktrees
worktree: teach "repair" to fix outgoing links to worktrees
worktree: teach "repair" to fix worktree back-links to main worktree
worktree: add skeleton "repair" command
When a packfile is removed by "git repack", multi-pack-index gets
cleared; the code was taught to do so less aggressively by first
checking if the midx actually refers to a pack that no longer
exists.
* tb/repack-clearing-midx:
midx: traverse the local MIDX first
builtin/repack.c: invalidate MIDX only when necessary
Yet another subcommand of "git submodule" is getting rewritten in C.
* ss/submodule-summary-in-c:
submodule: port submodule subcommand 'summary' from shell to C
t7421: introduce a test script for verifying 'summary' output
submodule: rename helper functions to avoid ambiguity
submodule: remove extra line feeds between callback struct and macro
In a future commit, some commit-graph internals will want access to
'r->settings', but we only have the 'struct object_directory *'
corresponding to that repository.
Add an additional parameter to pass the repository around in more
places.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git config --worktree` outside of a git repository hits a BUG()
when trying to enumerate the worktrees. Let's catch this error earlier
and die() with a friendlier message.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.
This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.
This patch contains a solution to that. When cloning using
git -c core.deltabasecachelimit=1g clone \
https://fuchsia.googlesource.com/third_party/vulkan-cts
on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).
The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.
The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When formatting a patch series over `origin..HEAD`, one would expect
that range to be used as the current-series-range when computing a
range-diff between the previous and current versions of a patch series.
However, infer_range_diff_ranges() ignores `origin..HEAD` when
--range-diff=<prev> specifies a single revision rather than a range, and
instead unexpectedly computes the current-series-range based upon
<prev>. Address this anomaly by unconditionally using `origin..HEAD` as
the current-series-range regardless of <prev> as long as `origin` is
known, and only fall back to basing current-series-range on <prev> when
`origin` is not known.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To compute and show an interdiff, show_interdiff() needs only the two
OID's to compare and a diffopts, yet it expects callers to supply an
entire rev_info. The demand for rev_info is not only overkill, but also
places unnecessary burden on potential future callers which might not
otherwise have a rev_info at hand. Address this by tightening its
signature to require only the items it needs instead of a full rev_info.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
show_interdiff() is a relatively small function and not likely to grow
larger or more complicated. Rather than dedicating an entire source file
to it, relocate it to diff-lib.c which houses other "take two things and
compare them" functions meant to be re-used but not so low-level as to
reside in the core diff implementation.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have had parallel implementations of "add -i/-p" since 2.25 and
have been using them from various codepaths since 2.26 days, but
never made the built-in version the default.
We have found and fixed a handful of corner case bugs in the
built-in version, and it may be a good time to start switching over
the user base from the scripted version to the built-in version.
Let's enable the built-in version for those who opt into the
feature.experimental guinea-pig program to give wider exposure.
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like `git switch` and `git checkout`, `git worktree add` can check out a
branch or set up a detached HEAD. However, unlike those other commands,
`git worktree add` does not understand -d as shorthand for --detach,
which may confound users accustomed to using -d for this purpose.
Address this shortcoming by teaching `add` to recognize -d for --detach,
thus bringing it in line with the other commands.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call write_pack_header() to hash and write a pack header instead of
open-coding this function. This gets rid of duplicate code and of the
magic version number 2 -- which has been used here since c90be46abd
(Changed fast-import's pack header creation to use pack.h, 2006-08-16)
and in pack.h (again) since 29f049a0c2 (Revert "move pack creation to
version 3", 2006-10-14).
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for building a refspec using printf-style formatting. It
frees callers from managing their own buffer. Use it throughout the
tree to shorten and simplify its callers.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
map_refspec() either returns the passed in ref string or a detached
strbuf. This makes it hard for callers to release the possibly
allocated memory, and set_refspecs() consequently leaks it.
Let map_refspec() append any refspecs directly and release its own
strbufs after use. Rename it to refspec_append_mapped() and don't
return anything to reflect its increased responsibility.
set_refspecs() also leaks its strbufs. Do the same here and directly
call refspec_append() in each if branch instead of holding onto a
detached strbuf, then dispose of the allocated memory after use. We
need to add an else branch for the final call because all the other
conditional branches already add their formatted refspec now.
setup_push_upstream() and setup_push_current() forgot to release their
strbufs as well; plug these leaks, too, while at it.
None of these leaks were likely to impact users, because the number
and sizes of refspecs are usually small and the allocations are only
done once per program run. Clean them up nevertheless, as another
step on the long road towards zero memory leaks.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching recursively with submodules, for each ref in the
superproject, we call check_for_new_submodule_commits() which collects all
the objects that have to be checked for submodule changes on
calculate_changed_submodule_paths(). On the first call, it also collects all
the existing refs for excluding them from the scan.
calculate_changed_submodule_paths() creates an argument array with all the
collected new objects, followed by --not and all the old objects. This argv
is passed to setup_revisions, which parses each argument, converts it back
to an oid and resolves the object. The parsing itself also does redundant
work, because it is treated like user input, while in fact it is a full
oid. So it needlessly attempts to look it up as ref (checks if it has ^, ~
etc.), checks if it is a file name etc.
For a repository with many refs, all of this is expensive. But if the fetch
in the superproject did not update the ref (i.e. the objects that are
required to exist in the submodule did not change), there is no need to
include it in the list.
Before commit be76c212 (fetch: ensure submodule objects fetched,
2018-12-06), submodule reference changes were only detected for refs that
were changed, but not for new refs. This commit covered also this case, but
what it did was to just include every ref.
This change should reduce the number of scanned refs by about half (except
the case of a no-op fetch, which will not scan any ref), because all the
existing refs will still be listed after --not.
The regression was reported here:
https://public-inbox.org/git/CAGHpTBKSUJzFSWc=uznSu2zB33qCSmKXM-
iAjxRCpqNK5bnhRg@mail.gmail.com/
Signed-off-by: Orgad Shaneh <orgads@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Updates to on-demand fetching code in lazily cloned repositories.
* jt/lazy-fetch:
fetch: no FETCH_HEAD display if --no-write-fetch-head
fetch-pack: remove no_dependents code
promisor-remote: lazy-fetch objects in subprocess
fetch-pack: do not lazy-fetch during ref iteration
fetch: only populate existing_refs if needed
fetch: avoid reading submodule config until needed
fetch: allow refspecs specified through stdin
negotiator/noop: add noop fetch negotiator
A handful of places in in-tree code still relied on being able to
execute the git subcommands, especially built-ins, in "git-foo"
form, which have been corrected.
* jc/undash-in-tree-git-callers:
credential-cache: use child_process.args
cvsexportcommit: do not run git programs in dashed form
transport-helper: do not run git-remote-ext etc. in dashed form
Trim an unused binary and turn a bunch of commands into built-in.
* jk/slimmed-down:
drop vcs-svn experiment
make git-fast-import a builtin
make git-bugreport a builtin
make credential helpers builtins
Makefile: drop builtins from MSVC pdb list
"git rebase -i" learns a bit more options.
* pw/rebase-i-more-options:
t3436: do not run git-merge-recursive in dashed form
rebase: add --reset-author-date
rebase -i: support --ignore-date
rebase -i: support --committer-date-is-author-date
am: stop exporting GIT_COMMITTER_DATE
rebase -i: add --ignore-whitespace flag
When a user checks out the upstream branch of HEAD, the upstream branch
not being a local branch, and then runs "git status", like this:
git clone $URL client
cd client
git checkout @{u}
git status
no status is printed, but instead an error message:
fatal: HEAD does not point to a branch
(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)
This is because "git status" reads the reflog when determining the "HEAD
detached" message, and thus attempts to DWIM "@{u}", but that doesn't
work because HEAD no longer points to a branch.
Therefore, when calculating the status of a worktree, tolerate dangling
marks. This is done by adding an additional parameter to
dwim_ref() and repo_dwim_ref().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
887952b8c6 ("fetch: optionally allow disabling FETCH_HEAD update",
2020-08-18) introduced the ability to disable writing to FETCH_HEAD
during fetch, but did not suppress the "<source> -> FETCH_HEAD" message
when this ability is used. This message is misleading in this case,
because FETCH_HEAD is not written. Also, because "fetch" is used to
lazy-fetch missing objects in a partial clone, this significantly
clutters up the output in that case since the objects to be fetched are
potentially numerous.
Therefore, suppress this message when --no-write-fetch-head is passed
(but not when --dry-run is set).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git restore/checkout --no-overlay" with wildcarded pathspec
mistakenly removed matching paths in subdirectories, which has been
corrected.
* rs/checkout-no-overlay-pathspec-fix:
checkout, restore: make pathspec recursive
Accesses to two pseudorefs have been updated to properly use ref
API.
* hn/refs-pseudorefs:
sequencer: treat REVERT_HEAD as a pseudo ref
builtin/commit: suggest update-ref for pseudoref removal
sequencer: treat CHERRY_PICK_HEAD as a pseudo ref
refs: make refs_ref_exists public
Long ago, we decided to use 3 threads by default when running the
index-pack task in parallel, which has been adjusted a bit upwards.
* jk/index-pack-w-more-threads:
index-pack: adjust default threading cap
p5302: count up to online-cpus for thread tests
p5302: disable thread-count parameter tests by default
The intention of `git init --separate-work-dir=<path>` is to move the
.git/ directory to a location outside of the main worktree. When used
within a linked worktree, however, rather than moving the .git/
directory as intended, it instead incorrectly moves the worktree's
.git/worktrees/<id> directory to <path>, thus disconnecting the linked
worktree from its parent repository and breaking the worktree in the
process since its local .git file no longer points at a location at
which it can find the object database. Fix this broken behavior.
An intentional side-effect of this change is that it also closes a
loophole not caught by ccf236a23a (init: disallow --separate-git-dir
with bare repository, 2020-08-09) in which the check to prevent
--separate-git-dir being used in conjunction with a bare repository was
unable to detect the invalid combination when invoked from within a
linked worktree. Therefore, add a test to verify that this loophole is
closed, as well.
Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A linked worktree's .git file is a "gitfile" pointing at the
.git/worktrees/<id> directory within the repository. When `git init
--separate-git-dir=<path>` is used on an existing repository to relocate
the repository's .git/ directory to a different location, it neglects to
update the .git files of linked worktrees, thus breaking the worktrees
by making it impossible for them to locate the repository. Fix this by
teaching --separate-git-dir to repair the .git file of each linked
worktree to point at the new repository location.
Reported-by: Henré Botha <henrebotha@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .git/worktrees/<id>/gitdir file points at the location of a linked
worktree's .git file. Its content must be of the form
/path/to/worktree/.git (from which the location of the worktree itself
can be derived by stripping the "/.git" suffix). If the gitdir file is
deleted or becomes corrupted or outdated, then Git will be unable to
find the linked worktree. An easy way for the gitdir file to become
outdated is for the user to move the worktree manually (without using
"git worktree move"). Although it is possible to manually update the
gitdir file to reflect the new linked worktree location, doing so
requires a level of knowledge about worktree internals beyond what a
user should be expected to know offhand.
Therefore, teach "git worktree repair" how to repair broken or outdated
.git/worktrees/<id>/gitdir files automatically. (For this to work, the
command must either be invoked from within the worktree whose gitdir
file requires repair, or from within the main or any linked worktree by
providing the path of the broken worktree as an argument to "git
worktree repair".)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The .git file in a linked worktree is a "gitfile" which points back to
the .git/worktrees/<id> entry in the main worktree or bare repository.
If a worktree's .git file is deleted or becomes corrupted or outdated,
then the linked worktree won't know how to find the repository or any of
its own administrative files (such as 'index', 'HEAD', etc.). An easy
way for the .git file to become outdated is for the user to move the
main worktree or bare repository. Although it is possible to manually
update each linked worktree's .git file to reflect the new repository
location, doing so requires a level of knowledge about worktree
internals beyond what a user should be expected to know offhand.
Therefore, teach "git worktree repair" how to repair broken or outdated
worktree .git files automatically. (For this to work, the command must
be invoked from within the main worktree or bare repository, or from
within a worktree which has not become disconnected from the repository
-- such as one which was created after the repository was moved.)
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's refactor code adding a new `write_in_file()` function
that opens a file for writing a message and closes it and a
wrapper for writing mode.
This helper will be used in later steps and makes the code
simpler and easier to understand.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following 'enum bisect_error' vocabulary, return variable 'res' is
always non-positive.
Let's use '-res' instead of 'abs(res)' to make the code clearer.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In cmd_bisect__helper() function, if an invalid or no
subcommand is passed there is a BUG.
BUG() out instead of returning an error.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a repository has an alternate object directory configured, callers
can traverse through each alternate's MIDX by walking the '->next'
pointer.
But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it
places the new ones at the front of this pointer chain, not at the end.
This can be confusing for callers such as 'git repack -ad', causing test
failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'.
The occurs when dropping a pack known to the local MIDX with alternates
configured that have their own MIDX. Since the alternate's MIDX is
returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns
true (which is correct, since it traverses through the '->next' pointer
to find the MIDX in the chain that does contain the requested object).
But, we call 'clear_midx_file()' on 'the_repository', which drops the
MIDX at the path of the first MIDX in the chain, which (in the case of
t7700.6 is the one in the alternate).
This patch addresses that by:
- placing the local MIDX first in the chain when calling
'prepare_multi_pack_index_one()', and
- introducing a new 'get_local_multi_pack_index()', which explicitly
returns the repository-local MIDX, if any.
Don't impose an additional order on the MIDX's '->next' pointer beyond
that the first item in the chain must be local if one exists so that we
avoid a quadratic insertion.
Likewise, use 'get_local_multi_pack_index()' in
'remove_redundant_pack()' to fix the formerly broken t7700.6 when run
with 'GIT_TEST_MULTI_PACK_INDEX=1'.
Finally, note that the MIDX ordering invariant is only preserved by the
insertion order in 'prepare_packed_git()', which traverses through the
ODB's '->next' pointer, meaning we visit the local object store first.
This fragility makes this an undesirable long-term solution if more
callers are added, but it is acceptable for now since this is the only
caller.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The positional arguments are specified in this order: "bad" then "good".
To avoid confusion, the options above the positional arguments
are now specified in the same order. They can still be specified in any
order since they're options, not positional arguments.
Signed-off-by: Hugo Locurcio <hugo.locurcio@hugo.pro>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jk/leakfix:
submodule--helper: fix leak of core.worktree value
config: fix leak in git_config_get_expiry_in_days()
config: drop git_config_get_string_const()
config: fix leaks from git_config_get_string_const()
checkout: fix leak of non-existent branch names
submodule--helper: use strbuf_release() to free strbufs
clear_pattern_list(): clear embedded hashmaps
Add a new multi-valued config variable "receive.procReceiveRefs"
for `receive-pack` command, like the follows:
git config --system --add receive.procReceiveRefs refs/for
git config --system --add receive.procReceiveRefs refs/drafts
If the specific prefix strings given by the config variables match the
reference names of the commands which are sent from git client to
`receive-pack`, these commands will be executed by an external hook
(named "proc-receive"), instead of the internal `execute_commands`
function.
For example, if it is set to "refs/for", pushing to a reference such as
"refs/for/master" will not create or update reference "refs/for/master",
but may create or update a pull request directly by running the hook
"proc-receive".
Optional modifiers can be provided in the beginning of the value to
filter commands for specific actions: create (a), modify (m),
delete (d). A `!` can be included in the modifiers to negate the
reference prefix entry. E.g.:
git config --system --add receive.procReceiveRefs ad:refs/heads
git config --system --add receive.procReceiveRefs !:refs/heads
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new introduced "proc-receive" hook may handle a command for a
pseudo-reference with a zero-old as its old-oid, while the hook may
create or update a reference with different name, different new-oid,
and different old-oid (the reference may exist already with a non-zero
old-oid). Current "report-status" protocol cannot report the status for
such reference rewrite.
Add new capability "report-status-v2" and new report protocol which is
not backward compatible for report of git-push.
If a user pushes to a pseudo-reference "refs/for/master/topic", and
"receive-pack" creates two new references "refs/changes/23/123/1" and
"refs/changes/24/124/1", for client without the knowledge of
"report-status-v2", "receive-pack" will only send "ok/ng" directives in
the report, such as:
ok ref/for/master/topic
But for client which has the knowledge of "report-status-v2",
"receive-pack" will use "option" directives to report more attributes
for the reference given by the above "ok/ng" directive.
ok refs/for/master/topic
option refname refs/changes/23/123/1
option new-oid <new-oid>
ok refs/for/master/topic
option refname refs/changes/24/124/1
option new-oid <new-oid>
The client will report two new created references to the end user.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commands are fed to the "post-receive" hook, report options will
be parsed and the real old-oid, new-oid, reference name will feed to
the "post-receive" hook.
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git calls an internal `execute_commands` function to handle commands
sent from client to `git-receive-pack`. Regardless of what references
the user pushes, git creates or updates the corresponding references if
the user has write-permission. A contributor who has no
write-permission, cannot push to the repository directly. So, the
contributor has to write commits to an alternate location, and sends
pull request by emails or by other ways. We call this workflow as a
distributed workflow.
It would be more convenient to work in a centralized workflow like what
Gerrit provided for some cases. For example, a read-only user who
cannot push to a branch directly can run the following `git push`
command to push commits to a pseudo reference (has a prefix "refs/for/",
not "refs/heads/") to create a code review.
git push origin \
HEAD:refs/for/<branch-name>/<session>
The `<branch-name>` in the above example can be as simple as "master",
or a more complicated branch name like "foo/bar". The `<session>` in
the above example command can be the local branch name of the client
side, such as "my/topic".
We cannot implement a centralized workflow elegantly by using
"pre-receive" + "post-receive", because Git will call the internal
function "execute_commands" to create references (even the special
pseudo reference) between these two hooks. Even though we can delete
the temporarily created pseudo reference via the "post-receive" hook,
having a temporary reference is not safe for concurrent pushes.
So, add a filter and a new handler to support this kind of workflow.
The filter will check the prefix of the reference name, and if the
command has a special reference name, the filter will turn a specific
field (`run_proc_receive`) on for the command. Commands with this filed
turned on will be executed by a new handler (a hook named
"proc-receive") instead of the internal `execute_commands` function.
We can use this "proc-receive" command to create pull requests or send
emails for code review.
Suggested by Junio, this "proc-receive" hook reads the commands,
push-options (optional), and send result using a protocol in pkt-line
format. In the following example, the letter "S" stands for
"receive-pack" and letter "H" stands for the hook.
# Version and features negotiation.
S: PKT-LINE(version=1\0push-options atomic...)
S: flush-pkt
H: PKT-LINE(version=1\0push-options...)
H: flush-pkt
# Send commands from server to the hook.
S: PKT-LINE(<old-oid> <new-oid> <ref>)
S: ... ...
S: flush-pkt
# Send push-options only if the 'push-options' feature is enabled.
S: PKT-LINE(push-option)
S: ... ...
S: flush-pkt
# Receive result from the hook.
# OK, run this command successfully.
H: PKT-LINE(ok <ref>)
# NO, I reject it.
H: PKT-LINE(ng <ref> <reason>)
# Fall through, let 'receive-pack' to execute it.
H: PKT-LINE(ok <ref>)
H: PKT-LINE(option fall-through)
# OK, but has an alternate reference. The alternate reference name
# and other status can be given in options
H: PKT-LINE(ok <ref>)
H: PKT-LINE(option refname <refname>)
H: PKT-LINE(option old-oid <old-oid>)
H: PKT-LINE(option new-oid <new-oid>)
H: PKT-LINE(option forced-update)
H: ... ...
H: flush-pkt
After receiving a command, the hook will execute the command, and may
create/update different reference. For example, a command for a pseudo
reference "refs/for/master/topic" may create/update different reference
such as "refs/pull/123/head". The alternate reference name and other
status are given in option lines.
The list of commands returned from "proc-receive" will replace the
relevant commands that are sent from user to "receive-pack", and
"receive-pack" will continue to run the "execute_commands" function and
other routines. Finally, the result of the execution of these commands
will be reported to end user.
The reporting function from "receive-pack" to "send-pack" will be
extended in latter commit just like what the "proc-receive" hook reports
to "receive-pack".
Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'grep' check in test 4 of t7421 resulted in the failure of t7421 on
Windows due to a different error message
error: cannot spawn git: No such file or directory
instead of
fatal: exec 'rev-parse': cd to 'my-subm' failed: No such file or directory
Tighten up the check to compute 'src_abbrev' by guarding the
'verify_submodule_committish()' call using `p->status !='D'`, so that
the former isn't called in case of non-existent submodule directory,
consequently, there is no such error message on any execution
environment. The same need not be implemented for 'dst_abbrev' and is
rather redundant since the conditional 'if (S_ISGITLINK(p->mod_dst))'
already guards the 'verify_submodule_committish()' when we have a
status of 'D'.
Therefore, eliminate the 'grep' check in t7421. Instead, verify the
absence of an error message by doing a 'test_must_be_empty' on the
file containing the error.
Reported-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Worktree administrative files can become corrupted or outdated due to
external factors. Although, it is often possible to recover from such
situations by hand-tweaking these files, doing so requires intimate
knowledge of worktree internals. While information necessary to make
such repairs manually can be obtained from git-worktree.txt and
gitrepository-layout.txt, we can assist users more directly by teaching
git-worktree how to repair its administrative files itself (at least to
some extent). Therefore, add a "git worktree repair" command which
attempts to correct common problems which may arise due to factors
beyond Git's control.
At this stage, the "repair" command is a mere skeleton; subsequent
commits will flesh out the functionality.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allocate a child_env strvec but never free its memory. Instead, let's
just use the strvec that our child_process struct provides, which is
cleaned up automatically when we run the command.
And while we're moving the initialization of the child_process around,
let's switch it to use the official init function (zero-initializing it
works OK, since strvec is happy enough with that, but it sets a bad
example).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
learned to remove a multi-pack-index file if it added or removed a pack
from the object store.
This mechanism is a little over-eager, since it is only necessary to
drop a MIDX if 'git repack' removes a pack that the MIDX references.
Adding a pack outside of the MIDX does not require invalidating the
MIDX, and likewise for removing a pack the MIDX does not know about.
Teach 'git repack' to check for this by loading the MIDX, and checking
whether the to-be-removed pack is known to the MIDX. This requires a
slightly odd alternation to a test in t5319, which is explained with a
comment. A new test is added to show that the MIDX is left alone when
both packs known to it are marked as .keep, but two packs unknown to it
are removed and combined into one new pack.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definitions of 'verify_submodule_committish()' and
'print_submodule_summary()' had wrong styling in terms of the asterisk
placement. Amend them.
Also, the warning printed in case of an unexpected file mode printed the
mode in decimal. Print it in octal for enhanced readability.
Reported-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eliminate the parameters 'missing_{src,dst}' from the
'print_submodule_summary()' function call since they are not used
anywhere in the function.
Reported-by: Jeff King <peff@peff.net>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of "git init --separate-git-dir" is to initialize a
new project with the repository separate from the working tree,
or, in the case of an existing project, to move the repository
(the .git/ directory) out of the working tree. It does not make
sense to use --separate-git-dir with a bare repository for which
there is no working tree, so disallow its use with bare
repositories.
* es/init-no-separate-git-dir-in-bare:
init: disallow --separate-git-dir with bare repository
A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is refactoring 2 of 2 to simplify struct base_data.
Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.
Before this patch, the delta children were calculated at the last
possible moment. This allowed the members of struct base_data to be
populated in any order, superficially useful when we have the object
contents before the struct object_entry. But this makes reasoning about
the state of struct base_data more complicated, hence this patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is refactoring 1 of 2 to simplify struct base_data.
In index-pack, each thread maintains a doubly-linked list of the delta
chain that it is currently processing (the "base" and "child" pointers
in struct base_data). When a thread exceeds the delta base cache limit
and needs to reclaim memory, it uses the "child" pointers to traverse
the lineage, reclaiming the memory of the eldest delta bases first.
A subsequent patch will perform memory reclaiming in a different way and
will thus no longer need the "child" pointer. Because the "child"
pointer is redundant even now, remove it so that the aforementioned
subsequent patch will be clearer. In the meantime, reclaim memory in the
reverse order of the "base" pointers.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
the object type is already present in the name of the function. Remove
that parameter from these functions.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathspec given to git checkout and git restore is used with both
tree_entry_interesting (via read_tree_recursive) and match_pathspec
(via ce_path_match). The latter effectively only supports recursive
matching regardless of the value of the pathspec flag "recursive",
which is unset here.
That causes different match results for pathspecs with wildcards, and
can lead checkout and restore in no-overlay mode to remove entries
instead of modifying them. Enable recursive matching for both checkout
and restore to make matching consistent.
Setting the flag in checkout_main() technically also affects git switch,
but since that command doesn't accept pathspecs at all this has no
actual consequence.
Reported-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Initial-test-by: Sergii Shkarnikov <sergii.shkarnikov@globallogic.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b8a2486f15 (index-pack: support multithreaded delta resolving,
2012-05-06) describes an experiment that shows that setting the number
of threads for index-pack higher than 3 does not help.
I repeated that experiment using a more modern version of Git and a more
modern CPU and got different results.
Here are timings for p5302 against linux.git run on my laptop, a Core
i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16):
5302.3: index-pack 0 threads 256.28(253.41+2.79)
5302.4: index-pack 1 threads 257.03(254.03+2.91)
5302.5: index-pack 2 threads 149.39(268.34+3.06)
5302.6: index-pack 4 threads 94.96(294.10+3.23)
5302.7: index-pack 8 threads 68.12(339.26+3.89)
5302.8: index-pack 16 threads 70.90(655.03+7.21)
5302.9: index-pack default number of threads 116.91(290.05+3.21)
You can see that wall-clock times continue to improve dramatically up to
the number of cores, but bumping beyond that (into hyperthreading
territory) does not help (and in fact hurts a little).
Here's the same experiment on a machine with dual Xeon 6230's, totaling
40 cores (80 with hyperthreading):
5302.3: index-pack 0 threads 310.04(302.73+6.90)
5302.4: index-pack 1 threads 310.55(302.68+7.40)
5302.5: index-pack 2 threads 178.17(304.89+8.20)
5302.6: index-pack 5 threads 99.53(315.54+9.56)
5302.7: index-pack 10 threads 72.80(327.37+12.79)
5302.8: index-pack 20 threads 60.68(357.74+21.66)
5302.9: index-pack 40 threads 58.07(454.44+67.96)
5302.10: index-pack 80 threads 59.81(720.45+334.52)
5302.11: index-pack default number of threads 134.18(309.32+7.98)
The results are similar; things stop improving at 40 threads. Curiously,
going from 20 to 40 really doesn't help much, either (and increases CPU
time considerably). So that may represent an actual barrier to
parallelism, where we lose out due to context-switching and loss of
cache locality, but don't reap the wall-clock benefits due to contention
of our coarse-grained locks.
So what's a good default value? It's clear that the current cap of 3 is
too low; our default values are 42% and 57% slower than the best times
on each machine. The results on the 40-core machine imply that 20
threads is an actual barrier regardless of the number of cores, so we'll
take that as a maximum. We get the best results on these machines at
half of the online-cpus value. That's presumably a result of the
hyperthreading. That's common on multi-core Intel processors, but not
necessarily elsewhere. But if we take it as an assumption, we can
perform optimally on hyperthreaded machines and still do much better
than the status quo on other machines, as long as we never half below
the current value of 3.
So that's what this patch does.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pseudorefs move to a different ref storage mechanism, pseudorefs no longer
can be removed with 'rm'. Instead, suggest a "update-ref -d" command, which will
work regardless of ref storage backend.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check for existence and delete CHERRY_PICK_HEAD through ref functions.
This will help cherry-pick work with alternate ref storage backends.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few end-user facing messages have been updated to be
hash-algorithm agnostic.
* jc/object-names-are-not-sha-1:
messages: avoid SHA-1 in end-user facing messages
The previous commit introduced --ignore-date flag to rebase -i, but the
name is rather vague as it does not say whether the author date or the
committer date is ignored. Add an alias to convey the precise purpose.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the --ignore-date
option to the merge backend. This option uses the current time as the
author date rather than reusing the original author date when
rewriting commits. We take care to handle the combination of
--ignore-date and --committer-date-is-author-date in the same way as
the apply backend.
Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The dir structure seemed to have a number of leaks and problems around
it. First I noticed that parent_hashmap and recursive_hashmap were
being leaked (though Peff noticed and submitted fixes before me). Then
I noticed in the previous commit that clear_directory() was only taking
responsibility for a subset of fields within dir_struct, despite the
fact that entries[] and ignored[] we allocated internally to dir.c.
That, of course, resulted in many callers either leaking or haphazardly
trying to free these arrays and their contents.
Digging further, I found that despite the pretty clear documentation
near the top of dir.h that folks were supposed to call clear_directory()
when the user no longer needed the dir_struct, there were four callers
that didn't bother doing that at all. However, two of them clearly
thought about leaks since they had an UNLEAK(dir) directive, which to me
suggests that the method to free the data was too unclear. I suspect
the non-obviousness of the API and its holes led folks to avoid it,
which then snowballed into further problems with the entries[],
ignored[], parent_hashmap, and recursive_hashmap problems.
Rename clear_directory() to dir_clear() to be more in line with other
data structures in git, and introduce a dir_init() to handle the
suggested memsetting of dir_struct to all zeroes. I hope that a name
like "dir_clear()" is more clear, and that the presence of dir_init()
will provide a hint to those looking at the code that they need to look
for either a dir_clear() or a dir_free() and lead them to find
dir_clear().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The calling convention for the dir API is supposed to end with a call to
clear_directory() to free up no longer needed memory. However,
clear_directory() didn't free dir->entries or dir->ignored. I believe
this was an oversight, but a number of callers noticed memory leaks and
started free'ing these. Unfortunately, they did so somewhat haphazardly
(sometimes freeing the entries in the arrays, and sometimes only
free'ing the arrays themselves). This suggests the callers weren't
trying to make sure any possible memory used might be free'd, but just
the memory they noticed their usecase definitely had allocated.
Fix this mess by moving all the duplicated free'ing logic into
clear_directory(). End by resetting dir to a pristine state so it could
be reused if desired.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that Git has switched to using a subprocess to lazy-fetch missing
objects, remove the no_dependents code as it is no longer used.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "fetch", get_ref_map() iterates over all refs to populate
"existing_refs" in order to populate peer_ref->old_oid in the returned
refmap, even if the refmap has no peer_ref set - which is the case when
only literal hashes (i.e. no refs by name) are fetched.
Iterating over refs causes the targets of those refs to be checked for
existence. Avoiding this is especially important when we use "git fetch"
to perform lazy fetches in a partial clone because a target of such a
ref may need to be itself lazy-fetched (and otherwise causing an
infinite loop).
Therefore, avoid populating "existing_refs" until necessary. With this
patch, because Git lazy-fetches objects by literal hashes (to be done in
a subsequent commit), it will then be able to guarantee avoiding reading
targets of refs.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In "fetch", there are two parameters submodule_fetch_jobs_config and
recurse_submodules that can be set in a variety of ways: through
.gitmodules, through .git/config, and through the command line.
Currently "fetch" handles this by first reading .gitmodules, then
reading .git/config (allowing it to overwrite existing values), then
reading the command line (allowing it to overwrite existing values).
Notice that we can avoid reading .gitmodules if .git/config and/or the
command line already provides us with what we need. In addition, if
recurse_submodules is found to be "no", we do not need the value of
submodule_fetch_jobs_config.
Avoiding reading .gitmodules is especially important when we use "git
fetch" to perform lazy fetches in a partial clone because the
.gitmodules file itself might need to be lazy fetched (and otherwise
causing an infinite loop).
In light of all this, avoid reading .gitmodules until necessary. When
reading it, we may only need one of the two parameters it provides, so
teach fetch_config_from_gitmodules() to support NULL arguments. With
this patch, users (including Git itself when invoking "git fetch" to
lazy-fetch) will be able to guarantee avoiding reading .gitmodules by
passing --recurse-submodules=no.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a subsequent patch, partial clones will be taught to fetch missing
objects using a "git fetch" subprocess. Because the number of objects
fetched may be too numerous to fit on the command line, teach "fetch" to
accept refspecs passed through stdin.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you run fetch but record the result in remote-tracking branches,
and either if you do nothing with the fetched refs (e.g. you are
merely mirroring) or if you always work from the remote-tracking
refs (e.g. you fetch and then merge origin/branchname separately),
you can get away with having no FETCH_HEAD at all.
Teach "git fetch" a command line option "--[no-]write-fetch-head".
The default is to write FETCH_HEAD, and the option is primarily
meant to be used with the "--no-" prefix to override this default,
because there is no matching fetch.writeFetchHEAD configuration
variable to flip the default to off (in which case, the positive
form may become necessary to defeat it).
Note that under "--dry-run" mode, FETCH_HEAD is never written;
otherwise you'd see list of objects in the file that you do not
actually have. Passing `--write-fetch-head` does not force `git
fetch` to write the file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log --first-parent -p" showed patches only for single-parent
commits on the first-parent chain; the "--first-parent" option has
been made to imply "-m". Use "--no-diff-merges" to restore the
previous behaviour to omit patches for merge commits.
* jk/log-fp-implies-m:
doc/git-log: clarify handling of merge commit diffs
doc/git-log: move "-t" into diff-options list
doc/git-log: drop "-r" diff option
doc/git-log: move "Diff Formatting" from rev-list-options
log: enable "-m" automatically with "--first-parent"
revision: add "--no-diff-merges" option to counteract "-m"
log: drop "--cc implies -m" logic
"git bisect" learns the "--first-parent" option to find the first
breakage along the first-parent chain.
* al/bisect-first-parent:
bisect: combine args passed to find_bisection()
bisect: introduce first-parent flag
cmd_bisect__helper: defer parsing no-checkout flag
rev-list: allow bisect and first-parent flags
t6030: modernize "git bisect run" tests
In the ensure_core_worktree() function, we load the core.worktree value
of the submodule repository using repo_config_get_string(). This
function copies the string, but we never free it, leaking the memory.
We can instead use the "tmp" version of that function to avoid the
allocation at all. We don't have to worry about lifetime issues, since
we never even look at the value (we just want to know if it's set).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the
--committer-date-is-author-date option to the merge backend. This
option uses the author date of the commit that is being rewritten as
the committer date when the new commit is created.
Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation of --committer-date-is-author-date exports
GIT_COMMITTER_DATE to override the default committer date but does not
reset GIT_COMMITTER_DATE in the environment after creating the commit
so it is set in the environment of any hooks that get run. We're about
to add the same functionality to the sequencer and do not want to have
GIT_COMMITTER_DATE set when running hooks or exec commands so lets
update commit_tree_extended() to take an explicit committer so we
override the default date without setting GIT_COMMITTER_DATE in the
environment.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of functions that used struct refspec_item did not zero out the
structure memory. This can result in unexpected behavior, especially if
additional parameters are ever added to refspec_item in the future. Use
memset to ensure that unset structure members are zero.
It may make sense to convert most of these uses of struct refspec_item
to use either struct initializers or refspec_item_init_or_die. However,
other similar code uses memset. Converting all of these uses has been
left as a future exercise.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two functions to get a single config string:
- git_config_get_string()
- git_config_get_string_const()
One might naively think that the first one allocates a new string and
the second one just points us to the internal configset storage. But
in fact they both allocate a new copy; the second one exists only to
avoid having to cast when using it with a const global which we never
intend to free.
The documentation for the function explains that clearly, but it seems
I'm not alone in being surprised by this. Of 17 calls to the function,
13 of them leak the resulting value.
We could obviously fix these by adding the appropriate free(). But it
would be simpler still if we actually had a non-allocating way to get
the string. There's git_config_get_value() but that doesn't quite do
what we want. If the config key is present but is a boolean with no
value (e.g., "[foo]bar" in the file), then we'll get NULL (whereas the
string versions will print an error and die).
So let's introduce a new variant, git_config_get_string_tmp(), that
behaves as these callers expect. We need a new name because we have new
semantics but the same function signature (so even if we converted the
four remaining callers, topics in flight might be surprised). The "tmp"
is because this value should only be held onto for a short time. In
practice it's rare for us to clear and refresh the configset,
invalidating the pointer, but hopefully the "tmp" makes callers think
about the lifetime. In each of the converted cases here the value only
needs to last within the local function or its immediate caller.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We unconditionally write a branch name into a newly allocated buffer in
new_branch_info->path, via setup_branch_path(). We then check to see if
the branch exists; if not, we set that field to NULL, leaking the
memory. We should take care to free() it when doing so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prepare_to_clone_next_submodule() function has a few local-variable
strbufs. We use strbuf_reset() throughout the function to reuse the
buffers over and over. But at the end of the function we also use
strbuf_reset() as they go out of scope, which means we end up leaking
their heap buffers. This should be strbuf_release() instead.
These were introduced by 48308681b0 (git submodule update: have a
dedicated helper for cloning, 2016-02-29), but it doesn't seem to have
the same mistake elsewhere. Likewise, I looked for other instances of
the pattern in the submodule--helper file but couldn't find any.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are still a handful mentions of SHA-1 when we meant the
(hexadecimal) object names in end-user facing messages. Rewrite
them.
I was hoping that this can mostly be s/SHA-1/object name/, but
a few messages needed rephrasing to keep the result readable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new helper function has_object() has been introduced to make it
easier to mark object existence checks that do and don't want to
trigger lazy fetches, and a few such checks are converted using it.
* jt/has_object:
fsck: do not lazy fetch known non-promisor object
pack-objects: no fetch when allow-{any,promisor}
apply: do not lazy fetch when applying binary
sha1-file: introduce no-lazy-fetch has_object()
We UNLEAK() the "sorting" list created by parsing command-line options
(which is essentially used until the program exits). But we do so right
before leaving the cmd_ls_remote() function, which means we have to hit
all of the exits. But the point of UNLEAK() is that it's an annotation
which doesn't impact the variable itself. We can mark it as soon as
we're done writing its value, and then we only have to do so once.
This gives us a minor code reduction, and serves as a better example of
how UNLEAK() can be used.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason that git-fast-import benefits from being a separate
binary. And as it links against libgit.a, it has a non-trivial disk
footprint. Let's make it a builtin, which reduces the size of a stripped
installation from 22MB to 21MB.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason that bugreport has to be a separate binary. And since
it links against libgit.a, it has a rather large disk footprint. Let's
make it a builtin, which reduces the size of a stripped installation
from 24MB to 22MB.
This also simplifies our Makefile a bit. And we can take advantage of
builtin niceties like RUN_SETUP_GENTLY.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no real reason for credential helpers to be separate binaries. I
did them this way originally under the notion that helper don't _need_
to be part of Git, and so can be built totally separately (and indeed,
the ones in contrib/credential are). But the ones in our main Makefile
build on libgit.a, and the resulting binaries are reasonably large.
We can slim down our total disk footprint by just making them builtins.
This reduces the size of:
make strip install
from 29MB to 24MB on my Debian system.
Note that credential-cache can't operate without support for Unix
sockets. Currently we just don't build it at all when NO_UNIX_SOCKETS is
set. We could continue that with conditionals in the Makefile and our
list of builtins. But instead, let's build a dummy implementation that
dies with an informative message. That has two advantages:
- it's simpler, because the conditional bits are all kept inside
the credential-cache source
- a user who is expecting it to exist will be told _why_ they can't
use it, rather than getting the "credential-cache is not a git
command" error which makes it look like the Git install is broken.
Note that our dummy implementation does still respond to "-h" in order
to appease t0012 (and this may be a little friendlier for users, as
well).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert submodule subcommand 'summary' to a builtin and call it via
'git-submodule.sh'.
The shell version had to call $diff_cmd twice, once to find the modified
modules cared by the user and then again, with that list of modules
to do various operations for computing the summary of those modules.
On the other hand, the C version does not need a second call to
$diff_cmd since it reuses the module list from the first call to do the
aforementioned tasks.
In the C version, we use the combination of setting a child process'
working directory to the submodule path and then calling
'prepare_submodule_repo_env()' which also sets the 'GIT_DIR' to '.git',
so that we can be certain that those spawned processes will not access
the superproject's ODB by mistake.
A behavioural difference between the C and the shell version is that the
shell version outputs two line feeds after the 'git log' output when run
outside of the tests while the C version outputs one line feed in any
case. The reason for this is that the shell version calls log with
'--pretty=format:<fmt>' whose output is followed by two echo
calls; 'format' does not have "terminator" semantics like its 'tformat'
counterpart. So, the log output is terminated by a newline only when
invoked by the user and not when invoked from the scripts. This results
in the one & two line feed differences in the shell version.
On the other hand, the C version calls log with '--pretty=<fmt>'
which is equivalent to '--pretty:tformat:<fmt>' which is then
followed by a 'printf("\n")'. Due to its "terminator" semantics the
log output is always terminated by newline and hence one line feed in
any case.
Also, when we try to pass an option-like argument after a non-option
argument, for instance:
git submodule summary HEAD --foo-bar
(or)
git submodule summary HEAD --cached
That argument would be treated like a path to the submodule for which
the user is requesting a summary. So, the option ends up having no
effect. Though, passing '--quiet' is an exception to this:
git submodule summary HEAD --quiet
While 'summary' doesn't support '--quiet', we don't get an output for
the above command as '--quiet' is treated as a path which means we get
an output only if a submodule whose path is '--quiet' exists.
The error message in case of computing a summary for non-existent
submodules in the C version is different from that of the shell version.
Since the new error message is not marked for translation, change the
'test_i18ngrep' in t7421.4 to 'grep'.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Stefan Beller <stefanbeller@gmail.com>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many `submodule--helper` subcommands follow the convention that a struct
defines their callback data, and the declaration of that struct is
followed immediately by a macro to use in static initializers, without
any separating empty line.
Let's align the `init`, `status` and `sync` subcommands with that convention.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Philip Oakley <philipoakley@iee.email>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The final leg of SHA-256 transition.
* bc/sha-256-part-3: (39 commits)
t: remove test_oid_init in tests
docs: add documentation for extensions.objectFormat
ci: run tests with SHA-256
t: make SHA1 prerequisite depend on default hash
t: allow testing different hash algorithms via environment
t: add test_oid option to select hash algorithm
repository: enable SHA-256 support by default
setup: add support for reading extensions.objectformat
bundle: add new version for use with SHA-256
builtin/verify-pack: implement an --object-format option
http-fetch: set up git directory before parsing pack hashes
t0410: mark test with SHA1 prerequisite
t5308: make test work with SHA-256
t9700: make hash size independent
t9500: ensure that algorithm info is preserved in config
t9350: make hash size independent
t9301: make hash size independent
t9300: use $ZERO_OID instead of hard-coded object ID
t9300: abstract away SHA-1-specific constants
t8011: make hash size independent
...
Update "git help guides" documentation organization.
* pb/guide-docs:
git.txt: add list of guides
Documentation: don't hardcode command categories twice
help: drop usage of 'common' and 'useful' for guides
command-list.txt: add missing 'gitcredentials' and 'gitremote-helpers'
All "mergy" operations that internally use the merge-recursive
machinery should honor the merge.renormalize configuration, but
many of them didn't.
* en/eol-attrs-gotchas:
checkout: support renormalization with checkout -m <paths>
merge: make merge.renormalize work for all uses of merge machinery
t6038: remove problematic test
t6038: make tests fail for the right reason
The argv_array API is useful for not just managing argv but any
"vector" (NULL-terminated array) of strings, and has seen adoption
to a certain degree. It has been renamed to "strvec" to reduce the
barrier to adoption.
* jk/strvec:
strvec: rename struct fields
strvec: drop argv_array compatibility layer
strvec: update documention to avoid argv_array
strvec: fix indentation in renamed calls
strvec: convert remaining callers away from argv_array name
strvec: convert more callers away from argv_array name
strvec: convert builtin/ callers away from argv_array name
quote: rename sq_dequote_to_argv_array to mention strvec
strvec: rename files from argv-array to strvec
argv-array: rename to strvec
argv-array: use size_t for count and alloc
The purpose of "git init --separate-git-dir" is to separate the
repository from the worktree. This is true even when --separate-git-dir
is used on an existing worktree, in which case, it moves the .git/
subdirectory to a new location outside the worktree.
However, an outright bare repository (such as one created by "git init
--bare"), has no worktree, so using --separate-git-dir to separate it
from its non-existent worktree is nonsensical. Therefore, make it an
error to use --separate-git-dir on a bare repository.
Implementation note: "git init" considers a repository bare if told so
explicitly via --bare or if it guesses it to be so based upon
heuristics. In the explicit --bare case, a conflict with
--separate-git-dir is easy to detect early. In the guessed case,
however, the conflict can only be detected once "bareness" is guessed,
which happens after "git init" has begun creating the repository.
Technically, we can get by with a single late check which would cover
both cases, however, erroring out early, when possible, without leaving
detritus provides a better user experience.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that find_bisection() accepts multiple boolean arguments, these may
be combined into a single unsigned integer in order to declutter some of
the code in bisect.c
Also, rename the existing "flags" bitfield to "commit_flags", to
explicitly differentiate it from the new "bisect_flags" bitfield.
Based-on-patch-by: Harald Nordgren <haraldnordgren@gmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon seeing a merge commit when bisecting, this option may be used to
follow only the first parent.
In detecting regressions introduced through the merging of a branch, the
merge commit will be identified as introduction of the bug and its
ancestors will be ignored.
This option is particularly useful in avoiding false positives when a
merged branch contained broken or non-buildable commits, but the merge
itself was OK.
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_bisect__helper() is intended as a temporary shim layer serving as an
interface for git-bisect.sh. This function and git-bisect.sh should
eventually be replaced by a C implementation, cmd_bisect(), serving as
an entrypoint for all "git bisect ..." shell commands: cmd_bisect() will
only parse the first token following "git bisect", and dispatch the
remaining args to the appropriate function ["bisect_start()",
"bisect_next()", etc.].
Thus, cmd_bisect__helper() should not be responsible for parsing flags
like --no-checkout. Instead, let the --no-checkout flag remain in the
argv array, so it may be evaluated alongside the other options already
parsed by bisect_start().
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add first_parent_only parameter to find_bisection(), removing the
barrier that prevented combining the --bisect and --first-parent flags
when using git rev-list
Based-on-patch-by: Tiago Botelho <tiagonbotelho@hotmail.com>
Signed-off-by: Aaron Lipman <alipman88@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a call to has_object_file(), which lazily fetches missing
objects in a partial clone, when the object is known to not be
a promisor object. Change that call to has_object(), which does not do
any lazy fetching.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The options --missing=allow-{any,promisor} were introduced in caf3827e2f
("rev-list: add list-objects filtering support", 2017-11-22) with the
following note in the commit message:
This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.
The idea that these options are missing-object aware (and thus do not
need to lazily fetch objects, unlike unaware commands that assume that
all objects are present) are assumed in later commits such as 07ef3c6604
("fetch test: use more robust test for filtered objects", 2020-01-15).
However, the current implementations of these options use
has_object_file(), which indeed lazily fetches missing objects. Teach
these implementations not to do so. Also, update the documentation of
these options to be clearer.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1b81d8cb19 (help: use command-list.txt for the source of guides,
2018-05-20), all man5/man7 guides listed in command-list.txt appear in
the output of 'git help -g'.
However, 'git help -g' still prefixes this list with "The common Git
guides are:", which makes one wonder if there are others!
In the same spirit, the man page for 'git help' describes the '--guides'
option as listing 'useful' guides, which is not false per se but can
also be taken to mean that there are other guides that exist but are not
useful.
Instead of 'common' and 'useful', use 'Git concept guides' in both
places. To keep the code in line with this change, rename
help.c::list_common_guides_help to list_guides_help.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While packing many objects in a repository with a promissor remote,
lazily fetching missing objects from the promissor remote one by
one may be inefficient---the code now attempts to fetch all the
missing objects in batch (obviously this won't work for a lazy
clone that lazily fetches tree objects as you cannot even enumerate
what blobs are missing until you learn which trees are missing).
* jt/pack-objects-prefetch-in-batch:
pack-objects: prefetch objects to be packed
pack-objects: refactor to oid_object_info_extended
The 'merge' command is not the only one that does merges; other commands
like checkout -m or rebase do as well. Unfortunately, the only area of
the code that checked for the "merge.renormalize" config setting was in
builtin/merge.c, meaning it could only affect merges performed by the
"merge" command. Move the handling of this config setting to
merge_recursive_config() so that other commands can benefit from it as
well. Fixes a few tests in t6038.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "argc" and "argv" names made sense when the struct was argv_array,
but now they're just confusing. Let's rename them to "nr" (which we use
for counts elsewhere) and "v" (which is rather terse, but reads well
when combined with typical variable names like "args.v").
Note that we have to update all of the callers immediately. Playing
tricks with the preprocessor is hard here, because we wouldn't want to
rewrite unrelated tokens.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv src dst", when src is an unmerged path, errored out
correctly but with an incorrect error message to claim that src is
not tracked, which has been clarified.
* ct/mv-unmerged-path-error:
git-mv: improve error message for conflicted file
Preliminary clean-up of the refs API in preparation for adding a
new refs backend "reftable".
* hn/reftable:
reflog: cleanse messages in the refs.c layer
bisect: treat BISECT_HEAD as a pseudo ref
t3432: use git-reflog to inspect the reflog for HEAD
lib-t6000.sh: write tag using git-update-ref
"git clone --separate-git-dir=$elsewhere" used to stomp on the
contents of the existing directory $elsewhere, which has been
taught to fail when $elsewhere is not an empty directory.
* bw/fail-cloning-into-non-empty:
git clone: don't clone into non-empty directory
Updates to the changed-paths bloom filter.
* ds/commit-graph-bloom-updates:
commit-graph: check all leading directories in changed path Bloom filters
revision: empty pathspecs should not use Bloom filters
revision.c: fix whitespace
commit-graph: check chunk sizes after writing
commit-graph: simplify chunk writes into loop
commit-graph: unify the signatures of all write_graph_chunk_*() functions
commit-graph: persist existence of changed-paths
bloom: fix logic in get_bloom_filter()
commit-graph: change test to die on parse, not load
commit-graph: place bloom_settings in context
Now that we have a complete SHA-256 implementation in Git, let's enable
it so people can use it. Remove the ENABLE_SHA256 define constant
everywhere it's used. Add tests for initializing a repository with
SHA-256.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we detect the hash algorithm in use by the length of the
object ID. This is inelegant and prevents us from using a different
hash algorithm that is also 256 bits in length.
Since we cannot extend the v2 format in a backward-compatible way, let's
add a v3 format, which is identical, except for the addition of
capabilities, which are prefixed by an at sign. We add "object-format"
as the only capability and reject unknown capabilities, since we do not
have a network connection and therefore cannot negotiate with the other
side.
For compatibility, default to the v2 format for SHA-1 and require v3
for SHA-256.
In t5510, always use format v3 so we can be sure we produce consistent
results across hash algorithms. Since head -n N lists the top N lines
instead of the Nth line, let's run our output through sed to normalize
it and compare it against a fixed value, which will make sure we get
exactly what we're expecting.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recently added test in t5702 started using git verify-pack outside of
a repository. While this poses no problems with SHA-1, with SHA-256 we
implicitly rely on the setup of the repository to initialize our hash
algorithm settings.
Since we're not in a repository here, we need to provide git verify-pack
help to set things up properly. git index-pack already knows an
--object-format option, so let's accept one as well and pass it down to
our git index-pack invocation. Since we're now dynamically adjusting
the elements in argv, let's switch to using struct argv_array to manage
them. Finally, let's make t5702 pass the proper argument on down to its
git verify-pack caller.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using "--first-parent" to consider history as a single line of
commits, git-log still defaults to treating merges specially, even
though they could be considered as single commits in the linearized
history (that just introduce all of the changes from the second and
higher parents).
Let's instead have "--first-parent" imply "-m", which makes something
like:
git log --first-parent -p
do what you'd expect. Likewise:
git log --first-parent -Sfoo
will find "foo" in merge commits.
No new test is needed; we'll tweak the output of the existing
"--first-parent -p" test, which now matches the "-m --first-parent -p"
test. The unchanged existing test for "--no-diff-merges" confirms that
the user can get the old behavior if they want.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "-m" option sets revs->ignore_merges to "0", but there's no way to
undo it. This probably isn't something anybody overly cares about, since
"1" is already the default, but it will serve as an escape hatch when we
flip the default for ignore_merges to "0" in more situations.
We'll also add a few extra niceties:
- initialize the value to "-1" to indicate "not set", and then resolve
it to the normal 0/1 bool in setup_revisions(). This lets any tweak
functions, as well as setup_revisions() itself, avoid clobbering the
user's preference (which until now they couldn't actually express).
- since we now have --no-diff-merges, let's add the matching
--diff-merges, which is just a synonym for "-m". Then we don't even
need to document --no-diff-merges separately; it countermands the
long form of "-m" in the usual way.
The new test shows that this behaves just the same as the current
behavior without "-m".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was added by 82dee4160c (log: show merge commit when --cc is given,
2015-08-20), which explains why we need it. But that commit failed to
notice that setup_revisions() already does the same thing, since
cd2bdc5309 (Common option parsing for "git log --diff" and friends,
2006-04-14).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
parse_object_or_die() is passed an object ID and a name to show if the
object cannot be parsed. If the name is NULL then it shows the
hexadecimal object ID. Use that feature instead of preparing and
passing the hexadecimal representation to the function proactively.
That's shorter and a bit more efficient.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code which split an argv_array call across multiple lines, like:
argv_array_pushl(&args, "one argument",
"another argument", "and more",
NULL);
was recently mechanically renamed to use strvec, which results in
mis-matched indentation like:
strvec_pushl(&args, "one argument",
"another argument", "and more",
NULL);
Let's fix these up to align the arguments with the opening paren. I did
this manually by sifting through the results of:
git jump grep 'strvec_.*,$'
and liberally applying my editor's auto-format. Most of the changes are
of the form shown above, though I also normalized a few that had
originally used a single-tab indentation (rather than our usual style of
aligning with the open paren). I also rewrapped a couple of obvious
cases (e.g., where previously too-long lines became short enough to fit
on one), but I wasn't aggressive about it. In cases broken to three or
more lines, the grouping of arguments is sometimes meaningful, and it
wasn't worth my time or reviewer time to ponder each case individually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We eventually want to drop the argv_array name and just use strvec
consistently. There's no particular reason we have to do it all at once,
or care about interactions between converted and unconverted bits.
Because of our preprocessor compat layer, the names are interchangeable
to the compiler (so even a definition and declaration using different
names is OK).
This patch converts all of the files in builtin/ to keep the diff to a
manageable size.
The conversion was done purely mechanically with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe '
s/ARGV_ARRAY/STRVEC/g;
s/argv_array/strvec/g;
'
and then selectively staging files with "git add builtin/". We'll deal
with any indentation/style fallouts separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to eventually drop the use of the "argv_array" name in favor of
"strvec." Unlike most other uses of the name, this one is embedded in a
function name, so the definition and all of the callers need to be
updated at the same time.
We don't technically need to update the parameter types here (our
preprocessor compat macros make the two names interchangeable), but
let's do so to keep the site consistent for now.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:
git ls-files '*.c' '*.h' |
xargs perl -i -pe 's/argv-array.h/strvec.h/'
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object to be packed is noticed to be missing, prefetch all
to-be-packed objects in one batch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use oid_object_info_extended() instead of oid_object_info() because a
subsequent commit needs to specify an additional flag here.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git mv' has always complained about renaming a conflicted
file, as it cannot handle multiple index entries for one file.
However, the error message it uses has been the same as the
one for an untracked file:
fatal: not under version control, src=...
which is patently wrong. Distinguish the two cases and
add a test to make sure we produce the correct message.
Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix to the code to produce progress bar, which is new in the
upcoming release.
* tb/commit-graph-no-check-oids:
commit-graph: fix "Collecting commits from input" progress line
To display a progress line while reading commits from standard input
and looking them up, 5b6653e523 (builtin/commit-graph.c: dereference
tags in builtin, 2020-05-13) should have added a pair of
start_delayed_progress() and stop_progress() calls around the loop
reading stdin. Alas, the stop_progress() call ended up at the wrong
place, after write_commit_graph(), which does all the commit-graph
computation and writing, and has several progress lines of its own.
Consequently, that new
Collecting commits from input: 1234
progress line is overwritten by the first progress line shown by
write_commit_graph(), and its final "done" line is shown last, after
everything is finished:
$ { sleep 3 ; git rev-list -3 HEAD ; sleep 1 ; } | ~/src/git/git commit-graph write --stdin-commits
Expanding reachable commits in commit graph: 873402, done.
Writing out commit graph in 4 passes: 100% (3493608/3493608), done.
Collecting commits from input: 3, done.
Furthermore, that stop_progress() call was added after the 'cleanup'
label, where that loop reading stdin jumps in case of an error. In
case of invalid input this then results in the "done" line shown after
the error message:
$ { sleep 3 ; git rev-list -3 HEAD ; echo junk ; } | ~/src/git/git commit-graph write --stdin-commits
error: unexpected non-hex object ID: junk
Collecting commits from input: 3, done.
Move that stop_progress() call to the right place.
While at it, drop the unnecessary 'if (progress)' condition protecting
the stop_progress() call, because that function is prepared to handle
a NULL progress struct.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rebase is implemented with two different backends - 'apply' and
'merge' each of which support a different set of options. In
particular the apply backend supports a number of options implemented
by 'git am' that are not implemented in the merge backend. This means
that the available options are different depending on which backend is
used which is confusing. This patch adds support for the
--ignore-whitespace option to the merge backend. This option treats
lines with only whitespace changes as unchanged and is implemented in
the merge backend by translating it to -Xignore-space-change.
Signed-off-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both the git-bisect.sh as bisect--helper inspected the file system
directly.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using git clone with --separate-git-dir realgitdir and
realgitdir already exists, it's content is destroyed.
So, make sure we don't clone into an existing non-empty directory.
When d45420c1 (clone: do not clean up directories we didn't create,
2018-01-02) tightened the clean-up procedure after a failed cloning
into an empty directory, it assumed that the existing directory
given is an empty one so it is OK to keep that directory, while
running the clean-up procedure that is designed to remove everything
in it (since there won't be any, anyway). Check and make sure that
the $GIT_DIR is empty even cloning into an existing repository.
Signed-off-by: Ben Wijen <ben@wijen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent update to "git diff" meant as a code clean-up introduced a
bug in its error handling code, which has been corrected.
* ct/diff-with-merge-base-clarification:
diff: check for merge bases before assigning sym->base
In symdiff_prepare(), we iterate over the set of parsed objects to pick
out any symmetric differences, including the left, right, and base
elements. We assign the results into pointers in a "struct symdiff", and
then complain if we didn't find a base, like so:
sym->left = rev->pending.objects[lpos].name;
sym->right = rev->pending.objects[rpos].name;
sym->base = rev->pending.objects[basepos].name;
if (basecount == 0)
die(_("%s...%s: no merge base"), sym->left, sym->right);
But the least lines are backwards. If basecount is 0, then basepos will
be -1, and we will access memory outside of the pending array. This
isn't usually that big a deal, since we don't do anything besides a
single pointer-sized read before exiting anyway, but it does violate the
C standard, and of course memory-checking tools like ASan complain.
Let's put the basecount check first. Note that we haveto split it from
the other assignments, since the die() relies on sym->left and
sym->right having been assigned (this isn't strictly necessary, but is
easier to read than dereferencing the pending array again).
Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fast-export --anonymize" learned to take customized mapping to
allow its users to tweak its output more usable for debugging.
* jk/fast-export-anonym-alt:
fast-export: use local array to store anonymized oid
fast-export: anonymize "master" refname
fast-export: allow seeding the anonymized mapping
fast-export: add a "data" callback parameter to anonymize_str()
fast-export: move global "idents" anonymize hashmap into function
fast-export: use a flex array to store anonymized entries
fast-export: stop storing lengths in anonymized hashmaps
fast-export: tighten anonymize_mem() interface to handle only strings
fast-export: store anonymized oids as hex strings
fast-export: use xmemdupz() for anonymizing oids
t9351: derive anonymized tree checks from original repo
The name of the primary branch in existing repositories, and the
default name used for the first branch in newly created
repositories, is made configurable, so that we can eventually wean
ourselves off of the hardcoded 'master'.
* js/default-branch-name:
contrib: subtree: adjust test to change in fmt-merge-msg
testsvn: respect `init.defaultBranch`
remote: use the configured default branch name when appropriate
clone: use configured default branch name when appropriate
init: allow setting the default for the initial branch name via the config
init: allow specifying the initial branch name for the new repository
docs: add missing diamond brackets
submodule: fall back to remote's HEAD for missing remote.<name>.branch
send-pack/transport-helper: avoid mentioning a particular branch
fmt-merge-msg: stop treating `master` specially
API cleanup for get_worktrees()
* es/get-worktrees-unsort:
worktree: drop get_worktrees() unused 'flags' argument
worktree: drop get_worktrees() special-purpose sorting option
A few fields in "struct commit" that do not have to always be
present have been moved to commit slabs.
* ak/commit-graph-to-slab:
commit-graph: minimize commit_graph_data_slab access
commit: move members graph_pos, generation to a slab
commit-graph: introduce commit_graph_data_slab
object: drop parsed_object_pool->commit_count
SHA-256 migration work continues.
* bc/sha-256-part-2: (44 commits)
remote-testgit: adapt for object-format
bundle: detect hash algorithm when reading refs
t5300: pass --object-format to git index-pack
t5704: send object-format capability with SHA-256
t5703: use object-format serve option
t5702: offer an object-format capability in the test
t/helper: initialize the repository for test-sha1-array
remote-curl: avoid truncating refs with ls-remote
t1050: pass algorithm to index-pack when outside repo
builtin/index-pack: add option to specify hash algorithm
remote-curl: detect algorithm for dumb HTTP by size
builtin/ls-remote: initialize repository based on fetch
t5500: make hash independent
serve: advertise object-format capability for protocol v2
connect: parse v2 refs with correct hash algorithm
connect: pass full packet reader when parsing v2 refs
Documentation/technical: document object-format for protocol v2
t1302: expect repo format version 1 for SHA-256
builtin/show-index: provide options to determine hash algo
t5302: modernize test formatting
...
When displaying cat-file usage, the fact that a <format> can
be specified is only visible when lookling at the --batch and
--batch-check options which are shown like this:
--batch[=<format>] show info and content of objects fed from the standard input
--batch-check[=<format>]
show info about objects fed from the standard input
It seems more coherent and improves discovery to also show it
on the usage line.
In the documentation the DESCRIPTION tells us that "The output
format can be overridden using the optional <format> argument",
but we can't see the <format> argument in the SYNOPSIS above
the description which is confusing.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changed-path Bloom filters were released in v2.27.0, but have a
significant drawback. A user can opt-in to writing the changed-path
filters using the "--changed-paths" option to "git commit-graph write"
but the next write will drop the filters unless that option is
specified.
This becomes even more important when considering the interaction with
gc.writeCommitGraph (on by default) or fetch.writeCommitGraph (part of
features.experimental). These config options trigger commit-graph writes
that the user did not signal, and hence there is no --changed-paths
option available.
Allow a user that opts-in to the changed-path filters to persist the
property of "my commit-graph has changed-path filters" automatically. A
user can drop filters using the --no-changed-paths option.
In the process, we need to be extremely careful to match the Bloom
filter settings as specified by the commit-graph. This will allow future
versions of Git to customize these settings, and the version with this
change will persist those settings as commit-graphs are rewritten on
top.
Use the trace2 API to signal the settings used during the write, and
check that output in a test after manually adjusting the correct bytes
in the commit-graph file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff-files" has been taught to say paths that are marked as
intent-to-add are new files, not modified from an empty blob.
* sk/diff-files-show-i-t-a-as-new:
diff-files: treat "i-t-a" files as "not-in-index"
An in-code comment in "git diff" has been updated.
* dl/diff-usage-comment-update:
builtin/diff: fix botched update of usage comment
builtin/diff: update usage comment
Allow runtime upgrade of the repository format version, which needs
to be done carefully.
There is a rather unpleasant backward compatibility worry with the
last step of this series, but it is the right thing to do in the
longer term.
* xl/upgrade-repo-format:
check_repository_format_gently(): refuse extensions for old repositories
sparse-checkout: upgrade repository to version 1 when enabling extension
fetch: allow adding a filter after initial clone
repository: add a helper function to perform repository format upgrade
Some older versions of gcc complain about this line:
builtin/fast-export.c:412:2: error: dereferencing type-punned pointer
will break strict-aliasing rules [-Werror=strict-aliasing]
put_be32(oid.hash + hashsz - 4, counter++);
^
This seems to be a false positive, as there's no type-punning at all
here. oid.hash is an array of unsigned char; when we pass it to a
function it decays to a pointer to unsigned char. We do take a void
pointer in put_be32(), but it's immediately aliased with another pointer
to unsigned char (and clearly the compiler is looking inside the inlined
put_be32(), since the warning doesn't happen with -O0).
This happens on gcc 4.8 and 4.9, but not later versions (I tested gcc 6,
7, 8, and 9).
We can work around it by using a local array instead of an object_id
struct. This is a little more intimate with the details of object_id,
but for whatever reason doesn't seem to trigger the compiler warning.
We can revert this patch once we decide that those gcc versions are too
old to care about for a warning like this (gcc 4.8 is the default
compiler for Ubuntu Trusty, which is out-of-support but not fully
end-of-life'd until April 2022).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running "fast-export --anonymize" will leave "refs/heads/master"
untouched in the output, for two reasons:
- it helped to have some known reference point between the original
and anonymized repository
- since it's historically the default branch name, it doesn't leak any
information
Now that we can ask fast-export to retain particular tokens, we have a
much better tool for the first one (because it works for any ref, not
just master).
For the second, the notion of "default branch name" is likely to become
configurable soon, at which point the name _does_ leak information.
Let's drop this special case in preparation.
Note that we have to adjust the test a bit, since it relied on using the
name "master" in the anonymized repos. We could just use
--anonymize-map=master to keep the same output, but then we wouldn't
know if it works because of our hard-coded master or because of the
explicit map.
So let's flip the test a bit, and confirm that we anonymize "master",
but keep "other" in the output.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After you anonymize a repository, it can be hard to find which commits
correspond between the original and the result, and thus hard to
reproduce commands that triggered bugs in the original.
Let's make it possible to seed the anonymization map. This lets users
either:
- mark names to be retained as-is, if they don't consider them secret
(in which case their original commands would just work)
- map names to new values, which lets them adapt the reproduction
recipe to the new names without revealing the originals
The implementation is fairly straight-forward. We already store each
anonymized token in a hashmap (so that the same token appearing twice is
converted to the same result). We can just introduce a new "seed"
hashmap which is consulted first.
This does make a few more promises to the user about how we'll anonymize
things (e.g., token-splitting pathnames). But it's unlikely that we'd
want to change those rules, even if the actual anonymization of a single
token changes. And it makes things much easier for the user, who can
unblind only a directory name without having to specify each path within
it.
One alternative to this approach would be to anonymize as we see fit,
and then dump the whole refname and pathname mappings to a file. This
does work, but it's a bit awkward to use (you have to manually dig the
items you care about out of the mapping).
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "fetch/clone" protocol has been updated to allow the server to
instruct the clients to grab pre-packaged packfile(s) in addition
to the packed object data coming over the wire.
* jt/cdn-offload:
upload-pack: fix a sparse '0 as NULL pointer' warning
upload-pack: send part of packfile response as uri
fetch-pack: support more than one pack lockfile
upload-pack: refactor reading of pack-objects out
Documentation: add Packfile URIs design doc
Documentation: order protocol v2 sections
http-fetch: support fetching packfiles by URL
http-fetch: refactor into function
http: refactor finish_http_pack_request()
http: use --stdin when indexing dumb HTTP pack
Rewrite of parts of the scripted "git submodule" Porcelain command
continues; this time it is "git submodule set-branch" subcommand's
turn.
* ss/submodule-set-branch-in-c:
submodule: port subcommand 'set-branch' from shell to C
Code clean-up around "git branch" with a minor bugfix.
* dl/branch-cleanup:
branch: don't mix --edit-description
t3200: test for specific errors
t3200: rename "expected" to "expect"
"git diff" used to take arguments in random and nonsense range
notation, e.g. "git diff A..B C", "git diff A..B C...D", etc.,
which has been cleaned up.
* ct/diff-with-merge-base-clarification:
Documentation: usage for diff combined commits
git diff: improve range handling
t/t3430: avoid undefined git diff behavior
Code clean-up of "git clean" resulted in a fix of recent
performance regression.
* en/clean-cleanups:
clean: optimize and document cases where we recurse into subdirectories
clean: consolidate handling of ignored parameters
dir, clean: avoid disallowed behavior
dir: fix a few confusing comments
When cloning a repository without any branches, Git chooses a default
branch name for the as-yet unborn branch.
As part of the implicit initialization of the local repository, Git just
learned to respect `init.defaultBranch` to choose a different initial
branch name. We now really want that branch name to be used as a
fall-back.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We just introduced the command-line option
`--initial-branch=<branch-name>` to allow initializing a new repository
with a different initial branch than the hard-coded one.
To allow users to override the initial branch name more permanently
(i.e. without having to specify the name manually for each and every
`git init` invocation), let's introduce the `init.defaultBranch` config
setting.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Don Goodman-Wilson <don@goodman-wilson.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is a growing number of projects and companies desiring to change
the main branch name of their repositories (see e.g.
https://twitter.com/mislav/status/1270388510684598272 for background on
this).
To change that branch name for new repositories, currently the only way
to do that automatically is by copying all of Git's template directory,
then hard-coding the desired default branch name into the `.git/HEAD`
file, and then configuring `init.templateDir` to point to those copied
template files.
To make this process much less cumbersome, let's introduce a new option:
`--initial-branch=<branch-name>`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `remote.<name>.branch` is not configured, `git submodule update`
currently falls back to using the branch name `master`. A much better
idea, however, is to use the remote `HEAD`: on all Git servers running
reasonably recent Git versions, the symref `HEAD` points to the main
branch.
Note: t7419 demonstrates that there _might_ be use cases out there that
_expect_ `git submodule update --remote` to update submodules to the
remote `master` branch even if the remote `HEAD` points to another
branch. Arguably, this patch makes the behavior more intuitive, but
there is a slight possibility that this might cause regressions in
obscure setups.
Even so, it should be okay to fix this behavior without anything like a
longer transition period:
- The `git submodule update --remote` command is not really common.
- Current Git's behavior when running this command is outright
confusing, unless the remote repository's current branch _is_ `master`
(in which case the proposed behavior matches the old behavior).
- If a user encounters a regression due to the changed behavior, the fix
is actually trivial: setting `submodule.<name>.branch` to `master`
will reinstate the old behavior.
Helped-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The anonymize_str() function takes a generator callback, but there's no
way to pass extra context to it. Let's add the usual "void *data"
parameter to the generator interface and pass it along.
This is mildly annoying for existing callers, all of which pass NULL,
but is necessary to avoid extra globals in some cases we'll add in a
subsequent patch.
While we're touching each of these callbacks, we can further observe
that none of them use the existing orig/len parameters at all. This
makes sense, since the point is for their output to have no discernable
basis in the original (my original version had some notion that we might
use a one-way function to obfuscate the names, but it was never
implemented). So let's drop those extra parameters. If a caller really
wants to do something with them, it can pass a struct through the new
data parameter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the other anonymization functions keep their static mappings
inside the function to avoid polluting the global namespace. Let's do
the same for "idents", as nobody needs it outside of
anonymize_ident_line().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we're using a separate keydata struct for hash lookups, we have
more flexibility in how we allocate anonymized_entry structs. Let's push
the "orig" key into a flex member within the struct. That should save us
a few bytes of memory per entry (a pointer plus any malloc overhead),
and may make lookups a little faster (since it's one less pointer to
chase in the comparison function).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the anonymize_str() interface is restricted to NUL-terminated
strings, there's no need for us to keep track of the length of each
entry in the hashmap. This simplifies the code and saves a bit of
memory.
Note that we do still need to compare the stored results to partial
strings passed in by the callers. We can do that by using hashmap's
keydata feature to get the ptr/len pair into the comparison function,
and then using strncmp().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the anonymize_mem() interface _can_ store arbitrary byte
sequences, none of the callers uses this feature (as of the previous
commit). We'd like to keep it that way, as we'll be exposing the
string-like nature of the anonymization routines to the user. So let's
tighten up the interface a bit:
- don't treat "len" as an out-parameter from anonymize_mem(); this
ensures callers treat the pointer result as a NUL-terminated string
- likewise, don't treat "len" as an out-parameter from generator
functions
- swap out "void *" for "char *" as appropriate to signal that we
don't handle arbitrary memory
- rename the function to anonymize_str()
This will also open up some optimization opportunities in a future
patch.
Note that we can't drop the "len" parameter entirely. Some callers do
pass in partial strings (e.g., "foo/bar", len=3) to avoid copying, and
we need to handle those still.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fast-export stores anonymized oids, it does so as binary strings.
And while the anonymous mapping storage is binary-clean (at least as of
the previous commit), this will become awkward when we start exposing
more of it to the user. In particular, if we allow a method for
retaining token "foo", then users may want to specify a hex oid as such
a token.
Let's just switch to storing the hex strings. The difference in memory
usage is negligible (especially considering how infrequently we'd
generally store an oid compared to, say, path components).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our anonymize_mem() function is careful to take a ptr/len pair to allow
storing binary tokens like object ids, as well as partial strings (e.g.,
just "foo" of "foo/bar"). But it duplicates the hash key using
xstrdup()! That means that:
- for a partial string, we'd store all bytes up to the NUL, even
though we'd never look at anything past "len". This didn't produce
wrong behavior, but was wasteful.
- for a binary oid that doesn't contain a zero byte, we'd copy garbage
bytes off the end of the array (though as long as nothing complained
about reading uninitialized bytes, further reads would be limited by
"len", and we'd produce the correct results)
- for a binary oid that does contain a zero byte, we'd copy _fewer_
bytes than intended into the hashmap struct. When we later try to
look up a value, we'd access uninitialized memory and potentially
falsely claim that a particular oid is not present.
The most common reason to store an oid is an anonymized gitlink, but our
test case doesn't have any gitlinks at all. So let's add one whose oid
contains a NUL and is present at two different paths. ASan catches the
memory error, but even without it we can detect the bug because the oid
is not anonymized the same way for both paths.
And of course the fix is to copy the correct number of bytes. We don't
technically need the appended NUL from xmemdupz(), but it doesn't hurt
as an extra protection against anybody treating it like a string (plus a
future patch will push us more in that direction).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, an attempt was made to correct the "N=1, M=0"
case. However, the fix was botched and it introduced two half-correct
sections by mistake. Combine these half-correct sections into one fully
correct section.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
d91d6fbf26 (commit-reach: create repo_is_descendant_of(), 2020-06-17)
adds a repository aware version of is_descendant_of() and a backward
compatibility shim that is barely used.
Update all callers to directly use the new repo_is_descendant_of()
function instead; making the codebase simpler and pushing more
the_repository references higher up the stack.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The same worktree directory must be registered only once, but
"git worktree move" allowed this invariant to be violated, which
has been corrected.
* es/worktree-duplicate-paths:
worktree: make "move" refuse to move atop missing registered worktree
worktree: generalize candidate worktree path validation
worktree: prune linked worktree referencing main worktree path
worktree: prune duplicate entries referencing same worktree path
worktree: make high-level pruning re-usable
worktree: give "should be pruned?" function more meaningful name
worktree: factor out repeated string literal
The `diff-files' command and related commands which call the function
`cmd_diff_files()', consider the "intent-to-add" files as a part of the
index when comparing the work-tree against it. This was previously
addressed in commits [1] and [2] by turning the option
`--ita-invisible-in-index' (introduced in [3]) on by default.
For `diff-files' (and `add -p' as a consequence) to show the i-t-a
files as as new, `ita_invisible_in_index' will be enabled by default
here as well.
[1] 0231ae71d3 (diff: turn --ita-invisible-in-index on by default,
2018-05-26)
[2] 425a28e0a4 (diff-lib: allow ita entries treated as "not yet exist
in index", 2016-10-24)
[3] b42b451919 (diff: add --ita-[in]visible-in-index, 2016-10-24)
Signed-off-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_worktrees() accepts a 'flags' argument, however, there are no
existing flags (the lone flag GWT_SORT_LINKED was recently retired) and
no behavior which can be tweaked. Therefore, drop the 'flags' argument.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Of all the clients of get_worktrees(), only "git worktree list" wants
the list sorted in a very specific way; other clients simply don't care
about the order. Rather than imbuing get_worktrees() with special
knowledge about how various clients -- now and in the future -- may want
the list sorted, drop the sorting capability altogether and make it the
client's responsibility to sort the list if needed.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository. Since
using --stdin necessarily implies a repository, don't allow specifying
an object format if it's provided to prevent users from passing an
option that won't work. Add documentation for this option.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_pull() builds a commit_list to pass a single potential ancestor to
is_descendant_of(). The latter leaves the list intact. Release the
allocated memory after the call.
Leaking in cmd_*() isn't a big deal, but sets a bad example for other
users of is_descendant_of().
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A comment in cmd_diff() states that if one tree-ish and no blobs are
provided, (the "N=1, M=0" case), it will provide a diff between the tree
and the cache. This is incorrect because a diff happens between the
tree-ish and the working tree. Remove the `--cached` in the comment so
that the correct behavior is shown. Add a new section describing the
"N=1, M=0, --cached" behavior.
Next, describe the "N=0, M=0, --cached" case, similar to the above since
it is undocumented.
Finally, fix some spacing issues. Add spaces between each section for
consistency and readability. Also, change tabs within the comment into
spaces.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The behaviour of "sparse-checkout" in the state "git clone
--no-checkout" left was changed accidentally in 2.27, which has
been corrected.
* en/sparse-checkout:
sparse-checkout: avoid staging deletions of all files
The reflog entries for "git clone" and "git fetch" did not
anonymize the URL they operated on.
* js/reflog-anonymize-for-clone-and-fetch:
clone/fetch: anonymize URLs in the reflog
14ba97f8 (alloc: allow arbitrary repositories for alloc functions,
2018-05-15) introduced parsed_object_pool->commit_count to keep count of
commits per repository and was used to assign commit->index.
However, commit-slab code requires commit->index values to be unique
and a global count would be correct, rather than a per-repo count.
Let's introduce a static counter variable, `parsed_commits_count` to
keep track of parsed commits so far.
As commit_count has no use anymore, let's also drop it from the struct.
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`git branch` accepts `--edit-description` in conjunction with other
arguments. However, `--edit-description` is its own mode, similar to
`--set-upstream-to`, which is also made mutually exclusive with other
modes. Prevent `--edit-description` from being mixed with other modes.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 6b1db43109 ("clean: teach clean -d to preserve ignored paths",
2017-05-23) added the following code block (among others) to git-clean:
if (remove_directories)
dir.flags |= DIR_SHOW_IGNORED_TOO | DIR_KEEP_UNTRACKED_CONTENTS;
The reason for these flags is well documented in the commit message, but
isn't obvious just from looking at the code. Add some explanations to
the code to make it clearer.
Further, it appears git-2.26 did not correctly handle this combination
of flags from git-clean. With both these flags and without
DIR_SHOW_IGNORED_TOO_MODE_MATCHING set, git is supposed to recurse into
all untracked AND ignored directories. git-2.26.0 clearly was not doing
that. I don't know the full reasons for that or whether git < 2.27.0
had additional unknown bugs because of that misbehavior, because I don't
feel it's worth digging into. As per the huge changes and craziness
documented in commit 8d92fb2927 ("dir: replace exponential algorithm
with a linear one", 2020-04-01), the old algorithm was a mess and was
thrown out. What I can say is that git-2.27.0 correctly recurses into
untracked AND ignored directories with that combination.
However, in clean's case we don't need to recurse into ignored
directories; that is just a waste of time. Thus, when git-2.27.0
started correctly handling those flags, we got a performance regression
report. Rather than relying on other bugs in fill_directory()'s former
logic to provide the behavior of skipping ignored directories, make use
of the DIR_SHOW_IGNORED_TOO_MODE_MATCHING value specifically added in
commit eec0f7f2b7 ("status: add option to show ignored files
differently", 2017-10-30) for this purpose.
Reported-by: Brian Malehorn <bmalehorn@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I spent a long time trying to figure out how and whether the code worked
with different values of ignore, ignore_only, and remove_directories.
After lots of time setting up lots of testcases, sifting through lots of
print statements, and walking through the debugger, I finally realized
that one piece of code related to how it was all setup was found in
clean.c rather than dir.c. Make a change that would have made it easier
for me to do the extra testing by putting this handling in one spot.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dir.h documented quite clearly that DIR_SHOW_IGNORED and
DIR_SHOW_IGNORED_TOO are mutually exclusive, with a big comment to this
effect by the definition of both enum values. However, a command like
git clean -fx $DIR
would set both values for dir.flags. I _think_ it happened to work
because:
* As dir.h points out, DIR_KEEP_UNTRACKED_CONTENTS only takes effect
if DIR_SHOW_IGNORED_TOO is set.
* As coded, I believe DIR_SHOW_IGNORED would just happen to take
precedence over DIR_SHOW_IGNORED_TOO in the code as currently
constructed.
Which is a long way of saying "we just got lucky".
Fix clean.c to avoid setting these mutually exclusive values at the same
time, and add a check to dir.c that will throw a BUG() to prevent anyone
else from making this mistake.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document the usage for producing combined commits with "git diff".
This includes updating the synopsis section.
While here, add the three-dot notation to the synopsis.
Make "git diff -h" print the same usage summary as the manual
page synopsis, minus the "A..B" form, which is now discouraged.
Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git diff is given a symmetric difference A...B, it chooses
some merge base from the two specified commits (as documented).
This fails, however, if there is *no* merge base: instead, you
see the differences between A and B, which is certainly not what
is expected.
Moreover, if additional revisions are specified on the command
line ("git diff A...B C"), the results get a bit weird:
* If there is a symmetric difference merge base, this is used
as the left side of the diff. The last final ref is used as
the right side.
* If there is no merge base, the symmetric status is completely
lost. We will produce a combined diff instead.
Similar weirdness occurs if you use, e.g., "git diff C A...B D".
Likewise, using multiple two-dot ranges, or tossing extra
revision specifiers into the command line with two-dot ranges,
or mixing two and three dot ranges, all produce nonsense.
To avoid all this, add a routine to catch the range cases and
verify that that the arguments make sense. As a side effect,
produce a warning showing *which* merge base is being used when
there are multiple choices; die if there is no merge base.
Signed-off-by: Chris Torek <chris.torek@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach upload-pack to send part of its packfile response as URIs.
An administrator may configure a repository with one or more
"uploadpack.blobpackfileuri" lines, each line containing an OID, a pack
hash, and a URI. A client may configure fetch.uriprotocols to be a
comma-separated list of protocols that it is willing to use to fetch
additional packfiles - this list will be sent to the server. Whenever an
object with one of those OIDs would appear in the packfile transmitted
by upload-pack, the server may exclude that object, and instead send the
URI. The client will then download the packs referred to by those URIs
before performing the connectivity check.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever a fetch results in a packfile being downloaded, a .keep file is
generated, so that the packfile can be preserved (from, say, a running
"git repack") until refs are written referring to the contents of the
packfile.
In a subsequent patch, a successful fetch using protocol v2 may result
in more than one .keep file being generated. Therefore, teach
fetch_pack() and the transport mechanism to support multiple .keep
files.
Implementation notes:
- builtin/fetch-pack.c normally does not generate .keep files, and thus
is unaffected by this or future changes. However, it has an
undocumented "--lock-pack" feature, used by remote-curl.c when
implementing the "fetch" remote helper command. In keeping with the
remote helper protocol, only one "lock" line will ever be written;
the rest will result in warnings to stderr. However, in practice,
warnings will never be written because the remote-curl.c "fetch" is
only used for protocol v0/v1 (which will not generate multiple .keep
files). (Protocol v2 uses the "stateless-connect" command, not the
"fetch" command.)
- connected.c has an optimization in that connectivity checks on a ref
need not be done if the target object is in a pack known to be
self-contained and connected. If there are multiple packfiles, this
optimization can no longer be done.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree add" takes special care to avoid creating a new worktree
at a location already registered to an existing worktree even if that
worktree is missing (which can happen, for instance, if the worktree
resides on removable media). "git worktree move", however, is not so
careful when validating the destination location and will happily move
the source worktree atop the location of a missing worktree. This leads
to the anomalous situation of multiple worktrees being associated with
the same path, which is expressly forbidden by design. For example:
$ git clone foo.git
$ cd foo
$ git worktree add ../bar
$ git worktree add ../baz
$ rm -rf ../bar
$ git worktree move ../baz ../bar
$ git worktree list
.../foo beefd00f [master]
.../bar beefd00f [bar]
.../bar beefd00f [baz]
$ git worktree remove ../bar
fatal: validation failed, cannot remove working tree:
'.../bar' does not point back to '.git/worktrees/bar'
Fix this shortcoming by enhancing "git worktree move" to perform the
same additional validation of the destination directory as done by "git
worktree add".
While at it, add a test to verify that "git worktree move" won't move a
worktree atop an existing (non-worktree) path -- a restriction which has
always been in place but was never tested.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree add" checks that the specified path is a valid location
for a new worktree by ensuring that the path does not already exist and
is not already registered to another worktree (a path can be registered
but missing, for instance, if it resides on removable media). Since "git
worktree add" is not the only command which should perform such
validation ("git worktree move" ought to also), generalize the the
validation function for use by other callers, as well.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree prune" detects when multiple entries are associated with
the same path and prunes the duplicates, however, it does not detect
when a linked worktree points at the path of the main worktree.
Although "git worktree add" disallows creating a new worktree with the
same path as the main worktree, such a case can arise outside the
control of Git even without the user mucking with .git/worktree/<id>/
administrative files. For instance:
$ git clone foo.git
$ git -C foo worktree add ../bar
$ rm -rf bar
$ mv foo bar
$ git -C bar worktree list
.../bar deadfeeb [master]
.../bar deadfeeb [bar]
Help the user recover from such corruption by extending "git worktree
prune" to also detect when a linked worktree is associated with the path
of the main worktree.
Reported-by: Jonathan Müller <jonathanmueller.dev@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A fundamental restriction of linked working trees is that there must
only ever be a single worktree associated with a particular path, thus
"git worktree add" explicitly disallows creation of a new worktree at
the same location as an existing registered worktree. Nevertheless,
users can still "shoot themselves in the foot" by mucking with
administrative files in .git/worktree/<id>/. Worse, "git worktree move"
is careless[1] and allows a worktree to be moved atop a registered but
missing worktree (which can happen, for instance, if the worktree is on
removable media). For instance:
$ git clone foo.git
$ cd foo
$ git worktree add ../bar
$ git worktree add ../baz
$ rm -rf ../bar
$ git worktree move ../baz ../bar
$ git worktree list
.../foo beefd00f [master]
.../bar beefd00f [bar]
.../bar beefd00f [baz]
Help users recover from this form of corruption by teaching "git
worktree prune" to detect when multiple worktrees are associated with
the same path.
[1]: A subsequent commit will fix "git worktree move" validation to be
more strict.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level logic for removing a worktree is well encapsulated in
delete_git_dir(). However, high-level details related to pruning a
worktree -- such as dealing with verbosity and dry-run mode -- are not
encapsulated. Factor out this high-level logic into its own function so
it can be re-used as new worktree corruption detectors are added.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Readers of the name prune_worktree() are likely to expect the function
to actually prune a worktree, however, it only answers the question
"should this worktree be pruned?". Give it a name more reflective of its
true purpose to avoid such confusion.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to parse "git bisect start" command line was lax in
validating the arguments.
* cb/bisect-helper-parser-fix:
bisect--helper: avoid segfault with bad syntax in `start --term-*`
On-the-wire protocol v2 easily falls into a deadlock between the
remote-curl helper and the fetch-pack process when the server side
prematurely throws an error and disconnects. The communication has
been updated to make it more robust.
* dl/remote-curl-deadlock-fix:
stateless-connect: send response end packet
pkt-line: define PACKET_READ_RESPONSE_END
remote-curl: error on incomplete packet
pkt-line: extern packet_length()
transport: extract common fetch_pack() call
remote-curl: remove label indentation
remote-curl: fix typo
Code simplification and test coverage enhancement.
* bc/filter-process:
t2060: add a test for switch with --orphan and --discard-changes
builtin/checkout: simplify metadata initialization
For each worktree removed by "git worktree prune", it reports the reason
for the removal. All reasons share the common prefix "Removing
worktrees/%s:". As new removal reasons are added, this prefix needs to
be duplicated, which is error-prone and potentially cumbersome.
Therefore, factor out the common prefix.
Although this change seems to increase the "sentence lego quotient", it
should be reasonably safe, as the reason for removal is a distinct
clause, not strictly related to the prefix. Moreover, the "worktrees" in
"Removing worktrees/%s:" is a path literal which ought not be localized,
so by factoring it out, we can more easily avoid exposing that path
fragment to translators.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'extensions' configuration variable gets special meaning in the new
repository version, so when enabling the extension we should upgrade the
repository to version 1.
Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Retroactively adding a filter can be useful for existing shallow clones as
they allow users to see earlier change histories without downloading all
git objects in a regular --unshallow fetch.
Without this patch, users can make a clone partial by editing the
repository configuration to convert the remote into a promisor, like:
git config core.repositoryFormatVersion 1
git config extensions.partialClone origin
git fetch --unshallow --filter=blob:none origin
Since the hard part of making this work is already in place and such
edits can be error-prone, teach Git to perform the required configuration
change automatically instead.
Note that this change does not modify the existing git behavior which
recognizes setting extensions.partialClone without changing
repositoryFormatVersion.
Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
sparse-checkout's purpose is to update the working tree to have it
reflect a subset of the tracked files. As such, it shouldn't be
switching branches, making commits, downloading or uploading data, or
staging or unstaging changes. Other than updating the worktree, the
only thing sparse-checkout should touch is the SKIP_WORKTREE bit of the
index. In particular, this sets up a nice invariant: running
sparse-checkout will never change the status of any file in `git status`
(reflecting the fact that we only set the SKIP_WORKTREE bit if the file
is safe to delete, i.e. if the file is unmodified).
Traditionally, we did a _really_ bad job with this goal. The
predecessor to sparse-checkout involved manual editing of
.git/info/sparse-checkout and running `git read-tree -mu HEAD`. That
command would stage and unstage changes and overwrite dirty changes in
the working tree.
The initial implementation of the sparse-checkout command was no better;
it simply invoked `git read-tree -mu HEAD` as a subprocess and had the
same caveats, though this issue came up repeatedly in review comments
and workarounds for the problems were put in place before the feature
was merged[1, 2, 3, 4, 5, 6; especially see 4 & 6].
[1] https://lore.kernel.org/git/CABPp-BFT9A5n=_bx5LsjCvbogqwSjiwgr5amcjgbU1iAk4KLJg@mail.gmail.com/
[2] https://lore.kernel.org/git/CABPp-BEmwSwg4tgJg6nVG8a3Hpn_g-=ZjApZF4EiJO+qVgu4uw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFV7TA0qwZCQpHCqx9N+JifyRyuBQ-pZ_oGfe-NOgyh7A@mail.gmail.com/
[4] https://lore.kernel.org/git/CABPp-BHYCCD+Vx5fq35jH82eHc1-P53Lz_aGNpHJNcx9kg2K-A@mail.gmail.com/
[5] https://lore.kernel.org/git/CABPp-BF+JWYZfDqp2Tn4AEKVp4b0YMA=Mbz4Nz62D-gGgiduYQ@mail.gmail.com/
[6] https://lore.kernel.org/git/20191121163706.GV23183@szeder.dev/
However, these workarounds, in addition to disabling the feature in a
number of important cases, also missed one special case. I'll get back
to it later.
In the 2.27.0 cycle, the disabling of the feature was lifted by finally
replacing the internal equivalent of `git read-tree -mu HEAD` with
something that did what we wanted: the new update_sparsity() function in
unpack-trees.c that only ever updates SKIP_WORKTREE bits in the index
and updates the working tree to match. This new function handles all
the cases that were problematic for the old implementation, except that
it breaks the same special case that avoided the workarounds of the old
implementation, but broke it in a different way.
So...that brings us to the special case: a git clone performed with
--no-checkout. As per the meaning of the flag, --no-checkout does not
check out any branch, with the implication that you aren't on one and
need to switch to one after the clone. Implementationally, HEAD is
still set (so in some sense you are partially on a branch), but
* the index is "unborn" (non-existent)
* there are no files in the working tree (other than .git/)
* the next time git switch (or git checkout) is run it will run
unpack_trees with `initial_checkout` flag set to true.
It is not until you run, e.g. `git switch <somebranch>` that the index
will be written and files in the working tree populated.
With this special --no-checkout case, the traditional `read-tree -mu
HEAD` behavior would have done the equivalent of acting like checkout --
switch to the default branch (HEAD), write out an index that matches
HEAD, and update the working tree to match. This special case slipped
through the avoid-making-changes checks in the original sparse-checkout
command and thus continued there.
After update_sparsity() was introduced and used (see commit f56f31af03
("sparse-checkout: use new update_sparsity() function", 2020-03-27)),
the behavior for the --no-checkout case changed: Due to git's
auto-vivification of an empty in-memory index (see do_read_index() and
note that `must_exist` is false), and due to sparse-checkout's
update_working_directory() code to always write out the index after it
was done, we got a new bug. That made it so that sparse-checkout would
switch the repository from a clone with an "unborn" index (i.e. still
needing an initial_checkout), to one that had a recorded index with no
entries. Thus, instead of all the files appearing deleted in `git
status` being known to git as a special artifact of not yet being on a
branch, our recording of an empty index made it suddenly look to git as
though it was definitely on a branch with ALL files staged for deletion!
A subsequent checkout or switch then had to contend with the fact that
it wasn't on an initial_checkout but had a bunch of staged deletions.
Make sure that sparse-checkout changes nothing in the index other than
the SKIP_WORKTREE bit; in particular, when the index is unborn we do not
have any branch checked out so there is no sparsification or
de-sparsification work to do. Simply return from
update_working_directory() early.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even if we strongly discourage putting credentials into the URLs passed
via the command-line, there _is_ support for that, and users _do_ do
that.
Let's scrub them before writing them to the reflog.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message from "git checkout -b foo -t bar baz" was
confusing.
* rs/checkout-b-track-error:
checkout: improve error messages for -b with extra argument
checkout: add tests for -b and --track
Convert submodule subcommand 'set-branch' to a builtin and call it via
'git-submodule.sh'.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Helped-by: Denton Liu <liu.denton@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
ls-remote may or may not operate within a repository, and as such will
not have been initialized with the repository's hash algorithm. Even if
it were, the remote side could be using a different algorithm and we
would still want to display those refs properly. Find the hash
algorithm used by the remote side by querying the transport object and
set our hash algorithm accordingly.
Without this change, if the remote side is using SHA-256, we truncate
the refs to 40 hex characters, since that's the length of the default
hash algorithm (SHA-1).
Note that technically this is not a correct setting of the repository
hash algorithm since, if we are in a repository, it might be one of a
different hash algorithm from the remote side. However, our current
code paths don't handle multiple algorithms and won't for some time, so
this is the best we can do. We rely on the fact that ls-remote never
modifies the current repository, which is a reasonable assumption to
make.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
show-index is capable of reading any possible index file whether or not
the index is inside a repository. However, because our index files lack
metadata about the hash algorithm in use, it's not possible to
autodetect the algorithm that a particular index file is using.
In order to allow us to read index files of any algorithm, let's set up
the .git directory gently so that we default to the algorithm for the
current repository, and add an --object-format option to allow users to
override this setting and continue to run show-index outside of a
repository altogether. Let's also document this new option so that
people can find it and use it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table. Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.
In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened. Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When performing a clone, we don't know what hash algorithm the other end
will support. Currently, we don't support fetching data belonging to a
different algorithm, so we must know what algorithm the remote side is
using in order to properly initialize the repository. We can know that
only after fetching the refs, so if the remote side has any references,
use that information to reinitialize the repository with the correct
hash algorithm information.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Detect when the server doesn't support our hash algorithm and abort.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Advertise the current hash algorithm in use by using the object-format
capability as part of the ref advertisement.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, remote-curl acts as a proxy and blindly forwards packets
between an HTTP server and fetch-pack. In the case of a stateless RPC
connection where the connection is terminated before the transaction is
complete, remote-curl will blindly forward the packets before waiting on
more input from fetch-pack. Meanwhile, fetch-pack will read the
transaction and continue reading, expecting more input to continue the
transaction. This results in a deadlock between the two processes.
This can be seen in the following command which does not terminate:
$ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
whereas the v1 version does terminate as expected:
$ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012
Cloning into 'git'...
fatal: the remote end hung up unexpectedly
Instead of blindly forwarding packets, make remote-curl insert a
response end packet after proxying the responses from the remote server
when using stateless_connect(). On the RPC client side, ensure that each
response ends as described.
A separate control packet is chosen because we need to be able to
differentiate between what the remote server sends and remote-curl's
control packets. By ensuring in the remote-curl code that a server
cannot send response end packets, we prevent a malicious server from
being able to perform a denial of service attack in which they spoof a
response end packet and cause the described deadlock to happen.
Reported-by: Force Charlie <charlieio@outlook.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we try to create a branch "foo" based on "origin/master" and give
git commit -b an extra unsupported argument "bar", it confusingly
reports:
$ git checkout -b foo origin/master bar
fatal: 'bar' is not a commit and a branch 'foo' cannot be created from it
$ git checkout --track -b foo origin/master bar
fatal: 'bar' is not a commit and a branch 'foo' cannot be created from it
That's wrong, because it very well understands that "origin/master" is
supposed to be the start point for the new branch and not "bar". Check
if we got a commit and show more fitting messages in that case instead:
$ git checkout -b foo origin/master bar
fatal: Cannot update paths and switch to branch 'foo' at the same time.
$ git checkout --track -b foo origin/master bar
fatal: '--track' cannot be used with updating paths
Original-patch-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
06f5608c14 (bisect--helper: `bisect_start` shell function partially in C,
2019-01-02) adds a lax parser for `git bisect start` which could result
in a segfault under a bad syntax call for start with custom terms.
Detect if there are enough arguments left in the command line to use for
--term-{old,good,new,bad} and abort with the same syntax error the original
implementation will show if not.
While at it, remove an unnecessary (and incomplete) check for unknown
arguments and make sure to add a test to avoid regressions.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Acked-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call init_checkout_metadata in reset_tree, we want to pass the
object ID of the commit in question so that it can be passed to filters,
or if there is no commit, the tree. We anticipated this latter case,
which can occur elsewhere in the checkout code, but it cannot occur
here. The only case in which we do not have a commit object is when
invoking git switch with --orphan. Moreover, we can only hit this code
path without a commit object additionally with either --force or
--discard-changes.
In such a case, there is no point initializing the checkout metadata
with a commit or tree because (a) there is no commit, only the empty
tree, and (b) we will never use the data, since no files will be smudged
when checking out a branch with no files. Pass the all-zeros object ID
in this case, since we just need some value which is a valid pointer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index was added to the data verified by git-fsck in
ea5ae6c3 "fsck: verify multi-pack-index". This implementation was
based on the implementation for verifying the commit-graph, and a
copy-paste error kept the ERROR_COMMIT_GRAPH flag as the bit set
when an error appears in the multi-pack-index.
Add a new flag, ERROR_MULTI_PACK_INDEX, and use that instead.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For a merge with a single strategy, the result of evaluate_result() is
effectively not used and therefore is not needed, so avoid altogether.
On Windows, this optimization can halve the time required to perform a
recursive merge of a single commit with the LLVM repo.
Signed-off-by: Andrew Ng <andrew.ng@sony.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 7c5c9b9c57 (commit-graph: error out on invalid commit oids in
'write --stdin-commits', 2019-08-05), the commit-graph builtin dies on
receiving non-commit OIDs as input to '--stdin-commits'.
This behavior can be cumbersome to work around in, say, the case of
piping 'git for-each-ref' to 'git commit-graph write --stdin-commits' if
the caller does not want to cull out non-commits themselves. In this
situation, it would be ideal if 'git commit-graph write' wrote the graph
containing the inputs that did pertain to commits, and silently ignored
the remainder of the input.
Some options have been proposed to the effect of '--[no-]check-oids'
which would allow callers to have the commit-graph builtin do just that.
After some discussion, it is difficult to imagine a caller who wouldn't
want to pass '--no-check-oids', suggesting that we should get rid of the
behavior of complaining about non-commit inputs altogether.
If callers do wish to retain this behavior, they can easily work around
this change by doing the following:
git for-each-ref --format='%(objectname) %(objecttype) %(*objecttype)' |
awk '
!/commit/ { print "not-a-commit:"$1 }
/commit/ { print $1 }
' |
git commit-graph write --stdin-commits
To make it so that valid OIDs that refer to non-existent objects are
indeed an error after loosening the error handling, perform an extra
lookup to make sure that object indeed exists before sending it to the
commit-graph internals.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When given a list of commits, the commit-graph machinery calls
'lookup_commit_reference_gently()' on each element in the set and treats
the resulting set of OIDs as the base over which to close for
reachability.
In an earlier collection of commits, the 'git commit-graph write
--reachable' case made the inner-most call to
'lookup_commit_reference_gently()' by peeling references before they
were passed over to the commit-graph internals.
Do the analog for 'git commit-graph write --stdin-commits' by calling
'lookup_commit_reference_gently()' outside of the commit-graph
machinery, making the inner-most call a noop.
Since this may incur additional processing time, surround
'read_one_commit' with a progress meter to provide output to the caller.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With either '--stdin-commits' or '--stdin-packs', the commit-graph
builtin will read line-delimited input, and interpret it either as a
series of commit OIDs, or pack names.
In a subsequent commit, we will begin handling '--stdin-commits'
differently by processing each line as it comes in, instead of in one
shot at the end. To make adequate room for this additional logic, split
the '--stdin-commits' case from '--stdin-packs' by only storing the
input when '--stdin-packs' is given.
In the case of '--stdin-commits', feed each line to a new
'read_one_commit' helper, which (for now) will merely call
'parse_oid_hex'.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "am", "commit", "merge" and "rebase", when they are run with
the "--quiet" option, to pass "--quiet" down to "gc --auto".
* jc/auto-gc-quiet:
auto-gc: pass --quiet down from am, commit, merge and rebase
auto-gc: extract a reusable helper from "git fetch"
The <stdlib.h> header on NetBSD brings in its own definition of
hmac() function (eek), which conflicts with our own and unrelated
function with the same name. Our function has been renamed to work
around the issue.
* cb/avoid-colliding-with-netbsd-hmac:
builtin/receive-pack: avoid generic function name hmac()
"git restore --staged --worktree" now defaults to take the contents
out of "HEAD", instead of erring out.
* es/restore-staged-from-head-by-default:
restore: default to HEAD when combining --staged and --worktree
"git branch" and other "for-each-ref" variants accepted multiple
--sort=<key> options in the increasing order of precedence, but it
had a few breakages around "--ignore-case" handling, and tie-breaking
with the refname, which have been fixed.
* jk/for-each-ref-multi-key-sort-fix:
ref-filter: apply fallback refname sort only after all user sorts
ref-filter: apply --ignore-case to all sorting keys
In error messages that "git switch" mentions its option to create a
new branch, "-b/-B" options were shown, where "-c/-C" options
should be, which has been corrected.
* dl/switch-c-option-in-error-message:
switch: fix errors and comments related to -c and -C
Convert submodule subcommand 'set-url' to a builtin. Port 'set-url' to
'submodule--helper.c' and call the latter via 'git-submodule.sh'.
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These commands take the --quiet option for their own operation, but
they forget to pass the option down when they invoke "git gc --auto"
internally.
Teach them to do so using the run_auto_gc() helper we added in the
previous step.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in 1991006c (fetch: convert argv_gc_auto to struct argv_array,
2014-08-16), we taught "git fetch --quiet" to pass the "--quiet"
option down to "gc --auto". This issue, however, is not limited to
"fetch":
$ git grep -e 'gc.*--auto' \*.c
finds hits in "am", "commit", "merge", and "rebase" and these
commands do not pass "--quiet" down to "gc --auto" when they
themselves are told to be quiet.
As a preparatory step, let's introduce a helper function
run_auto_gc(), that the caller can pass a boolean "quiet",
and redo the fix to "git fetch" using the helper.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By default, files are restored from the index for --worktree, and from
HEAD for --staged. When --worktree and --staged are combined, --source
must be specified to disambiguate the restore source[1], thus making it
cumbersome to restore a file in both the worktree and the index.
However, HEAD is also a reasonable default for --worktree when combined
with --staged, so make it the default anytime --staged is used (whether
combined with --worktree or not).
[1]: Due to an oversight, the --source requirement, though documented,
is not actually enforced.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fabec2c5c3 (builtin/receive-pack: switch to use the_hash_algo, 2019-08-18)
renames hmac_sha1 to hmac, as it was updated to use the hash function used
by git (which won't be sha1 in the future).
hmac() is provided by NetBSD >= 8 libc and therefore conflicts as shown by :
builtin/receive-pack.c:421:13: error: conflicting types for 'hmac'
static void hmac(unsigned char *out,
^~~~
In file included from ./git-compat-util.h:172:0,
from ./builtin.h:4,
from builtin/receive-pack.c:1:
/usr/include/stdlib.h:305:10: note: previous declaration of 'hmac' was here
ssize_t hmac(const char *, const void *, size_t, const void *, size_t, void *,
^~~~
Rename it again to hmac_hash to reflect it will use the git's defined hash
function and avoid the conflict, while at it update a comment to better
describe the HMAC function that was used.
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the ref-filter users (for-each-ref, branch, and tag) take an
--ignore-case option which makes filtering and sorting case-insensitive.
However, this option was applied only to the first element of the
ref_sorting list. So:
git for-each-ref --ignore-case --sort=refname
would do what you expect, but:
git for-each-ref --ignore-case --sort=refname --sort=taggername
would sort the primary key (taggername) case-insensitively, but sort the
refname case-sensitively. We have two options here:
- teach callers to set ignore_case on the whole list
- replace the ref_sorting list with a struct that contains both the
list of sorting keys, as well as options that apply to _all_
keys
I went with the first one here, as it gives more flexibility if we later
want to let the users set the flag per-key (presumably through some
special syntax when defining the key; for now it's all or nothing
through --ignore-case).
The new test covers this by sorting on both tagger and subject
case-insensitively, which should compare "a" and "A" identically, but
still sort them before "b" and "B". We'll break ties by sorting on the
refname to give ourselves a stable output (this is actually supposed to
be done automatically, but there's another bug which will be fixed in
the next commit).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Error and verbose trace messages from "git push" did not redact
credential material embedded in URLs.
* js/anonymise-push-url-in-errors:
push: anonymize URLs in error messages and warnings
The "bugreport" tool.
* es/bugreport:
bugreport: drop extraneous includes
bugreport: add compiler info
bugreport: add uname info
bugreport: gather git version and build info
bugreport: add tool to generate debugging info
help: move list_config_help to builtin/help
Incompatible options "--root" and "--fork-point" of "git rebase"
have been marked and documented as being incompatible.
* en/rebase-root-and-fork-point-are-incompatible:
rebase: display an error if --root and --fork-point are both provided
"git blame" learns to take advantage of the "changed-paths" Bloom
filter stored in the commit-graph file.
* ds/blame-on-bloom:
test-bloom: check that we have expected arguments
test-bloom: fix some whitespace issues
blame: drop unused parameter from maybe_changed_path
blame: use changed-path Bloom filters
tests: write commit-graph with Bloom filters
revision: complicated pathspecs disable filters
Introduce an extension to the commit-graph to make it efficient to
check for the paths that were modified at each commit using Bloom
filters.
* gs/commit-graph-path-filter:
bloom: ignore renames when computing changed paths
commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag
t4216: add end to end tests for git log with Bloom filters
revision.c: add trace2 stats around Bloom filter usage
revision.c: use Bloom filters to speed up path based revision walks
commit-graph: add --changed-paths option to write subcommand
commit-graph: reuse existing Bloom filters during write
commit-graph: write Bloom filters to commit graph file
commit-graph: examine commits by generation number
commit-graph: examine changed-path objects in pack order
commit-graph: compute Bloom filters for changed paths
diff: halt tree-diff early after max_changes
bloom.c: core Bloom filter implementation for changed paths.
bloom.c: introduce core Bloom filter constructs
bloom.c: add the murmur3 hash implementation
commit-graph: define and use MAX_NUM_CHUNKS
Fix in-core inconsistency after fetching into a shallow repository
that broke the code to write out commit-graph.
* tb/reset-shallow:
shallow.c: use '{commit,rollback}_shallow_file'
t5537: use test_write_lines and indented heredocs for readability
In previous patches, the functions 'commit_shallow_file' and
'rollback_shallow_file' were introduced to reset the shallowness
validity checks on a repository after potentially modifying
'.git/shallow'.
These functions can be made safer by wrapping the 'struct lockfile *' in
a new type, 'shallow_lock', so that they cannot be called with a raw
lock (and potentially misused by other code that happens to possess a
lockfile, but has nothing to do with shallowness).
This patch introduces that type as a thin wrapper around 'struct
lockfile', and updates the two aforementioned functions and their
callers to use it.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many functions in commit.h that are more related to shallow
repositories than they are to any sort of generic commit machinery.
Likely this began when there were only a few shallow-related functions,
and commit.h seemed a reasonable enough place to put them.
But, now there are a good number of shallow-related functions, and
placing them all in 'commit.h' doesn't make sense.
This patch extracts a 'shallow.h', which takes all of the declarations
from 'commit.h' for functions which already exist in 'shallow.c'. We
will bring the remaining shallow-related functions defined in 'commit.c'
in a subsequent patch.
For now, move only the ones that already are implemented in 'shallow.c',
and update the necessary includes.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In d787d311db (checkout: split part of it to new command 'switch',
2019-03-29), the `git switch` command was created by extracting the
common functionality of cmd_checkout() in checkout_main(). However, in
b7b5fce270 (switch: better names for -b and -B, 2019-03-29), the branch
creation and force creation options for 'switch' were changed to -c and
-C, respectively. As a result of this, error messages and comments that
previously referred to `-b` and `-B` became invalid for `git switch`.
For error messages that refer to `-b` and `-B`, use a format string
instead so that `-c` and `-C` can be printed when `git switch` is
invoked.
Reported-by: Robert Simpson
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git update-ref --stdin" learned a handful of new verbs to let the
user control ref update transactions more explicitly, which helps
as an ingredient to implement two-phase commit-style atomic
ref-updates across multiple repositories.
* ps/transactional-update-ref-stdin:
update-ref: implement interactive transaction handling
update-ref: read commands in a line-wise fashion
update-ref: move transaction handling into `update_refs_stdin()`
update-ref: pass end pointer instead of strbuf
update-ref: drop unused argument for `parse_refname`
update-ref: organize commands in an array
strbuf: provide function to append whole lines
git-update-ref.txt: add missing word
refs: fix segfault when aborting empty transaction
The directory traversal code had redundant recursive calls which
made its performance characteristics exponential with respect to
the depth of the tree, which was corrected.
* en/fill-directory-exponential:
completion: fix 'git add' on paths under an untracked directory
Fix error-prone fill_directory() API; make it only return matches
dir: replace double pathspec matching with single in treat_directory()
dir: include DIR_KEEP_UNTRACKED_CONTENTS handling in treat_directory()
dir: replace exponential algorithm with a linear one
dir: refactor treat_directory to clarify control flow
dir: fix confusion based on variable tense
dir: fix broken comment
dir: consolidate treat_path() and treat_one_path()
dir: fix simple typo in comment
t3000: add more testcases testing a variety of ls-files issues
t7063: more thorough status checking
"sparse-checkout" UI improvements.
* en/sparse-checkout:
sparse-checkout: provide a new reapply subcommand
unpack-trees: failure to set SKIP_WORKTREE bits always just a warning
unpack-trees: provide warnings on sparse updates for unmerged paths too
unpack-trees: make sparse path messages sound like warnings
unpack-trees: split display_error_msgs() into two
unpack-trees: rename ERROR_* fields meant for warnings to WARNING_*
unpack-trees: move ERROR_WOULD_LOSE_SUBMODULE earlier
sparse-checkout: use improved unpack_trees porcelain messages
sparse-checkout: use new update_sparsity() function
unpack-trees: add a new update_sparsity() function
unpack-trees: pull sparse-checkout pattern reading into a new function
unpack-trees: do not mark a dirty path with SKIP_WORKTREE
unpack-trees: allow check_updates() to work on a different index
t1091: make some tests a little more defensive against failures
unpack-trees: simplify pattern_list freeing
unpack-trees: simplify verify_absent_sparse()
unpack-trees: remove unused error type
unpack-trees: fix minor typo in comment
The stash entry created by "git rebase --autosquash" to keep the
initial dirty state were discarded by mistake upon "git rebase
--quit", which has been corrected.
* dl/merge-autostash-rebase-quit-fix:
rebase: save autostash entry into stash reflog on --quit
"git grep" did not quote a path with unusual character like other
commands (like "git diff", "git status") do, but did quote when run
from a subdirectory, both of which has been corrected.
* mt/grep-cquote-path:
grep: follow conventions for printing paths w/ unusual chars
The "--decorate-refs" and "--decorate-refs-exclude" options "git
log" takes have learned a companion configuration variable
log.excludeDecoration that sits at the lowest priority in the
family.
* ds/log-exclude-decoration-config:
log: add log.excludeDecoration config option
log-tree: make ref_filter_match() a helper method
"git diff-tree --pretty --notes" used to hit an assertion failure,
as it forgot to initialize the notes subsystem.
* tb/diff-tree-with-notes:
diff-tree.c: load notes machinery when required
Allowing the user to split a patch hunk while "git stash -p" does
not work well; a band-aid has been added to make this (partially)
work better.
* js/stash-p-fix:
stash -p: (partially) fix bug concerning split hunks
t3904: fix incorrect demonstration of a bug
Code in builtin/*, i.e. those can only be called from within
built-in subcommands, that implements bulk of a couple of
subcommands have been moved to libgit.a so that they could be used
by others.
* dl/libify-a-few:
Lib-ify prune-packed
Lib-ify fmt-merge-msg
"git diff" in a partial clone learned to avoid lazy loading blob
objects in more casese when they are not needed.
* jt/avoid-prefetch-when-able-in-diff:
diff: restrict when prefetching occurs
diff: refactor object read
diff: make diff_populate_filespec_options struct
promisor-remote: accept 0 as oid_nr in function
"git commit-graph write --expire-time=<timestamp>" did not use the
given timestamp correctly, which has been corrected.
* ds/commit-graph-expiry-fix:
commit-graph: fix buggy --expire-time option
"git log" learns "--[no-]mailmap" as a synonym to "--[no-]use-mailmap"
* jc/log-no-mailmap:
log: give --[no-]use-mailmap a more sensible synonym --[no-]mailmap
clone: reorder --recursive/--recurse-submodules
parse-options: teach "git cmd -h" to show alias as alias
The config API made mixed uses of int and size_t types to represent
length of various pieces of text it parsed, which has been updated
to use the correct type (i.e. size_t) throughout.
* jk/config-use-size-t:
config: reject parsing of files over INT_MAX
config: use size_t to store parsed variable baselen
git_config_parse_key(): return baselen as size_t
config: drop useless length variable in write_pair()
parse_config_key(): return subsection len as size_t
remote: drop auto-strlen behavior of make_branch() and make_rewrite()
Validation of push certificate has been made more robust against
timing attacks.
* bc/constant-memequal:
receive-pack: compilation fix
builtin/receive-pack: use constant-time comparison for HMAC value
Just like 47abd85ba0 (fetch: Strip usernames from url's before storing
them, 2009-04-17) and later 882d49ca5c (push: anonymize URL in status
output, 2016-07-13), and even later c1284b21f2 (curl: anonymize URLs
in error messages and warnings, 2019-03-04) this change anonymizes URLs
(read: strips them of user names and especially passwords) in
user-facing error messages and warnings.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a03b55530a (merge: teach --autostash option, 2020-04-07), the
--autostash option was introduced for `git merge`. Notably, when
`git merge --quit` is run with an autostash entry present, it is saved
into the stash reflog. This is contrasted with the current behaviour of
`git rebase --quit` where the autostash entry is simply just dropped out
of existence.
Adopt the behaviour of `git merge --quit` in `git rebase --quit` and
save the autostash entry into the stash reflog instead of just deleting
it.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the usage for `git push` is shown, it includes the following
lines
--recurse-submodules[=(check|on-demand|no)]
control recursive pushing of submodules
which seem to indicate that the argument for --recurse-submodules is
optional. However, we cannot actually run that optiion without an
argument:
$ git push --recurse-submodules
fatal: recurse-submodules missing parameter
Unset PARSE_OPT_OPTARG so that it is clear that this option requires an
argument. Since the parse-options machinery guarantees that an argument
is present now, assume that `arg` is set in the else of
option_parse_recurse_submodules().
Reported-by: Andrew White <andrew.white@audinate.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the codebase, there are many options which use OPTION_CALLBACK in a
plain ol' struct definition. However, we have the OPT_CALLBACK and
OPT_CALLBACK_F macros which are meant to abstract these plain struct
definitions away. These macros are useful as they semantically signal to
developers that these are just normal callback option with nothing fancy
happening.
Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or
OPT_CALLBACK_F where applicable. The heavy lifting was done using the
following (disgusting) shell script:
#!/bin/sh
do_replacement () {
tr '\n' '\r' |
sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\s*0,\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' |
sed -e 's/{\s*OPTION_CALLBACK,\s*\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\(\s*[^[:space:]}]*\)\s*}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' |
tr '\r' '\n'
}
for f in $(git ls-files \*.c)
do
do_replacement <"$f" >"$f.tmp"
mv "$f.tmp" "$f"
done
The result was manually inspected and then reformatted to match the
style of the surrounding code. Finally, using
`git grep OPTION_CALLBACK \*.c`, leftover results which were not handled
by the script were manually transformed.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--root implies we want to rebase all commits since the beginning of
history. --fork-point means we want to use the reflog of the specified
upstream to find the best common ancestor between <upstream> and
<branch> and only rebase commits since that common ancestor. These
options are clearly contradictory, so throw an error (instead of
segfaulting on a NULL pointer) if both are specified.
Reported-by: Alexander Berg <alexander.berg@atos.net>
Documentation-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In bd0b42aed3 (fetch-pack: do not take shallow lock unnecessarily,
2019-01-10), the author noted that 'is_repository_shallow' produces
visible side-effect(s) by setting 'is_shallow' and 'shallow_stat'.
This is a problem for e.g., fetching with '--update-shallow' in a
shallow repository with 'fetch.writeCommitGraph' enabled, since the
update to '.git/shallow' will cause Git to think that the repository
isn't shallow when it is, thereby circumventing the commit-graph
compatibility check.
This causes problems in shallow repositories with at least shallow refs
that have at least one ancestor (since the client won't have those
objects, and therefore can't take the reachability closure over commits
when writing a commit-graph).
Address this by introducing thin wrappers over 'commit_lock_file' and
'rollback_lock_file' for use specifically when the lock is held over
'.git/shallow'. These wrappers (appropriately called
'commit_shallow_file' and 'rollback_shallow_file') call into their
respective functions in 'lockfile.h', but additionally reset validity
checks used by the shallow machinery.
Replace each instance of 'commit_lock_file' and 'rollback_lock_file'
with 'commit_shallow_file' and 'rollback_shallow_file' when the lock
being held is over the '.git/shallow' file.
As a result, 'prune_shallow' can now only be called once (since
'check_shallow_file_for_update' will die after calling
'reset_repository_shallow'). But, this is OK since we only call
'prune_shallow' at most once per process.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow "git rebase" to reapply all local commits, even if the may be
already in the upstream, without checking first.
* jt/rebase-allow-duplicate:
rebase --merge: optionally skip upstreamed commits
"git rebase" (again) learns to honor "--no-keep-empty", which lets
the user to discard commits that are empty from the beginning (as
opposed to the ones that become empty because of rebasing). The
interactive rebase also marks commits that are empty in the todo.
* en/rebase-no-keep-empty:
rebase: fix an incompatible-options error message
rebase: reinstate --no-keep-empty
rebase -i: mark commits that begin empty in todo editor
The interactive input from various codepaths are consolidated and
any prompt possibly issued earlier are fflush()ed before we read.
* js/flush-prompt-before-interative-input:
interactive: explicitly `fflush` stdout before expecting input
interactive: refactor code asking the user for interactive input
The output from "git format-patch" uses RFC 2047 encoding for
non-ASCII letters on From: and Subject: headers, so that it can
directly be fed to e-mail programs. A new option has been added
to produce these headers in raw.
* eb/format-patch-no-encode-headers:
format-patch: teach --no-encode-email-headers
"git rebase" learned the "--no-gpg-sign" option to countermand
commit.gpgSign the user may have.
* dd/no-gpg-sign:
Documentation: document merge option --no-gpg-sign
Documentation: merge commit-tree --[no-]gpg-sign
Documentation: reword commit --no-gpg-sign
Documentation: document am --no-gpg-sign
cherry-pick/revert: honour --no-gpg-sign in all case
rebase.c: honour --no-gpg-sign
The logic to auto-follow tags by "git clone --single-branch" was
not careful to avoid lazy-fetching unnecessary tags, which has been
corrected.
* jk/use-quick-lookup-in-clone-for-tag-following:
clone: use "quick" lookup while following tags
Code cleanup.
* jk/oid-array-cleanups:
oidset: stop referring to sha1-array
ref-filter: stop referring to "sha1 array"
bisect: stop referring to sha1_array
test-tool: rename sha1-array to oid-array
oid_array: rename source file from sha1-array
oid_array: use size_t for iteration
oid_array: use size_t for count and allocation
"git pull --rebase" tried to run a rebase even after noticing that
the pull results in a fast-forward and no rebase is needed nor
sensible, for the past few years due to a mistake nobody noticed.
* en/pull-do-not-rebase-after-fast-forwarding:
pull: avoid running both merge and rebase
"git pull" shares many options with underlying "git fetch", but
some of them were not documented and some of those that would make
sense to pass down were not passed down.
* rs/pull-options-sync-code-and-doc:
pull: pass documented fetch options on
pull: remove --update-head-ok from documentation
Simplify the commit ancestry connectedness check in a partial clone
repository in which "promised" objects are assumed to be obtainable
lazily on-demand from promisor remote repositories.
* jt/connectivity-check-optim-in-partial-clone:
connected: always use partial clone optimization
We do not use C99 "for loop initial declaration" in our codebase
(yet), but one snuck in.
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since its introduction in 7249e91 (revision.c: support --notes
command-line option, 2011-03-29), combining '--notes' with any option
that causes us to format notes (e.g., '--pretty', '--format="%N"', etc)
results in a failed assertion at runtime.
$ git rev-list HEAD | git diff-tree --stdin --pretty=medium --notes
commit 8f3d9f354286745c751374f5f1fcafee6b3f3136
git: notes.c:1308: format_display_notes: Assertion `display_notes_trees' failed.
Aborted
This failure is due to diff-tree not calling 'load_display_notes' to
initialize the notes machinery.
Ordinarily, this failure isn't triggered, because it requires passing
both '--notes' and another of the above mentioned options. In the case
of '--pretty', for example, we set 'opt->verbose_header', causing
'show_log()' to eventually call 'format_display_notes()', which expects
a non-NULL 'display_note_trees'.
Without initializing the notes machinery, 'display_note_trees' remains
NULL, and thus triggers an assertion failure.
Fix this by initializing the notes machinery after parsing our options,
and harden this behavior against regression with a test in t4013. (Note
that the added ref in this test requires updating two unrelated tests
which use 'log --all', and thus need to learn about the new refs).
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
grep does not follow the conventions used by other Git commands when
printing paths that contain unusual characters (as double-quotes or
newlines). Commands such as ls-files, commit, status and diff will:
- Quote and escape unusual pathnames, by default.
- Print names verbatim and unquoted when "-z" is used.
But grep *never* quotes/escapes absolute paths with unusual chars and
*always* quotes/escapes relative ones, even with "-z". Besides being
inconsistent in its own output, the deviation from other Git commands
can be confusing. So let's make it follow the two rules above and add
some tests for this new behavior. Note that, making grep quote/escape
all unusual paths by default, also make it fully compliant with the
core.quotePath configuration, which is currently ignored for absolute
paths.
Reported-by: Greg Hurrell <greg@hurrell.net>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changed-path Bloom filters help reduce the amount of tree
parsing required during history queries. Before calculating a
diff, we can ask the filter if a path changed between a commit
and its first parent. If the filter says "no" then we can move
on without parsing trees. If the filter says "maybe" then we
parse trees to discover if the answer is actually "yes" or "no".
When computing a blame, there is a section in find_origin() that
computes a diff between a commit and one of its parents. When this
is the first parent, we can check the Bloom filters before calling
diff_tree_oid().
In order to make this work with the blame machinery, we need to
initialize a struct bloom_key with the initial path. But also, we
need to add more keys to a list if a rename is detected. We then
check to see if _any_ of these keys answer "maybe" in the diff.
During development, I purposefully left out this "add a new key
when a rename is detected" to see if the test suite would catch
my error. That is how I discovered the issues with
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS from the previous change.
With that change, we can feel some confidence in the coverage of
this change.
If a user requests copy detection using "git blame -C", then there
are more places where the set of "important" files can expand. I
do not know enough about how this happens in the blame machinery.
Thus, the Bloom filter integration is explicitly disabled in this
mode. A later change could expand the bloom_key data with an
appropriate call (or calls) to add_bloom_key().
If we did not disable this mode, then the following tests would
fail:
t8003-blame-corner-cases.sh
t8011-blame-split-file.sh
Generally, this is a performance enhancement and should not
change the behavior of 'git blame' in any way. If a repo has a
commit-graph file with computed changed-path Bloom filters, then
they should notice improved performance for their 'git blame'
commands.
Here are some example timings that I found by blaming some paths
in the Linux kernel repository:
git blame arch/x86/kernel/topology.c >/dev/null
Before: 0.83s
After: 0.24s
git blame kernel/time/time.c >/dev/null
Before: 0.72s
After: 0.24s
git blame tools/perf/ui/stdio/hist.c >/dev/null
Before: 0.27s
After: 0.11s
I specifically looked for "deep" paths that were also edited many
times. As a counterpoint, the MAINTAINERS file was edited many
times but is located in the root tree. This means that the cost of
computing a diff relative to the pathspec is very small. Here are
the timings for that command:
git blame MAINTAINERS >/dev/null
Before: 20.1s
After: 18.0s
These timings are the best of five. The worst-case runs were on the
order of 2.5 minutes for both cases. Note that the MAINTAINERS file
has 18,740 lines across 17,000+ commits. This happens to be one of
the cases where this change provides the least improvement.
The lack of improvement for the MAINTAINERS file and the relatively
modest improvement for the other examples can be easily explained.
The blame machinery needs to compute line-level diffs to determine
which lines were changed by each commit. That makes up a large
proportion of the computation time, and this change does not
attempt to improve on that section of the algorithm. The
MAINTAINERS file is large and changed often, so it takes time to
determine which lines were updated by which commit. In contrast,
the code files are much smaller, and it takes longer to comute
the line-by-line diff for a single patch on the Linux mailing
lists.
Outside of the "-C" integration, I believe there is little more to
gain from the changed-path Bloom filters for 'git blame' after this
patch.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT_TEST_COMMIT_GRAPH environment variable updates the commit-
graph file whenever "git commit" is run, ensuring that we always
have an updated commit-graph throughout the test suite. The
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS environment variable was
introduced to write the changed-path Bloom filters whenever "git
commit-graph write" is run. However, the GIT_TEST_COMMIT_GRAPH
trick doesn't launch a separate process and instead writes it
directly.
To expand the number of tests that have commits in the commit-graph
file, add a helper method that computes the commit-graph and place
that helper inside "git commit" and "git merge".
In the helper method, check GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS
to ensure we are writing changed-path Bloom filters whenever
possible.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Starting in 3ac68a93fd, help.o began to depend on builtin/branch.o,
builtin/clean.o, and builtin/config.o. This meant that help.o was
unusable outside of the context of the main Git executable.
To make help.o usable by other commands again, move list_config_help()
into builtin/help.c (where it makes sense to assume other builtin libraries
are present).
When command-list.h is included but a member is not used, we start to
hear a compiler warning. Since the config list is generated in a fairly
different way than the command list, and since commands and config
options are semantically different, move the config list into its own
header and move the generator into its own script and build rule.
For reasons explained in 976aaedc (msvc: add a Makefile target to
pre-generate the Visual Studio solution, 2019-07-29), some build
artifacts we consider non-source files cannot be generated in the
Visual Studio environment, and we already have some Makefile tweaks
to help Visual Studio to use generated command-list.h header file.
Do the same to a new generated file, config-list.h, introduced by
this change.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
In 'git log', the --decorate-refs-exclude option appends a pattern
to a string_list. This list is used to prevent showing some refs
in the decoration output, or even by --simplify-by-decoration.
Users may want to use their refs space to store utility refs that
should not appear in the decoration output. For example, Scalar [1]
runs a background fetch but places the "new" refs inside the
refs/scalar/hidden/<remote>/* refspace instead of refs/<remote>/*
to avoid updating remote refs when the user is not looking. However,
these "hidden" refs appear during regular 'git log' queries.
A similar idea to use "hidden" refs is under consideration for core
Git [2].
Add the 'log.excludeDecoration' config option so users can exclude
some refs from decorations by default instead of needing to use
--decorate-refs-exclude manually. The config value is multi-valued
much like the command-line option. The documentation is careful to
point out that the config value can be overridden by the
--decorate-refs option, even though --decorate-refs-exclude would
always "win" over --decorate-refs.
Since the 'log.excludeDecoration' takes lower precedence to
--decorate-refs, and --decorate-refs-exclude takes higher
precedence, the struct decoration_filter needed another field.
This led also to new logic in load_ref_decorations() and
ref_filter_match().
There are several tests in t4202-log.sh that test the
--decorate-refs-(include|exclude) options, so these are extended.
Since the expected output is already stored as a file, most tests
could simply replace a "--decorate-refs-exclude" option with an
in-line config setting. Other tests involve the precedence of
the config option compared to command-line options and needed more
modification.
[1] https://github.com/microsoft/scalar
[2] https://lore.kernel.org/git/77b1da5d3063a2404cd750adfe3bb8be9b6c497d.1585946894.git.gitgitgadget@gmail.com/
Helped-by: Junio C Hamano <gister@pobox.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When operating on a stream of commit OIDs on stdin, 'git commit-graph
write' checks that each OID refers to an object that is indeed a commit.
This is convenient to make sure that the given input is well-formed, but
can sometimes be undesirable.
For example, server operators may wish to feed the refnames that were
updated during a push to 'git commit-graph write --input=stdin-commits',
and silently discard refs that don't point at commits. This can be done
by combing the output of 'git for-each-ref' with '--format
%(*objecttype)', but this requires opening up a potentially large number
of objects. Instead, it is more convenient to feed the updated refs to
the commit-graph machinery, and let it throw out refs that don't point
to commits.
Introduce '--[no-]check-oids' to make such a behavior possible. With
'--check-oids' (the default behavior to retain backwards compatibility),
'git commit-graph write' will barf on a non-commit line in its input.
With 'no-check-oids', such lines will be silently ignored, making the
above possible by specifying this option.
No matter which is supplied, 'git commit-graph write' retains the
behavior from the previous commit of rejecting non-OID inputs like
"HEAD" and "refs/heads/foo" as before.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'write_commit_graph()' function takes in either a string list of
pack indices, or a string list of hexadecimal commit OIDs. These
correspond to the '--stdin-packs' and '--stdin-commits' mode(s) from
'git commit-graph write'.
Using a string_list of hexadecimal commit IDs is not the most efficient
use of memory, since we can instead use the 'struct oidset', which is
more well-suited for this case.
This has another benefit which will become apparent in the following
commit. This is that we are about to disambiguate the kinds of errors we
produce with '--stdin-commits' into "non-hex input" and "hex-input, but
referring to a non-commit object". By having 'write_commit_graph' take
in a 'struct oidset *' of commits, we place the burden on the caller (in
this case, the builtin) to handle the first case, and the commit-graph
machinery can handle the second case.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using split commit-graphs, it is sometimes useful to completely
replace the commit-graph chain with a new base.
For example, consider a scenario in which a repository builds a new
commit-graph incremental for each push. Occasionally (say, after some
fixed number of pushes), they may wish to rebuild the commit-graph chain
with all reachable commits.
They can do so with
$ git commit-graph write --reachable
but this removes the chain entirely and replaces it with a single
commit-graph in 'objects/info/commit-graph'. Unfortunately, this means
that the next push will have to move this commit-graph into the first
layer of a new chain, and then write its new commits on top.
Avoid such copying entirely by allowing the caller to specify that they
wish to replace the entirety of their commit-graph chain, while also
specifying that the new commit-graph should become the basis of a fresh,
length-one chain.
This addresses the above situation by making it possible for the caller
to instead write:
$ git commit-graph write --reachable --split=replace
which writes a new length-one chain to 'objects/info/commit-graphs',
making the commit-graph incremental generated by the subsequent push
relatively cheap by avoiding the aforementioned copy.
In order to do this, remove an assumption in 'write_commit_graph_file'
that chains are always at least two incrementals long.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, we laid the groundwork for supporting different
splitting strategies. In this commit, we introduce the first splitting
strategy: 'no-merge'.
Passing '--split=no-merge' is useful for callers which wish to write a
new incremental commit-graph, but do not want to spend effort condensing
the incremental chain [1]. Previously, this was possible by passing
'--size-multiple=0', but this no longer the case following 63020f175f
(commit-graph: prefer default size_mult when given zero, 2020-01-02).
When '--split=no-merge' is given, the commit-graph machinery will never
condense an existing chain, and it will always write a new incremental.
[1]: This might occur when, for example, a server administrator running
some program after each push may want to ensure that each job runs
proportional in time to the size of the push, and does not "jump" when
the commit-graph machinery decides to trigger a merge.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With '--split', the commit-graph machinery writes new commits in another
incremental commit-graph which is part of the existing chain, and
optionally decides to condense the chain into a single commit-graph.
This is done to ensure that the asymptotic behavior of looking up a
commit in an incremental chain is not dominated by the number of
incrementals in that chain. It can be controlled by the '--max-commits'
and '--size-multiple' options.
In the next two commits, we will introduce additional splitting
strategies that can exert additional control over:
- when a split commit-graph is and isn't written, and
- when the existing commit-graph chain is discarded completely and
replaced with another graph
To prepare for this, make '--split' take an optional strategy (as in
'--split[=<strategy>]'), and add a new enum to describe which strategy
is being used. For now, no strategies are given, and the only enumerated
value is 'COMMIT_GRAPH_SPLIT_UNSPECIFIED', indicating the absence of a
strategy.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using `starts_with()`, the magic number 7, `strlen()` and a
fair number of additions to verify the three parts of the config key
"branch.<branch>.mergeoptions", use `skip_prefix()` to jump through them
more explicitly.
We need to introduce a new variable for this (we certainly can't modify
`k` just because we see "branch."!). With `skip_prefix()` we often use
quite bland names like `p` or `str`. Let's do the same. If and when this
function needs to do more prefix-skipping, we'll have a generic variable
ready for this.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When rebasing against an upstream that has had many commits since the
original branch was created:
O -- O -- ... -- O -- O (upstream)
\
-- O (my-dev-branch)
it must read the contents of every novel upstream commit, in addition to
the tip of the upstream and the merge base, because "git rebase"
attempts to exclude commits that are duplicates of upstream ones. This
can be a significant performance hit, especially in a partial clone,
wherein a read of an object may end up being a fetch.
Add a flag to "git rebase" to allow suppression of this feature. This
flag only works when using the "merge" backend.
This flag changes the behavior of sequencer_make_script(), called from
do_interactive_rebase() <- run_rebase_interactive() <-
run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
(indirectly called from sequencer_make_script() through
prepare_revision_walk()) will no longer call cherry_pick_list(), and
thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
means that the intermediate commits in upstream are no longer read (as
shown by the test) and means that no PATCHSAME-caused skipping of
commits is done by sequencer_make_script(), either directly or through
make_script_with_merges().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user specifies the apply backend with options that only work
with the merge backend, such as
git rebase --apply --exec /bin/true HEAD~3
the error message has always been
fatal: --exec requires an interactive rebase
This error message is misleading and was one of the reasons we renamed
the interactive backend to the merge backend. Update the error message
to state that these options merely require use of the merge backend.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit d48e5e21da ("rebase (interactive-backend): make --keep-empty the
default", 2020-02-15) turned --keep-empty (for keeping commits which
start empty) into the default. The logic underpinning that commit was:
1) 'git commit' errors out on the creation of empty commits without an
override flag
2) Once someone determines that the override is worthwhile, it's
annoying and/or harmful to required them to take extra steps in
order to keep such commits around (and to repeat such steps with
every rebase).
While the logic on which the decision was made is sound, the result was
a bit of an overcorrection. Instead of jumping to having --keep-empty
being the default, it jumped to making --keep-empty the only available
behavior. There was a simple workaround, though, which was thought to
be good enough at the time. People could still drop commits which
started empty the same way the could drop any commits: by firing up an
interactive rebase and picking out the commits they didn't want from the
list. However, there are cases where external tools might create enough
empty commits that picking all of them out is painful. As such, having
a flag to automatically remove start-empty commits may be beneficial.
Provide users a way to drop commits which start empty using a flag that
existed for years: --no-keep-empty. Interpret --keep-empty as
countermanding any previous --no-keep-empty, but otherwise leaving
--keep-empty as the default.
This might lead to some slight weirdness since commands like
git rebase --empty=drop --keep-empty
git rebase --empty=keep --no-keep-empty
look really weird despite making perfect sense (the first will drop
commits which become empty, but keep commits that started empty; the
second will keep commits which become empty, but drop commits which
started empty). However, --no-keep-empty was named years ago and we are
predominantly keeping it for backward compatibility; also we suspect it
will only be used rarely since folks already have a simple way to drop
commits they don't want with an interactive rebase.
Reported-by: Bryan Turner <bturner@atlassian.com>
Reported-by: Sami Boukortt <sami@boukortt.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We return the length to a subset of a string using an "int *"
out-parameter. This is fine most of the time, as we'd expect config keys
to be relatively short, but it could behave oddly if we had a gigantic
config key. A more appropriate type is size_t.
Let's switch over, which lets our callers use size_t as appropriate
(they are bound by our type because they must pass the out-parameter as
a pointer). This is mostly just a cleanup to make it clear this code
handles long strings correctly. In practice, our config parser already
chokes on long key names (because of a similar int/size_t mixup!).
When doing an int/size_t conversion, we have to be careful that nobody
was trying to assign a negative value to the variable. I manually
confirmed that for each case here. They tend to just feed the result to
xmemdupz() or similar; in a few cases I adjusted the parameter types for
helper functions to make sure the size_t is preserved.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are quite a few code locations (e.g. `git clean --interactive`)
where Git asks the user for an answer. In preparation for fixing a bug
shared by all of them, and also to DRY up the code, let's refactor it.
Please note that most of these callers trimmed white-space both at the
beginning and at the end of the answer, instead of trimming only the
end (as the caller in `add-patch.c` does).
Therefore, technically speaking, we change behavior in this patch. At
the same time, it can be argued that this is actually a bug fix.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before, `--autostash` only worked with `git pull --rebase`. However, in
the last patch, merge learned `--autostash` as well so there's no reason
why we should have this restriction anymore. Teach pull to pass
`--autostash` to merge, just like it did for rebase.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In rebase, one can pass the `--autostash` option to cause the worktree
to be automatically stashed before continuing with the rebase. This
option is missing in merge, however.
Implement the `--autostash` option and corresponding `merge.autoStash`
option in merge which stashes before merging and then pops after.
This option is useful when a developer has some local changes on a topic
branch but they realize that their work depends on another branch.
Previously, they had to run something like
git fetch ...
git stash push
git merge FETCH_HEAD
git stash pop
but now, that is reduced to
git fetch ...
git merge --autostash FETCH_HEAD
When an autostash is generated, it is automatically reapplied to the
worktree only in three explicit situations:
1. An incomplete merge is commit using `git commit`.
2. A merge completes successfully.
3. A merge is aborted using `git merge --abort`.
In all other situations where the merge state is removed using
remove_merge_branch_state() such as aborting a merge via
`git reset --hard`, the autostash is saved into the stash reflog
instead keeping the worktree clean.
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Suggested-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Lib-ify the autostash code by extracting perform_autostash() from rebase
into sequencer. In a future commit, this will be used to implement
`--autostash` in other builtins.
This patch is best viewed with `--color-moved`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we plan on lib-ifying create_autostash() so we need it to
be more generic. Make it more generic by making it accept a
`struct repository` argument instead of implicitly using the non-repo
functions and `the_repository`. Also, make it accept a `path` argument
so that we no longer rely have to rely on `struct rebase_options`.
Finally, make it accept a `default_reflog_action` argument so we no
longer have to rely on `DEFAULT_REFLOG_ACTION`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a future commit, we will lib-ify this code. In preparation for
this, extract the code into the create_autostash() function so that it
can be cleaned up before it is finally lib-ified.
This patch is best viewed with `--color-moved` and
`--color-moved-ws=allow-indentation-change`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Continue the process of lib-ifying the autostash code. In a future
commit, this will be used to implement `--autostash` in other builtins.
This patch is best viewed with `--color-moved`.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the future, we plan on lib-ifying reset_head() so we need it to
be more generic. Make it more generic by making it accept a
`struct repository` argument instead of implicitly using the non-repo
functions. Also, make it accept a `const char *default_reflog_action`
argument so that the default action of "rebase" isn't hardcoded in.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The apply_autostash() function in builtin/rebase.c is similar enough to
the apply_autostash() function in sequencer.c that they are almost
interchangeable, except for the type of arg they accept. Make the
sequencer.c version extern and use it in rebase.
The rebase version was introduced in 6defce2b02 (builtin rebase: support
`--autostash` option, 2018-09-04) as part of the shell to C conversion.
It opted to duplicate the function because, at the time, there was
another in-progress project converting interactive rebase from shell to
C as well and they did not want to clash with them by refactoring
sequencer.c version of apply_autostash(). Since both efforts are long
done, we can freely combine them together now.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since in sequencer.c, read_one() basically duplicates the functionality
of read_oneliner(), reduce code duplication by replacing read_one() with
read_oneliner().
This was done with the following Coccinelle script
@@
expression a, b;
@@
- read_one(a, b)
+ !read_oneliner(b, a, READ_ONELINER_WARN_MISSING)
and long lines were manually broken up.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're comparing a push cert nonce, we currently do so using strcmp.
Most implementations of strcmp short-circuit and exit as soon as they
know whether two values are equal. This, however, is a problem when
we're comparing the output of HMAC, as it leaks information in the time
taken about how much of the two values match if they do indeed differ.
In our case, the nonce is used to prevent replay attacks against our
server via the embedded timestamp and replay attacks using requests from
a different server via the HMAC. Push certs, which contain the nonces,
are signed, so an attacker cannot tamper with the nonces without
breaking validation of the signature. They can, of course, create their
own signatures with invalid nonces, but they can also create their own
signatures with valid nonces, so there's nothing to be gained. Thus,
there is no security problem.
Even though it doesn't appear that there are any negative consequences
from the current technique, for safety and to encourage good practices,
let's use a constant time comparison function for nonce verification.
POSIX does not provide one, but they are easy to write.
The technique we use here is also used in NaCl and the Go standard
library and relies on the fact that bitwise or and xor are constant time
on all known architectures.
We need not be concerned about exiting early if the actual and expected
lengths differ, since the standard cryptographic assumption is that
everyone, including an attacker, knows the format of and algorithm used
in our nonces (and in any event, they have the source code and can
determine it easily). As a result, we assume everyone knows how long
our nonces should be. This philosophy is also taken by the Go standard
library and other cryptographic libraries when performing constant time
comparisons on HMAC values.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When trying to stash part of the worktree changes by splitting a hunk
and then only partially accepting the split bits and pieces, the user
is presented with a rather cryptic error:
error: patch failed: <file>:<line>
error: test: patch does not apply
Cannot remove worktree changes
and the command would fail to stash the desired parts of the worktree
changes (even if the `stash` ref was actually updated correctly).
We even have a test case demonstrating that failure, carrying it for
four years already.
The explanation: when splitting a hunk, the changed lines are no longer
separated by more than 3 lines (which is the amount of context lines
Git's diffs use by default), but less than that. So when staging only
part of the diff hunk for stashing, the resulting diff that we want to
apply to the worktree in reverse will contain those changes to be
dropped surrounded by three context lines, but since the diff is
relative to HEAD rather than to the worktree, these context lines will
not match.
Example time. Let's assume that the file README contains these lines:
We
the
people
and the worktree added some lines so that it contains these lines
instead:
We
are
the
kind
people
and the user tries to stash the line containing "are", then the command
will internally stage this line to a temporary index file and try to
revert the diff between HEAD and that index file. The diff hunk that
`git stash` tries to revert will look somewhat like this:
@@ -1776,3 +1776,4
We
+are
the
people
It is obvious, now, that the trailing context lines overlap with the
part of the original diff hunk that the user did *not* want to stash.
Keeping in mind that context lines in diffs serve the primary purpose of
finding the exact location when the diff does not apply precisely (but
when the exact line number in the file to be patched differs from the
line number indicated in the diff), we work around this by reducing the
amount of context lines: the diff was just generated.
Note: this is not a *full* fix for the issue. Just as demonstrated in
t3701's 'add -p works with pathological context lines' test case, there
are ambiguities in the diff format. It is very rare in practice, of
course, to encounter such repeated lines.
The full solution for such cases would be to replace the approach of
generating a diff from the stash and then applying it in reverse by
emulating `git revert` (i.e. doing a 3-way merge). However, in `git
stash -p` it would not apply to `HEAD` but instead to the worktree,
which makes this non-trivial to implement as long as we also maintain a
scripted version of `add -i`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When commit subjects or authors have non-ASCII characters, git
format-patch Q-encodes them so they can be safely sent over email.
However, if the patch transfer method is something other than email (web
review tools, sneakernet), this only serves to make the patch metadata
harder to read without first applying it (unless you can decode RFC 2047
in your head). git am as well as some email software supports
non-Q-encoded mail as described in RFC 6531.
Add --[no-]encode-email-headers and format.encodeEmailHeaders to let the
user control this behavior.
Signed-off-by: Emma Brooks <me@pluvano.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag to the test setup suite
in order to toggle writing Bloom filters when running any of the git tests.
If set to true, we will compute and write Bloom filters every time a test
calls `git commit-graph write`, as if the `--changed-paths` option was
passed in.
The test suite passes when GIT_TEST_COMMIT_GRAPH and
GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS are enabled.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add --changed-paths option to git commit-graph write. This option will
allow users to compute information about the paths that have changed
between a commit and its first parent, and write it into the commit graph
file. If the option is passed to the write subcommand we set the
COMMIT_GRAPH_WRITE_BLOOM_FILTERS flag and pass it down to the
commit-graph logic.
Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Garima Singh <garima.singh@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are 3 callers to promisor_remote_get_direct() that first check if
the number of objects to be fetched is equal to 0. Fold that check into
promisor_remote_get_direct(), and in doing so, be explicit as to what
promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success,
immediately).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-update-ref(1) command can only handle queueing transactions
right now via its "--stdin" parameter, but there is no way for users to
handle the transaction itself in a more explicit way. E.g. in a
replicated scenario, one may imagine a coordinator that spawns
git-update-ref(1) for multiple repositories and only if all agree that
an update is possible will the coordinator send a commit. Such a
transactional session could look like
> start
< start: ok
> update refs/heads/master $OLD $NEW
> prepare
< prepare: ok
# All nodes have returned "ok"
> commit
< commit: ok
or
> start
< start: ok
> create refs/heads/master $OLD $NEW
> prepare
< fatal: cannot lock ref 'refs/heads/master': reference already exists
# On all other nodes:
> abort
< abort: ok
In order to allow for such transactional sessions, this commit
introduces four new commands for git-update-ref(1), which matches those
we have internally already with the exception of "start":
- start: start a new transaction
- prepare: prepare the transaction, that is try to lock all
references and verify their current value matches the
expected one
- commit: explicitly commit a session, that is update references to
match their new expected state
- abort: abort a session and roll back all changes
By design, git-update-ref(1) will commit as soon as standard input is
being closed. While fine in a non-transactional world, it is definitely
unexpected in a transactional world. Because of this, as soon as any of
the new transactional commands is used, the default will change to
aborting without an explicit "commit". To avoid a race between queueing
updates and the first "prepare" that starts a transaction, the "start"
command has been added to start an explicit transaction.
Add some tests to exercise this new functionality.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-update-ref(1) supports a `--stdin` mode that allows it to read
all reference updates from standard input. This is mainly used to allow
for atomic reference updates that are all or nothing, so that either all
references will get updated or none.
Currently, git-update-ref(1) reads all commands as a single block of up
to 1000 characters and only starts processing after stdin gets closed.
This is less flexible than one might wish for, as it doesn't really
allow for longer-lived transactions and doesn't allow any verification
without committing everything. E.g. one may imagine the following
exchange:
> start
< start: ok
> update refs/heads/master $NEWOID1 $OLDOID1
> update refs/heads/branch $NEWOID2 $OLDOID2
> prepare
< prepare: ok
> commit
< commit: ok
When reading all input as a whole block, the above interactive protocol
is obviously impossible to achieve. But by converting the command to
read commands linewise, we can make it more interactive than before.
Obviously, the linewise interface is only a first step in making
git-update-ref(1) work in a more transaction-oriented way. Missing is
most importantly support for transactional commands that manage the
current transaction.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While the actual logic to update the transaction is handled in
`update_refs_stdin()`, the transaction itself is started and committed
in `cmd_update_ref()` itself. This makes it hard to handle transaction
abortion and commits as part of `update_refs_stdin()` itself, which is
required in order to introduce transaction handling features to `git
update-refs --stdin`.
Refactor the code to move all transaction handling into
`update_refs_stdin()` to prepare for transaction handling features.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently pass both an `strbuf` containing the current command line
as well as the `next` pointer pointing to the first argument to
commands. This is both confusing and makes code more intertwined.
Convert this to use a simple pointer as well as a pointer pointing to
the end of the input as a preparatory step to line-wise reading of
stdin.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `parse_refname` function accepts a `struct strbuf *input` argument
that isn't used at all. As we're about to convert commands to not use a
strbuf anymore but instead an end pointer, let's drop this argument now
to make the converting commit easier to review.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently manually wire up all commands known to `git-update-ref
--stdin`, making it harder than necessary to preprocess arguments after
the command is determined. To make this more extensible, let's refactor
the code to use an array of known commands instead. While this doesn't
add a lot of value now, it is a preparatory step to implement line-wise
reading of commands.
As we're going to introduce commands without trailing spaces, this
commit also moves whitespace parsing into the respective commands.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit-graph builtin has an --expire-time option that takes a
datetime using OPT_EXPIRY_DATE(). However, the implementation inside
expire_commit_graphs() was treating a non-zero value as a number of
seconds to subtract from "now".
Update t5323-split-commit-graph.sh to demonstrate the correct value
of the --expire-time option by actually creating a crud .graph file
with mtime earlier than the expire time. Instead of using a super-
early time (1980) we use an explicit, and recent, time. Using
test-tool chmtime to create two files on either end of an exact
second, we create a test that catches this failure no matter the
current time. Using a fixed date is more portable than trying to
format a relative date string into the --expiry-date input.
I noticed this when inspecting some Scalar repos that had an excess
number of commit-graph files. In Scalar, we were using this second
interpretation by using "--expire-time=3600" to mean "delete graphs
older than one hour ago" to avoid deleting a commit-graph that a
foreground process may be trying to load.
Also I noticed that the help text was copied from the --max-commits
option. Fix that help text.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, the expected calling convention for the dir.c API was:
fill_directory(&dir, ..., pathspec)
foreach entry in dir->entries:
if (dir_path_match(entry, pathspec))
process_or_display(entry)
This may have made sense once upon a time, because the fill_directory() call
could use cheap checks to avoid doing full pathspec matching, and an external
caller may have wanted to do other post-processing of the results anyway.
However:
* this structure makes it easy for users of the API to get it wrong
* this structure actually makes it harder to understand
fill_directory() and the functions it uses internally. It has
tripped me up several times while trying to fix bugs and
restructure things.
* relying on post-filtering was already found to produce wrong
results; pathspec matching had to be added internally for multiple
cases in order to get the right results (see commits 404ebceda0
(dir: also check directories for matching pathspecs, 2019-09-17)
and 89a1f4aaf7 (dir: if our pathspec might match files under a
dir, recurse into it, 2019-09-17))
* it's bad for performance: fill_directory() already has to do lots
of checks and knows the subset of cases where it still needs to do
more checks. Forcing external callers to do full pathspec
matching means they must re-check _every_ path.
So, add the pathspec matching within the fill_directory() internals, and
remove it from external callers.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning with --single-branch, we implement git-fetch's usual
tag-following behavior, grabbing any tag objects that point to objects
we have locally.
When we're a partial clone, though, our has_object_file() check will
actually lazy-fetch each tag. That not only defeats the purpose of
--single-branch, but it does it incredibly slowly, potentially kicking
off a new fetch for each tag. This is even worse for a shallow clone,
which implies --single-branch, because even tags which are supersets of
each other will be fetched individually.
We can fix this by passing OBJECT_INFO_SKIP_FETCH_OBJECT to the call,
which is what git-fetch does in this case.
Likewise, let's include OBJECT_INFO_QUICK, as that's what git-fetch
does. The rationale is discussed in 5827a03545 (fetch: use "quick"
has_sha1_file for tag following, 2016-10-13), but here the tradeoff
would apply even more so because clone is very unlikely to be racing
with another process repacking our newly-created repository.
This may provide a very small speedup even in the non-partial case case,
as we'd avoid calling reprepare_packed_git() for each tag (though in
practice, we'd only have a single packfile, so that reprepare should be
quite cheap).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.
Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).
I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With 50033772d5 ("connected: verify promisor-ness of partial clone",
2020-01-30), the fast path (checking promisor packs) in
check_connected() now passes a subset of the slow path (rev-list) - if
all objects to be checked are found in promisor packs, both the fast
path and the slow path will pass; otherwise, the fast path will
definitely not pass. This means that we can always attempt the fast path
whenever we need to do the slow path.
The fast path is currently guarded by a flag; therefore, remove that
flag. Also, make the fast path fallback to the slow path - if the fast
path fails, the failing OID and all remaining OIDs will be passed to
rev-list.
The main user-visible benefit is the performance of fetch from a partial
clone - specifically, the speedup of the connectivity check done before
the fetch. In particular, a no-op fetch into a partial clone on my
computer was sped up from 7 seconds to 0.01 seconds. This is a
complement to the work in 2df1aa239c ("fetch: forgo full
connectivity check if --filter", 2020-01-30), which is the child of the
aforementioned 50033772d5. In that commit, the connectivity check
*after* the fetch was sped up.
The addition of the fast path might cause performance reductions in
these cases:
- If a partial clone or a fetch into a partial clone fails, Git will
fruitlessly run rev-list (it is expected that everything fetched
would go into promisor packs, so if that didn't happen, it is most
likely that rev-list will fail too).
- Any connectivity checks done by receive-pack, in the (in my opinion,
unlikely) event that a partial clone serves receive-pack.
I think that these cases are rare enough, and the performance reduction
in this case minor enough (additional object DB access), that the
benefit of avoiding a flag outweighs these.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'pack.useSparse' configuration variable now defaults to 'true',
enabling an optimization that has been experimental since Git 2.21.
* ds/default-pack-use-sparse-to-true:
pack-objects: flip the use of GIT_TEST_PACK_SPARSE
config: set pack.useSparse=true by default
The fetch options --deepen, --negotiation-tip, --server-option,
--shallow-exclude, and --shallow-since are documented for git pull as
well, but are not actually accepted by that command. Pass them on to
make the code match its documentation.
Reported-by: 天几 <muzimuzhi@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When opt_rebase is true, we still first check if we can fast-forward.
If the branch is fast-forwardable, then we can avoid the rebase and just
use merge to do the fast-forward logic. However, when commit a6d7eb2c7a
("pull: optionally rebase submodules (remote submodule changes only)",
2017-06-23) added the ability to rebase submodules it accidentally
caused us to run BOTH a merge and a rebase. Add a flag to avoid doing
both.
This was found when a user had both pull.rebase and rebase.autosquash
set to true. In such a case, the running of both merge and rebase would
cause ORIG_HEAD to be updated twice (and match HEAD at the end instead
of the commit before the rebase started), against expectation.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If commands like merge or rebase materialize files as part of their work,
or a previous sparse-checkout command failed to update individual files
due to dirty changes, users may want a command to simply 'reapply' the
sparsity rules. Provide one.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
setup_unpack_trees_porcelain() provides much improved error/warning
messages; instead of a message that assumes that there is only one path
with a given problem despite being used by code that intentionally is
grouping and showing errors together, it uses a message designed to be
used with groups of paths. For example, this transforms
error: Entry ' folder1/a
folder2/a
' not uptodate. Cannot update sparse checkout.
into
error: Cannot update sparse checkout: the following entries are not up to date:
folder1/a
folder2/a
In the past the suboptimal messages were never actually triggered
because we would error out if the working directory wasn't clean before
we even called unpack_trees(). The previous commit changed that,
though, so let's use the better error messages.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the equivalent of 'git read-tree -mu HEAD' in the sparse-checkout
codepaths for setting the SKIP_WORKTREE bits and instead use the new
update_sparsity() function.
Note that when an issue is hit, the error message splits 'error' and
'Cannot update sparse checkout' on separate lines. For now, we use two
greps to find both pieces of the error message but subsequent commits
will clean up the messages reported to the user.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit e091228e17 ("sparse-checkout: update working directory
in-process", 2019-11-21) allowed passing a pre-defined set of patterns
to unpack_trees(). However, if o->pl was NULL, it would still read the
existing patterns and use those. If those patterns were read into a
data structure that was allocated, naturally they needed to be free'd.
However, despite the same function being responsible for knowing about
both the allocation and the free'ing, the logic for tracking whether to
free the pattern_list was hoisted to an outer function with an
additional flag in unpack_trees_options. Put the logic back in the
relevant function and discard the now unnecessary flag.
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pull" learned to warn when no pull.rebase configuration
exists, and neither --[no-]rebase nor --ff-only is given (which
would result a merge).
* ah/force-pull-rebase-configuration:
pull: warn if the user didn't say whether to rebase or to merge
"git stash" has kept an escape hatch to use the scripted version
for a few releases, which got stale. It has been removed.
* tg/retire-scripted-stash:
stash: remove the stash.useBuiltin setting
stash: get git_stash_config at the top level
When "git describe C" finds an annotated tag with tagname A to be
the best name to explain commit C, and the tag is stored in a
"wrong" place in the refs/tags hierarchy, e.g. refs/tags/B, the
command gave a warning message but used A (not B) to describe C.
If C is exactly at the tag, the describe output would be "A", but
"git rev-parse A^0" would not be equal as "git rev-parse C^0". The
behavior of the command has been changed to use the "long" form
i.e. A-0-gOBJECTNAME, which is correctly interpreted by rev-parse.
* jc/describe-misnamed-annotated-tag:
describe: force long format for a name based on a mislocated tag
The "--fork-point" mode of "git rebase" regressed when the command
was rewritten in C back in 2.20 era, which has been corrected.
* at/rebase-fork-point-regression-fix:
rebase: --fork-point regression fix
Provide more information (e.g. the object of the tree-ish in which
the blob being converted appears, in addition to its path, which
has already been given) to smudge/clean conversion filters.
* bc/filter-process:
t0021: test filter metadata for additional cases
builtin/reset: compute checkout metadata for reset
builtin/rebase: compute checkout metadata for rebases
builtin/clone: compute checkout metadata for clones
builtin/checkout: compute checkout metadata for checkouts
convert: provide additional metadata to filters
convert: permit passing additional metadata to filter processes
builtin/checkout: pass branch info down to checkout_worktree
The code to interface with GnuPG has been refactored.
* hi/gpg-prefer-check-signature:
gpg-interface: prefer check_signature() for GPG verification
t: increase test coverage of signature verification output
SHA-256 transition continues.
* bc/sha-256-part-1-of-4: (22 commits)
fast-import: add options for rewriting submodules
fast-import: add a generic function to iterate over marks
fast-import: make find_marks work on any mark set
fast-import: add helper function for inserting mark object entries
fast-import: permit reading multiple marks files
commit: use expected signature header for SHA-256
worktree: allow repository version 1
init-db: move writing repo version into a function
builtin/init-db: add environment variable for new repo hash
builtin/init-db: allow specifying hash algorithm on command line
setup: allow check_repository_format to read repository format
t/helper: make repository tests hash independent
t/helper: initialize repository if necessary
t/helper/test-dump-split-index: initialize git repository
t6300: make hash algorithm independent
t6300: abstract away SHA-1-specific constants
t: use hash-specific lookup tables to define test constants
repository: require a build flag to use SHA-256
hex: add functions to parse hex object IDs in any algorithm
hex: introduce parsing variants taking hash algorithms
...
The mechanism to prevent "git commit" from making an empty commit
or amending during an interrupted cherry-pick was broken during the
rewrite of "git rebase" in C, which has been corrected.
* pw/advise-rebase-skip:
commit: give correct advice for empty commit during a rebase
commit: encapsulate determine_whence() for sequencer
commit: use enum value for multiple cherry-picks
sequencer: write CHERRY_PICK_HEAD for reword and edit
cherry-pick: check commit error messages
cherry-pick: add test for `--skip` advice in `git commit`
t3404: use test_cmp_rev
The real_path() convenience function can easily be misused; with a
bit of code refactoring in the callers' side, its use has been
eliminated.
* am/real-path-fix:
get_superproject_working_tree(): return strbuf
real_path_if_valid(): remove unsafe API
real_path: remove unsafe API
set_git_dir: fix crash when used with real_path()
Revamping of the advise API to allow more systematic enumeration of
advice knobs in the future.
* hw/advise-ng:
tag: use new advice API to check visibility
advice: revamp advise API
advice: change "setupStreamFailure" to "setUpstreamFailure"
advice: extract vadvise() from advise()
In builtin.h, there exists the distinctly lib-ish function
prune_packed_objects(). This function can currently only be called by
built-in commands but, unlike all of the other functions in the header,
it does not make sense to impose this restriction as the functionality
can be logically reused in libgit.
Extract this function into prune-packed.c so that related definitions
can exist clearly in their own header file.
While we're at it, clean up #includes that are unused.
This patch is best viewed with --color-moved.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin.h, there exists the distinctly "lib-ish" function
fmt_merge_msg(). This function can currently only be called by built-in
commands but, unlike most of the other functions in the header, it does
not make sense to impose this restriction as the functionality can be
logically reused in libgit.
Extract this function into fmt-merge-msg.c so that related definitions
can exist clearly in their own header file.
While we're at it, clean up #includes that are unused.
This patch is best viewed with --color-moved.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The environment variable GIT_TEST_PACK_SPARSE was previously used
to allow testing the --sparse option for "git pack-objects" in
the test suite. This allowed interesting cases of "git push" to
also test this algorithm.
Since pack.useSparse is now true by default, we do not need this
variable to _enable_ the --sparse option, but instead to _disable_
it. This flips how we work with the variable a bit.
When checking for the variable, default to a value of -1 for
"unset". If unset, then take the default from the repo settings,
which is currently 1. Then, the --[no-]sparse command-line option
will override either of these settings.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 'submodule--helper.c', the structures and macros for callbacks belonging
to any subcommand are named in the format: 'subcommand_cb' and 'SUBCOMMAND_CB_INIT'
respectively.
This was an exception for the subcommand 'foreach' of the command
'submodule'. Rename the aforementioned structures and macros:
'struct cb_foreach' to 'struct foreach_cb' and 'CB_FOREACH_INIT'
to 'FOREACH_CB_INIT'.
Signed-off-by: Shourya Shukla <shouryashukla.oo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge signed-tag" while lacking the public key started to say
"No signature", which was utterly wrong. This regression has been
reverted.
* hi/gpg-use-check-signature:
Revert "gpg-interface: prefer check_signature() for GPG verification"
Fix for a bug revealed by a recent change to make the protocol v2
the default.
* ds/partial-clone-fixes:
partial-clone: avoid fetching when looking for objects
partial-clone: demonstrate bugs in partial fetch
"git check-ignore" did not work when the given path is explicitly
marked as not ignored with a negative entry in the .gitignore file.
* en/check-ignore:
check-ignore: fix documentation and implementation to match
The index-pack code now diagnoses a bad input packstream that
records the same object twice when it is used as delta base; the
code used to declare a software bug when encountering such an
input, but it is an input error.
* jk/index-pack-dupfix:
index-pack: downgrade twice-resolved REF_DELTA to die()
The option name "--use-mailmap" looks OK, but it becomes awkward
when you have to negate it, i.e. "--no-use-mailmap". I, perhaps
with many other users, always try "--no-mailmap" and become unhappy
to see it fail.
Add an alias "--[no-]mailmap" to remedy this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous step made an option that is an alias to another option
identify itself as an alias to the latter. Because it is easier to
scan the list when a pointer goes backward to what a reader already
has seen, mention "recurse-submodules" first with its true short
help string, and then "recurse" with the statement that it is a
synonym to "recurse-submodules".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass the commit, and if we have it, the ref to the filters when we
perform a checkout. This should only be the case when we invoke git
reset --hard; the metadata will be unused otherwise.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking out a commit, provide metadata to the filter process
including the ref we're using.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Provide commit metadata for checkout code paths that use unpack_trees
and friends. When we're checking out a commit, use the commit
information, but don't provide commit information if we're checking out
from the index, since there need not be any particular commit associated
with the index, and even if there is one, we can't know what it is.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have the codebase wired up to pass any additional metadata
to filters, let's collect the additional metadata that we'd like to
pass.
The two main places we pass this metadata are checkouts and archives.
In these two situations, reading HEAD isn't a valid option, since HEAD
isn't updated for checkouts until after the working tree is written and
archives can accept an arbitrary tree. In other situations, HEAD will
usually reflect the refname of the branch in current use.
We pass a smaller amount of data in other cases, such as git cat-file,
where we can really only logically know about the blob.
This commit updates only the parts of the checkout code where we don't
use unpack_trees. That function and callers of it will be handled in a
future commit.
In the archive code, we leak a small amount of memory, since nothing we
pass in the archiver argument structure is freed.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a variety of situations where a filter process can make use of
some additional metadata. For example, some people find the ident
filter too limiting and would like to include the commit or the branch
in their smudged files. This information isn't available during
checkout as HEAD hasn't been updated at that point, and it wouldn't be
available in archives either.
Let's add a way to pass this metadata down to the filter. We pass the
blob we're operating on, the treeish (preferring the commit over the
tree if one exists), and the ref we're operating on. Note that we won't
pass this information in all cases, such as when renormalizing or when
we're performing diffs, since it doesn't make sense in those cases.
The data we currently get from the filter process looks like the
following:
command=smudge
pathname=git.c
0000
With this change, we'll get data more like this:
command=smudge
pathname=git.c
refname=refs/tags/v2.25.1
treeish=c522f061d551c9bb8684a7c3859b2ece4499b56b
blob=7be7ad34bd053884ec48923706e70c81719a8660
0000
There are a couple things to note about this approach. For operations
like checkout, treeish will always be a commit, since we cannot check
out individual trees, but for other operations, like archive, we can end
up operating on only a particular tree, so we'll provide only a tree as
the treeish. Similar comments apply for refname, since there are a
variety of cases in which we won't have a ref.
This commit wires up the code to print this information, but doesn't
pass any of it at this point. In a future commit, we'll have various
code paths pass the actual useful data down.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit refactors the use of verify_signed_buffer() outside of
gpg-interface.c to use check_signature() instead. It also turns
verify_signed_buffer() into a file-local function since it's now only
invoked internally by check_signature().
There were previously two globally scoped functions used in different
parts of Git to perform GPG signature verification:
verify_signed_buffer() and check_signature(). Now only
check_signature() is used.
The verify_signed_buffer() function doesn't guard against duplicate
signatures as described by Michał Górny [1]. Instead it only ensures a
non-erroneous exit code from GPG and the presence of at least one
GOODSIG status field. This stands in contrast with check_signature()
that returns an error if more than one signature is encountered.
The lower degree of verification makes the use of verify_signed_buffer()
problematic if callers don't parse and validate the various parts of the
GPG status message themselves. And processing these messages seems like
a task that should be reserved to gpg-interface.c with the function
check_signature().
Furthermore, the use of verify_signed_buffer() makes it difficult to
introduce new functionality that relies on the content of the GPG status
lines.
Now all operations that does signature verification share a single entry
point to gpg-interface.c. This makes it easier to propagate changed or
additional functionality in GPG signature verification to all parts of
Git, without having odd edge-cases that don't perform the same degree of
verification.
[1] https://dev.gentoo.org/~mgorny/articles/attack-on-git-signature-verification.html
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Band-aid fixes for two fallouts from switching the default "rebase"
backend.
* en/rebase-backend:
git-rebase.txt: highlight backend differences with commit rewording
sequencer: clear state upon dropping a become-empty commit
i18n: unmark a message in rebase.c
In the future, we're going to want to use the branch info in
checkout_worktree, so let's pass the whole struct branch_info down, not
just the revision name. We hoist the definition of struct branch_info
so it's in scope.
Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit v2.25.0-4-ge98c4269c8 (rebase (interactive-backend): fix handling
of commits that become empty, 2020-02-15) marked "{drop,keep,ask}" for
translation, but this message should not be changed.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Often novice Git users forget to say "pull --rebase" and end up with an
unnecessary merge from upstream. What they usually want is either "pull
--rebase" in the simpler cases, or "pull --ff-only" to update the copy
of main integration branches, and rebase their work separately. The
pull.rebase configuration variable exists to help them in the simpler
cases, but there is no mechanism to make these users aware of it.
Issue a warning message when no --[no-]rebase option from the command
line and no pull.rebase configuration variable is given. This will
inconvenience those who never want to "pull --rebase", who haven't had
to do anything special, but the cost of the inconvenience is paid only
once per user, which should be a reasonable cost to help a number of new
users.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Together with the previous commits, this commit fully fixes the problem
of using shared buffer for `real_path()` in `get_superproject_working_tree()`.
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Returning a shared buffer invites very subtle bugs due to reentrancy or
multi-threading, as demonstrated by the previous patch.
There was an unfinished effort to abolish this [1].
Let's finally rid of `real_path()`, using `strbuf_realpath()` instead.
This patch uses a local `strbuf` for most places where `real_path()` was
previously called.
However, two places return the value of `real_path()` to the caller. For
them, a `static` local `strbuf` was added, effectively pushing the
problem one level higher:
read_gitfile_gently()
get_superproject_working_tree()
[1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git am --short-current-patch" is a way to show the piece of e-mail
for the stopped step, which is not suitable to directly feed "git
apply" (it is designed to be a good "git am" input). It learned a
new option to show only the patch part.
* pb/am-show-current-patch:
am: support --show-current-patch=diff to retrieve .git/rebase-apply/patch
am: support --show-current-patch=raw as a synonym for--show-current-patch
am: convert "resume" variable to a struct
parse-options: convert "command mode" to a flag
parse-options: add testcases for OPT_CMDMODE()
`real_path()` returns result from a shared buffer, inviting subtle
reentrance bugs. One of these bugs occur when invoked this way:
set_git_dir(real_path(git_dir))
In this case, `real_path()` has reentrance:
real_path
read_gitfile_gently
repo_set_gitdir
setup_git_env
set_git_dir_1
set_git_dir
Later, `set_git_dir()` uses its now-dead parameter:
!is_absolute_path(path)
Fix this by using a dedicated `strbuf` to hold `strbuf_realpath()`.
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the stash.useBuiltin setting which was added as an escape hatch
to disable the builtin version of stash first released with Git 2.22.
Carrying the legacy version is a maintenance burden, and has in fact
become out of date failing a test since the 2.23 release, without
anyone noticing until now. So users would be getting a hint to fall
back to a potentially buggy version of the tool.
We used to shell out to git config to get the useBuiltin configuration
to avoid changing any global state before spawning legacy-stash.
However that is no longer necessary, so just use the 'git_config'
function to get the setting instead.
Similar to what we've done in d03ebd411c ("rebase: remove the
rebase.useBuiltin setting", 2019-03-18), where we remove the
corresponding setting for rebase, we leave the documentation in place,
so people can refer back to it when searching for it online, and so we
can refer to it in the commit message.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge signed-tag" while lacking the public key started to say
"No signature", which was utterly wrong. This regression has been
reverted.
* hi/gpg-use-check-signature:
Revert "gpg-interface: prefer check_signature() for GPG verification"
"git describe" in a repository with multiple root commits sometimes
gave up looking for the best tag to describe a given commit with
too early, which has been adjusted.
* be/describe-multiroot:
describe: don't abort too early when searching tags
"git clone --recurse-submodules --single-branch" now uses the same
single-branch option when cloning the submodules.
* es/recursive-single-branch-clone:
clone: pass --single-branch during --recurse-submodules
submodule--helper: use C99 named initializer
Code cleanup to use "struct object_id" more by replacing use of
"char *sha1"
* jk/nth-packed-object-id:
packfile: drop nth_packed_object_sha1()
packed_object_info(): use object_id internally for delta base
packed_object_info(): use object_id for returning delta base
pack-check: push oid lookup into loop
pack-check: convert "internal error" die to a BUG()
pack-bitmap: use object_id when loading on-disk bitmaps
pack-objects: use object_id struct in pack-reuse code
pack-objects: convert oe_set_delta_ext() to use object_id
pack-objects: read delta base oid into object_id struct
nth_packed_object_oid(): use customary integer return
"git rebase BASE BRANCH" rebased/updated the tip of BRANCH and
checked it out, even when the BRANCH is checked out in a different
worktree. This has been corrected.
* es/do-not-let-rebase-switch-to-protected-branch:
rebase: refuse to switch to branch already checked out elsewhere
t3400: make test clean up after itself
"git push" should stop from updating a branch that is checked out
when receive.denyCurrentBranch configuration is set, but it failed
to pay attention to checkouts in secondary worktrees. This has
been corrected.
* hv/receive-denycurrent-everywhere:
t2402: test worktree path when called in .git directory
receive.denyCurrentBranch: respect all worktrees
t5509: use a bare repository for test push target
get_main_worktree(): allow it to be called in the Git directory
In rare cases "git worktree add <path>" could think that <path>
was already a registered worktree even when it wasn't and refuse
to add the new worktree. This has been corrected.
* es/worktree-avoid-duplication-fix:
worktree: don't allow "add" validation to be fooled by suffix matching
worktree: add utility to find worktree by pathname
worktree: improve find_worktree() documentation
Underlying machinery of "git bisect--helper" is being refactored
into pieces that are more easily reused.
* mr/bisect-in-c-1:
bisect: libify `bisect_next_all`
bisect: libify `handle_bad_merge_base` and its dependents
bisect: libify `check_good_are_ancestors_of_bad` and its dependents
bisect: libify `check_merge_bases` and its dependents
bisect: libify `bisect_checkout`
bisect: libify `exit_if_skipped_commits` to `error_if_skipped*` and its dependents
bisect--helper: return error codes from `cmd_bisect__helper()`
bisect: add enum to represent bisect returning codes
bisect--helper: introduce new `decide_next()` function
bisect: use the standard 'if (!var)' way to check for 0
bisect--helper: change `retval` to `res`
bisect--helper: convert `vocab_*` char pointers to char arrays
"git sparse-checkout" learned a new "add" subcommand.
* ds/sparse-add:
sparse-checkout: allow one-character directories in cone mode
sparse-checkout: work with Windows paths
sparse-checkout: create 'add' subcommand
sparse-checkout: extract pattern update from 'set' subcommand
sparse-checkout: extract add_patterns_from_input()
change the advise call in tag library from advise() to
advise_if_enabled() to construct an example of the usage of
the new API.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next commit we're adding another config variable to be read
from 'git_stash_config', that is valid for the top level command
instead of just a subset. Move the 'git_config' invocation for
'git_stash_config' to the top-level to prepare for that.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix for a bug revealed by a recent change to make the protocol v2
the default.
* ds/partial-clone-fixes:
partial-clone: avoid fetching when looking for objects
partial-clone: demonstrate bugs in partial fetch
"git rebase" has learned to use the merge backend (i.e. the
machinery that drives "rebase -i") by default, while allowing
"--apply" option to use the "apply" backend (e.g. the moral
equivalent of "format-patch piped to am"). The rebase.backend
configuration variable can be set to customize.
* en/rebase-backend:
rebase: rename the two primary rebase backends
rebase: change the default backend from "am" to "merge"
rebase: make the backend configurable via config setting
rebase tests: repeat some tests using the merge backend instead of am
rebase tests: mark tests specific to the am-backend with --am
rebase: drop '-i' from the reflog for interactive-based rebases
git-prompt: change the prompt for interactive-based rebases
rebase: add an --am option
rebase: move incompatibility checks between backend options a bit earlier
git-rebase.txt: add more details about behavioral differences of backends
rebase: allow more types of rebases to fast-forward
t3432: make these tests work with either am or merge backends
rebase: fix handling of restrict_revision
rebase: make sure to pass along the quiet flag to the sequencer
rebase, sequencer: remove the broken GIT_QUIET handling
t3406: simplify an already simple test
rebase (interactive-backend): fix handling of commits that become empty
rebase (interactive-backend): make --keep-empty the default
t3404: directly test the behavior of interest
git-rebase.txt: update description of --allow-empty-message
"git check-ignore" did not work when the given path is explicitly
marked as not ignored with a negative entry in the .gitignore file.
* en/check-ignore:
check-ignore: fix documentation and implementation to match
The object reachability bitmap machinery and the partial cloning
machinery were not prepared to work well together, because some
object-filtering criteria that partial clones use inherently rely
on object traversal, but the bitmap machinery is an optimization
to bypass that object traversal. There however are some cases
where they can work together, and they were taught about them.
* jk/object-filter-with-bitmap:
rev-list --count: comment on the use of count_right++
pack-objects: support filters with bitmaps
pack-bitmap: implement BLOB_LIMIT filtering
pack-bitmap: implement BLOB_NONE filtering
bitmap: add bitmap_unset() function
rev-list: use bitmap filters for traversal
pack-bitmap: basic noop bitmap filter infrastructure
rev-list: allow commit-only bitmap traversals
t5310: factor out bitmap traversal comparison
rev-list: allow bitmaps when counting objects
rev-list: make --count work with --objects
rev-list: factor out bitmap-optimized routines
pack-bitmap: refuse to do a bitmap traversal with pathspecs
rev-list: fallback to non-bitmap traversal when filtering
pack-bitmap: fix leak of haves/wants object lists
pack-bitmap: factor out type iterator initialization
This reverts commit 72b006f4bf, which
breaks the end-user experience when merging a signed tag without
having the public key. We should report "can't check because we
have no public key", but the code with this change claimed that
there was no signature.
When searching the commit graph for tag candidates, `git-describe`
will stop as soon as there is only one active branch left and
it already found an annotated tag as a candidate.
This works well as long as all branches eventually connect back
to a common root, but if the tags are found across branches
with no common ancestor
B
o----.
\
o-----o---o----x
A
it can happen that the search on one branch terminates prematurely
because a tag was found on another, independent branch. This scenario
isn't quite as obscure as it sounds, since cloning with a limited
depth often introduces many independent "dead ends" into the commit
graph.
The help text of `git-describe` states pretty clearly that when
describing a commit D, the number appended to the emitted tag X should
correspond to the number of commits found by `git log X..D`.
Thus, this commit modifies the stopping condition to only abort
the search when only one branch is left to search *and* all current
best candidates are descendants from that branch.
For repositories with a single root, this condition is always
true: When the search is reduced to a single active branch, the
current commit must be an ancestor of *all* tag candidates. This
means that in the common case, this change will have no negative
performance impact since the same number of commits as before will
be traversed.
Signed-off-by: Benno Evers <benno@bmevers.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `options.switch_to' is set, `options.orig_head' is populated right
after with the object name the ref/commit argument points at.
Therefore, there is no need to parse `switch_to' again.
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote rename X Y" needs to adjust configuration variables
(e.g. branch.<name>.remote) whose value used to be X to Y.
branch.<name>.pushRemote is now also updated.
* bw/remote-rename-update-config:
remote rename/remove: gently handle remote.pushDefault config
config: provide access to the current line number
remote rename/remove: handle branch.<name>.pushRemote config values
remote: clean-up config callback
remote: clean-up by returning early to avoid one indentation
pull --rebase/remote rename: document and honor single-letter abbreviations rebase types
Previously, performing "git clone --recurse-submodules --single-branch"
resulted in submodules cloning all branches even though the superproject
cloned only one branch. Pipe --single-branch through the submodule
helper framework to make it to 'clone' later on.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Start using a named initializer list for SUBMODULE_UPDATE_CLONE_INIT, as
the struct is becoming cumbersome for a typical struct initializer list.
Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git worktree add <path>" performs various checks before approving
<path> as a valid location for the new worktree. Aside from ensuring
that <path> does not already exist, one of the questions it asks is
whether <path> is already a registered worktree. To perform this check,
it queries find_worktree() and disallows the "add" operation if
find_worktree() finds a match for <path>. As a convenience, however,
find_worktree() casts an overly wide net to allow users to identify
worktrees by shorthand in order to keep typing to a minimum. For
instance, it performs suffix matching which, given subtrees "foo/bar"
and "foo/baz", can correctly select the latter when asked only for
"baz".
"add" validation knows the exact path it is interrogating, so this sort
of heuristic-based matching is, at best, questionable for this use-case
and, at worst, may may accidentally interpret <path> as matching an
existing worktree and incorrectly report it as already registered even
when it isn't. (In fact, validate_worktree_add() already contains a
special case to avoid accidentally matching against the main worktree,
precisely due to this problem.)
Avoid the problem of potential accidental matching against an existing
worktree by instead taking advantage of find_worktree_by_path() which
matches paths deterministically, without applying any sort of magic
shorthand matching performed by find_worktree().
Reported-by: Cameron Gunnin <cameron.gunnin@synopsys.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a caller sets the object_info.delta_base_sha1 to a non-NULL pointer,
we'll write the oid of the object's delta base to it. But we can
increase our type safety by switching this to a real object_id struct.
All of our callers are just pointing into the hash member of an
object_id anyway, so there's no inconvenience.
Note that we do still keep it as a pointer-to-struct, because the NULL
sentinel value tells us whether the caller is even interested in the
information.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the pack-reuse code is dumping an OFS_DELTA entry to a client that
doesn't support it, we re-write it as a REF_DELTA. To do so, we use
nth_packed_object_sha1() to get the oid, but that function is soon going
away in favor of the more type-safe nth_packed_object_id(). Let's switch
now in preparation.
Note that this does incur an extra hash copy (from the pack idx mmap to
the object_id and then to the output, rather than straight from mmap to
the output). But this is not worth worrying about. It's probably not
measurable even when it triggers, and this is fallback code that we
expect to trigger very rarely (since everybody supports OFS_DELTA these
days anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already store an object_id internally, and now our sole caller also
has one. Let's stop passing around the internal hash array, which adds a
bit of type safety.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're considering reusing an on-disk delta, we get the oid of the
base as a pointer to unsigned char bytes of the hash, either into the
packfile itself (for REF_DELTA) or into the pack idx (using the revindex
to convert the offset into an index entry).
Instead, we'd prefer to use a more type-safe object_id as much as
possible. We can get the pack idx using nth_packed_object_id() instead.
For the packfile bytes, we can copy them out using oidread().
This doesn't even incur an extra copy overall, since the next thing we'd
always do with that pointer is pass it to can_reuse_delta(), which needs
an object_id anyway (and called oidread() itself). So this patch also
converts that function to take the object_id directly.
Note that we did previously use NULL as a sentinel value when the object
isn't a delta. We could probably get away with using the null oid for
this, but instead we'll use an explicit boolean flag, which should make
things more obvious for people reading the code later.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our nth_packed_object_sha1() function returns NULL for error. So when we
wrapped it with nth_packed_object_oid(), we kept the same semantics. But
it's a bit funny, because the caller actually passes in an out
parameter, and the pointer we return is just that same struct they
passed to us (or NULL).
It's not too terrible, but it does make the interface a little
non-idiomatic. Let's switch to our usual "0 for success, negative for
error" return value. Most callers either don't check it, or are
trivially converted. The one that requires the biggest change is
actually improved, as we can ditch an extra aliased pointer variable.
Since we are changing the interface in a subtle way that the compiler
wouldn't catch, let's also change the name to catch any topics in
flight. We can drop the 'o' and make it nth_packed_object_id(). That's
slightly shorter, but also less redundant since the 'o' stands for
"object" already.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The invocation "git rebase <upstream> <branch>" switches to <branch>
before performing the rebase operation. However, unlike git-switch,
git-checkout, and git-worktree which all refuse to switch to a branch
that is already checked out in some other worktree, git-rebase switches
to <branch> unconditionally. Curb this careless behavior by making
git-rebase also refuse to switch to a branch checked out elsewhere.
Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The receive.denyCurrentBranch config option controls what happens if
you push to a branch that is checked out into a non-bare repository.
By default, it rejects it. It can be disabled via `ignore` or `warn`.
Another yet trickier option is `updateInstead`.
However, this setting was forgotten when the git worktree command was
introduced: only the main worktree's current branch is respected.
With this change, all worktrees are respected.
That change also leads to revealing another bug,
i.e. `receive.denyCurrentBranch = true` was ignored when pushing into a
non-bare repository's unborn current branch using ref namespaces. As
`is_ref_checked_out()` returns 0 which means `receive-pack` does not get
into conditional statement to switch `deny_current_branch` accordingly
(ignore, warn, refuse, unconfigured, updateInstead).
receive.denyCurrentBranch uses the function `refs_resolve_ref_unsafe()`
(called via `resolve_refdup()`) to resolve the symbolic ref HEAD, but
that function fails when HEAD does not point at a valid commit.
As we replace the call to `refs_resolve_ref_unsafe()` with
`find_shared_symref()`, which has no problem finding the worktree for a
given branch even if it is unborn yet, this bug is fixed at the same
time: receive.denyCurrentBranch now also handles worktrees with unborn
branches as intended even while using ref namespaces.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The transition plan anticipates that we will allow signatures using
multiple algorithms in a single commit. In order to do so, we need to
use a different header per algorithm so that it will be obvious over
which data to compute the signature.
The transition plan specifies that we should use "gpgsig-sha256", so
wire up the commit code such that it can write and parse the current
algorithm, and it can remove the headers for any algorithm when creating
a new commit. Add tests to ensure that we write using the right header
and that git fsck doesn't reject these commits.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we perform a clone, we won't know the remote side's hash algorithm
until we've read the heads. Consequently, we'll need to rewrite the
repository format version and hash algorithm once we know what the
remote side has. Move the code that does this into its own function so
that we can call it from clone in the future.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the foreseeable future, SHA-1 will be the default algorithm for Git.
However, when running the testsuite, we want to be able to test an
arbitrary algorithm. It would be quite burdensome and very untidy to
have to specify the algorithm we'd like to test every time we
initialized a new repository somewhere in the testsuite, so add an
environment variable to allow us to specify the default hash algorithm
for Git.
This has the benefit that we can set it once for the entire testsuite
and not have to think about it. In the future, users can also use it to
set the default for their repositories if they would like to do so.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the user to specify the hash algorithm on the command line by
using the --object-format option to git init. Validate that the user is
not attempting to reinitialize a repository with a different hash
algorithm. Ensure that if we are writing a non-SHA-1 repository that we
set the repository version to 1 and write the objectFormat extension.
Restrict this option to work only when ENABLE_SHA256 is set until the
codebase is in a situation to fully support this.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases, we will want to not only check the repository format, but
extract the information that we've gained. To do so, allow
check_repository_format to take a pointer to struct repository_format.
Allow passing NULL for this argument if we're not interested in the
information, and pass NULL for all existing callers. A future patch
will make use of this information.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid hard-coding a hash size, instead preferring to use the_hash_algo.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can check if certain characters are present in a string by calling
strchr(3) on each of them, or we can pass them all to a single
strpbrk(3) call. The latter is shorter, less repetitive and slightly
more efficient, so let's do that instead.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using partial clone, find_non_local_tags() in builtin/fetch.c
checks each remote tag to see if its object also exists locally. There
is no expectation that the object exist locally, but this function
nevertheless triggers a lazy fetch if the object does not exist. This
can be extremely expensive when asking for a commit, as we are
completely removed from the context of the non-existent object and
thus supply no "haves" in the request.
6462d5eb9a (fetch: remove fetch_if_missing=0, 2019-11-05) removed a
global variable that prevented these fetches in favor of a bitflag.
However, some object existence checks were not updated to use this flag.
Update find_non_local_tags() to use OBJECT_INFO_SKIP_FETCH_OBJECT in
addition to OBJECT_INFO_QUICK. The _QUICK option only prevents
repreparing the pack-file structures. We need to be extremely careful
about supplying _SKIP_FETCH_OBJECT when we expect an object to not exist
due to updated refs.
This resolves a broken test in t5616-partial-clone.sh.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An annotated tag has two names: where it sits in the refs/tags
hierarchy and the tagname recorded in the "tag" field in the object
itself. They usually should match.
Since 212945d4 ("Teach git-describe to verify annotated tag names
before output", 2008-02-28), a commit described using an annotated
tag bases its name on the tagname from the object. While this was a
deliberate design decision to make it easier to converse about tags
with others, even if the tags happen to be fetched to a different
name than it was given by its creator, it had one downside.
The output from "git describe", at least in the modern Git, should
be usable as an object name to name the exact commit given to the
"git describe" command. Using the tagname, when two names differ,
breaks this property, when describing a commit that is directly
pointed at by such a tag. An annotated tag Bob made as "v1.0" may
sit at "refs/tags/v1.0-bob" in the ref hierarchy, and output from
"git describe v1.0-bob^0" would say "v1.0", but there may not be
any tag at "refs/tags/v1.0" locally or there may be another tag that
points at a different object.
Note that this won't be a problem if a commit being described is not
directly pointed at by such a mislocated tag. In the example in the
previous paragraph, describing a commit whose parent is v1.0-bob
would result in "v1.0" (i.e. the tagname taken from the tag object)
followed by "-1-gXXXXX" where XXXXX is the abbreviated object name,
and a string that ends with "-g" followed by a hexadecimal string is
an object name for the object whose name begins with hexadecimal
string (as long as it is unique), so it does not matter if the
leading part is "v1.0" or "v1.0-bob".
Show the name in the long format, i.e. with "-0-gXXXXX" suffix, when
the name we give is based on a mislocated annotated tag to ensure
that the output can be used as the object name for the object
originally given to the command to fix the issue.
While at it, remove an overly cautious dead code to protect against
an annotated tag object without the tagname. Such a tag is filtered
out much earlier in the codeflow, and will not reach this part of
the code.
Helped-by: Matheus Tavares <matheus.bernardino@usp.br>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git am --show-current-patch" was added in commit 984913a210 ("am:
add --show-current-patch", 2018-02-12), "git am" started recommending it
as a replacement for .git/rebase-merge/patch. Unfortunately the suggestion
is somewhat misguided; for example, the output of "git am --show-current-patch"
cannot be passed to "git apply" if it is encoded as quoted-printable
or base64. Add a new mode to "git am --show-current-patch" in order to
straighten the suggestion.
Reported-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git am --show-current-patch" was added in commit 984913a210 ("am:
add --show-current-patch", 2018-02-12), "git am" started recommending it
as a replacement for .git/rebase-merge/patch. Unfortunately the suggestion
is somewhat misguided; for example, the output "git am --show-current-patch"
cannot be passed to "git apply" if it is encoded as quoted-printable or
base64. To simplify worktree operations and to avoid that users poke into
.git, it would be better if "git am" also provided a mode that copies
.git/rebase-merge/patch to stdout.
One possibility could be to have completely separate options, introducing
for example --show-current-message (for .git/rebase-apply/NNNN)
and --show-current-diff (for .git/rebase-apply/patch), while possibly
deprecating --show-current-patch.
That would even remove the need for the first two patches in the series.
However, the long common prefix would have prevented using an abbreviated
option such as "--show". Therefore, I chose instead to add a string
argument to --show-current-patch. The new argument is optional, so that
"git am --show-current-patch"'s behavior remains backwards-compatible.
The next choice to make is how to handle multiple --show-current-patch
options. Right now, something like "git am --abort --show-current-patch"
is rejected, and the previous suggestion would likewise have naturally
rejected a command line like
git am --show-current-message --show-current-diff
Therefore, I decided to also reject for example
git am --show-current-patch=diff --show-current-patch=raw
In other words the whole of --show-current-patch=xxx (including the
optional argument) is treated as the command mode. I found this to be
more consistent and intuitive, even though it differs from the usual
"last one wins" semantics of the git command line.
Add the code to parse submodes based on the above design, where for now
"raw" is the only valid submode. "raw" prints the full e-mail message
just like "git am --show-current-patch".
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will allow stashing the submode of --show-current-patch from a
callback function. Using a struct will allow accessing both fields from
outside cmd_am (through container_of).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Decisions taken for simplicity:
1) For now, `--pathspec-from-file` is declared incompatible with
`--patch`, even when <file> is not `-`. Such use case is not
really expected.
2) It is not allowed to pass pathspec in both args and file.
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eliminate crude option parsing and rely on real parsing instead, because
1) Crude parsing is crude, for example it's not capable of
handling things like `git stash -m Message`
2) Adding options in two places is inconvenient and prone to bugs
As a side result, the case of `git stash -m Message` gets fixed.
Also give a good error message instead of just throwing usage at user.
----
Some review of what's been happening to this code:
Before [1], `git-stash.sh` only verified that all args begin with `-` :
# The default command is "push" if nothing but options are given
seen_non_option=
for opt
do
case "$opt" in
--) break ;;
-*) ;;
*) seen_non_option=t; break ;;
esac
done
Later, [1] introduced the duplicate code I'm now removing, also making
the previous test more strict by white-listing options.
----
[1] Commit 40af1468 ("stash: convert `stash--helper.c` into `stash.c`" 2019-02-26)
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Decisions taken for simplicity:
1) It is not allowed to pass pathspec in both args and file.
Adjustments were needed for `if (!argc)` block:
This code actually means "pathspec is not present". Previously, pathspec
could only come from commandline arguments, so testing for `argc` was a
valid way of testing for the presence of pathspec. But this is no longer
true with `--pathspec-from-file`.
During the entire `--pathspec-from-file` story, I tried to keep its
behavior very close to giving pathspec on commandline, so that switching
from one to another doesn't involve any surprises.
However, throwing usage at user in the case of empty
`--pathspec-from-file` would puzzle because there's nothing wrong with
"usage" (that is, argc/argv array).
On the other hand, throwing usage in the old case also feels bad to me.
While it's less of a puzzle, I (as user) never liked the experience of
comparing my commandline to "usage", trying to spot a difference. Since
it's already known what the error is, it feels a lot better to give that
specific error to user.
Judging from [1] it doesn't seem that showing usage in this case was
important (the patch was to avoid segfault), and it doesn't fit into how
other commands react to empty pathspec (see for example `git add` with a
custom message).
Therefore, I decided to show new error text in both cases. In order to
continue testing for error early, I moved `parse_pathspec()` higher. Now
it happens before `read_cache()` / `hold_locked_index()` /
`setup_work_tree()`, which shouldn't cause any issues.
[1] Commit 7612a1ef ("git-rm: honor -n flag" 2006-06-09)
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we want to get rid of git-bisect.sh, it would be necessary to
convert those exit() calls to return statements so that errors can be
reported.
Emulate try catch in C by converting `exit(<positive-value>)` to
`return <negative-value>`. Follow POSIX conventions to return
<negative-value> to indicate error.
Code that turns BISECT_INTERNAL_SUCCESS_MERGE_BASE (-11)
to BISECT_OK (0) from `check_good_are_ancestors_of_bad()` has been moved to
`cmd_bisect__helper()`.
Update all callers to handle the error returns.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we want to get rid of git-bisect.sh, it would be necessary
to convert bisect.c exit() calls to return statements so
that errors can be reported. Let's prepare for that by making
it possible to return different error codes than just 0 or 1.
Different error codes might enable a bisecting script calling the
bisect command that uses this function to do different things
depending on the exit status of the bisect command.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's refactor code from bisect_next_check() into a new
decide_next() helper function.
This removes some goto statements and makes the code simpler,
clearer and easier to understand.
While at it `bad_ref` and `good_glob` are not const any more
to void casting them inside `free()`.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's rename variable retval to res, so that variable names
in bisect--helper.c are more consistent.
After this change, there are 110 occurrences of res in the file
and zero of retval, while there were 26 instances of retval before.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using a pointer that points at a constant string,
just give name directly to the constant string; this way, we
do not have to allocate a pointer variable in addition to
the string we want to use.
Let's convert `vocab_bad` and `vocab_good` char pointers to char arrays.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
check-ignore has two different modes, and neither of these modes has an
implementation that matches the documentation. These modes differ in
whether they just print paths or whether they also print the final
pattern matched by the path. The fix is different for both modes, so
I'll discuss both separately.
=== First (default) mode ===
The first mode is documented as:
For each pathname given via the command-line or from a file via
--stdin, check whether the file is excluded by .gitignore (or other
input files to the exclude mechanism) and output the path if it is
excluded.
However, it fails to do this because it did not account for negated
patterns. Commands other than check-ignore verify exclusion rules via
calling
... -> treat_one_path() -> is_excluded() -> last_matching_pattern()
while check-ignore has a call path of the form:
... -> check_ignore() -> last_matching_pattern()
The fact that the latter does not include the call to is_excluded()
means that it is susceptible to to messing up negated patterns (since
that is the only significant thing is_excluded() adds over
last_matching_pattern()). Unfortunately, we can't make it just call
is_excluded(), because the same codepath is used by the verbose mode
which needs to know the matched pattern in question. This brings us
to...
=== Second (verbose) mode ===
The second mode, known as verbose mode, references the first in the
documentation and says:
Also output details about the matching pattern (if any) for each
given pathname. For precedence rules within and between exclude
sources, see gitignore(5).
The "Also" means it will print patterns that match the exclude rules as
noted for the first mode, and also print which pattern matches. Unless
more information is printed than just pathname and pattern (which is not
done), this definition is somewhat ill-defined and perhaps even
self-contradictory for negated patterns: A path which matches a negated
exclude pattern is NOT excluded and thus shouldn't be printed by the
former logic, while it certainly does match one of the explicit patterns
and thus should be printed by the latter logic.
=== Resolution ==
Since the second mode exists to find out which pattern matches given
paths, and showing the user a pattern that begins with a '!' is
sufficient for them to figure out whether the pattern is excluded, the
existing behavior is desirable -- we just need to update the
documentation to match the implementation (i.e. it is about printing
which pattern is matched by paths, not about showing which paths are
excluded).
For the first or default mode, users just want to know whether a pattern
is excluded. As such, the existing documentation is desirable; change
the implementation to match the documented behavior.
Finally, also adjust a few tests in t0008 that were caught up by this
discrepancy in how negated paths were handled.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git config" learned to show in which "scope", in addition to in
which file, each config setting comes from.
* mr/show-config-scope:
config: add '--show-scope' to print the scope of a config value
submodule-config: add subomdule config scope
config: teach git_config_source to remember its scope
config: preserve scope in do_git_config_sequence
config: clarify meaning of command line scoping
config: split repo scope to local and worktree
config: make scope_name non-static and rename it
t1300: create custom config file without special characters
t1300: fix over-indented HERE-DOCs
config: fix typo in variable name
Memory footprint and performance of "git name-rev" has been
improved.
* rs/name-rev-memsave:
name-rev: sort tip names before applying
name-rev: release unused name strings
name-rev: generate name strings only if they are better
name-rev: pre-size buffer in get_parent_name()
name-rev: factor out get_parent_name()
name-rev: put struct rev_name into commit slab
name-rev: don't _peek() in create_or_update_name()
name-rev: don't leak path copy in name_ref()
name-rev: respect const qualifier
name-rev: remove unused typedef
name-rev: rewrite create_or_update_name()
Two related changes, with separate rationale for each:
Rename the 'interactive' backend to 'merge' because:
* 'interactive' as a name caused confusion; this backend has been used
for many kinds of non-interactive rebases, and will probably be used
in the future for more non-interactive rebases than interactive ones
given that we are making it the default.
* 'interactive' is not the underlying strategy; merging is.
* the directory where state is stored is not called
.git/rebase-interactive but .git/rebase-merge.
Rename the 'am' backend to 'apply' because:
* Few users are familiar with git-am as a reference point.
* Related to the above, the name 'am' makes sentences in the
documentation harder for users to read and comprehend (they may read
it as the verb from "I am"); avoiding this difficult places a large
burden on anyone writing documentation about this backend to be very
careful with quoting and sentence structure and often forces
annoying redundancy to try to avoid such problems.
* Users stumble over pronunciation ("am" as in "I am a person not a
backend" or "am" as in "the first and thirteenth letters in the
alphabet in order are "A-M"); this may drive confusion when one user
tries to explain to another what they are doing.
* While "am" is the tool driving this backend, the tool driving git-am
is git-apply, and since we are driving towards lower-level tools
for the naming of the merge backend we may as well do so here too.
* The directory where state is stored has never been called
.git/rebase-am, it was always called .git/rebase-apply.
For all the reasons listed above:
* Modify the documentation to refer to the backends with the new names
* Provide a brief note in the documentation connecting the new names
to the old names in case users run across the old names anywhere
(e.g. in old release notes or older versions of the documentation)
* Change the (new) --am command line flag to --apply
* Rename some enums, variables, and functions to reinforce the new
backend names for us as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A large variety of rebase types are supported by the interactive
machinery, not just the explicitly interactive ones. These all share
the same code and write the same reflog messages, but the "-i" moniker
in those messages doesn't really have much meaning. It also becomes
somewhat distracting once we switch the default from the am-backend to
the interactive one. Just remove the "-i" from these messages.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, this option doesn't do anything except error out if any
options requiring the interactive-backend are also passed. However,
when we make the default backend configurable later in this series, this
flag will provide a way to override the config setting.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the past, we dis-allowed rebases using the interactive backend from
performing a fast-forward to short-circuit the rebase operation. This
made sense for explicitly interactive rebases and some implicitly
interactive rebases, but certainly became overly stringent when the
merge backend was re-implemented via the interactive backend.
Just as the am-based rebase has always had to disable the fast-forward
based on a variety of conditions or flags (e.g. --signoff, --whitespace,
etc.), we need to do the same but now with a few more options. However,
continuing to use REBASE_FORCE for tracking this is problematic because
the interactive backend used it for a different purpose. (When
REBASE_FORCE wasn't set, the interactive backend would not fast-forward
the whole series but would fast-forward individual "pick" commits at the
beginning of the todo list, and then a squash or something would cause
it to start generating new commits.) So, introduce a new
allow_preemptive_ff flag contained within cmd_rebase() and use it to
track whether we are going to allow a pre-emptive fast-forward that
short-circuits the whole rebase.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
restrict_revision in the original shell script was an excluded revision
range. It is also treated that way by the am-backend. In the
conversion from shell to C (see commit 6ab54d17be ("rebase -i:
implement the logic to initialize $revisions in C", 2018-08-28)), the
interactive-backend accidentally treated it as a positive revision
rather than a negated one.
This was missed as there were no tests in the testsuite that tested an
interactive rebase with fork-point behavior.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The GIT_QUIET environment variable was used to signal the non-am
backends that the rebase should perform quietly. The preserve-merges
backend does not make use of the quiet flag anywhere (other than to
write out its state whenever it writes state), and this mechanism was
broken in the conversion from shell to C. Since this environment
variable was specifically designed for scripts and the only backend that
would still use it is no longer a script, just gut this code.
A subsequent commit will fix --quiet for the interactive/merge backend
in a different way.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As established in the previous commit and commit b00bf1c9a8
(git-rebase: make --allow-empty-message the default, 2018-06-27), the
behavior for rebase with different backends in various edge or corner
cases is often more happenstance than design. This commit addresses
another such corner case: commits which "become empty".
A careful reader may note that there are two types of commits which would
become empty due to a rebase:
* [clean cherry-pick] Commits which are clean cherry-picks of upstream
commits, as determined by `git log --cherry-mark ...`. Re-applying
these commits would result in an empty set of changes and a
duplicative commit message; i.e. these are commits that have
"already been applied" upstream.
* [become empty] Commits which are not empty to start, are not clean
cherry-picks of upstream commits, but which still become empty after
being rebased. This happens e.g. when a commit has changes which
are a strict subset of the changes in an upstream commit, or when
the changes of a commit can be found spread across or among several
upstream commits.
Clearly, in both cases the changes in the commit in question are found
upstream already, but the commit message may not be in the latter case.
When cherry-mark can determine a commit is already upstream, then
because of how cherry-mark works this means the upstream commit message
was about the *exact* same set of changes. Thus, the commit messages
can be assumed to be fully interchangeable (and are in fact likely to be
completely identical). As such, the clean cherry-pick case represents a
case when there is no information to be gained by keeping the extra
commit around. All rebase types have always dropped these commits, and
no one to my knowledge has ever requested that we do otherwise.
For many of the become empty cases (and likely even most), we will also
be able to drop the commit without loss of information -- but this isn't
quite always the case. Since these commits represent cases that were
not clean cherry-picks, there is no upstream commit message explaining
the same set of changes. Projects with good commit message hygiene will
likely have the explanation from our commit message contained within or
spread among the relevant upstream commits, but not all projects run
that way. As such, the commit message of the commit being rebased may
have reasoning that suggests additional changes that should be made to
adapt to the new base, or it may have information that someone wants to
add as a note to another commit, or perhaps someone even wants to create
an empty commit with the commit message as-is.
Junio commented on the "become-empty" types of commits as follows[1]:
WRT a change that ends up being empty (as opposed to a change that
is empty from the beginning), I'd think that the current behaviour
is desireable one. "am" based rebase is solely to transplant an
existing history and want to stop much less than "interactive" one
whose purpose is to polish a series before making it publishable,
and asking for confirmation ("this has become empty--do you want to
drop it?") is more appropriate from the workflow point of view.
[1] https://lore.kernel.org/git/xmqqfu1fswdh.fsf@gitster-ct.c.googlers.com/
I would simply add that his arguments for "am"-based rebases actually
apply to all non-explicitly-interactive rebases. Also, since we are
stating that different cases should have different defaults, it may be
worth providing a flag to allow users to select which behavior they want
for these commits.
Introduce a new command line flag for selecting the desired behavior:
--empty={drop,keep,ask}
with the definitions:
drop: drop commits which become empty
keep: keep commits which become empty
ask: provide the user a chance to interact and pick what to do with
commits which become empty on a case-by-case basis
In line with Junio's suggestion, if the --empty flag is not specified,
pick defaults as follows:
explicitly interactive: ask
otherwise: drop
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Different rebase backends have different treatment for commits which
start empty (i.e. have no changes relative to their parent), and the
--keep-empty option was added at some point to allow adjusting behavior.
The handling of commits which start empty is actually quite similar to
commit b00bf1c9a8 (git-rebase: make --allow-empty-message the default,
2018-06-27), which pointed out that the behavior for various backends is
often more happenstance than design. The specific change made in that
commit is actually quite relevant as well and much of the logic there
directly applies here.
It makes a lot of sense in 'git commit' to error out on the creation of
empty commits, unless an override flag is provided. However, once
someone determines that there is a rare case that merits using the
manual override to create such a commit, it is somewhere between
annoying and harmful to have to take extra steps to keep such
intentional commits around. Granted, empty commits are quite rare,
which is why handling of them doesn't get considered much and folks tend
to defer to existing (accidental) behavior and assume there was a reason
for it, leading them to just add flags (--keep-empty in this case) that
allow them to override the bad defaults. Fix the interactive backend so
that --keep-empty is the default, much like we did with
--allow-empty-message. The am backend should also be fixed to have
--keep-empty semantics for commits that start empty, but that is not
included in this patch other than a testcase documenting the failure.
Note that there was one test in t3421 which appears to have been written
expecting --keep-empty to not be the default as correct behavior. This
test was introduced in commit 00b8be5a4d ("add tests for rebasing of
empty commits", 2013-06-06), which was part of a series focusing on
rebase topology and which had an interesting original cover letter at
https://lore.kernel.org/git/1347949878-12578-1-git-send-email-martinvonz@gmail.com/
which noted
Your input especially appreciated on whether you agree with the
intent of the test cases.
and then went into a long example about how one of the many tests added
had several questions about whether it was correct. As such, I believe
most the tests in that series were about testing rebase topology with as
many different flags as possible and were not trying to state in general
how those flags should behave otherwise.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to compute the commit-graph has been taught to use a more
robust way to tell if two object directories refer to the same
thing.
* tb/commit-graph-object-dir:
commit-graph.h: use odb in 'load_commit_graph_one_fd_st'
commit-graph.c: remove path normalization, comparison
commit-graph.h: store object directory in 'struct commit_graph'
commit-graph.h: store an odb in 'struct write_commit_graph_context'
t5318: don't pass non-object directory to '--object-dir'
The index-pack code now diagnoses a bad input packstream that
records the same object twice when it is used as delta base; the
code used to declare a software bug when encountering such an
input, but it is an input error.
* jk/index-pack-dupfix:
index-pack: downgrade twice-resolved REF_DELTA to die()
The way "git submodule status" reports an initialized but not yet
populated submodule has not been reimplemented correctly when a
part of the "git submodule" command was rewritten in C, which has
been corrected.
* pk/status-of-uncloned-submodule:
t7400: testcase for submodule status on unregistered inner git repos
submodule: fix status of initialized but not cloned submodules
t7400: add a testcase for submodule status on empty dirs
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).
* mt/use-passed-repo-more-in-funcs:
sha1-file: allow check_object_signature() to handle any repo
sha1-file: pass git_hash_algo to hash_object_file()
sha1-file: pass git_hash_algo to write_object_file_prepare()
streaming: allow open_istream() to handle any repo
pack-check: use given repo's hash_algo at verify_packfile()
cache-tree: use given repo's hash_algo at verify_one()
diff: make diff_populate_filespec() honor its repo argument
Some rough edges in the sparse-checkout feature, especially around
the cone mode, have been cleaned up.
* ds/sparse-checkout-harden:
sparse-checkout: fix cone mode behavior mismatch
sparse-checkout: improve docs around 'set' in cone mode
sparse-checkout: escape all glob characters on write
sparse-checkout: use C-style quotes in 'list' subcommand
sparse-checkout: unquote C-style strings over --stdin
sparse-checkout: write escaped patterns in cone mode
sparse-checkout: properly match escaped characters
sparse-checkout: warn on globs in cone patterns
sparse-checkout: detect short patterns
sparse-checkout: cone mode does not recognize "**"
sparse-checkout: fix documentation typo for core.sparseCheckoutCone
clone: fix --sparse option with URLs
sparse-checkout: create leading directories
t1091: improve here-docs
t1091: use check_files to reduce boilerplate
Unneeded connectivity check is now disabled in a partial clone when
fetching into it.
* jt/connectivity-check-optim-in-partial-clone:
fetch: forgo full connectivity check if --filter
connected: verify promisor-ness of partial clone
"git rebase -i" (and friends) used to unnecessarily check out the
tip of the branch to be rebased, which has been corrected.
* ag/rebase-avoid-unneeded-checkout:
rebase -i: stop checking out the tip of the branch to rebase
Traditionally, we avoided threaded grep while searching in objects
(as opposed to files in the working tree) as accesses to the object
layer is not thread-safe. This limitation is getting lifted.
* mt/threaded-grep-in-object-store:
grep: use no. of cores as the default no. of threads
grep: move driver pre-load out of critical section
grep: re-enable threads in non-worktree case
grep: protect packed_git [re-]initialization
grep: allow submodule functions to run in parallel
submodule-config: add skip_if_read option to repo_read_gitmodules()
grep: replace grep_read_mutex by internal obj read lock
object-store: allow threaded access to object reading
replace-object: make replace operations thread-safe
grep: fix racy calls in grep_objects()
grep: fix race conditions at grep_submodule()
grep: fix race conditions on userdiff calls
Two help messages given when "git add" notices the user gave it
nothing to add have been updated to use advise() API.
* hw/advice-add-nothing:
add: change advice config variables used by the add API
add: use advise function to display hints
"git grep --no-index" should not get affected by the contents of
the .gitmodules file but when "--recurse-submodules" is given or
the "submodule.recurse" variable is set, it did. Now these
settings are ignored in the "--no-index" mode.
* pb/do-not-recurse-grep-no-index:
grep: ignore --recurse-submodules if --no-index is given
"git restore --staged" did not correctly update the cache-tree
structure, resulting in bogus trees to be written afterwards, which
has been corrected.
* nd/switch-and-restore:
restore: invalidate cache-tree when removing entries with --staged
"git commit" gives output similar to "git status" when there is
nothing to commit, but without honoring the advise.statusHints
configuration variable, which has been corrected.
* hw/commit-advise-while-rejecting:
commit: honor advice.statusHints when rejecting an empty commit
Just as rev-list recently learned to combine filters and bitmaps, let's
do the same for pack-objects. The infrastructure is all there; we just
need to pass along our filter options, and the pack-bitmap code will
decide to use bitmaps or not.
This unsurprisingly makes things faster for partial clones of large
repositories (here we're cloning linux.git):
Test HEAD^ HEAD
------------------------------------------------------------------------------
5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This just passes the filter-options struct to prepare_bitmap_walk().
Since the bitmap code doesn't actually support any filters yet, it will
fallback to the non-bitmap code if any --filter is specified. But this
lets us exercise that rejection code path, as well as getting us ready
to test filters via rev-list when we _do_ support them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently you can't use object filters with bitmaps, but we plan to
support at least some filters with bitmaps. Let's introduce some
infrastructure that will help us do that:
- prepare_bitmap_walk() now accepts a list_objects_filter_options
parameter (which can be NULL for no filtering; all the current
callers pass this)
- we'll bail early if the filter is incompatible with bitmaps (just as
we would if there were no bitmaps at all). Currently all filters are
incompatible.
- we'll filter the resulting bitmap; since there are no supported
filters yet, this is always a noop.
There should be no behavior change yet, but we'll support some actual
filters in a future patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since we added reachability bitmap support, we've been able to use
it with rev-list to get the full list of objects, like:
git rev-list --objects --use-bitmap-index --all
But you can't do so without --objects, since we weren't ready to just
show the commits. However, the internals of the bitmap code are mostly
ready for this: they avoid opening up trees when walking to fill in the
bitmaps. We just need to actually pass in the rev_info to
traverse_bitmap_commit_list() so it knows which types to bother
triggering our callback for.
For completeness, the perf test now covers both the existing --objects
case, as well as the new commits-only behavior (the objects one got way
faster when we introduced bitmaps, but obviously isn't improved now).
Here are numbers for linux.git:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8%
5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0%
That run was cheating a little, as I didn't have any commit-graph in the
repository, and we'd built it by default these days when running git-gc.
Here are numbers with a commit-graph:
Test HEAD^ HEAD
------------------------------------------------------------------------
5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1%
5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1%
Still an improvement, but a lot less impressive.
We could have the perf script remove any commit-graph to show the
out-sized effect, but it probably makes sense to leave it in what would
be a more typical setup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prior commit taught "--count --objects" to work without bitmaps. We
should be able to get the same answer much more quickly with bitmaps.
Note that we punt on the max_count case here. This perhaps _could_ be
made to work if we find all of the boundary commits and treat them as
UNINTERESTING, subtracting them (and their reachable objects) from the
set we return. That implies an actual commit traversal, but we'd still
be faster due to avoiding opening up any trees. Given the complexity and
the fact that anyone is unlikely to want this, it makes sense to just
fall back to the non-bitmap case for now.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current behavior from "rev-list --count --objects" is nonsensical:
we enumerate all of the objects except commits, but then give a count of
commits. This wasn't planned, and is just what the code happens to do.
Instead, let's give the answer the user almost certainly wanted: the
full count of objects.
Note that there are more complicated cases around cherry-marking, etc.
We'll punt on those for now, but let the user know that we can't produce
an answer (rather than giving them something useless).
We'll test both the new feature as well as a vanilla --count of commits,
since that surprisingly doesn't seem to be covered in the existing
tests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few operations in rev-list that are optimized for bitmaps.
Rather than having the code inline in cmd_rev_list(), let's move them
into helpers. This not only makes the flow of the main function simpler,
but it lets us replace the complex "can we do the optimization?"
conditionals with a series of early returns from the functions. That
also makes it easy to add comments explaining those conditions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rev-list has refused to use bitmaps with pathspec limiting since
c8a70d3509 (rev-list: disable --use-bitmap-index when pruning commits,
2015-07-01). But this is true not just for rev-list, but for anyone who
calls prepare_bitmap_walk(); the code isn't equipped to handle this
case. We never noticed because the only other callers would never pass
a pathspec limiter.
But let's push the check down into prepare_bitmap_walk() anyway. That's
a more logical place for it to live, as callers shouldn't need to know
the details (and must be prepared to fall back to a regular traversal
anyway, since there might not be bitmaps in the repository).
It would also prepare us for a day where this case _is_ handled, but
that's pretty unlikely. E.g., we could use bitmaps to generate the set
of commits, and then diff each commit to see if it matches the pathspec.
That would be slightly faster than a naive traversal that actually walks
the commits. But you'd probably do better still to make use of the newer
commit-graph feature to make walking the commits very cheap.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--use-bitmap-index" option is usually aspirational: if we have
bitmaps and the request can be fulfilled more quickly using them we'll
do so, but otherwise fall back to a non-bitmap traversal.
The exception is object filtering, which explicitly dies if the two
options are combined. Let's convert this to the usual fallback behavior.
This is a minor convenience for now (since the caller can easily know
that --filter and --use-bitmap-index don't combine), but will become
much more useful as we start to support _some_ filters with bitmaps, but
not others.
The test infrastructure here is bigger than necessary for checking this
one small feature. But it will serve as the basis for more filtering
bitmap tests in future patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep --no-index" should not get affected by the contents of
the .gitmodules file but when "--recurse-submodules" is given or
the "submodule.recurse" variable is set, it did. Now these
settings are ignored in the "--no-index" mode.
* pb/do-not-recurse-grep-no-index:
grep: ignore --recurse-submodules if --no-index is given
"git rebase --fork-point master" used to work OK, as it internally
called "git merge-base --fork-point" that knew how to handle short
refname and dwim it to the full refname before calling the
underlying get_fork_point() function.
This is no longer true after the command was rewritten in C, as its
internall call made directly to get_fork_point() does not dwim a
short ref.
Move the "dwim the refname argument to the full refname" logic that
is used in "git merge-base" to the underlying get_fork_point()
function, so that the other caller of the function in the
implementation of "git rebase" behaves the same way to fix this
regression.
Signed-off-by: Alex Torok <alext9@gmail.com>
[jc: revamped the fix and used Alex's tests]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using Windows, a user may run 'git sparse-checkout set A\B\C'
to add the Unix-style path A/B/C to their sparse-checkout patterns.
Normalizing the input path converts the backslashes to slashes before we
add the string 'A/B/C' to the recursive hashset.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using the sparse-checkout feature, a user may want to incrementally
grow their sparse-checkout pattern set. Allow adding patterns using a
new 'add' subcommand. This is not much different from the 'set'
subcommand, because we still want to allow the '--stdin' option and
interpret inputs as directories when in cone mode and patterns
otherwise.
When in cone mode, we are growing the cone. This may actually reduce the
set of patterns when adding directory A when A/B is already a directory
in the cone. Test the different cases: siblings, parents, ancestors.
When not in cone mode, we can only assume the patterns should be
appended to the sparse-checkout file.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of adding "add" and "remove" subcommands to the
sparse-checkout builtin, extract a modify_pattern_list() method from the
sparse_checkout_set() method. This command will read input from the
command-line or stdin to construct a set of patterns, then modify the
existing sparse-checkout patterns after a successful update of the
working directory.
Currently, the only way to modify the patterns is to replace all of the
patterns. This will be extended in a later update.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of extending the sparse-checkout builtin with "add"
and "remove" subcommands, extract the code that fills a pattern list
based on the input values. The input changes depending on the
presence of "--stdin" or the value of core.sparseCheckoutCone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming a remote with
git remote rename X Y
git remote remove X
Git already renames or removes any branch.<name>.remote and
branch.<name>.pushRemote configurations if their value is X.
However remote.pushDefault needs a more gentle approach, as this may be
set in a non-repo configuration file. In such a case only a warning is
printed, such as:
warning: The global configuration remote.pushDefault in:
$HOME/.gitconfig:35
now names the non-existent remote origin
It is changed to remote.pushDefault = Y or removed when set in a repo
configuration though.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When renaming or removing a remote with
git remote rename X Y
git remote remove X
Git already renames/removes any config values from
branch.<name>.remote = X
to
branch.<name>.remote = Y
As branch.<name>.pushRemote also names a remote, it now also renames
or removes these config values from
branch.<name>.pushRemote = X
to
branch.<name>.pushRemote = Y
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some minor clean-ups in function `config_read_branches`:
* remove hardcoded length in `key += 7`
* call `xmemdupz` only once
* use a switch to handle the configuration type and add a `BUG()`
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 46af44b07d (pull --rebase=<type>: allow single-letter abbreviations
for the type, 2018-08-04) landed in Git, it had the side effect that
not only 'pull --rebase=<type>' accepted the single-letter abbreviations
but also the 'pull.rebase' and 'branch.<name>.rebase' configurations.
However, 'git remote rename' did not honor these single-letter
abbreviations when reading the 'branch.*.rebase' configurations.
We now document the single-letter abbreviations and both code places
share a common function to parse the values of 'git pull --rebase=*',
'pull.rebase', and 'branches.*.rebase'.
The only functional change is the handling of the `branch_info::rebase`
value. Before it was an unsigned enum, thus the truth value could be
checked with `branch_info::rebase != 0`. But `enum rebase_type` is
signed, thus the truth value must now be checked with
`branch_info::rebase >= REBASE_TRUE`
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a user queries config values with --show-origin, often it's
difficult to determine what the actual "scope" (local, global, etc.) of
a given value is based on just the origin file.
Teach 'git config' the '--show-scope' option to print the scope of all
displayed config values. Note that we should never see anything of
"submodule" scope as that is only ever used by submodule-config.c when
parsing the '.gitmodules' file.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many situations where the scope of a config command is known
beforehand, such as passing of '--local', '--file', etc. to an
invocation of git config. However, this information is lost when moving
from builtin/config.c to /config.c. This historically hasn't been a big
deal, but to prepare for the upcoming --show-scope option we teach
git_config_source to keep track of the source and the config machinery
to use that information to set current_parsing_scope appropriately.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function for inserting a C string into a strbuf. Use it
throughout the source to get rid of magic string length constants and
explicit strlen() calls.
Like strbuf_addstr(), implement it as an inline function to avoid the
implicit strlen() calls to cause runtime overhead.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
advice.addNothing config variable is used to control the visibility of
two advice messages in the add library. This config variable is
replaced by two new variables, whose names are more clear and relevant
to the two cases.
Also add the two new variables to the documentation.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout X" did not correctly fail when X is not a local
branch but could name more than one remote-tracking branches
(i.e. to be dwimmed as the starting point to create a corresponding
local branch), which has been corrected.
* am/checkout-file-and-ref-ref-ambiguity:
checkout: don't revert file on ambiguous tracking branches
parse_branchname_arg(): extract part as new function
The effort to move "git-add--interactive" to C continues.
* js/patch-mode-in-others-in-c:
commit --interactive: make it work with the built-in `add -i`
built-in add -p: implement the "worktree" patch modes
built-in add -p: implement the "checkout" patch modes
built-in stash: use the built-in `git add -p` if so configured
legacy stash -p: respect the add.interactive.usebuiltin setting
built-in add -p: implement the "stash" and "reset" patch modes
built-in add -p: prepare for patch modes other than "stage"
name_ref() is called for each ref and checks if its a better name for
the referenced commit. If that's the case it remembers it and checks if
a name based on it is better for its ancestors as well. This in done in
the the order for_each_ref() imposes on us.
That might not be optimal. If bad names happen to be encountered first
(as defined by is_better_name()), names derived from them may spread to
a lot of commits, only to be replaced by better names later. Setting
better names first can avoid that.
is_better_name() prefers tags, short distances and old references. The
distance is a measure that we need to calculate for each candidate
commit, but the other two properties are not dependent on the
relationships of commits. Sorting the refs by them should yield better
performance than the essentially random order we currently use.
And applying older references first should also help to reduce rework
due to the fact that older commits have less ancestors than newer ones.
So add all details of names to the tip table first, then sort them
to prefer tags and older references and then apply them in this order.
Here's the performance as measures by hyperfine for the Linux repo
before:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 851.1 ms ± 4.5 ms [User: 806.7 ms, System: 44.4 ms]
Range (min … max): 845.9 ms … 859.5 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 736.2 ms ± 8.7 ms [User: 688.4 ms, System: 47.5 ms]
Range (min … max): 726.0 ms … 755.2 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name_rev() assigns a name to a commit and its parents and grandparents
and so on. Commits share their name string with their first parent,
which in turn does the same, recursively to the root. That saves a lot
of allocations. When a better name is found, the old name is replaced,
but its memory is not released. That leakage can become significant.
Can we release these old strings exactly once even though they are
referenced multiple times? Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.
Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0. These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.
That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.
If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.
We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.
For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly. Here's the output of GNU
time before:
0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps
... and with this patch:
1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps
It also gets faster; hyperfine before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s]
Range (min … max): 1.522 s … 1.542 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s]
Range (min … max): 1.327 s … 1.346 s 10 runs
For the Linux repo it doesn't pay off; memory usage only gets down from:
0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps
... to:
0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps
The runtime actually increases slightly from:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms]
Range (min … max): 824.1 ms … 838.9 ms 10 runs
... to:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms]
Range (min … max): 843.4 ms … 854.3 ms 10 runs
Why is that? In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca. 1000x longer in the latter.
Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leave setting the tip_name member of struct rev_name to callers of
create_or_update_name(). This avoids allocations for names that are
rejected by that function. Here's how this affects the runtime when
working with a fresh clone of Git's own repository; performance numbers
by hyperfine before:
Benchmark #1: ./git -C ../git-pristine/ name-rev --all
Time (mean ± σ): 437.8 ms ± 4.0 ms [User: 422.5 ms, System: 15.2 ms]
Range (min … max): 432.8 ms … 446.3 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../git-pristine/ name-rev --all
Time (mean ± σ): 408.5 ms ± 1.4 ms [User: 387.2 ms, System: 21.2 ms]
Range (min … max): 407.1 ms … 411.7 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can calculate the size of new name easily and precisely. Open-code
the xstrfmt() calls and grow the buffers as needed before filling them.
This provides a surprisingly large benefit when working with the
Chromium repository; here are the numbers measured using hyperfine
before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 5.822 s ± 0.013 s [User: 5.304 s, System: 0.516 s]
Range (min … max): 5.803 s … 5.837 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.527 s ± 0.003 s [User: 1.015 s, System: 0.511 s]
Range (min … max): 1.524 s … 1.535 s 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce nesting by moving code to come up with a name for the parent into
its own function.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit slab commit_rev_name contains a pointer to a struct rev_name,
and the actual struct is allocated separatly. Avoid that allocation and
pointer indirection by storing the full struct in the commit slab. Use
the tip_name member pointer to determine if the returned struct is
initialized.
Performance in the Linux repository measured with hyperfine before:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 953.5 ms ± 6.3 ms [User: 901.2 ms, System: 52.1 ms]
Range (min … max): 945.2 ms … 968.5 ms 10 runs
... and with this patch:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 851.0 ms ± 3.1 ms [User: 807.4 ms, System: 43.6 ms]
Range (min … max): 846.7 ms … 857.0 ms 10 runs
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Look up the commit slab slot for the commit once using
commit_rev_name_at() and populate it in case it is empty, instead of
checking for emptiness in a separate step using commit_rev_name_peek()
via get_commit_rev_name().
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
name_ref() duplicates the path string and passes it to name_rev(), which
either puts it into a commit slab or ignores it if there is already a
better name, leaking it. Move the duplication to name_rev() and release
the copy in the latter case.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Keep the const qualifier of the first parameter of get_rev_name() even
when casting the object pointer to a commit pointer, and further for the
parameter of get_commit_rev_name(), as all these uses are read-only.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The type alias became unused with bf43abc6e6 (name-rev: use sizeof(*ptr)
instead of sizeof(type) in allocation, 2019-11-12); remove it.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code was moved straight out of name_rev(). As such, we inherited
the "goto" to jump from an if into an else-if. We also inherited the
fact that "nothing to do -- return NULL" is handled last.
Rewrite the function to first handle the "nothing to do" case. Then we
can handle the conditional allocation early before going on to populate
the struct. No need for goto-ing.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we're resolving a REF_DELTA, we compare-and-swap its type from
REF_DELTA to whatever real type the base object has, as discussed in
ab791dd138 (index-pack: fix race condition with duplicate bases,
2014-08-29). If the old type wasn't a REF_DELTA, we consider that a
BUG(). But as discussed in that commit, we might see this case whenever
we try to resolve an object twice, which may happen because we have
multiple copies of the base object.
So this isn't a bug at all, but rather a sign that the input pack is
broken. And indeed, this case is triggered already in t5309.5 and
t5309.6, which create packs with delta cycles and duplicate bases. But
we never noticed because those tests are marked expect_failure.
Those tests were added by b2ef3d9ebb (test index-pack on packs with
recoverable delta cycles, 2013-08-23), which was leaving the door open
for cases that we theoretically _could_ handle. And when we see an
already-resolved object like this, in theory we could keep going after
confirming that the previously resolved child->real_type matches
base->obj->real_type. But:
- enforcing the "only resolve once" rule here saves us from an
infinite loop in other parts of the code. If we keep going, then the
delta cycle in t5309.5 causes us to loop infinitely, as
find_ref_delta_children() doesn't realize which objects have already
been resolved. So there would be more changes needed to make this
case work, and in the meantime we'd be worse off.
- any pack that triggers this is broken anyway. It either has a
duplicate base object, or it has a cycle which causes us to bring in
a duplicate via --fix-thin. In either case, we'd end up rejecting
the pack in write_idx_file(), which also detects duplicates.
So the tests have little value in documenting what we _could_ be doing
(and have been neglected for 6+ years). Let's switch them to confirming
that we handle this case cleanly (and switch out the BUG() for a more
informative die() so that we do so).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply a similar treatment as in the previous patch to pass a 'struct
object_directory *' through the 'load_commit_graph_one_fd_st'
initializer, too.
This prevents a potential bug where a pointer comparison is made to a
NULL 'g->odb', which would cause the commit-graph machinery to think
that a pair of commit-graphs belonged to different alternates when in
fact they do not (i.e., in the case of no '--object-dir').
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of the previous patch, all calls to 'commit-graph.c' functions which
perform path normalization (for e.g., 'get_commit_graph_filename()') are
of the form 'ctx->odb->path', which is always in normalized form.
Now that there are no callers passing non-normalized paths to these
functions, ensure that future callers are bound by the same restrictions
by making these functions take a 'struct object_directory *' instead of
a 'const char *'. To match, replace all calls with arguments of the form
'ctx->odb->path' with 'ctx->odb' To recover the path, functions that
perform path manipulation simply use 'odb->path'.
Further, avoid string comparisons with arguments of the form
'odb->path', and instead prefer raw pointer comparisons, which
accomplish the same effect, but are far less brittle.
This has a pleasant side-effect of making these functions much more
robust to paths that cannot be normalized by 'normalize_path_copy()',
i.e., because they are outside of the current working directory.
For example, prior to this patch, Valgrind reports that the following
uninitialized memory read [1]:
$ ( cd t && GIT_DIR=../.git valgrind git rev-parse HEAD^ )
because 'normalize_path_copy()' can't normalize '../.git' (since it's
relative to but above of the current working directory) [2].
By using a 'struct object_directory *' directly,
'get_commit_graph_filename()' does not need to normalize, because all
paths are relative to the current working directory since they are
always read from the '->path' of an object directory.
[1]: https://lore.kernel.org/git/20191027042116.GA5801@sigill.intra.peff.net.
[2]: The bug here is that 'get_commit_graph_filename()' returns the
result of 'normalize_path_copy()' without checking the return
value.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a previous patch, the 'char *object_dir' in 'struct commit_graph' was
replaced with a 'struct object_directory'. This patch applies the same
treatment to 'struct commit_graph', which is another intermediate step
towards getting rid of all path normalization in 'commit-graph.c'.
Instead of taking a 'char *object_dir', functions that construct a
'struct commit_graph' now take a 'struct object_directory *'. Any code
that needs an object directory path use '->path' instead.
This ensures that all calls to functions that perform path normalization
are given arguments which do not themselves require normalization. This
prepares those functions to drop their normalization entirely, which
will occur in the subsequent patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are lots of places in 'commit-graph.h' where a function either has
(or almost has) a full 'struct object_directory *', accesses '->path',
and then throws away the rest of the struct.
This can cause headaches when comparing the locations of object
directories across alternates (e.g., in the case of deciding if two
commit-graph layers can be merged). These paths are normalized with
'normalize_path_copy()' which mitigates some comparison issues, but not
all [1].
Replace usage of 'char *object_dir' with 'odb->path' by storing a
'struct object_directory *' in the 'write_commit_graph_context'
structure. This is an intermediate step towards getting rid of all path
normalization in 'commit-graph.c'.
Resolving a user-provided '--object-dir' argument now requires that we
compare it to the known alternates for equality. Prior to this patch,
an unknown '--object-dir' argument would silently exit with status zero.
This can clearly lead to unintended behavior, such as verifying
commit-graphs that aren't in a repository's own object store (or one of
its alternates), or causing a typo to mask a legitimate commit-graph
verification failure. Make this error non-silent by 'die()'-ing when the
given '--object-dir' does not match any known alternate object store.
[1]: In my testing, for example, I can get one side of the commit-graph
code to fill object_dir with "./objects" and the other with just
"objects".
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sparse-checkout patterns allow special globs according to
fnmatch(3). When writing cone-mode patterns for paths containing
these characters, they must be escaped.
Use is_glob_special() to check which characters must be escaped
this way, and add a path to the tests that contains all glob
characters at once. Note that ']' is not special, since the
initial bracket '[' is escaped.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When in cone mode, the 'git sparse-checkout list' subcommand lists
the directories included in the sparse cone. When these directories
contain odd characters, such as a backslash, then we need to use
C-style quotes similar to 'git ls-tree'.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.
Even more specifically, the goal is to always allow the following from
the root of a repo:
git ls-tree --name-only -d HEAD | git sparse-checkout set --stdin
The ls-tree command provides directory names with an unescaped asterisk.
It also quotes the directories that contain an escaped backslash. We
must remove these quotes, then keep the escaped backslashes.
Use unquote_c_style() when parsing lines from stdin. Command-line
arguments will be parsed as-is, assuming the user can do the correct
level of escaping from their environment to match the exact directory
names.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a user somehow creates a directory with an asterisk (*) or backslash
(\), then the "git sparse-checkout set" command will struggle to provide
the correct pattern in the sparse-checkout file. When not in cone mode,
the provided pattern is written directly into the sparse-checkout file.
However, in cone mode we expect a list of paths to directories and then
we convert those into patterns.
However, there is some care needed for the timing of these escapes. The
in-memory pattern list is used to update the working directory before
writing the patterns to disk. Thus, we need the command to have the
unescaped names in the hashsets for the cone comparisons, then escape
the patterns later.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We often skip an optional prefix in a string with a hardcoded
constant, e.g.
if (starts_with(string, "prefix"))
string += 6;
which is less error prone when written
skip_prefix(string, "prefix", &string);
Note that this changes a few error messages from "git reflog expire
--expire=nonsense.timestamp", which used to complain by saying
'--expire=nonsense.timestamp' is not a valid timestamp
but with this change, we say
'nonsense.timestamp' is not a valid timestamp
which is more technically correct (the string with --expire= as
a prefix obviously cannot be a valid timestamp, but the error is
about the part of the input without that prefix).
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers of open_istream() at archive-tar.c and archive-zip.c are
capable of working on arbitrary repositories but the repo struct is not
passed down to open_istream(), which uses the_repository internally. For
now, that's not a problem since the said callers are only being called
with the_repository. But to be consistent and avoid future problems,
let's allow open_istream() to receive a struct repository and use that
instead of the_repository. This parameter addition will also be used in
a future patch to make sha1-file.c:check_object_signature() be able to
work on arbitrary repos.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gpg.minTrustLevel configuration variable has been introduced to
tell various signature verification codepaths the required minimum
trust level.
* hi/gpg-mintrustlevel:
gpg-interface: add minTrustLevel as a configuration option
If a filter is specified, we do not need a full connectivity check on
the contents of the packfile we just fetched; we only need to check that
the objects referenced are promisor objects.
This significantly speeds up fetches into repositories that have many
promisor objects, because during the connectivity check, all promisor
objects are enumerated (to mark them UNINTERESTING), and that takes a
significant amount of time.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit dfa33a298d ("clone: do faster object check for partial clones",
2019-04-21) optimized the connectivity check done when cloning with
--filter to check only the existence of objects directly pointed to by
refs. But this is not sufficient: they also need to be promisor objects.
Make this check more robust by instead checking that these objects are
promisor objects, that is, they appear in a promisor pack.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since grep learned to recurse into submodules in 0281e487fd
(grep: optionally recurse into submodules, 2016-12-16),
using --recurse-submodules along with --no-index makes Git
die().
This is unfortunate because if submodule.recurse is set in a user's
~/.gitconfig, invoking `git grep --no-index` either inside or outside
a Git repository results in
fatal: option not supported with --recurse-submodules
Let's allow using these options together, so that setting submodule.recurse
globally does not prevent using `git grep --no-index`.
Using `--recurse-submodules` should not have any effect if `--no-index`
is used inside a repository, as Git will recurse into the checked out
submodule directories just like into regular directories.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Original bash helper for "submodule status" was doing a check for
initialized but not cloned submodules and prefixed the status with
a minus sign in case no .git file or folder was found inside the
submodule directory.
This check was missed when the original port of the functionality
from bash to C was done.
Signed-off-by: Peter Kaestle <peter.kaestle@nokia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --sparse option was added to the clone builtin in d89f09c (clone:
add --sparse mode, 2019-11-21) and was tested with a local path clone
in t1091-sparse-checkout-builtin.sh. However, due to a difference in
how local paths are handled versus URLs, this mechanism does not work
with URLs.
Modify the test to use a "file://" URL, which would output this error
before the code change:
Cloning into 'clone'...
fatal: cannot change to 'file://.../repo': No such file or directory
error: failed to initialize sparse-checkout
These errors are due to using a "-C <path>" option to call 'git -C
<path> sparse-checkout init' but the URL is being given instead of
the target directory.
Update that target directory to evaluate this correctly. I have also
manually tested that https:// URLs are handled correctly as well.
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'git init' command creates the ".git/info" directory and fills it
with some default files. However, 'git worktree add' does not create
the info directory for that worktree. This causes a problem when running
"git sparse-checkout init" inside a worktree. While care was taken to
allow the sparse-checkout config to be specific to a worktree, this
initialization was untested.
Safely create the leading directories for the sparse-checkout file. This
is the safest thing to do even without worktrees, as a user could delete
their ".git/info" directory and expect Git to recover safely.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In git config use of the end_null variable to determine if we should be
null terminating our output. While it is correct to say a string is
"null terminated" the character is actually the "nul" character, so this
malapropism is being fixed.
Signed-off-by: Matthew Rogers <mattr94@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the first things done when using a sequencer-based
rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo
list. This requires knowledge of the commit range to rebase. To get
the oid of the last commit of the range, the tip of the branch to rebase
is checked out with prepare_branch_to_be_rebased(), then the oid of the
head is read. After this, the tip of the branch is not even modified.
The `am' backend, on the other hand, does not check out the branch.
On big repositories, it's a performance penalty: with `rebase -i', the
user may have to wait before editing the todo list while git is
extracting the branch silently, and "quiet" rebases will be slower than
`am'.
Since we already have the oid of the tip of the branch in
`opts->orig_head', it's useless to switch to this commit.
This removes the call to prepare_branch_to_be_rebased() in
do_interactive_rebase(), and adds a `orig_head' parameter to
get_revision_ranges(). prepare_branch_to_be_rebased() is removed as it
is no longer used.
This introduces a visible change: as we do not switch on the tip of the
branch to rebase, no reflog entry is created at the beginning of the
rebase for it.
Unscientific performance measurements, performed on linux.git, are as
follow:
Before this patch:
$ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
real 0m8,940s
user 0m6,830s
sys 0m2,121s
After this patch:
$ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
real 0m1,834s
user 0m0,916s
sys 0m0,206s
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Additional checks are added in have_duplicate_entry() and
obj_is_packed() to avoid duplicate objects in the reuse
bitmap. It was probably buggy to not have such a check
before.
Git as a client would never both asks for a tag by sha1 and
specify "include-tag", but libgit2 will, so a libgit2 client
cloning from a Git server would trigger the bug.
If a client both asks for a tag by sha1 and specifies
"include-tag", we may end up including the tag in the reuse
bitmap (due to the first thing), and then later adding it to
the packlist (due to the second). This results in duplicate
objects in the pack, which git chokes on. We should notice
that we are already including it when doing the include-tag
portion, and avoid adding it to the packlist.
The simplest place to fix this is right in add_ref_tag(),
where we could avoid peeling the tag at all if we know that
we are already including it. However, this pushes the check
instead into have_duplicate_entry(). This fixes not only
this case, but also means that we cannot have any similar
problems lurking in other code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code to reuse deltas from an existing packfile
just tried to dump a whole segment of the pack verbatim.
That's faster than the traditional way of actually adding
objects to the packing list, but it didn't kick in very
often. This new code is really going for a middle ground:
do _some_ per-object work, but way less than we'd
traditionally do.
The general strategy of the new code is to make a bitmap
of objects from the packfile we'll include, and then
iterate over it, writing out each object exactly as it is
in our on-disk pack, but _not_ adding it to our packlist
(which costs memory, and increases the search space for
deltas).
One complication is that if we're omitting some objects,
we can't set a delta against a base that we're not
sending. So we have to check each object in
try_partial_reuse() to make sure we have its delta.
About performance, in the worst case we might have
interleaved objects that we are sending or not sending,
and we'd have as many chunks as objects. But in practice
we send big chunks.
For instance, packing torvalds/linux on GitHub servers
now reused 6.5M objects, but only needed ~50k chunks.
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's refactor the way we check if an object is packed by
introducing obj_is_packed(). This function is now a simple
wrapper around packlist_find(), but it will evolve in a
following commit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's make it possible to configure if we want pack reuse or not.
The main reason it might not be wanted is probably debugging and
performance testing, though pack reuse _might_ cause larger packs,
because we wouldn't consider the reused objects as bases for
finding new deltas.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git restore --staged" did not correctly update the cache-tree
structure, resulting in bogus trees to be written afterwards, which
has been corrected.
* nd/switch-and-restore:
restore: invalidate cache-tree when removing entries with --staged
"git commit" gives output similar to "git status" when there is
nothing to commit, but without honoring the advise.statusHints
configuration variable, which has been corrected.
* hw/commit-advise-while-rejecting:
commit: honor advice.statusHints when rejecting an empty commit
Commit b00bf1c9a8 ("git-rebase: make --allow-empty-message the
default", 2018-06-27) made --allow-empty-message the default and thus
turned --allow-empty-message into a no-op but did not update the
documentation to reflect this. Update the documentation now, and hide
the option from the normal -h output since it is not useful.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In builtin/grep.c:add_work() we pre-load the userdiff drivers before
adding the grep_source in the todo list. This operation is currently
being performed after acquiring the grep_mutex, but as it's already
thread-safe, we don't need to protect it here. So let's move it out of
the critical section which should avoid thread contention and improve
performance.
Running[1] `git grep --threads=8 abcd[02] HEAD` on chromium's
repository[2], I got the following mean times for 30 executions after 2
warmups:
Original | 6.2886s
-------------------------|-----------
Out of critical section | 5.7852s
[1]: Tests performed on an i7-7700HQ with 16GB of RAM and SSD, running
Manjaro Linux.
[2]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
04-06-2019), after a 'git gc' execution.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They were disabled at 53b8d93 ("grep: disable threading in non-worktree
case", 12-12-2011), due to observable performance drops (to the point
that using a single thread would be faster than multiple threads). But
now that zlib inflation can be performed in parallel we can regain the
speedup, so let's re-enable threads in non-worktree grep.
Grepping 'abcd[02]' ("Regex 1") and '(static|extern) (int|double) \*'
("Regex 2") at chromium's repository[1] I got:
Threads | Regex 1 | Regex 2
---------|------------|-----------
1 | 17.2920s | 20.9624s
2 | 9.6512s | 11.3184s
4 | 6.7723s | 7.6268s
8** | 6.2886s | 6.9843s
These are all means of 30 executions after 2 warmup runs. All tests were
executed on an i7-7700HQ (quad-core w/ hyper-threading), 16GB of RAM and
SSD, running Manjaro Linux. But to make sure the optimization also
performs well on HDD, the tests were repeated on another machine with an
i5-4210U (dual-core w/ hyper-threading), 8GB of RAM and HDD (SATA III,
5400 rpm), also running Manjaro Linux:
Threads | Regex 1 | Regex 2
---------|------------|-----------
1 | 18.4035s | 22.5368s
2 | 12.5063s | 14.6409s
4** | 10.9136s | 12.7106s
** Note that in these cases we relied on hyper-threading, and that's
probably why we don't see a big difference in time.
Unfortunately, multithreaded git-grep might be slow in the non-worktree
case when --textconv is used and there're too many text conversions.
Probably the reason for this is that the object read lock is used to
protect fill_textconv() and therefore there is a mutual exclusion
between textconv execution and object reading. Because both are
time-consuming operations, not being able to perform them in parallel
can cause performance drops. To inform the users about this (and other
threading details), let's also add a "NOTES ON THREADS" section to
Documentation/git-grep.txt.
[1]: chromium’s repo at commit 03ae96f (“Add filters testing at DSF=2”,
04-06-2019), after a 'git gc' execution.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some fields in struct raw_object_store are lazy initialized by the
thread-unsafe packfile.c:prepare_packed_git(). Although this function is
present in the call stack of git-grep threads, all paths to it are
currently protected by obj_read_lock() (and the main thread usually
indirectly calls it before firing the worker threads, anyway). However,
it's possible that future modifications add new unprotected paths to it,
introducing a race condition. Because errors derived from it wouldn't
happen often, it could be hard to detect. So to prevent future
headaches, let's force eager initialization of packed_git when setting
git-grep up. There'll be a small overhead in the cases where we didn't
really need to prepare packed_git during execution but this shouldn't be
very noticeable.
Also, packed_git may be re-initialized by
packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep
are already protected by obj_read_lock() but it may suffer from the same
problem in the future. So let's also internally protect it with
obj_read_lock() (which is a recursive mutex).
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that object reading operations are internally protected, the
submodule initialization functions at builtin/grep.c:grep_submodule()
are very close to being thread-safe. Let's take a look at each call and
remove from the critical section what we can, for better performance:
- submodule_from_path() and is_submodule_active() cannot be called in
parallel yet only because they call repo_read_gitmodules() which
contains, in its call stack, operations that would otherwise be in
race condition with object reading (for example parse_object() and
is_promisor_remote()). However, they only call repo_read_gitmodules()
if it wasn't read before. So let's pre-read it before firing the
threads and allow these two functions to safely be called in
parallel.
- repo_submodule_init() is already thread-safe, so remove it from the
critical section without other necessary changes.
- The repo_read_gitmodules(&subrepo) call at grep_submodule() is safe as
no other thread is performing object reading operations in the subrepo
yet. However, threads might be working in the superproject, and this
function calls add_to_alternates_memory() internally, which is racy
with object readings in the superproject. So it must be kept
protected for now. Let's add a "NEEDSWORK" to it, informing why it
cannot be removed from the critical section yet.
- Finally, add_to_alternates_memory() must be kept protected for the
same reason as the item above.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, submodule-config.c doesn't have an externally accessible
function to read gitmodules only if it wasn't already read. But this
exact behavior is internally implemented by gitmodules_read_check(), to
perform a lazy load. Let's merge this function with
repo_read_gitmodules() adding a 'skip_if_read' which allows both
internal and external callers to access this functionality. This
simplifies a little the code. The added option will also be used in
the following patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-grep uses 'grep_read_mutex' to protect its calls to object reading
operations. But these have their own internal lock now, which ensures a
better performance (allowing parallel access to more regions). So, let's
remove the former and, instead, activate the latter with
enable_obj_read_lock().
Sections that are currently protected by 'grep_read_mutex' but are not
internally protected by the object reading lock should be surrounded by
obj_read_lock() and obj_read_unlock(). These guarantee mutual exclusion
with object reading operations, keeping the current behavior and
avoiding race conditions. Namely, these places are:
In grep.c:
- fill_textconv() at fill_textconv_grep().
- userdiff_get_textconv() at grep_source_1().
In builtin/grep.c:
- parse_object_or_die() and the submodule functions at
grep_submodule().
- deref_tag() and gitmodules_config_oid() at grep_objects().
If these functions become thread-safe, in the future, we might remove
the locking and probably get some speedup.
Note that some of the submodule functions will already be thread-safe
(or close to being thread-safe) with the internal object reading lock.
However, as some of them will require additional modifications to be
removed from the critical section, this will be done in its own patch.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
deref_tag() calls is_promisor_object() and parse_object(), both of which
perform lazy initializations and other thread-unsafe operations. If it
was only called by grep_objects() this wouldn't be a problem as the
latter is only executed by the main thread. However, deref_tag() is also
present in read_object_file()'s call stack. So calling deref_tag() in
grep_objects() without acquiring the grep_read_mutex may incur in a race
condition with object reading operations (such as the ones internally
performed by fill_textconv(), called at fill_textconv_grep()). The same
problem happens with the call to gitmodules_config_oid() which also has
parse_object() in its call stack. Fix that protecting both calls with
the said grep_read_mutex.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There're currently two function calls in builtin/grep.c:grep_submodule()
which might result in race conditions:
- submodule_from_path(): it has config_with_options() in its call stack
which, in turn, may have read_object_file() in its own. Therefore,
calling the first function without acquiring grep_read_mutex may end
up causing a race condition with other object read operations
performed by worker threads (for example, at the fill_textconv()
call in grep.c:fill_textconv_grep()).
- parse_object_or_die(): it falls into the same problem, having
repo_has_object_file(the_repository, ...) in its call stack. Besides
that, parse_object(), which is also called by parse_object_or_die(),
is thread-unsafe and also called by object reading functions.
It's unlikely to really fall into a data race with these operations as
the volume of calls to them is usually very low. But we better protect
ourselves against this possibility, anyway. So, to solve these issues,
move both of these function calls into the critical section of
grep_read_mutex.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature(). If that was the case, the process die()d.
The other code paths that did signature verification relied entirely on
the return code from check_commit_signature(). And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().
This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).
The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`). These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].
The GPG documentation says the following on the TRUST_ status codes [1]:
"""
These are several similar status codes:
- TRUST_UNDEFINED <error_token>
- TRUST_NEVER <error_token>
- TRUST_MARGINAL [0 [<validation_model>]]
- TRUST_FULLY [0 [<validation_model>]]
- TRUST_ULTIMATE [0 [<validation_model>]]
For good signatures one of these status lines are emitted to
indicate the validity of the key used to create the signature.
The error token values are currently only emitted by gpgsm.
"""
My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature. That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.
The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).
I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).
I also think it makes sense to not store the trust level in the same
struct member as the key/signature status. While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.
This patch introduces a new configuration option: gpg.minTrustLevel. It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.
Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced. If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.
Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure. A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.
Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature(). This would also have made the
behavior consistent with other parts of git that perform signature
verification. However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case. For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].
[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] 9674c1991d/scripts/verify-git-tag (L43)
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the advise function in advice.c to display hints to the users, as
it provides a neat and a standard format for hint messages, i.e: the
text is colored in yellow and the line starts by the word "hint:".
Also this will enable us to control the messages using advice.*
configuration variables.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 5d9324e0f4, reversing
changes made to c58ae96fc4.
The topic turns out to be too buggy for real use.
cf. <f2fe7437-8a48-3315-4d3f-8d51fe4bb8f1@gmail.com>
When "git restore --staged <path>" removes a path that's in the index,
it marks the entry with CE_REMOVE, but we don't do anything to
invalidate the cache-tree. In the non-staged case, we end up in
checkout_worktree(), which calls remove_marked_cache_entries(). That
actually drops the entries from the index, as well as invalidating the
cache-tree and untracked-cache.
But with --staged, we never call checkout_worktree(), and the CE_REMOVE
entries remain. Interestingly, they are dropped when we write out the
index, but that means the resulting index is inconsistent: its
cache-tree will not match the actual entries, and running "git commit"
immediately after will create the wrong tree.
We can solve this by calling remove_marked_cache_entries() ourselves
before writing out the index. Note that we can't just hoist it out of
checkout_worktree(); that function needs to iterate over the CE_REMOVE
entries (to drop their matching worktree files) before removing them.
One curiosity about the test: without this patch, it actually triggers a
BUG() when running git-restore:
BUG: cache-tree.c:810: new1 with flags 0x4420000 should not be in cache-tree
But in the original problem report, which used a similar recipe,
git-restore actually creates the bogus index (and the commit is created
with the wrong tree). I'm not sure why the test here behaves differently
than my out-of-suite reproduction, but what's here should catch either
symptom (and the fix corrects both cases).
Reported-by: Torsten Krah <krah.tm@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For easier understanding, here are the existing good scenarios:
1) Have *no* file 'foo', *no* local branch 'foo' and a *single*
remote branch 'foo'
2) `git checkout foo` will create local branch foo, see [1]
and
1) Have *a* file 'foo', *no* local branch 'foo' and a *single*
remote branch 'foo'
2) `git checkout foo` will complain, see [3]
This patch prevents the following scenario:
1) Have *a* file 'foo', *no* local branch 'foo' and *multiple*
remote branches 'foo'
2) `git checkout foo` will successfully... revert contents of
file `foo`!
That is, adding another remote suddenly changes behavior significantly,
which is a surprise at best and could go unnoticed by user at worst.
Please see [3] which gives some real world complaints.
To my understanding, fix in [3] overlooked the case of multiple remotes,
and the whole behavior of falling back to reverting file was never
intended:
[1] introduces the unexpected behavior. Before, there was fallback
from not-a-ref to pathspec. This is reasonable fallback. After, there
is another fallback from ambiguous-remote to pathspec. I understand
that it was a copy&paste oversight.
[2] noticed the unexpected behavior but chose to semi-document it
instead of forbidding, because the goal of the patch series was
focused on something else.
[3] adds `die()` when there is ambiguity between branch and file. The
case of multiple tracking branches is seemingly overlooked.
The new behavior: if there is no local branch and multiple remote
candidates, just die() and don't try reverting file whether it
exists (prevents surprise) or not (improves error message).
[1] Commit 70c9ac2f ("DWIM "git checkout frotz" to "git checkout -b frotz origin/frotz"" 2009-10-18)
https://public-inbox.org/git/7vaazpxha4.fsf_-_@alter.siamese.dyndns.org/
[2] Commit ad8d5104 ("checkout: add advice for ambiguous "checkout <branch>"", 2018-06-05)
https://public-inbox.org/git/20180502105452.17583-1-avarab@gmail.com/
[3] Commit be4908f1 ("checkout: disambiguate dwim tracking branches and local files", 2018-11-13)
https://public-inbox.org/git/20181110120707.25846-1-pclouds@gmail.com/
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is done for the next commit to avoid crazy 7x tab code padding.
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to write split commit-graph file(s) upon fetching computed
bogus value for the parameter used in splitting the resulting
files, which has been corrected.
* ds/commit-graph-set-size-mult:
commit-graph: prefer default size_mult when given zero
"git sparse-checkout list" subcommand learned to give its output in
a more concise form when the "cone" mode is in effect.
* ds/sparse-list-in-cone-mode:
sparse-checkout: document interactions with submodules
sparse-checkout: list directories in cone mode
In 50f26bd ("fetch: add fetch.writeCommitGraph config setting",
2019-09-02), the fetch builtin added the capability to write a
commit-graph using the "--split" feature. This feature creates
multiple commit-graph files, and those can merge based on a set
of "split options" including a size multiple. The default size
multiple is 2, which intends to provide a log_2 N depth of the
commit-graph chain where N is the number of commits.
However, I noticed during dogfooding that my commit-graph chains
were becoming quite large when left only to builds by 'git fetch'.
It turns out that in split_graph_merge_strategy(), we default the
size_mult variable to 2 except we override it with the context's
split_opts if they exist. In builtin/fetch.c, we create such a
split_opts, but do not populate it with values.
This problem is due to two failures:
1. It is unclear that we can add the flag COMMIT_GRAPH_WRITE_SPLIT
with a NULL split_opts.
2. If we have a non-NULL split_opts, then we override the default
values even if a zero value is given.
Correct both of these issues. First, do not override size_mult when
the options provide a zero value. Second, stop creating a split_opts
in the fetch builtin.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rebase --signoff" stopped working when the command was written
in C, which has been corrected.
* en/rebase-signoff-fix:
rebase: fix saving of --signoff state for am-based rebases
When core.sparseCheckoutCone is enabled, the 'git sparse-checkout set'
command takes a list of directories as input, then creates an ordered
list of sparse-checkout patterns such that those directories are
recursively included and all sibling entries along the parent directories
are also included. Listing the patterns is less user-friendly than the
directories themselves.
In cone mode, and as long as the patterns match the expected cone-mode
pattern types, change the output of 'git sparse-checkout list' to only
show the directories that created the patterns.
With this change, the following piped commands would not change the
working directory:
git sparse-checkout list | git sparse-checkout set --stdin
The only time this would not work is if core.sparseCheckoutCone is
true, but the sparse-checkout file contains patterns that do not
match the expected pattern types for cone mode.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The effort to move "git-add--interactive" to C continues.
* js/add-p-in-c:
built-in add -p: show helpful hint when nothing can be staged
built-in add -p: only show the applicable parts of the help text
built-in add -p: implement the 'q' ("quit") command
built-in add -p: implement the '/' ("search regex") command
built-in add -p: implement the 'g' ("goto") command
built-in add -p: implement hunk editing
strbuf: add a helper function to call the editor "on an strbuf"
built-in add -p: coalesce hunks after splitting them
built-in add -p: implement the hunk splitting feature
built-in add -p: show different prompts for mode changes and deletions
built-in app -p: allow selecting a mode change as a "hunk"
built-in add -p: handle deleted empty files
built-in add -p: support multi-file diffs
built-in add -p: offer a helpful error message when hunk navigation failed
built-in add -p: color the prompt and the help text
built-in add -p: adjust hunk headers as needed
built-in add -p: show colored hunks by default
built-in add -i: wire up the new C code for the `patch` command
built-in add -i: start implementing the `patch` functionality in C
Code cleanup.
* rs/ref-read-cleanup:
remote: pass NULL to read_ref_full() because object ID is not needed
refs: pass NULL to refs_read_ref_full() because object ID is not needed
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.
* ds/sparse-cone: (21 commits)
sparse-checkout: improve OS ls compatibility
sparse-checkout: respect core.ignoreCase in cone mode
sparse-checkout: check for dirty status
sparse-checkout: update working directory in-process for 'init'
sparse-checkout: cone mode should not interact with .gitignore
sparse-checkout: write using lockfile
sparse-checkout: use in-process update for disable subcommand
sparse-checkout: update working directory in-process
sparse-checkout: sanitize for nested folders
unpack-trees: add progress to clear_ce_flags()
unpack-trees: hash less in cone mode
sparse-checkout: init and set in cone mode
sparse-checkout: use hashmaps for cone patterns
sparse-checkout: add 'cone' mode
trace2: add region in clear_ce_flags
sparse-checkout: create 'disable' subcommand
sparse-checkout: add '--stdin' option to set subcommand
sparse-checkout: 'set' subcommand
clone: add --sparse mode
sparse-checkout: create 'init' subcommand
...
Redo "git name-rev" to avoid recursive calls.
* sg/name-rev-wo-recursion:
name-rev: cleanup name_ref()
name-rev: eliminate recursion in name_rev()
name-rev: use 'name->tip_name' instead of 'tip_name'
name-rev: drop name_rev()'s 'generation' and 'distance' parameters
name-rev: restructure creating/updating 'struct rev_name' instances
name-rev: restructure parsing commits and applying date cutoff
name-rev: pull out deref handling from the recursion
name-rev: extract creating/updating a 'struct name_rev' into a helper
t6120: add a test to cover inner conditions in 'git name-rev's name_rev()
name-rev: use sizeof(*ptr) instead of sizeof(type) in allocation
name-rev: avoid unnecessary cast in name_ref()
name-rev: use strbuf_strip_suffix() in get_rev_name()
t6120-describe: modernize the 'check_describe' helper
t6120-describe: correct test repo history graph in comment
"git format-patch" can take a set of configured format.notes values
to specify which notes refs to use in the log message part of the
output. The behaviour of this was not consistent with multiple
--notes command line options, which has been corrected.
* dl/format-patch-notes-config-fixup:
notes.h: fix typos in comment
notes: break set_display_notes() into smaller functions
config/format.txt: clarify behavior of multiple format.notes
format-patch: move git_config() before repo_init_revisions()
format-patch: use --notes behavior for format.notes
notes: extract logic into set_display_notes()
notes: create init_display_notes() helper
notes: rename to load_display_notes()
A few more commands learned the "--pathspec-from-file" command line
option.
* am/pathspec-f-f-checkout:
checkout, restore: support the --pathspec-from-file option
doc: restore: synchronize <pathspec> description
doc: checkout: synchronize <pathspec> description
doc: checkout: fix broken text reference
doc: checkout: remove duplicate synopsis
add: support the --pathspec-from-file option
cmd_add: prepare for next patch
An earlier series to teach "--pathspec-from-file" to "git commit"
forgot to make the option incompatible with "--all", which has been
corrected.
* am/pathspec-from-file:
commit: forbid --pathspec-from-file --all
The built-in `git add -i` machinery obviously has its `the_repository`
structure initialized at the point where `cmd_commit()` calls it, and
therefore does not look at the environment variable `GIT_INDEX_FILE`.
But when being called from `commit --interactive`, it has to, because
the index was already locked in that case, and we want to ask the
interactive add machinery to work on the `index.lock` file instead of
the `index` file.
Technically, we could teach `run_add_i()`, or for that matter
`run_add_p()`, to look specifically at that environment variable, but
the entire idea of passing in a parameter of type `struct repository *`
is to allow working on multiple repositories (and their index files)
independently.
So let's instead override the `index_file` field of that structure
temporarily.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a straight-forward port of 2f0896ec3a (restore: support
--patch, 2019-04-25) which added support for `git restore -p`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch teaches the built-in `git add -p` machinery all the tricks it
needs to know in order to act as the work horse for `git checkout -p`.
Apart from the minor changes (slightly reworded messages, different
`diff` and `apply --check` invocations), it requires a new function to
actually apply the changes, as `git checkout -p` is a bit special in
that respect: when the desired changes do not apply to the index, but
apply to the work tree, Git does not fail straight away, but asks the
user whether to apply the changes to the worktree at least.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The scripted version of `git stash` called directly into the Perl script
`git-add--interactive.perl`, and this was faithfully converted to C.
However, we have a much better way to do this now: call the internal API
directly, which will now incidentally also respect the
`add.interactive.useBuiltin` setting. Let's just do this.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As `git add` traditionally did not expose the `--patch=<mode>` modes via
command-line options, the scripted version of `git stash` had to call
`git add--interactive` directly.
But this prevents the built-in `add -p` from kicking in, as
`add--interactive` is the scripted version (which does not have a
"fall-back" to the built-in version).
So let's introduce support for internal switch for `git add` that the
scripted `git stash` can use to call the appropriate backend (scripted
or built-in, depending on `add.interactive.useBuiltin`).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git stash` and `git reset` commands support a `--patch` option, and
both simply hand off to `git add -p` to perform that work. Let's teach
the built-in version of that command to be able to perform that work, too.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The Perl script backing `git add -p` is used not only for that command,
but also for `git stash -p`, `git reset -p` and `git checkout -p`.
In preparation for teaching the C version of `git add -p` to support
also the latter commands, let's abstract away what is "stage" specific
into a dedicated data structure describing the differences between the
patch modes.
Finally, please note that the Perl version tries to make sure that the
diffs are only generated for the modified files. This is not actually
necessary, as the calls to Git's diff machinery already perform that
work, and perform it well. This makes it unnecessary to port the
`FILTER` field of the `%patch_modes` struct, as well as the
`get_diff_reference()` function.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was an error introduced in the conversion from shell in commit
21853626ea ("built-in rebase: call `git am` directly", 2019-01-18),
which was noticed by a random browsing of the code.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ea9882bfc4 (commit: disable status hints when writing to
COMMIT_EDITMSG, 2013-09-12) the intent was to disable status hints
when writing to COMMIT_EDITMSG, because giving the hints in the "git
status" like output in the commit message template are too late to
be useful (they say things like "'git add' to stage", but that is
only possible after aborting the current "git commit" session).
But there is one case that the hints can be useful: When the current
attempt to commit is rejected because no change is recorded in the
index. The message is given and "git commit" errors out, so the
hints can immediately be followed by the user. Teach the codepath
to honor the configuration variable.
Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
I forgot this in my previous patch `--pathspec-from-file` for
`git commit` [1]. When both `--pathspec-from-file` and `--all` were
specified, `--all` took precedence and `--pathspec-from-file` was
ignored. Before `--pathspec-from-file` was implemented, this case was
prevented by this check in `parse_and_validate_options()` :
die(_("paths '%s ...' with -a does not make sense"), argv[0]);
It is unfortunate that these two cases are disconnected. This came as
result of how the code was laid out before my patches, where `pathspec`
is parsed outside of `parse_and_validate_options()`. This branch is
already full of refactoring patches and I did not dare to go for another
one.
Fix by mirroring `die()` for `--pathspec-from-file` as well.
[1] Commit e440fc58 ("commit: support the --pathspec-from-file option" 2019-11-19)
Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using a pointer that points at a constant string,
just give name directly to the constant string; this way, we
do not have to allocate a pointer variable in addition to
the string we want to use.
Let's convert `need_bad_and_good_revision_warning` and
`need_bisect_start_warning` char pointers to char arrays.
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* dl/range-diff-with-notes:
range-diff: clear `other_arg` at end of function
range-diff: mark pointers as const
t3206: fix incorrect test name
"git rebase" did not work well when format.useAutoBase
configuration variable is set, which has been corrected.
* dl/rebase-with-autobase:
rebase: fix format.useAutoBase breakage
format-patch: teach --no-base
t4014: use test_config()
format-patch: fix indentation
t3400: demonstrate failure with format.useAutoBase
Reduce unnecessary reading of state variables back from the disk
during sequencer operation.
* ag/sequencer-todo-updates:
sequencer: directly call pick_commits() from complete_action()
rebase: fill `squash_onto' in get_replay_opts()
sequencer: move the code writing total_nr on the disk to a new function
sequencer: update `done_nr' when skipping commands in a todo list
sequencer: update `total_nr' when adding an item to a todo list
In the previous steps, we re-implemented the main loop of `git add -i`
in C, and most of the commands.
Notably, we left out the actual functionality of `patch`, as the
relevant code makes up more than half of `git-add--interactive.perl`,
and is actually pretty independent of the rest of the commands.
With this commit, we start to tackle that `patch` part. For better
separation of concerns, we keep the code in a separate file,
`add-patch.c`. The new code is still guarded behind the
`add.interactive.useBuiltin` config setting, and for the moment,
it can only be called via `git add -p`.
The actual functionality follows the original implementation of
5cde71d64a (git-add --interactive, 2006-12-10), but not too closely
(for example, we use string offsets rather than copying strings around,
and after seeing whether the `k` and `j` commands are applicable, in the
C version we remember which previous/next hunk was undecided, and use it
rather than looking again when the user asked to jump).
As a further deviation from that commit, We also use a comma instead of
a slash to separate the available commands in the prompt, as the current
version of the Perl script does this, and we also add a line about the
question mark ("print help") to the help text.
While it is tempting to use this conversion of `git add -p` as an excuse
to work on `apply_all_patches()` so that it does _not_ want to read a
file from `stdin` or from a file, but accepts, say, an `strbuf` instead,
we will refrain from this particular rabbit hole at this stage.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a user uses the sparse-checkout feature in cone mode, they
add patterns using "git sparse-checkout set <dir1> <dir2> ..."
or by using "--stdin" to provide the directories line-by-line over
stdin. This behaviour naturally looks a lot like the way a user
would type "git add <dir1> <dir2> ..."
If core.ignoreCase is enabled, then "git add" will match the input
using a case-insensitive match. Do the same for the sparse-checkout
feature.
Perform case-insensitive checks while updating the skip-worktree
bits during unpack_trees(). This is done by changing the hash
algorithm and hashmap comparison methods to optionally use case-
insensitive methods.
When this is enabled, there is a small performance cost in the
hashing algorithm. To tease out the worst possible case, the
following was run on a repo with a deep directory structure:
git ls-tree -d -r --name-only HEAD |
git sparse-checkout set --stdin
The 'set' command was timed with core.ignoreCase disabled or
enabled. For the repo with a deep history, the numbers were
core.ignoreCase=false: 62s
core.ignoreCase=true: 74s (+19.3%)
For reproducibility, the equivalent test on the Linux kernel
repository had these numbers:
core.ignoreCase=false: 3.1s
core.ignoreCase=true: 3.6s (+16%)
Now, this is not an entirely fair comparison, as most users
will define their sparse cone using more shallow directories,
and the performance improvement from eb42feca97 ("unpack-trees:
hash less in cone mode" 2019-11-21) can remove most of the
hash cost. For a more realistic test, drop the "-r" from the
ls-tree command to store only the first-level directories.
In that case, the Linux kernel repository takes 0.2-0.25s in
each case, and the deep repository takes one second, plus or
minus 0.05s, in each case.
Thus, we _can_ demonstrate a cost to this change, but it is
unlikely to matter to any reasonable sparse-checkout cone.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 8164c961e1 (format-patch: use --notes behavior for format.notes,
2019-12-09), we introduced set_display_notes() which was a monolithic
function with three mutually exclusive branches. Break the function up
into three small and simple functions that each are only responsible for
one task.
This family of functions accepts an `int *show_notes` instead of
returning a value suitable for assignment to `show_notes`. This is for
two reasons. First of all, this guarantees that the external
`show_notes` variable changes in lockstep with the
`struct display_notes_opt`. Second, this prompts future developers to be
careful about doing something meaningful with this value. In fact, a
NULL check is intentionally omitted because causing a segfault here
would tell the future developer that they are meant to use the value for
something meaningful.
One alternative was making the family of functions accept a
`struct rev_info *` instead of the `struct display_notes_opt *`, since
the former contains the `show_notes` field as well. This does not work
because we have to call git_config() before repo_init_revisions().
However, if we had a `struct rev_info`, we'd need to initialize it before
it gets assigned values from git_config(). As a result, we break the
circular dependency by having standalone `int show_notes` and
`struct display_notes_opt notes_opt` variables which temporarily hold
values from git_config() until the values are copied over to `rev`.
To implement this change, we need to get a pointer to
`rev_info::show_notes`. Unfortunately, this is not possible with
bitfields and only direct-assignment is possible. Change
`rev_info::show_notes` to a non-bitfield int so that we can get its
address.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_ref_full() wraps refs_read_ref_full(), which in turn wraps
refs_resolve_ref_unsafe(), which handles a NULL oid pointer of callers
not interested in the resolved object ID. Make use of that feature to
document that mv() is such a caller.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 5e82c3dd22 (bisect--helper: `bisect_reset` shell function in C,
2019-01-02), the `git bisect reset` subcommand was ported to C. When the
call to `git checkout` failed, an error message was reported to the
user.
However, this error message used the `strbuf` that had just been
released already. Let's switch that around: first use it, then release
it.
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hide lower-level verify_signed-buffer() API as a pure helper to
implement the public check_signature() function, in order to
encourage new callers to use the correct and more strict
validation.
* hi/gpg-use-check-signature:
gpg-interface: prefer check_signature() for GPG verification
The interaction between "git clone --recurse-submodules" and
alternate object store was ill-designed. The documentation and
code have been taught to make more clear recommendations when the
users see failures.
* jt/clone-recursesub-ref-advise:
submodule--helper: advise on fatal alternate error
Doc: explain submodule.alternateErrorStrategy
A few commands learned to take the pathspec from the
standard input or a named file, instead of taking it as the command
line arguments.
* am/pathspec-from-file:
commit: support the --pathspec-from-file option
doc: commit: synchronize <pathspec> description
reset: support the `--pathspec-from-file` option
doc: reset: synchronize <pathspec> description
pathspec: add new function to parse file
parse-options.h: add new options `--pathspec-from-file`, `--pathspec-file-nul`
"git rebase -i" learned a few options that are known by "git
rebase" proper.
* ra/rebase-i-more-options:
rebase -i: finishing touches to --reset-author-date
rebase: add --reset-author-date
rebase -i: support --ignore-date
sequencer: rename amend_author to author_to_rename
rebase -i: support --committer-date-is-author-date
sequencer: allow callers of read_author_script() to ignore fields
rebase -i: add --ignore-whitespace flag
In 13cdf78094 (format-patch: teach format.notes config option,
2019-05-16), the order in which git_config() and repo_init_revisions()
were swapped so that `rev.notes_opt` would be initialized before
git_config() was called. This is problematic, however, as git_config()
should generally be called before repo_init_revisions().
Break this circular dependency by creating `show_notes` and `notes_opt`
which git_config() reads into. Then, copy these values over to
`rev.show_notes` and `rev.notes_opt` after repo_init_revisions() is
called.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we had multiple `format.notes` config values where we had `<ref1>`,
`false`, `<ref2>` (in that order), then we would print out the notes for
both `<ref1>` and `<ref2>`. This doesn't make sense, however, since we
parse the config in a top-down manner and a `false` should be able to
override previous configurations, just like how `--no-notes` will
override previous `--notes`.
Duplicate the logic that handles the `--[no-]notes[=]` option to
`format.notes` for consistency. As a result, when parsing the config
from top to bottom, `format.notes = true` will behave like `--notes`,
`format.notes = <ref>` will behave like `--notes=<ref>` and
`format.notes = false` will behave like `--no-notes`.
This change isn't strictly backwards compatible but since it is an edge
case where a sane user would not mix notes refs with `false` and this
feature is relatively new (released only in v2.23.0), this change should
be harmless.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to the function comment, init_display_notes() was supposed to
"Load the notes machinery for displaying several notes trees." Rename
this function to load_display_notes() so that its use is more accurately
represented.
This is done because, in a future commit, we will reuse the name
init_display_notes().
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier patches in this series moved a couple of conditions from the
recursive name_rev() function into its caller name_ref(), for no other
reason than to make eliminating the recursion a bit easier to follow.
Since the previous patch name_rev() is not recursive anymore, so let's
move all those conditions back into name_rev().
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Following the previous patches in this series we can get the value of
'name_rev()'s 'tip_name' parameter from the 'struct rev_name'
associated with the commit as well.
So let's use 'name->tip_name' instead, which makes the patch
eliminating the recursion of name_rev() a bit easier to follow.
Note that at this point we could drop the 'tip_name' parameter as
well, but that parameter will be necessary later, after the recursion
is eliminated.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
flush_current_id() prints the hexadecimal representation of two object
IDs. When the code was added in f97672225b (Add "git-patch-id" program
to generate patch ID's., 2005-06-23), sha1_to_hex() had only a single
internal static buffer, so the result of one invocation had to be stored
in a local buffer.
Since dcb3450fd8 (sha1_to_hex() usage cleanup, 2006-05-03) it rotates
through four buffers, which allows to print up to four object IDs at the
same time. 1a876a69af (patch-id: convert to use struct object_id,
2015-03-13) replaced sha1_to_hex() with oid_to_hex(), which has the same
feature. Use it to simplify the code.
Signed-off-by: René Scharfe <l.s.r@web.de>
Acked-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* rs/use-skip-prefix-more:
name-rev: use skip_prefix() instead of starts_with()
push: use skip_prefix() instead of starts_with()
shell: use skip_prefix() instead of starts_with()
fmt-merge-msg: use skip_prefix() instead of starts_with()
fetch: use skip_prefix() instead of starts_with()
Following the previous patches in this series we can get the values of
name_rev()'s 'generation' and 'distance' parameters from the 'stuct
rev_name' associated with the commit as well.
Let's simplify the function's signature and remove these two
unnecessary parameters.
Note that at this point we could do the same with the 'tip_name',
'taggerdate' and 'from_tag' parameters as well, but those parameters
will be necessary later, after the recursion is eliminated.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the beginning of the recursive name_rev() function it creates a new
'struct rev_name' instance for each previously unvisited commit or, if
this visit results in better name for an already visited commit, then
updates the 'struct rev_name' instance attached to the commit, or
returns early.
Restructure this so it's caller creates or updates the 'struct
rev_name' instance associated with the commit to be passed as
parameter, i.e. both name_ref() before calling name_rev() and
name_rev() itself as it iterates over the parent commits.
This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.
This change also plugs the memory leak that was temporarily unplugged
in the earlier "name-rev: pull out deref handling from the recursion"
patch in this series.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the beginning of the recursive name_rev() function it parses the
commit it got as parameter, and returns early if the commit is older
than a cutoff limit.
Restructure this so the caller parses the commit and checks its date,
and doesn't invoke name_rev() if the commit to be passed as parameter
is older than the cutoff, i.e. both name_ref() before calling
name_rev() and name_rev() itself as it iterates over the parent
commits.
This makes eliminating the recursion a bit easier to follow, and the
condition moved to name_ref() will be moved back to name_rev() after
the recursion is eliminated.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'if (deref) { ... }' condition near the beginning of the recursive
name_rev() function can only ever be true in the first invocation,
because the 'deref' parameter is always 0 in the subsequent recursive
invocations.
Extract this condition from the recursion into name_rev()'s caller and
drop the function's 'deref' parameter. This makes eliminating the
recursion a bit easier to follow, and it will be moved back into
name_rev() after the recursion is eliminated.
Furthermore, drop the condition that die()s when both 'deref' and
'generation' are non-null (which should have been a BUG() to begin
with).
Note that this change reintroduces the memory leak that was plugged in
in commit 5308224633 (name-rev: avoid leaking memory in the `deref`
case, 2017-05-04), but a later patch (name-rev: restructure
creating/updating 'struct rev_name' instances) in this series will
plug it in again.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a later patch in this series we'll want to do this in two places.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Casting a 'struct object' to 'struct commit' is unnecessary there,
because it's already available in the local 'commit' variable.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_name_rev() basically open-codes strip_suffix() before adding a
string to a strbuf.
Let's use the strbuf right from the beginning, i.e. add the whole
string to the strbuf and then use strbuf_strip_suffix(), making the
code more idiomatic.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We were leaking memory by not clearing `other_arg` after we were done
using it. Clear it after we've finished using it.
Note that this isn't strictly necessary since the memory will be
reclaimed once the command exits. However, since we are releasing the
strbufs, we should also clear `other_arg` for consistency.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In dcb500dc16 (cherry-pick/revert: advise using --skip, 2019-07-02),
`git commit` learned to suggest to run `git cherry-pick --skip` when
trying to cherry-pick an empty patch.
However, it was overlooked that there are more conditions than just a
`git cherry-pick` when this advice is printed (which originally
suggested the neutral `git reset`): the same can happen during a rebase.
Let's suggest the correct command, even during a rebase.
While at it, we adjust more places in `builtin/commit.c` that
incorrectly assumed that the presence of a `CHERRY_PICK_HEAD` meant that
surely this must be a `cherry-pick` in progress.
Note: we take pains to handle the situation when a user runs a `git
cherry-pick` _during_ a rebase. This is quite valid (e.g. in an `exec`
line in an interactive rebase). On the other hand, it is not possible to
run a rebase during a cherry-pick, meaning: if both `rebase-merge/` and
`sequencer/` exist or CHERRY_PICK_HEAD and REBASE_HEAD point to the same
commit , we still want to advise to use `git cherry-pick --skip`.
Original-patch-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Working out which command wants to create a commit requires detailed
knowledge of the sequencer internals and that knowledge is going to
increase in subsequent commits. With that in mind lets encapsulate that
knowledge in sequencer.c rather than spreading it into builtin/commit.c.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add FROM_CHERRY_PICK_MULTI for a sequence of cherry-picks rather than
using a separate variable.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-2.23: (44 commits)
Git 2.23.1
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
...
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
In v2.15.4, we started to reject `submodule.update` settings in
`.gitmodules`. Let's raise a BUG if it somehow still made it through
from anywhere but the Git config.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
"git submodule status" that is run from a subdirectory of the
superproject did not work well, which has been corrected.
* mg/submodule-status-from-a-subdirectory:
submodule: fix 'submodule status' when called from a subdirectory
"git reset --patch $object" without any pathspec should allow a
tree object to be given, but incorrectly required a committish,
which has been corrected.
* nl/reset-patch-takes-a-tree:
reset: parse rev as tree-ish in patch mode
"git rev-parse --show-toplevel" run outside of any working tree did
not error out, which has been corrected.
* jk/fail-show-toplevel-outside-working-tree:
rev-parse: make --show-toplevel without a worktree an error
"git unpack-objects" used to show progress based only on the number
of received and unpacked objects, which stalled when it has to
handle an unusually large object. It now shows the throughput as
well.
* sg/unpack-progress-throughput:
builtin/unpack-objects.c: show throughput progress
"git range-diff" learned to take the "--notes=<ref>" and the
"--no-notes" options to control the commit notes included in the
log message that gets compared.
* dl/range-diff-with-notes:
format-patch: pass notes configuration to range-diff
range-diff: pass through --notes to `git log`
range-diff: output `## Notes ##` header
t3206: range-diff compares logs with commit notes
t3206: s/expected/expect/
t3206: disable parameter substitution in heredoc
t3206: remove spaces after redirect operators
pretty-options.txt: --notes accepts a ref instead of treeish
rev-list-options.txt: remove reference to --show-notes
argv-array: add space after `while`
The beginning of rewriting "git add -i" in C.
* js/builtin-add-i:
built-in add -i: implement the `help` command
built-in add -i: use color in the main loop
built-in add -i: support `?` (prompt help)
built-in add -i: show unique prefixes of the commands
built-in add -i: implement the main loop
built-in add -i: color the header in the `status` command
built-in add -i: implement the `status` command
diff: export diffstat interface
Start to implement a built-in version of `git add --interactive`
Currently it is technically possible to let a submodule's git
directory point right into the git dir of a sibling submodule.
Example: the git directories of two submodules with the names `hippo`
and `hippo/hooks` would be `.git/modules/hippo/` and
`.git/modules/hippo/hooks/`, respectively, but the latter is already
intended to house the former's hooks.
In most cases, this is just confusing, but there is also a (quite
contrived) attack vector where Git can be fooled into mistaking remote
content for file contents it wrote itself during a recursive clone.
Let's plug this bug.
To do so, we introduce the new function `validate_submodule_git_dir()`
which simply verifies that no git dir exists for any leading directories
of the submodule name (if there are any).
Note: this patch specifically continues to allow sibling modules names
of the form `core/lib`, `core/doc`, etc, as long as `core` is not a
submodule name.
This fixes CVE-2019-1387.
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With `format.useAutoBase = true`, running rebase resulted in an
error:
fatal: failed to get upstream, if you want to record base commit automatically,
please use git branch --set-upstream-to to track a remote branch.
Or you could specify base commit by --base=<base-commit-id> manually
error:
git encountered an error while preparing the patches to replay
these revisions:
ede2467cdedc63784887b587a61c36b7850ebfac..d8f581194799ae29bf5fa72a98cbae98a1198b12
As a result, git cannot rebase them.
Fix this by always passing `--no-base` to format-patch from rebase so
that the effect of `format.useAutoBase` is negated.
Reported-by: Christian Biesinger <cbiesinger@google.com>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If `format.useAutoBase = true`, there was no way to override this from
the command-line. Teach the `--no-base` option in format-patch to
override `format.useAutoBase`.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Decisions taken for simplicity:
1) For now, `--pathspec-from-file` is declared incompatible with
`--patch`, even when <file> is not `stdin`. Such use case it not
really expected.
2) It is not allowed to pass pathspec in both args and file.
`you must specify path(s) to restore` block was moved down to be able to
test for `pathspec.nr` instead, because testing for `argc` is no longer
correct.
`git switch` does not support the new options because it doesn't expect
`<pathspec>` arguments.
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Decisions taken for simplicity:
1) For now, `--pathspec-from-file` is declared incompatible with
`--interactive/--patch/--edit`, even when <file> is not `stdin`.
Such use case it not really expected. Also, it would require changes
to `interactive_add()` and `edit_patch()`.
2) It is not allowed to pass pathspec in both args and file.
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some code blocks were moved down to be able to test for `pathspec.nr`
in the next patch. Blocks are moved as is without any changes. This
is done as separate patch to reduce the amount of diffs in next patch.
Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In addition to preventing `.git` from being tracked by Git, on Windows
we also have to prevent `git~1` from being tracked, as the default NTFS
short name (also known as the "8.3 filename") for the file name `.git`
is `git~1`, otherwise it would be possible for malicious repositories to
write directly into the `.git/` directory, e.g. a `post-checkout` hook
that would then be executed _during_ a recursive clone.
When we implemented appropriate protections in 2b4c6efc82 (read-cache:
optionally disallow NTFS .git variants, 2014-12-16), we had analyzed
carefully that the `.git` directory or file would be guaranteed to be
the first directory entry to be written. Otherwise it would be possible
e.g. for a file named `..git` to be assigned the short name `git~1` and
subsequently, the short name generated for `.git` would be `git~2`. Or
`git~3`. Or even `~9999999` (for a detailed explanation of the lengths
we have to go to protect `.gitmodules`, see the commit message of
e7cb0b4455 (is_ntfs_dotgit: match other .git files, 2018-05-11)).
However, by exploiting two issues (that will be addressed in a related
patch series close by), it is currently possible to clone a submodule
into a non-empty directory:
- On Windows, file names cannot end in a space or a period (for
historical reasons: the period separating the base name from the file
extension was not actually written to disk, and the base name/file
extension was space-padded to the full 8/3 characters, respectively).
Helpfully, when creating a directory under the name, say, `sub.`, that
trailing period is trimmed automatically and the actual name on disk
is `sub`.
This means that while Git thinks that the submodule names `sub` and
`sub.` are different, they both access `.git/modules/sub/`.
- While the backslash character is a valid file name character on Linux,
it is not so on Windows. As Git tries to be cross-platform, it
therefore allows backslash characters in the file names stored in tree
objects.
Which means that it is totally possible that a submodule `c` sits next
to a file `c\..git`, and on Windows, during recursive clone a file
called `..git` will be written into `c/`, of course _before_ the
submodule is cloned.
Note that the actual exploit is not quite as simple as having a
submodule `c` next to a file `c\..git`, as we have to make sure that the
directory `.git/modules/b` already exists when the submodule is checked
out, otherwise a different code path is taken in `module_clone()` that
does _not_ allow a non-empty submodule directory to exist already.
Even if we will address both issues nearby (the next commit will
disallow backslash characters in tree entries' file names on Windows,
and another patch will disallow creating directories/files with trailing
spaces or periods), it is a wise idea to defend in depth against this
sort of attack vector: when submodules are cloned recursively, we now
_require_ the directory to be empty, addressing CVE-2019-1349.
Note: the code path we patch is shared with the code path of `git
submodule update --init`, which must not expect, in general, that the
directory is empty. Hence we have to introduce the new option
`--force-init` and hand it all the way down from `git submodule` to the
actual `git submodule--helper` process that performs the initial clone.
Reported-by: Nicolas Joly <Nicolas.Joly@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
When recursively cloning a superproject with some shallow modules
defined in its .gitmodules, then recloning with "--reference=<path>", an
error occurs. For example:
git clone --recurse-submodules --branch=master -j8 \
https://android.googlesource.com/platform/superproject \
master
git clone --recurse-submodules --branch=master -j8 \
https://android.googlesource.com/platform/superproject \
--reference master master2
fails with:
fatal: submodule '<snip>' cannot add alternate: reference repository
'<snip>' is shallow
When a alternate computed from the superproject's alternate cannot be
added, whether in this case or another, advise about configuring the
"submodule.alternateErrorStrategy" configuration option and using
"--reference-if-able" instead of "--reference" when cloning.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" codepath had a big "do not lazily fetch missing objects
when I ask if something exists" switch. This has been corrected by
marking the "does this thing exist?" calls with "if not please do not
lazily fetch it" flag.
* jt/fetch-remove-lazy-fetch-plugging:
promisor-remote: remove fetch_if_missing=0
clone: remove fetch_if_missing=0
fetch: remove fetch_if_missing=0
Recent update to "git stash pop" made the command empty the index
when run with the "--quiet" option, which has been corrected.
* tg/stash-refresh-index:
stash: make sure we have a valid index before writing it
"git bundle" has been taught to use the parse options API. "git
bundle verify" learned "--quiet" and "git bundle create" learned
options to control the progress output.
* rj/bundle-ui-updates:
bundle-verify: add --quiet
bundle-create: progress output control
bundle: framework for options before bundle file
Docfix.
* en/doc-typofix:
Fix spelling errors in no-longer-updated-from-upstream modules
multimail: fix a few simple spelling errors
sha1dc: fix trivial comment spelling error
Fix spelling errors in test commands
Fix spelling errors in messages shown to users
Fix spelling errors in names of tests
Fix spelling errors in comments of testcases
Fix spelling errors in code comments
Fix spelling errors in documentation outside of Documentation/
Documentation: fix a bunch of typos, both old and new
Fetching from multiple remotes into the same repository in parallel
had a bad interaction with the recent change to (optionally) update
the commit-graph after a fetch job finishes, as these parallel
fetches compete with each other. Which has been corrected.
* js/fetch-multi-lockfix:
fetch: avoid locking issues between fetch.jobs/fetch.writeCommitGraph
fetch: add the command-line option `--write-commit-graph`
"git worktree add" internally calls "reset --hard" that should not
descend into submodules, even when submodule.recurse configuration
is set, but it was affected. This has been corrected.
* pb/no-recursive-reset-hard-in-worktree-add:
worktree: teach "add" to ignore submodule.recurse config