Streamline --rebase-merges command line option handling and
introduce rebase.merges configuration variable.
* ah/rebase-merges-config:
rebase: add a config option for --rebase-merges
rebase: deprecate --rebase-merges=""
rebase: add documentation and test for --no-rebase-merges
Code clean-up.
* jk/fast-export-cleanup:
fast-export: drop unused parameter from anonymize_commit_message()
fast-export: drop data parameter from anonymous generators
fast-export: de-obfuscate --anonymize-map handling
fast-export: factor out anonymized_entry creation
fast-export: simplify initialization of anonymized hashmaps
fast-export: drop const when storing anonymized values
The index files can become corrupt under certain conditions when
the split-index feature is in use, especially together with
fsmonitor, which have been corrected.
* js/split-index-fixes:
unpack-trees: take care to propagate the split-index flag
fsmonitor: avoid overriding `cache_changed` bits
split-index; stop abusing the `base_oid` to strip the "link" extension
split-index & fsmonitor: demonstrate a bug
The wildmatch library code unlearns exponential behaviour it
acquired some time ago since it was borrowed from rsync.
* pw/wildmatch-fixes:
t3070: make chain lint tester happy
wildmatch: hide internal return values
wildmatch: avoid undefined behavior
wildmatch: fix exponential behavior
When I ran this test using `TEST_SHELL_PATH=/bin/bash` in my Ubuntu
setup (where Bash is at version 5.0.17(1)-release), I was greeted with
this error message:
./test-lib.sh: line 1072: $CHALLENGE: ambiguous redirect
This commit fixes that error by quoting the `CHALLENGE` variable (which
has as value a path containing spaces), and by avoiding to cuddle the
empty string parameter in the `printf` call with the redirect character
(in fact, the `printf ''>$CHALLENGE` is removed because the next line
overwrites the file anyway because it _also_ uses a single `>` to
redirect the output).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recent-ish change to allow unicode character classes to be used
with "grep -P" triggered a JIT bug in older pcre2 libraries.
The problematic change in Git built with these older libraries has
been disabled to work around the bug.
* mk/workaround-pcre-jit-ucp-bug:
grep: work around UTF-8 related JIT bug in PCRE2 <= 10.34
Code clean-up to use designated initializers in parse-options API.
* sg/parse-options-h-initializers:
parse-options.h: use designated initializers in OPT_* macros
parse-options.h: rename _OPT_CONTAINS_OR_WITH()'s parameters
parse-options.h: use consistent name for the callback parameters
Code clean-up to include and/or uninclude parse-options.h file as
needed.
* sg/parse-options-h-users:
treewide: remove unnecessary inclusions of parse-options.h from headers
treewide: include parse-options.h in source files
This argument was added in 7cae7627c4 (builtin/grep.c: integrate with
sparse index, 2022-09-22), but it was a carry-over from an earlier
version where the --sparse flag was added to the 'git grep' builtin.
This argument does not exist, so currently the
p2000-sparse-operations.sh performance test script fails when reaching
this step.
With this fix, the script works with these numbers for my copy of the
Git source code repository:
Test HEAD
------------------------------------------------------------
2000.30: git grep --cached ... (full-v3) 0.34(1.20+0.14)
2000.31: git grep --cached ... (full-v4) 0.31(1.15+0.13)
2000.32: git grep --cached ... (sparse-v3) 0.26(1.13+0.12)
2000.33: git grep --cached ... (sparse-v4) 0.27(1.13+0.12)
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove workaround for ancient versions of DocBook to make it work
correctly with groff, which has not been necessary since docbook
1.76 from 2010.
* fc/docbook-remove-groff-workaround:
doc: remove GNU troff workaround
time(2) on glib 2.31+, especially on Linux, goes out of sync with
higher resolution timers used for gettimeofday(2) and by the
filesystem. Replace all calls to it with a git_time() wrapper and
use gettimeofday(2) in its implementation.
* pe/time-use-gettimeofday:
git-compat-util: use gettimeofday(2) for time(2)
Transports that do not support protocol v2 did not correctly fall
back to protocol v0 under certain conditions, which has been
corrected.
* jk/fix-proto-downgrade-to-v0:
git_connect(): fix corner cases in downgrading v2 to v0
"git rev-parse --quiet foo@{u}", or anything that asks @{u} to be
parsed with GET_OID_QUIETLY option, did not quietly fail, which has
been corrected.
* fc/oid-quietly-parse-upstream:
object-name: fix quiet @{u} parsing
Lift the limitation that colored prompts can only be used with
PROMPT_COMMAND mode.
* fc/completion-colors-do-not-need-prompt-command:
completion: prompt: use generic colors
1f2e05f0b7 ("wildmatch: fix exponential behavior", 2023-03-20)
introduced a new test with a background process. Backgrounding
necessarily gives a result of 0, so that a seemingly broken && chain is
not really broken.
Adjust t3070 slightly so that our chain lint test recognizes the
construct for what it is and does not raise a false positive.
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 50b54fd72a (config: be strict on core.commentChar, 2014-05-17)
notes that “multi-byte character encoding could also be misinterpreted”,
and indeed a multi-byte codepoint (non-ASCII) is not accepted as a valid
`core.commentChar`.
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This should have been removed in `ab/retire-scripted-add-p` but wasn't.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When copying the `split_index` structure from one index structure to
another, we need to propagate the `SPLIT_INDEX_ORDERED` flag, too, if it
is set, otherwise Git might forget to write the shared index when that
is actually needed.
It just so _happens_ that in many instances when `unpack_trees()` is
called, the result causes the shared index to be written anyway, but
there are edge cases when that is not so.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of e636a7b4d0 (read-cache: be specific what part of the index has
changed, 2014-06-13), the paradigm `cache_changed = 1` fell out of
fashion and it became a bit field instead.
This is important because some bits have specific meaning and should not
be unset without care, e.g. `SPLIT_INDEX_ORDERED`.
However, b5a8169752 (mark_fsmonitor_valid(): mark the index as changed
if needed, 2019-05-24) did use the `cache_changed` attribute as if it
were a Boolean instead of a bit field.
That not only would override the `SPLIT_INDEX_ORDERED` bit when marking
index entries as valid via the FSMonitor, but worse: it would set the
`SOMETHING_OTHER` bit (whose value is 1). This means that Git would
unnecessarily force a full index to be written out when a split index
was asked for.
Let's instead use the bit that is specifically intended to indicate
FSMonitor-triggered changes, allowing the split-index feature to work as
designed.
Noticed-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a split-index is in effect, the `$GIT_DIR/index` file needs to
contain a "link" extension that contains all the information about the
split-index, including the information about the shared index.
However, in some cases Git needs to suppress writing that "link"
extension (i.e. to fall back to writing a full index) even if the
in-memory index structure _has_ a `split_index` configured. This is the
case e.g. when "too many not shared" index entries exist.
In such instances, the current code sets the `base_oid` field of said
`split_index` structure to all-zero to indicate that `do_write_index()`
should skip writing the "link" extension.
This can lead to problems later on, when the in-memory index is still
used to perform other operations and eventually wants to write a
split-index, detects the presence of the `split_index` and reuses that,
too (under the assumption that it has been initialized correctly and
still has a non-null `base_oid`).
Let's stop zeroing out the `base_oid` to indicate that the "link"
extension should not be written.
One might be tempted to simply call `discard_split_index()` instead,
under the assumption that Git decided to write a non-split index and
therefore the `split_index` structure might no longer be wanted.
However, that is not possible because that would release index entries
in `split_index->base` that are likely to still be in use. Therefore we
cannot do that.
The next best thing we _can_ do is to introduce a bit field to indicate
specifically which index extensions (not) to write. So that's what we do
here.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit adds a new test case that demonstrates a bug in the
split-index code that is triggered under certain circumstances when the
FSMonitor is enabled, and its symptom manifests in the form of one of
the following error messages:
BUG: fsmonitor.c:20: fsmonitor_dirty has more entries than the index (2 > 1)
BUG: unpack-trees.c:776: pos <n> doesn't point to the first entry of <dir>/ in index
error: invalid path ''
error: The following untracked working tree files would be overwritten by reset:
initial.t
Which of these error messages appears depends on timing-dependent
conditions.
Technically the root cause lies with a bug in the split-index code that
has nothing to do with FSMonitor, but for the sake of this new test case
it was the easiest way to trigger the bug.
The bug is this: Under specific conditions, Git needs to skip writing
the "link" extension (which is the index extension containing the
information pertaining to the split-index). To do that, the `base_oid`
attribute of the `split_index` structure in the in-memory index is
zeroed out, and `do_write_index()` specifically checks for a "null"
`base_oid` to understand that the "link" extension should not be
written. However, this violates the consistency of the in-memory index
structure, but that does not cause problems in most cases because the
process exits without using the in-memory index structure anymore,
anyway.
But: _When_ the in-memory index is still used (which is the case e.g. in
`git rebase`), subsequent writes of `the_index` are at risk of writing
out a bogus index file, one that _should_ have a "link" extension but
does not. In many cases, the `SPLIT_INDEX_ORDERED` flag _happens_ to be
set for subsequent writes, forcing the shared index to be written, which
re-initializes `base_oid` to a non-bogus state, and all is good.
When it is _not_ set, however, all kinds of mayhem ensue, resulting in
above-mentioned error messages, and often enough putting worktrees in a
totally broken state where the only recourse is to manually delete the
`index` and the `index.lock` files and then call `git reset` manually.
Not something to ask users to do.
The reason why it is comparatively easy to trigger the bug with
FSMonitor is that there is _another_ bug in the FSMonitor code:
`mark_fsmonitor_valid()` sets `cache_changed` to 1, i.e. treating that
variable as a Boolean. But it is a bit field, and 1 happens to be the
`SOMETHING_CHANGED` bit that forces the "link" extension to be skipped
when writing the index, among other things.
"Comparatively easy" is a relative term in this context, for sure. The
essence of how the new test case triggers the bug is as following:
1. The `git rebase` invocation will first reset the worktree to
a commit that contains only the `one.t` file, and then execute a
rebase script that starts with the following commands (commit hashes
skipped):
label onto
reset initial
pick two
label two
reset two
pick three
[...]
2. Before executing the `label` command, a split index is written, as
well as the shared index.
3. The `reset initial` command in the rebase script writes out a new
split index but skips writing the shared index, as intended.
4. The `pick two` command updates the worktree and refreshes the index,
marking the `two.t` entry as valid via the FSMonitor, which sets the
`SOMETHING_CHANGED` bit in `cache_changed`, which in turn causes the
`base_oid` attribute to be zeroed out and a full (non-split) index
to be written (making sure _not_ to write the "link" extension).
5. Now, the `reset two` command will leave the worktree alone, but
still write out a new split index, not writing the shared index
(because `base_oid` is still zeroed out, and there is no index entry
update requiring it to be written, either).
6. When it is turn to run `pick three`, the index is read, but it is
too short: It only contains a single entry when there should be two,
because the "link" extension is missing from the written-out index
file.
There are three bugs at play, actually, which will be fixed over the
course of the next commits:
- The `base_oid` attribute should not be zeroed out to indicate when
the "link" extension should not be written, as it puts the in-memory
index structure into an inconsistent state.
- The FSMonitor should not overwrite bits in `cache_changed`.
- The `unpack_trees()` function tries to reuse the `split_index`
structure from the source index, if any, but does not propagate the
`SPLIT_INDEX_ORDERED` flag.
While a fix for the second bug would let this test case pass, there are
other conditions where the `SOMETHING_CHANGED` bit is set. Therefore,
the bug that most crucially needs to be fixed is the first one.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The purpose of the new option is to accommodate users who would like
--rebase-merges to be on by default and to facilitate turning on
--rebase-merges by default without configuration in a future version of
Git.
Name the new option rebase.rebaseMerges, even though it is a little
redundant, for consistency with the name of the command line option and
to be clear when scrolling through values in the [rebase] section of
.gitconfig.
Support setting rebase.rebaseMerges to the nonspecific value "true" for
users who don't need to or don't want to learn about the difference
between rebase-cousins and no-rebase-cousins.
Make --rebase-merges without an argument on the command line override
any value of rebase.rebaseMerges in the configuration, for consistency
with other command line flags with optional arguments that have an
associated config option.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The unusual syntax --rebase-merges="" (that is, --rebase-merges with an
empty string argument) has been an undocumented synonym of
--rebase-merges without an argument. Deprecate that syntax to avoid
confusion when a rebase.rebaseMerges config option is introduced, where
rebase.rebaseMerges="" will be equivalent to --no-rebase-merges.
It is not likely that anyone is actually using this syntax, but just in
case, deprecate the empty string argument instead of dropping support
for it immediately.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As far as I can tell, --no-rebase-merges has always worked, but has
never been documented. It is especially important to document it before
a rebase.rebaseMerges option is introduced so that users know how to
override the config option on the command line. It's also important to
clarify that --rebase-merges without an argument is not the same as
--no-rebase-merges and not passing --rebase-merges is not the same as
passing --rebase-merges=no-rebase-cousins.
A test case is necessary to make sure that --no-rebase-merges keeps
working after its code is refactored in the following patches of this
series. The test case is a little contrived: It's unlikely that a user
would type both --rebase-merges and --no-rebase-merges at the same time.
However, if an alias is defined which includes --rebase-merges, the user
might decide to add --no-rebase-merges to countermand that part of the
alias but leave alone other flags set by the alias.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fd2da4b1ea (archive: add --mtime, 2023-02-18) added a helper function
for checking the file modification time of an extracted entry. Use it
for the older mtime test as well to shorten the code and piggyback on
the archive extraction done to validate file contents.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stephane is reporting[1] a regression introduced in git v2.40.0 that leads
to 'git grep' segfaulting in his CI pipeline. It turns out, he's using an
older version of libpcre2 that triggers a wild pointer dereference in
the generated JIT code that was fixed in PCRE2 10.35.
Instead of completely disabling the JIT compiler for the buggy version,
just mask out the Unicode property handling as we used to do prior to
commit acabd2048e ("grep: correctly identify utf-8 characters with
\{b,w} in -P").
[1] https://lore.kernel.org/git/7E83DAA1-F9A9-4151-8D07-D80EA6D59EEA@clumio.com/
Reported-by: Stephane Odul <stephane@clumio.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the comment above the function indicates, we do not bother actually
storing commit messages in our anonymization map. But we still take the
message as a parameter, and just ignore it. Let's stop doing that, which
will make -Wunused-parameter happier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The anonymization code has a specific generator callback for each type
of data (e.g., one for paths, one for oids, and so on). These all take a
"data" parameter, but none of them use it for anything. Which is not
surprising, as the point is to generate a new name independent of any
input, and each function keeps its own static counter.
We added the extra pointer in d5bf91fde4 (fast-export: add a "data"
callback parameter to anonymize_str(), 2020-06-23) to handle
--anonymize-map parsing, but that turned out to be awkward itself, and
was recently dropped.
So let's get rid of this "data" parameter that nobody is using, both
from the generators and from anonymize_str() which plumbed it through.
This simplifies the code, and makes -Wunused-parameter happier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we handle an --anonymize-map option, we parse the orig/anon pair,
and then feed the "orig" string to anonymize_str(), along with a
generator function that duplicates the "anon" string to be cached in the
map.
This works, because anonymize_str() says "ah, there is no mapping yet
for orig; I'll add one from the generator". But there are some
downsides:
1. It's a bit too clever, as it's not obvious what the code is trying
to do or why it works.
2. It requires allowing generator functions to take an extra void
pointer, which is not something any of the normal callers of
anonymize_str() want.
3. It does the wrong thing if the same token is provided twice.
When there are conflicting options, like:
git fast-export --anonymize \
--anonymize-map=foo:one \
--anonymize-map=foo:two
we usually let the second one override the first. But by using
anonymize_str(), which has first-one-wins logic, we do the
opposite.
So instead of relying on anonymize_str(), let's directly add the entry
ourselves. We can tweak the tests to show that we handle overridden
options correctly now.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When anonymizing output, there's only one spot where we generate new
entries to add to our hashmap: when anonymize_str() doesn't find an
entry, we use the generate() callback to make one and add it. Let's pull
that into its own function in preparation for another caller.
Note that we'll add one extra feature. In anonymize_str(), we know that
we won't find an existing entry in the hashmap (since it will only try
to add after failing to find one). But other callers won't have the same
behavior, so we should catch this case and free the now-dangling entry.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We take pains to avoid doing a lookup on a hashmap which has not been
initialized with hashmap_init(). That was necessary back when this code
was written. But hashmap_get() became safer in b7879b0ba6 (hashmap:
allow re-use after hashmap_free(), 2020-11-02). Since then it's OK to
call functions on a zero-initialized table; it will just correctly
return NULL, since there is no match.
This simplifies the code a little, and also lets us keep the
initialization line closer to when we add an entry (which is when the
hashmap really does need to be totally initialized). That will help
later refactoring.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We store anonymized values as pointers to "const char *", since they are
conceptually const to callers who use them. But they are actually
allocated strings whose memory is owned by the struct.
The ownership mismatch hasn't been a big deal since we never free() them
(they are held until the program ends), but let's switch them to "char *"
in preparation for changing that.
Since most code only accesses them via anonymize_str(), it can continue
to narrow them to "const char *" in its return value.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git receive-pack" that responds to "git push" requests failed to
clean a stale lockfile when killed in the middle, which has been
corrected.
* ps/receive-pack-unlock-before-die:
receive-pack: fix stale packfile locks when dying
Fix for a "ls-files --format="%(path)" that produced nonsense
output, which was a bug in 2.38.
* aj/ls-files-format-fix:
ls-files: fix "--format" output of relative paths
"git format-patch" honors the src/dst prefixes set to nonstandard
values with configuration variables like "diff.noprefix", causing
receiving end of the patch that expects the standard -p1 format to
break. Teach "format-patch" to ignore end-user configuration and
always use the standard prefixes.
This is a backward compatibility breaking change.
* jk/format-patch-ignore-noprefix:
rebase: prefer --default-prefix to --{src,dst}-prefix for format-patch
format-patch: add format.noprefix option
format-patch: do not respect diff.noprefix
diff: add --default-prefix option
t4013: add tests for diff prefix options
diff: factor out src/dst prefix setup