df062010 (filter-branch: avoid passing commit message through sed)
introduced a regression when filtering commits with multi-line headers,
if the header contains a blank line. An example of this is a gpg-signed
commit:
$ git cat-file commit signed-commit
tree 3d4038e029712da9fc59a72afbfcc90418451630
parent 110eac945dc1713b27bdf49e74e5805db66971f0
author A U Thor <author@example.com> 1112912413 -0700
committer C O Mitter <committer@example.com> 1112912413 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iEYEABECAAYFAlYXADwACgkQE7b1Hs3eQw23CACgldB/InRyDgQwyiFyMMm3zFpj
pUsAnA+f3aMUsd9mNroloSmlOgL6jIMO
=0Hgm
-----END PGP SIGNATURE-----
Adding gpg
As a consequence, "filter-branch --msg-filter cat" (which should leave the
commit message unchanged) spills the signature (after the internal blank
line) into the original commit message.
The reason is that although the signature is indented, making the line a
whitespace only line, the "read" call is splitting the line based on
the shell's IFS, which defaults to <space><tab><newline>. The leading
space is consumed and $header_line is empty, causing the "skip header
lines" loop to exit.
The rest of the commit object is then re-used as the rewritten commit
message, causing the new message to include the signature of the
original commit.
Set IFS to an empty string for the "read" call, thus disabling the word
splitting, which causes $header_line to be set to the non-empty value ' '.
This allows the loop to fully consume the header lines before
emitting the original, intact commit message.
[jc: this is literally based on MJG's suggestion]
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: James McCoy <vega.james@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On some systems (like OS X), if sed encounters input without
a trailing newline, it will silently add it. As a result,
"git filter-branch" on such systems may silently rewrite
commit messages that omit a trailing newline. Even though
this is not something we generate ourselves with "git
commit", it's better for filter-branch to preserve the
original data as closely as possible.
We're using sed here only to strip the header fields from
the commit object. We can accomplish the same thing with a
shell loop. Since shell "read" calls are slow (usually one
syscall per byte), we use "cat" once we've skipped past the
header. Depending on the size of your commit messages, this
is probably faster (you pay the cost to fork, but then read
the data in saner-sized chunks). This idea is shamelessly
stolen from Junio.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our is_hfs_dotgit function relies on the hackily-implemented
next_hfs_char to give us the next character that an HFS+
filename comparison would look at. It's hacky because it
doesn't implement the full case-folding table of HFS+; it
gives us just enough to see if the path matches ".git".
At the end of next_hfs_char, we use tolower() to convert our
32-bit code point to lowercase. Our tolower() implementation
only takes an 8-bit char, though; it throws away the upper
24 bits. This means we can't have any false negatives for
is_hfs_dotgit. We only care about matching 7-bit ASCII
characters in ".git", and we will correctly process 'G' or
'g'.
However, we _can_ have false positives. Because we throw
away the upper bits, code point \u{0147} (for example) will
look like 'G' and get downcased to 'g'. It's not known
whether a sequence of code points whose truncation ends up
as ".git" is meaningful in any language, but it does not
hurt to be more accurate here. We can just pass out the full
32-bit code point, and compare it manually to the upper and
lowercase characters we care about.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.
Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).
Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on NTFS nor FAT32.
In practice this probably doesn't matter, though, as
the restricted names are rather obscure and almost
certainly would never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.
Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.
However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).
Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on HFS+. In practice
this probably doesn't matter, though, as the restricted
names are rather obscure and almost certainly would
never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We complain about ".git" in a tree because it cannot be
loaded into the index or checked out. Since we now also
reject ".GIT" case-insensitively, fsck should notice the
same, so that errors do not propagate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check that fsck notices and complains about confusing
paths in trees. However, there are a few shortcomings:
1. We check only for these paths as file entries, not as
intermediate paths (so ".git" and not ".git/foo").
2. We check "." and ".." together, so it is possible that
we notice only one and not the other.
3. We repeat a lot of boilerplate.
Let's use some loops to be more thorough in our testing, and
still end up with shorter code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow ".git" to enter into the index as a path
component, because checking out the result to the working
tree may causes confusion for subsequent git commands.
However, on case-insensitive file systems, ".Git" or ".GIT"
is the same. We should catch and prevent those, too.
Note that technically we could allow this for repos on
case-sensitive filesystems. But there's not much point. It's
unlikely that anybody cares, and it creates a repository
that is unexpectedly non-portable to other systems.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We should prevent nonsense paths from entering the index in
the first place, as they can cause confusing results if they
are ever checked out into the working tree. We already do
so, but we never tested it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test case "--amend option copies authorship" specifies that the
git-commit option `--amend` uses the authorship of the replaced
commit for the new commit. Add the omitted check that this property
actually holds.
Signed-off-by: Fabian Ruch <bafain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time, git-log was just "rev-list | diff-tree",
and we did not bother to test it separately. These days git-log
is implemented internally, but we want to make sure that the
rev-list to diff-tree pipeline continues to function. Let's
add a basic sanity test.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"%G" (nothing after G) is an invalid pretty format specifier, but
the parser did not notice it as garbage.
* jk/pretty-G-format-fixes:
move "%G" format test from t7510 to t6006
pretty: avoid reading past end-of-string with "%G"
t7510: check %G* pretty-format output
t7510: test a commit signed by an unknown key
t7510: use consistent &&-chains in loop
t7510: stop referring to master in later tests
During "git rebase --merge", a conflicted patch could not be
skipped with "--skip" if the next one also conflicted.
* bc/fix-rebase-merge-skip:
rebase--merge: fix --skip with two conflicts in a row
* maint-1.8.5:
annotate: use argv_array
t7300: repair filesystem permissions with test_when_finished
enums: remove trailing ',' after last item in enum
* jk/repack-pack-keep-objects:
repack: s/write_bitmap/&s/ in code
repack: respect pack.writebitmaps
repack: do not accidentally pack kept objects by default
The git log --graph --show-signature command incorrectly indents the gpg
information about signed commits and merged signed tags. It does not
follow the level of indentation of the current commit.
Example of garbled output:
$ git log --show-signature --graph
* commit 258e0a237cb69aaa587b0a4fb528bb0316b1b776
|\ gpg: Signature made Mon, Jun 30, 2014 13:22:33 EDT using RSA key ID DA08
gpg: Good signature from "Jason Pyeron <jpye...@pdinc.us>"
Merge: 727c355 1ca13ed
| | Author: Jason Pyeron <jpye...@pdinc.us>
| | Date: Mon Jun 30 13:22:29 2014 -0400
| |
| | Merge of 1ca13ed2271d60ba9 branch - rebranding
| |
| * commit 1ca13ed2271d60ba93d40bcc8db17ced8545f172
| | gpg: Signature made Mon, Jun 23, 2014 9:45:47 EDT using RSA key ID DD37
gpg: Good signature from "Stephen Robert Guglielmo <s...@guglielmo.us>"
gpg: aka "Stephen Robert Guglielmo <srguglie...@gmail.com>"
Author: Stephen R Guglielmo <s...@guglielmo.us>
| | Date: Mon Jun 23 09:45:27 2014 -0400
| |
| | Minor URL updates
In log-tree.c modify show_sig_lines() function to call graph_show_oneline()
after each line of gpg information it has printed in order to preserve
the level of indentation for the next output line.
Reported-by: Jason Pyeron <jpyeron@pdinc.us>
Signed-off-by: Zoltan Klinger <zoltan.klinger@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We create a directory that cannot be removed, confirm that
it cannot be removed, and then fix it like:
chmod 0 foo &&
test_must_fail git clean -d -f &&
chmod 755 foo
If the middle step fails but leaves the directory (e.g., the
bug is that clean does not notice the failure), this
pollutes the test repo with an unremovable directory. Not
only does this cause further tests to fail, but it means
that "rm -rf" fails on the whole trash directory, and the
user has to intervene manually to even re-run the test script.
We can bump the "chmod 755" recovery to a test_when_finished
block to be sure that it always runs.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When multiple parents of a merge commit get mapped to the same
commit, filter-branch used to pass all instances of the parent
commit to the parent and commit filters and to "git commit-tree" or
"git_commit_non_empty_tree".
This can often happen when extracting a small project from a large
repository; merges can join history with no commits on any branch
which affect the paths being retained. Once the intermediate
commits have been filtered out, all the immediate parents of the
merge commit can end up being mapped to the same commit - either the
original merge-base or an ancestor of it.
"git commit-tree" would display an error but write the commit with
the normalized parents in any case. "git_commit_non_empty_tree"
would fail to notice that the commit being made was in fact a
non-merge commit and would retain it even if a further pass with
"--prune-empty" would discard the commit as empty.
Ensure that duplicate parents are pruned before the parent filter to
make "--prune-empty" idempotent, removing all empty non-merge
commits in a singe pass.
Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The final test in t7510 checks that "--format" placeholders
that look similar to GPG placeholders (but that we don't
actually understand) are passed through. That test was
placed in t7510, since the other GPG placeholder tests are
there. However, it does not have a GPG prerequisite, because
it is not actually checking any signed commits.
This causes the test to erroneously fail when gpg is not
installed on a system, however. Not because we need signed
commits, but because we need _any_ commit to run "git log".
If we don't have gpg installed, t7510 doesn't create any
commits at all.
We can fix this by moving the test into t6006. This is
arguably a better place anyway, because it is where we test
most of the other placeholders (we do not test GPG
placeholders there because of the infrastructure needed to
make signed commits).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git status" (and "git commit") behaved as if changes in a modified
submodule are not there if submodule.*.ignore configuration is set,
which was misleading. The configuration is only to unclutter diff
output during the course of development, and should not to hide
changes in the "status" output to cause the users forget to commit
them.
* jl/status-added-submodule-is-never-ignored:
commit -m: commit staged submodules regardless of ignore config
status/commit: show staged submodules regardless of ignore config
"git show -s" (i.e. show log message only) used to incorrectly emit
an extra blank line after a merge commit.
* mk/show-s-no-extra-blank-line-for-merges:
git-show: fix 'git show -s' to not add extra terminator after merge commit
The autostash mode of "git rebase -i" did not restore the dirty
working tree state if the user aborted the interactive rebase by
emptying the insn sheet.
* rr/rebase-autostash-fix:
rebase -i: test "Nothing to do" case with autostash
rebase -i: handle "Nothing to do" case with autostash
"git log --exclude=<glob> --all | git shortlog" worked as expected,
but "git shortlog --exclude=<glob> --all", which is supposed to be
identical to the above pipeline, was not accepted at the command
line argument parser level.
* jc/shortlog-ref-exclude:
shortlog: allow --exclude=<glob> to be passed
On a case insensitive filesystem, merge-recursive incorrectly
deleted the file that is to be renamed to a name that is the same
except for case differences.
* dt/merge-recursive-case-insensitive:
mv: allow renaming to fix case on case insensitive filesystems
merge-recursive.c: fix case-changing merge bug
"git diff --find-copies-harder" sometimes pretended as if the mode
bits have changed for paths that are marked with assume-unchanged
bit.
* jk/diff-files-assume-unchanged:
run_diff_files: do not look at uninitialized stat data
"git commit --allow-empty-message -C $commit" did not work when the
commit did not have any log message.
* jk/commit-C-pick-empty:
commit: do not complain of empty messages from -C
"git blame" assigned the blame to the copy in the working-tree if
the repository is set to core.autocrlf=input and the file used CRLF
line endings.
* bc/blame-crlf-test:
blame: correctly handle files regardless of autocrlf
"--ignore-space-change" option of "git apply" ignored the spaces
at the beginning of line too aggressively, which is inconsistent
with the option of the same name "diff" and "git diff" have.
* jc/apply-ignore-whitespace:
apply --ignore-space-change: lines with and without leading whitespaces do not match
The "%<(10,trunc)%s" pretty format specifier in the log family of
commands is used to truncate the string to a given length (e.g. 10
in the example) with padding to column-align the output, but did
not take into account that number of bytes and number of display
columns are different.
* as/pretty-truncate:
pretty.c: format string with truncate respects logOutputEncoding
t4205, t6006: add tests that fail with i18n.logOutputEncoding set
t4205 (log-pretty-format): use `tformat` rather than `format`
t4041, t4205, t6006, t7102: don't hardcode tested encoding value
t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs
If the user asks for --format=%G with nothing else, we
correctly realize that "%G" is not a valid placeholder (it
should be "%G?", "%GK", etc). But we still tell the
strbuf_expand code that we consumed 2 characters, causing it
to jump over the trailing NUL and output garbage.
This also fixes the case where "%GX" would be consumed (and
produce no output). In other cases, we pass unrecognized
placeholders through to the final string.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not check these along with the other pretty-format
placeholders in t6006, because we need signed commits to
make them interesting. t7510 has such commits, and can
easily exercise them in addition to the regular
--show-signature code path.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We tested both good and bad signatures, but not ones made
correctly but with a key for which we have no trust.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check multiple commits in a loop. Because we want to
break out of the loop if any single iteration fails, we use
a subshell/exit like:
(
for i in $stuff
do
do-something $i || exit 1
done
)
However, we are inconsistent in our loop body. Some commands
get their own "|| exit 1", and others try to chain to the
next command with "&&", like:
X &&
Y || exit 1
Z || exit 1
This is a little hard to read and follow, because X and Y
are treated differently for no good reason. But much worse,
the second loop follows a similar pattern and gets it wrong.
"Y" is expected to fail, so we use "&& exit 1", giving us:
X &&
Y && exit 1
Z || exit 1
That gets the test for X wrong (we do not exit unless both X
fails and Y unexpectedly succeeds, but we would want to exit
if _either_ is wrong). We can write this clearly and
correctly by consistently using "&&", followed by a single
"|| exit 1", and negating Y with "!" (as we would in a
normal &&-chain). Like:
X &&
! Y &&
Z || exit 1
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>