Even though an earlier attempt (bafc478..41dd00bad) cleaned
up RFC 2047 encoding, pretty.c::add_rfc2047() still decides
where to split the output line by going through the input
one byte at a time, and potentially splits a character in
the middle. A subject line may end up showing like this:
".... fö?? bar". (instead of ".... föö bar".)
if split incorrectly.
RFC 2047, section 5 (3) explicitly forbids such beaviour
Each 'encoded-word' MUST represent an integral number of
characters. A multi-octet character may not be split across
adjacent 'encoded- word's.
that means that e.g. for
Subject: .... föö bar
encoding
Subject: =?UTF-8?q?....=20f=C3=B6=C3=B6?=
=?UTF-8?q?=20bar?=
is correct, and
Subject: =?UTF-8?q?....=20f=C3=B6=C3?= <-- NOTE ö is broken here
=?UTF-8?q?=B6=20bar?=
is not, because "ö" character UTF-8 encoding C3 B6 is split here across
adjacent encoded words.
To fix the problem, make the loop grab one _character_ at a time and
determine its output length to see where to break the output line. Note
that this version only knows about UTF-8, but the logic to grab one
character is abstracted out in mbs_chrlen() function to make it possible
to extend it to other encodings with the help of iconv in the future.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests themselves are properly protected by the GPG
prerequisite, but one of the set-up steps outside the
test_expect_success block unconditionally assumed that there is a
gpghome/ directory, which is not true if GPG is not being used.
It may be a good idea to move the whole set-up steps in the test but
that is a follow-up topic.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log -p -S<string>" now looks for the <string> after applying
the textconv filter (if defined); earlier it inspected the contents
of the blobs without filtering.
"git diff --stat" miscounted the total number of changed lines when
binary files were involved and hidden beyond --stat-count. It also
miscounted the total number of changed files when there were
unmerged paths.
* lt/diff-stat-show-0-lines:
t4049: refocus tests
diff --shortstat: do not count "unmerged" entries
diff --stat: do not count "unmerged" entries
diff --stat: move the "total count" logic to the last loop
diff --stat: use "file" temporary variable to refer to data->files[i]
diff --stat: status of unmodified pair in diff-q is not zero
test: add failing tests for "diff --stat" to t4049
Fix "git diff --stat" for interesting - but empty - file changes
The primary thing Linus's patch wanted to change was to make sure
that 0-line change appears for a mode-only change. Update the
first test to chmod a file that we can see in the output (limited
by --stat-count) to demonstrate it. Also make sure to use test_chmod
and compare the index and the tree, so that we can run this test
even on a filesystem without permission bits.
Later two tests are about fixes to separate issues that were
introduced and/or uncovered by Linus's patch as a side effect, but
the issues are not related to mode-only changes. Remove chmod from
the tests.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git p4" used to try expanding malformed "$keyword$" that spans
across multiple lines.
* pw/maint-p4-rcs-expansion-newline:
git p4: RCS expansion should not span newlines
* jh/update-ref-d-through-symref:
Fix failure to delete a packed ref through a symref
t1400-update-ref: Add test verifying bug with symrefs in delete_ref()
Even though we show a separate *UNMERGED* entry in the patch and
diffstat output (or in the --raw format, for that matter) in
addition to and separately from the diff against the specified stage
(defaulting to #2) for unmerged paths, they should not be counted in
the total number of files affected---that would lead to counting the
same path twice.
The separation done by the previous step makes this fix simple and
straightforward. Among the filepairs in diff_queue, paths that
weren't modified, and the extra "unmerged" entries do not count as
total number of files.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diffstat generation logic, with --stat-count limit, is
implemented as three loops.
- The first counts the width necessary to show stats up to
specified number of entries, and notes up to how many entries in
the data we need to iterate to show the graph;
- The second iterates that many times to draw the graph, adjusts
the number of "total modified files", and counts the total
added/deleted lines for the part that was shown in the graph;
- The third iterates over the remainder and only does the part to
count "total added/deleted lines" and to adjust "total modified
files" without drawing anything.
Move the logic to count added/deleted lines and modified files from
the second loop to the third loop.
This incidentally fixes a bug. The third loop was not filtering
binary changes (counted in bytes) from the total added/deleted as it
should. The second loop implemented this correctly, so if a binary
change appeared earlier than the --stat-count cutoff, the code
counted number of added/deleted lines correctly, but if it appeared
beyond the cutoff, the number of lines would have mixed with the
byte count in the buggy third loop.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few problems in diff.c around --stat area, partially
caused by the recent 74faaa1 (Fix "git diff --stat" for interesting
- but empty - file changes, 2012-10-17), and largely caused by the
earlier change that introduced when --stat-count was added.
Add a few test pieces to t4049 to expose the issues.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff -G<pattern>" did not honor textconv filter when looking
for changes.
* jk/maint-diff-grep-textconv:
diff_grep: use textconv buffers for add/deleted files
Various rfc2047 quoting issues around a non-ASCII name on the From:
line in the output from format-patch have been corrected.
* js/format-2047:
format-patch tests: check quoting/encoding in To: and Cc: headers
format-patch: fix rfc2047 address encoding with respect to rfc822 specials
format-patch: make rfc2047 encoding more strict
format-patch: introduce helper function last_line_length()
format-patch: do not wrap rfc2047 encoded headers too late
format-patch: do not wrap non-rfc2047 headers too early
utf8: fix off-by-one wrapping of text
A symbolic ref refs/heads/SYM was not correctly removed with "git
branch -d SYM"; the command removed the ref pointed by SYM instead.
* rs/branch-del-symref:
branch: show targets of deleted symrefs, not sha1s
branch: skip commit checks when deleting symref branches
branch: delete symref branch, not its target
branch: factor out delete_branch_config()
branch: factor out check_branch_commit()
"git grep -e pattern <tree>" asked the attribute system to read
"<tree>:.gitattributes" file in the working tree, which was
nonsense.
* nd/grep-true-path:
grep: stop looking at random places for .gitattributes
"git log -F -E --grep='<ere>'" failed to use the given <ere>
pattern as extended regular expression, and instead looked for the
string literally.
* 'jc/grep-pcre-loose-ends' (early part):
log --grep: use the same helper to set -E/-F options as "git grep"
revisions: initialize revs->grep_filter using grep_init()
grep: move pattern-type bits support to top-level grep.[ch]
grep: move the configuration parsing logic to grep.[ch]
builtin/grep.c: make configuration callback more reusable
The "say" function in the test scaffolding incorrectly allowed
"echo" to interpret "\a" as if it were a C-string asking for a BEL
output.
* jc/test-say-color-avoid-echo-escape:
test-lib: Fix say_color () not to interpret \a\b\c in the message
When given a variable without a value, such as '[section] var' and
asking git-config to treat it as a path, git_config_pathname returns
an error and doesn't modify its output parameter. show_config assumes
that the call is always successful and sets a variable to indicate
that vptr should be freed. In case of an error however, trying to do
this will cause the program to be killed, as it's pointing to memory
in the stack.
Detect the error and return immediately to avoid freeing or accessing
the uninitialed memory in the stack.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The title of an RSS feed is generated from many components,
including the filename provided as a query parameter, but we
failed to quote it. Besides showing the wrong output, this
is a vector for XSS attacks.
Signed-off-by: Jeff King <peff@peff.net>
This bug was introduced in cb585a9 (git-p4: keyword
flattening fixes, 2011-10-16). The newline character
is indeed special, and $File$ expansions should not try
to match across multiple lines.
Based-on-patch-by: Chris Goard <cgoard@gmail.com>
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Jeff King <peff@peff.net>
We currently just look at raw blob data when using "-S" to
pickaxe. This is mostly historical, as pickaxe predates the
textconv feature. If the user has bothered to define a
textconv filter, it is more likely that their search string will be
on the textconv output, as that is what they will see in the
diff (and we do not even provide a mechanism for them to
search for binary needles that contain NUL characters).
This patch teaches "-S" to use textconv, just as we
already do for "-G".
Signed-off-by: Jeff King <peff@peff.net>
If you use "-G" to grep a diff, we will apply a configured
textconv filter to the data before generating the diff.
However, if the diff is an addition or deletion, we do not
bother running the diff at all, and just look for the token
in the added (or removed) content. This works because we
know that the diff must contain every line of content.
However, while we used the textconv-derived buffers in the
regular diff, we accidentally passed the original unmodified
buffers to regexec when checking the added or removed
content. This could lead to an incorrect answer.
Worse, in some cases we might have a textconv buffer but no
original buffer (e.g., if we pulled the textconv data from
cache, or if we reused a working tree file when generating
it). In that case, we could actually feed NULL to regexec
and segfault.
Reported-by: Peter Oberndorfer <kumbayo84@arcor.de>
Signed-off-by: Jeff King <peff@peff.net>
When deleting a ref through a symref (e.g. using 'git update-ref -d HEAD'
to delete refs/heads/master), we would remove the loose ref, but a packed
version of the same ref would remain, the end result being that instead of
deleting refs/heads/master we would appear to reset it to its state as of
the last repack.
This patch fixes the issue, by making sure we pass the correct ref name
when invoking repack_without_ref() from within delete_ref().
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When deleting a ref through a symref (e.g. using 'git update-ref -d HEAD'
to delete refs/heads/master), we currently fail to remove the packed
version of that ref. This testcase demonstrates the bug.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git branch reports the abbreviated hash of the head commit of
a deleted branch to make it easier for a user to undo the
operation. For symref branches this doesn't help. Print the
symref target instead for them.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before a branch is deleted, we check that it points to a valid
commit. With -d we also check that the commit is a merged; this
check is not done with -D.
The reason for that is that commits pointed to by branches should
never go missing; if they do then something broke and it's better
to stop instead of adding to the mess. And a non-merged commit
may contain changes that are worth preserving, so we require the
stronger option -D instead of -d to get rid of them.
If a branch consists of a symref, these concerns don't apply.
Deleting such a branch can't make a commit become unreferenced,
so we don't need to check if it is merged, or even if it is
actually a valid commit. Skip them in that case. This allows
us to delete dangling symref branches.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a branch that is to be deleted happens to be a symref to another
branch, the current code removes the targeted branch instead of the
one it was called for.
Change this surprising behaviour and delete the symref branch
instead.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-format-patch does currently not parse user supplied extra header
values (e. g., --cc, --add-header) and just replays them. That forces
users to add them RFC 2822/2047 conform in encoded form, e.g.
--cc '=?UTF-8?q?Jan=20H=2E=20Sch=C3=B6nherr?= <...>'
which is inconvenient. We would want to update git-format-patch to
accept human-readable input
--cc 'Jan H. Schönherr <...>'
and handle the encoding, wrapping and quoting internally in the future,
similar to what is already done in git-send-email. The necessary code
should mostly exist in the code paths that handle the From: and Subject:
headers.
Whether we want to do this only for the git-format-patch options
--to and --cc (and the corresponding config options) or also for
user supplied headers via --add-header, is open for discussion.
For now, add test_expect_failure tests for To: and Cc: headers as a
reminder and fix tests that would otherwise fail should this get
implemented.
Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to RFC 2047 and RFC 822, rfc2047 encoded words and and rfc822
quoted strings do not mix. Since add_rfc2047() no longer leaves RFC 822
specials behind, the quoting is also no longer necessary to create a
standard-conforming mail.
Remove the quoting, when RFC 2047 encoding takes place. This actually
requires to refactor add_rfc2047() a bit, so that the different cases
can be distinguished.
With this patch, my own name gets correctly decoded as Jan H. Schönherr
(without quotes) and not as "Jan H. Schönherr" (with quotes).
Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
RFC 2047 requires more characters to be encoded than it is currently done.
Especially, RFC 2047 distinguishes between allowed remaining characters
in encoded words in addresses (From, To, etc.) and other headers, such
as Subject.
Make add_rfc2047() and is_rfc2047_special() location dependent and include
all non-allowed characters to hopefully be RFC 2047 conformant.
This especially fixes a problem, where RFC 822 specials (e. g. ".") were
left unencoded in addresses, which was solved with a non-standard-conforming
workaround in the past (which is going to be removed in a follow-up patch).
Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Encoded characters add more than one character at once to an encoded
header. Include all characters that are about to be added in the length
calculation for wrapping.
Additionally, RFC 2047 imposes a maximum line length of 76 characters
if that line contains an rfc2047 encoded word.
Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do not wrap the second and later lines of non-rfc2047-encoded headers
substantially before the 78 character limit.
Instead of passing the remaining length of the first line as wrapping
width, use the correct maximum length and tell strbuf_add_wrapped_bytes()
how many characters of the first line are already used.
Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The wrapping logic in strbuf_add_wrapped_text() does currently not allow
lines that entirely fill the allowed width, instead it wraps the line one
character too early.
For example, the text "This is the sixth commit." formatted via
"%w(11,1,2)" (wrap at 11 characters, 1 char indent of first line, 2 char
indent of following lines) results in four lines: " This is", " the",
" sixth", " commit." This is wrong, because " the sixth" is exactly
11 characters long, and thus allowed.
Fix this by allowing the (width+1) character of a line to be a valid
wrapping point if it is a whitespace character.
Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The behavior of "git diff --stat" is rather odd for files that have
zero lines of changes: it will discount them entirely unless they were
renames.
Which means that the stat output will simply not show files that only
had "other" changes: they were created or deleted, or their mode was
changed.
Now, those changes do show up in the summary, but so do renames, so
the diffstat logic is inconsistent. Why does it show renames with zero
lines changed, but not mode changes or added files with zero lines
changed?
So change the logic to not check for "is_renamed", but for
"is_interesting" instead, where "interesting" is judged to be any
action but a pure data change (because a pure data change with zero
data changed really isn't worth showing, if we ever get one in our
diffpairs).
So if you did
chmod +x Makefile
git diff --stat
before, it would show empty (" 0 files changed"), with this it shows
Makefile | 0
1 file changed, 0 insertions(+), 0 deletions(-)
which I think is a more correct diffstat (and then with "--summary" it
shows *what* the metadata change to Makefile was - this is completely
consistent with our handling of renamed files).
Side note: the old behavior was *really* odd. With no changes at all,
"git diff --stat" output was empty. With just a chmod, it said "0
files changed". No way is our legacy behavior sane.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/ll-merge-binary-ours:
ll-merge: warn about inability to merge binary files only when we can't
attr: "binary" attribute should choose built-in "binary" merge driver
merge: teach -Xours/-Xtheirs to binary ll-merge driver
grep searches for .gitattributes using "name" field in struct
grep_source but that field is not real on-disk path name. For example,
"grep pattern rev" fills the field with "rev:path", and Git looks for
.gitattributes in the (non-existent but exploitable) path "rev:path"
instead of "path".
This patch passes real paths down to grep_source_load_driver() when:
- grep on work tree
- grep on the index
- grep a commit (or a tag if it points to a commit)
so that these cases look up .gitattributes at proper paths.
.gitattributes lookup is disabled in all other cases.
Initial-work-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running with color disabled (e.g. under prove to produce TAP
output), say_color() helper function is defined to use echo to show
the message. With a message that ends with "\c", echo is allowed to
interpret it as "Do not end the line with LF".
Use printf "%s\n" to emit the message literally.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test script uses "svn cp" to create a branch with an @-sign in
its name:
svn cp "pr ject/trunk" "pr ject/branches/not-a@{0}reflog"
That sets up for later tests that fetch the branch and check that git
svn mangles the refname appropriately.
Unfortunately, modern svn versions interpret path arguments with an
@-sign as an example of path@revision syntax (which pegs a path to a
particular revision) and truncate the path or error out with message
"svn: E205000: Syntax error parsing peg revision '{0}reflog'".
When using subversion 1.6.x, escaping the @ sign as %40 avoids trouble
(see 08fd28bb, 2010-07-08). Newer versions are stricter:
$ svn cp "$repo/pr ject/trunk" "$repo/pr ject/branches/not-a%40{reflog}"
svn: E205000: Syntax error parsing peg revision '%7B0%7Dreflog'
The recommended method for escaping a literal @ sign in a path passed
to subversion is to add an empty peg revision at the end of the path
("branches/not-a@{0}reflog@"). Do that.
Pre-1.6.12 versions of Subversion probably treat the trailing @ as
another literal @-sign (svn issue 3651). Luckily ever since
v1.8.0-rc0~155^2~7 (t9118: workaround inconsistency between SVN
versions, 2012-07-28) the test can survive that.
Tested with Debian Subversion 1.6.12dfsg-6 and 1.7.5-1 and r1395837
of Subversion trunk (1.8.x).
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Tested-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Eric Wong <normalperson@yhbt.net>
The command line option parser for "git log -F -E --grep='<ere>'"
did not flip the "fixed" bit, violating the general "last option
wins" principle among conflicting options.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests just want a bit-for-bit identical copy; they do not need
even -H (there is no symbolic link involved) nor -p (there is no
funny permission or ownership issues involved).
Just use "cp -R" instead.
Signed-off-by: Ben Walton <bdwalton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The fsck test assumed too much on what kind of error it will
detect. The only important thing is the inconsistency is detected
as an error.
* jc/maint-t1450-fsck-order-fix:
t1450: the order the objects are checked is undefined
"git receive-pack" (the counterpart to "git push") did not give
progress output while processing objects it received to the puser
when run over the smart-http protocol.
* jk/receive-pack-unpack-error-to-pusher:
receive-pack: drop "n/a" on unpacker errors
receive-pack: send pack-processing stderr over sideband
receive-pack: redirect unpack-objects stdout to /dev/null
A repository created with "git clone --single" had its fetch
refspecs set up just like a clone without "--single", leading the
subsequent "git fetch" to slurp all the other branches, defeating
the whole point of specifying "only this branch".
* rt/maint-clone-single:
clone --single: limit the fetch refspec to fetched branch
When using the {word,[...]} style of configuration for tags and branches,
it appears the intent is to only match whole path parts, since the words
in the {} pattern are meta-character quoted.
When the pattern word appears in the beginning or middle of the url,
it's matched completely, since the left side, pattern, and (non-empty)
right side are joined together with path separators.
However, when the pattern word appears at the end of the URL, the
right side is an empty pattern, and the resulting regex matches
more than just the specified pattern.
For example, if you specify something along the lines of
branches = branches/project/{release_1,release_2}
and your repository also contains "branches/project/release_1_2", you
will also get the release_1_2 branch. By restricting the match regex
with anchors, this is avoided.
Signed-off-by: Ammon Riley <ammon.riley@gmail.com>
Signed-off-by: Eric Wong <normalperson@yhbt.net>