currently for cases like
From: A U Thor <a.u.thor@example.com> (Comment)
mailinfo extracts the following 'Author:' field:
Author: A U Thor (Comment)
^^
which has two extra spaces left in there after removed email part.
I think this is wrong so here is a fix.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also as suggested by Junio, in order to try to catch other MIME
problems, test cases from the "8. Examples" section of RFC2047 are added
to t5100 testsuite as well.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
At present we do headers unfolding (see RFC822 3.1.1. LONG HEADER FIELDS) for
all fields except 'From' (always) and 'Subject' (when keep_subject is set)
Not unfolding 'From' is a bug -- see above-mentioned RFC link.
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When native language (RU) is in use, subject header usually contains several
parts, e.g.
Subject: [Navy-patches] [PATCH]
=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
=?utf-8?b?0YHQsdC+0YDQutC4?=
This exposes several bugs in builtin-mailinfo.c:
1. decode_b_segment: do not append explicit NUL -- explicit NUL was preventing
correct header construction on parts concatenation via strbuf_addbuf in
decode_header_bq. Fixes:
-Subject: Изменён список пакетов необходимых для сборки
+Subject: Изменён список па
Then
2. Do not emit '\n' between "encoded-word" where RFC2046 says that linear
white space between them are ignored when displaying. Fixes:
-Subject: Изменён список пакетов необходимых для сборки
+Subject: Изменён список па кетов необходимых для сборки
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In handle_from, we calculate the end boundary of a section
to remove from a strbuf using strcspn like this:
el = strcspn(buf, set_of_end_boundaries);
strbuf_remove(&sb, start, el + 1);
This works fine if "el" is the offset of the boundary
character, meaning we remove up to and including that
character. But if the end boundary didn't match (that is, we
hit the end of the string as the boundary instead) then we
want just "el". Asking for "el+1" caught an out-of-bounds
assertion in the strbuf library.
This manifested itself when we got a 'From' header that had
just an email address with nothing else in it (the end of
the string was the end of the address, rather than, e.g., a
trailing '>' character), causing git-mailinfo to barf.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent changes to is_multipart_boundary() caused git-mailinfo to segfault.
The reason was after handling the end of the boundary the code tried to look
for another boundary. Because the boundary list was empty, dereferencing
the pointer to the top of the boundary caused the program to go boom.
The fix is to check to see if the list is empty and if so go on its merry
way instead of looking for another boundary.
I also fixed a couple of increments and decrements that didn't look correct
relating to content_top.
The boundary test case was updated to catch future problems like this again.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After finding a MIME multi-part message boundary line, the handle_body()
function is supposed to first flush any accumulated contents from the
previous part to the output stream. However, the code mistakenly output
the boundary line it found.
The old code that used one global, fixed-length buffer line[] used an
alternate static buffer newline[] for keeping track of this accumulated
contents and flushed newline[] upon seeing the boundary; when 3b6121f
(git-mailinfo: use strbuf's instead of fixed buffers, 2008-07-13)
converted a fixed-length buffer in this program to use strbuf,these two
buffers were converted to "line" and "prev" (the latter of which now has a
much more sensible name) strbufs, but the code mistakenly flushed "line"
(which contains the boundary we have just found), instead of "prev".
This resulted in the first boundary to be output in front of the first
line of the message.
The rewritten implementation of handle_boundary() lost the terminating
newline; this would then result in the second line of the message to be
stuck with the first line.
The is_multipart_boundary() was designed to catch both the internal
boundary and the terminating one (the one with trailing "--"); this also
was broken with the rewrite, and the code in the handle_boundary() to
handle the terminating boundary was never triggered.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Subject: " isn't in the static array "header", and thus
memcmp("Subject:", header[i], 7) will never match.
Even if it did so, hdr_data[] may not have been allocated if there weren't
a "Subject: " in-body when we process "[PATCH]" in the affected codepath.
Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are broken filesystems that cannot have a file whose name is "nul"
anywhere on it. Rename the test file to make ourselves more portable.
Noticed by Mark Levedahl.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function fgets() has a big problem with NUL characters: it reads
them, but nobody will know if the NUL comes from the file stream, or
was appended at the end of the line.
So implement a custom read_line_with_nul() function.
Noticed by Tommy Thorn.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function is intended to be fed one logical line at a time to
inspect, but a QP encoded raw input line can have more than one
lines, just like BASE64 encoded one.
Quoting LF as =0A may be unusual but RFC2045 allows it.
The issue was noticed and fixed by Jay Soffian. JC added a test
to protect the fix from regressing later.
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't translate the patch to UTF-8, instead preserve the data as
is. This also reverts a test case that was included in the
original patch series.
Also allow overwriting the authorship and title information we
gather from RFC2822 mail headers with additional in-body
headers, which was pointed out by Linus.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I am working on a project that required parsing through regular
mboxes that didn't necessarily have patches embedded in them. I
started by creating my own modified copy of git-am and working
from there. Very quickly, I noticed git-mailinfo wasn't able to
handle a big chunk of my email.
After hacking up numerous solutions and running into more
limitations, I decided it was just easier to rewrite a big chunk
of it. The following patch has a bunch of fixes and features
that I needed in order for me do what I wanted.
Note: I'm didn't follow any email rfc papers but I don't think
any of the changes I did required much knowledge (besides the
boundary stuff).
List of major changes/fixes:
- can't create empty patch files fix
- empty patch files don't fail, this failure will come inside git-am
- multipart boundaries are now handled
- only output inbody headers if a patch exists otherwise assume those
headers are part of the reply and instead output the original headers
- decode and filter base64 patches correctly
- various other accidental fixes
I believe I didn't break any existing functionality or
compatibility (other than what I describe above, which is really
only the empty patch file).
I tested this through various mailing list archives and
everything seemed to parse correctly (a couple thousand emails).
[jc: squashed in another patch from Don's five patch series to
fix the test case, as this patch exposes the bug in the test.]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It basically considers all the continuation lines to be lines of their
own, and if the total line is bigger than what we can fit in it, we just
truncate the result rather than stop in the middle and then get confused
when we try to parse the "next" line (which is just the remainder of the
first line).
[jc: added test, and tightened boundary a bit per list discussion.]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Currently the test passes with 1.3.3 but not with the tip of
"master". This is to verify the fixes from Eric W Biedermann.
Signed-off-by: Junio C Hamano <junkio@cox.net>