Commit Graph

20 Commits

Author SHA1 Message Date
Linus Torvalds
b59d398bea Do a better job at guessing unknown character sets
At least in the kernel development community, we're generally slowly
converting to UTF-8 everywhere, and the old default of Latin1 in emails is
being supplanted by UTF-8, and it doesn't necessarily show up as such in
the mail headers (because, quite frankly, when people send patches
around, they want the email client to do as little as humanly possible
about the patch)

Despite that, it's often the case that email addresses etc still have
Latin1, so I've seen emails where this is a mixed bag, with Signed-off
parts being copied from email (and containing Latin1 characters), and the
rest of the email being a patch in UTF-8.

So this suggests a very natural change: if the target character set is
utf-8 (the default), and if the source already looks like utf-8, just
assume that it doesn't need any conversion at all.

Only assume that it needs conversion if it isn't already valid utf-8, in
which case we (for historical reasons) will assume it's Latin1.

Basically no really _valid_ latin1 will ever look like utf-8, so while
this changes our historical behaviour, it doesn't do so in practice, and
makes the default behaviour saner for the case where the input was already
in proper format.

We could do a more fancy guess, of course, but this correctly handled a
series of patches I just got from Andrew that had a mixture of Latin1 and
UTF-8 (in different emails, but without any character set indication).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-07-18 17:01:10 -07:00
Junio C Hamano
fcd056a6d2 More missing static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-08 02:37:19 -07:00
Don Zickus
86747c132b git-mailinfo fixes for patch munging
Don't translate the patch to UTF-8, instead preserve the data as
is.  This also reverts a test case that was included in the
original patch series.

Also allow overwriting the authorship and title information we
gather from RFC2822 mail headers with additional in-body
headers, which was pointed out by Linus.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-31 00:59:19 -07:00
Don Zickus
f0658cf210 restrict the patch filtering
I have come across many emails that use long strings of '-'s as separators
for ideas.  This patch below limits the separator to only 3 '-', with the
intent that long string of '-'s will stay in the commit msg and not in the
patch file.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-12 23:33:41 -07:00
Don Zickus
87ab799234 builtin-mailinfo.c infrastrcture changes
I am working on a project that required parsing through regular
mboxes that didn't necessarily have patches embedded in them.  I
started by creating my own modified copy of git-am and working
from there.  Very quickly, I noticed git-mailinfo wasn't able to
handle a big chunk of my email.

After hacking up numerous solutions and running into more
limitations, I decided it was just easier to rewrite a big chunk
of it.  The following patch has a bunch of fixes and features
that I needed in order for me do what I wanted.

Note: I'm didn't follow any email rfc papers but I don't think
any of the changes I did required much knowledge (besides the
boundary stuff).

List of major changes/fixes:
- can't create empty patch files fix
- empty patch files don't fail, this failure will come inside git-am
- multipart boundaries are now handled
- only output inbody headers if a patch exists otherwise assume those
headers are part of the reply and instead output the original headers
- decode and filter base64 patches correctly
- various other accidental fixes

I believe I didn't break any existing functionality or
compatibility (other than what I describe above, which is really
only the empty patch file).

I tested this through various mailing list archives and
everything seemed to parse correctly (a couple thousand emails).

[jc: squashed in another patch from Don's five patch series to
 fix the test case, as this patch exposes the bug in the test.]

Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-12 23:33:41 -07:00
Shawn O. Pearce
3a55602eec General const correctness fixes
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime.  Likewise we should always be
treating unsigned values as unsigned values, not as signed values.

Most of these are very straightforward.  The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case.  Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 10:47:10 -08:00
Junio C Hamano
4e4b55dd0f Merge branch 'maint'
* maint:
  git-apply: do not fix whitespaces on context lines.
  diff --cc: integer overflow given a 2GB-or-larger file
  mailinfo: do not get confused with logical lines that are too long.
2007-02-27 01:33:52 -08:00
Linus Torvalds
34fc5cefa7 mailinfo: do not get confused with logical lines that are too long.
It basically considers all the continuation lines to be lines of their
own, and if the total line is bigger than what we can fit in it, we just
truncate the result rather than stop in the middle and then get confused
when we try to parse the "next" line (which is just the remainder of the
first line).

[jc: added test, and tightened boundary a bit per list discussion.]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:02:32 -08:00
Junio C Hamano
cc44c7655f Mechanical conversion to use prefixcmp()
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily.  Leftover from this will be fixed in a separate step, including
idiotic conversions like

    if (!strncmp("foo", arg, 3))

  =>

    if (!(-prefixcmp(arg, "foo")))

This was done by using this script in px.perl

   #!/usr/bin/perl -i.bak -p
   if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
           s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
   }
   if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
           s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
   }

and running:

   $ git grep -l strncmp -- '*.c' | xargs perl px.perl

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-20 22:03:15 -08:00
Junio C Hamano
bb1091a475 -u is now default for 'git-mailinfo'.
Originally from David Woodhouse, but also adjusts the callers of
mailinfo to the new default.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-01-09 21:32:49 -08:00
Junio C Hamano
d2c11a38c4 UTF-8: introduce i18n.logoutputencoding.
It is plausible for somebody to want to view the commit log in a
different encoding from i18n.commitencoding -- the project's
policy may be UTF-8 and the user may be using a commit message
hook to run iconv to conform to that policy (and either not have
i18n.commitencoding to default to UTF-8 or have it explicitly
set to UTF-8).  Even then, Latin-1 may be more convenient for
the usual pager and the terminal the user uses.

The new variable i18n.logoutputencoding is used in preference to
i18n.commitencoding to decide what encoding to recode the log
output in when git-log and friends formats the commit log message.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-27 16:41:33 -08:00
Junio C Hamano
b45974a655 Move encoding conversion routine out of mailinfo to utf8.c
This moves the body of convert_to_utf8() routine used in mailinfo
to the utf8.c i18n library.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-26 00:22:39 -08:00
Junio C Hamano
85023577a8 simplify inclusion of system header files.
This is a mechanical clean-up of the way *.c files include
system header files.

 (1) sources under compat/, platform sha-1 implementations, and
     xdelta code are exempt from the following rules;

 (2) the first #include must be "git-compat-util.h" or one of
     our own header file that includes it first (e.g. config.h,
     builtin.h, pkt-line.h);

 (3) system headers that are included in "git-compat-util.h"
     need not be included in individual C source files.

 (4) "git-compat-util.h" does not have to include subsystem
     specific header files (e.g. expat.h).

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-20 09:51:35 -08:00
Junio C Hamano
e49521b56d Make hexval() available to others.
builtin-mailinfo.c has its own hexval implementaiton but it can
share the table-lookup one recently implemented in sha1_file.c

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-20 16:08:14 -07:00
David Rientjes
96f1e58f52 remove unnecessary initializations
[jc: I needed to hand merge the changes to the updated codebase,
 so the result needs to be checked.]

Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-15 21:22:20 -07:00
Linus Torvalds
a633fca0c0 Call setup_git_directory() much earlier
This changes the calling convention of built-in commands and
passes the "prefix" (i.e. pathname of $PWD relative to the
project root level) down to them.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-29 01:34:07 -07:00
Michael S. Tsirkin
c2c487cf3a mailinfo: accept >From in message header
Mail I get sometimes has multiple From lines, like this:

    From Majordomo@vger.kernel.org  Thu Jul 27 16:39:36 2006
    >From mtsirkin  Thu Jul 27 16:39:36 2006
    Received: from yok.mtl.com [10.0.8.11]
    ...

which confuses git-mailinfo since that does not recognize >From
as a valid header line.

This patch makes it recognize >From XXX as a valid header line.

Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-27 19:33:06 -07:00
Junio C Hamano
b75bf2c3f0 mailinfo: assume input is latin-1 on the header as we do for the body
When the input mbox does not identify what encoding it is in,
and already have RFC2047 stripped away, we cannot tell what
encoding the header text is in.  For body text, when the message
does not say what charset it is in, we fall back to assume
latin-1 input when converting to utf8.  This should be done
consistently to the header as well.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-06 00:10:49 -07:00
Timo Hirvonen
554fe20d80 Make some strings const
Signed-off-by: Timo Hirvonen <tihirvon@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-28 03:24:37 -07:00
Lukas Sandström
34488e3c37 Make git-mailinfo a builtin
[jc: with a bit of constness tightening]

Signed-off-by: Lukas Sandström <lukass@etek.chalmers.se>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-18 22:10:28 -07:00