contrib: update stats/mailmap script
This version changes quite a few things:
1. The original parsed the mailmap file itself, and it did
it wrong (it did not understand entries with an extra
email key).
Instead, this version uses git's "%aE" and "%aN"
formats to have git perform the mapping, meaning we do
not have to read .mailmap at all, but still operate on
the current state that git sees (and it also works
properly from subdirs).
2. The original would find multiple names for an email,
but not the other way around.
This version can do either or both. If we find multiple
emails for a name, the resolution is less obvious than
the other way around. However, it can still be a
starting point for a human to investigate.
3. The original would order only by count, not by recency.
This version can do either. Combined with showing the
counts, it can be easier to decide how to resolve.
4. This version shows similar entries in a blank-delimited
stanza, which makes it more clear which options you are
picking from.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-12 12:41:41 +01:00
|
|
|
#!/usr/bin/perl
|
|
|
|
|
|
|
|
use warnings 'all';
|
|
|
|
use strict;
|
|
|
|
use Getopt::Long;
|
|
|
|
|
|
|
|
my $match_emails;
|
|
|
|
my $match_names;
|
|
|
|
my $order_by = 'count';
|
|
|
|
Getopt::Long::Configure(qw(bundling));
|
|
|
|
GetOptions(
|
|
|
|
'emails|e!' => \$match_emails,
|
|
|
|
'names|n!' => \$match_names,
|
|
|
|
'count|c' => sub { $order_by = 'count' },
|
|
|
|
'time|t' => sub { $order_by = 'stamp' },
|
|
|
|
) or exit 1;
|
|
|
|
$match_emails = 1 unless $match_names;
|
|
|
|
|
|
|
|
my $email = {};
|
|
|
|
my $name = {};
|
|
|
|
|
|
|
|
open(my $fh, '-|', "git log --format='%at <%aE> %aN'");
|
|
|
|
while(<$fh>) {
|
|
|
|
my ($t, $e, $n) = /(\S+) <(\S+)> (.*)/;
|
|
|
|
mark($email, $e, $n, $t);
|
|
|
|
mark($name, $n, $e, $t);
|
2007-07-14 22:43:09 +02:00
|
|
|
}
|
contrib: update stats/mailmap script
This version changes quite a few things:
1. The original parsed the mailmap file itself, and it did
it wrong (it did not understand entries with an extra
email key).
Instead, this version uses git's "%aE" and "%aN"
formats to have git perform the mapping, meaning we do
not have to read .mailmap at all, but still operate on
the current state that git sees (and it also works
properly from subdirs).
2. The original would find multiple names for an email,
but not the other way around.
This version can do either or both. If we find multiple
emails for a name, the resolution is less obvious than
the other way around. However, it can still be a
starting point for a human to investigate.
3. The original would order only by count, not by recency.
This version can do either. Combined with showing the
counts, it can be easier to decide how to resolve.
4. This version shows similar entries in a blank-delimited
stanza, which makes it more clear which options you are
picking from.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-12 12:41:41 +01:00
|
|
|
close($fh);
|
|
|
|
|
|
|
|
if ($match_emails) {
|
|
|
|
foreach my $e (dups($email)) {
|
|
|
|
foreach my $n (vals($email->{$e})) {
|
|
|
|
show($n, $e, $email->{$e}->{$n});
|
|
|
|
}
|
|
|
|
print "\n";
|
|
|
|
}
|
2007-07-14 22:43:09 +02:00
|
|
|
}
|
contrib: update stats/mailmap script
This version changes quite a few things:
1. The original parsed the mailmap file itself, and it did
it wrong (it did not understand entries with an extra
email key).
Instead, this version uses git's "%aE" and "%aN"
formats to have git perform the mapping, meaning we do
not have to read .mailmap at all, but still operate on
the current state that git sees (and it also works
properly from subdirs).
2. The original would find multiple names for an email,
but not the other way around.
This version can do either or both. If we find multiple
emails for a name, the resolution is less obvious than
the other way around. However, it can still be a
starting point for a human to investigate.
3. The original would order only by count, not by recency.
This version can do either. Combined with showing the
counts, it can be easier to decide how to resolve.
4. This version shows similar entries in a blank-delimited
stanza, which makes it more clear which options you are
picking from.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-12 12:41:41 +01:00
|
|
|
if ($match_names) {
|
|
|
|
foreach my $n (dups($name)) {
|
|
|
|
foreach my $e (vals($name->{$n})) {
|
|
|
|
show($n, $e, $name->{$n}->{$e});
|
2007-07-14 22:43:09 +02:00
|
|
|
}
|
contrib: update stats/mailmap script
This version changes quite a few things:
1. The original parsed the mailmap file itself, and it did
it wrong (it did not understand entries with an extra
email key).
Instead, this version uses git's "%aE" and "%aN"
formats to have git perform the mapping, meaning we do
not have to read .mailmap at all, but still operate on
the current state that git sees (and it also works
properly from subdirs).
2. The original would find multiple names for an email,
but not the other way around.
This version can do either or both. If we find multiple
emails for a name, the resolution is less obvious than
the other way around. However, it can still be a
starting point for a human to investigate.
3. The original would order only by count, not by recency.
This version can do either. Combined with showing the
counts, it can be easier to decide how to resolve.
4. This version shows similar entries in a blank-delimited
stanza, which makes it more clear which options you are
picking from.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-12 12:41:41 +01:00
|
|
|
print "\n";
|
2007-07-14 22:43:09 +02:00
|
|
|
}
|
|
|
|
}
|
contrib: update stats/mailmap script
This version changes quite a few things:
1. The original parsed the mailmap file itself, and it did
it wrong (it did not understand entries with an extra
email key).
Instead, this version uses git's "%aE" and "%aN"
formats to have git perform the mapping, meaning we do
not have to read .mailmap at all, but still operate on
the current state that git sees (and it also works
properly from subdirs).
2. The original would find multiple names for an email,
but not the other way around.
This version can do either or both. If we find multiple
emails for a name, the resolution is less obvious than
the other way around. However, it can still be a
starting point for a human to investigate.
3. The original would order only by count, not by recency.
This version can do either. Combined with showing the
counts, it can be easier to decide how to resolve.
4. This version shows similar entries in a blank-delimited
stanza, which makes it more clear which options you are
picking from.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-12 12:41:41 +01:00
|
|
|
exit 0;
|
2007-07-14 22:43:09 +02:00
|
|
|
|
contrib: update stats/mailmap script
This version changes quite a few things:
1. The original parsed the mailmap file itself, and it did
it wrong (it did not understand entries with an extra
email key).
Instead, this version uses git's "%aE" and "%aN"
formats to have git perform the mapping, meaning we do
not have to read .mailmap at all, but still operate on
the current state that git sees (and it also works
properly from subdirs).
2. The original would find multiple names for an email,
but not the other way around.
This version can do either or both. If we find multiple
emails for a name, the resolution is less obvious than
the other way around. However, it can still be a
starting point for a human to investigate.
3. The original would order only by count, not by recency.
This version can do either. Combined with showing the
counts, it can be easier to decide how to resolve.
4. This version shows similar entries in a blank-delimited
stanza, which makes it more clear which options you are
picking from.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-12 12:41:41 +01:00
|
|
|
sub mark {
|
|
|
|
my ($h, $k, $v, $t) = @_;
|
|
|
|
my $e = $h->{$k}->{$v} ||= { count => 0, stamp => 0 };
|
|
|
|
$e->{count}++;
|
|
|
|
$e->{stamp} = $t unless $t < $e->{stamp};
|
|
|
|
}
|
|
|
|
|
|
|
|
sub dups {
|
|
|
|
my $h = shift;
|
|
|
|
return grep { keys($h->{$_}) > 1 } keys($h);
|
|
|
|
}
|
|
|
|
|
|
|
|
sub vals {
|
|
|
|
my $h = shift;
|
|
|
|
return sort {
|
|
|
|
$h->{$b}->{$order_by} <=> $h->{$a}->{$order_by}
|
|
|
|
} keys($h);
|
|
|
|
}
|
|
|
|
|
|
|
|
sub show {
|
|
|
|
my ($n, $e, $h) = @_;
|
|
|
|
print "$n <$e> ($h->{$order_by})\n";
|
|
|
|
}
|