pretty-printing body of the commit that is stored in non UTF-8
encoding did not work well. The early part of this series fixes
it. And then it adds %C(auto) specifier that turns the coloring on
when we are emitting to the terminal, and adds column-aligning
format directives.
* nd/pretty-formats:
pretty: support %>> that steal trailing spaces
pretty: support truncating in %>, %< and %><
pretty: support padding placeholders, %< %> and %><
pretty: add %C(auto) for auto-coloring
pretty: split color parsing into a separate function
pretty: two phase conversion for non utf-8 commits
utf8.c: add reencode_string_len() that can handle NULs in string
utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
utf8.c: move display_mode_esc_sequence_len() for use by other functions
pretty: share code between format_decoration and show_decorations
pretty-formats.txt: wrap long lines
pretty: get the correct encoding for --pretty:format=%e
pretty: save commit encoding from logmsg_reencode if the caller needs it
A fix to a long-standing issue in the command line parser for
revisions, which was triggered by mv/sequence-pick-error-diag topic.
* tr/copy-revisions-from-stdin:
read_revisions_from_stdin: make copies for handle_revision_arg
The commit encoding is parsed by logmsg_reencode, there's no need for
the caller to re-parse it again. The reencoded message now has the new
encoding, not the original one. The caller would need to read commit
object again before parsing.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_revisions_from_stdin() has passed pointers to its read buffer
down to handle_revision_arg() since its inception way back in 42cabc3
(Teach rev-list an option to read revs from the standard input.,
2006-09-05). Even back then, this was a bug: through
add_pending_object, the argument was recorded in the object_array's
'name' field.
Fix it by making a copy whenever read_revisions_from_stdin() passes an
argument down the callchain. The other caller runs handle_revision_arg()
on argv[], where it would be redundant to make a copy.
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Kevin Bracey reports that the change regresses a case shown in the
user manual.
Trading one fix with another breakage is not worth it. Just keep
the test to document the existing breakage, and revert the change
for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the revision "slop" code to look deeper while commits with
exactly the same timestamps come next to each other (which can
often happen after a large "am" and "rebase" session).
* kk/revwalk-slop-too-many-commit-within-a-second:
Fix revision walk for commits with the same dates
The --simplify-merges logic did not cull irrelevant parents from a
merge that is otherwise not interesting with respect to the paths
we are following.
This touches a fairly core part of the revision traversal
infrastructure; even though I think this change is correct, please
report immediately if you find any unintended side effect.
* jc/remove-treesame-parent-in-simplify-merges:
simplify-merges: drop merge from irrelevant side branch
Logic in still_interesting function allows to stop the commits
traversing if the oldest processed commit is not older then the
youngest commit on the list to process and the list contains only
commits marked as not interesting ones. It can be premature when dealing
with a set of coequal commits. For example git rev-list A^! --not B
provides wrong answer if all commits in the range A..B had the same
commit time and there are more then 7 of them.
To fix this problem the relevant part of the logic in still_interesting
is changed to: the walk can be stopped if the oldest processed commit is
younger then the youngest commit on the list to processed.
Signed-off-by: Kacper Kornet <draenog@pld-linux.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you run "git log --grep=foo", we will run your regex on
the literal bytes of the commit message. This can provide
confusing results if the commit message is not in the same
encoding as your grep expression (or worse, you have commits
in multiple encodings, in which case your regex would need
to be written to match either encoding). On top of this, we
might also be grepping in the commit's notes, which are
already re-encoded, potentially leading to grepping in a
buffer with mixed encodings concatenated. This is insanity,
but most people never noticed, because their terminal and
their commit encodings all match.
Instead, let's massage the to-be-grepped commit into a
standardized encoding. There is not much point in adding a
flag for "this is the encoding I expect my grep pattern to
match"; the only sane choice is for it to use the log output
encoding. That is presumably what the user's terminal is
using, and it means that the patterns found by the grep will
match the output produced by git.
As a bonus, this fixes a potential segfault in commit_match
when commit->buffer is NULL, as we now build on logmsg_reencode,
which handles reading the commit buffer from disk if
necessary. The segfault can be triggered with:
git commit -m 'text1' --allow-empty
git commit -m 'text2' --allow-empty
git log --graph --no-walk --grep 'text2'
which arguably does not make any sense (--graph inherently
wants a connected history, and by --no-walk the command line
is telling us to show discrete points in history without
connectivity), and we probably should forbid the
combination, but that is a separate issue.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge simplification rule stated in 6546b59 (revision traversal:
show full history with merge simplification, 2008-07-31) still
treated merge commits too specially. Namely, in a history with this
shape:
---o---o---M
/
x---x---x
where three 'x' were on a history completely unrelated to the main
history 'o' and do not touch any of the paths we are following, we
still said that after simplifying all of the parents of M, 'x'
(which is the leftmost 'x' that rightmost 'x simplifies down to) and
'o' (which would be the last commit on the main history that touches
the paths we are following) are independent from each other, and
both need to be kept.
That is incorrect; when the side branch 'x' never touches the paths,
it should be removed to allow M to simplify down to the last commit
on the main history that touches the paths.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we taught the commit_match() mechanism to pay attention to the
new --use-mailmap option, we started to unconditionally copy the
commit object to a temporary buffer, just in case we need the author
and committer lines updated via the mailmap mechanism, and rewrite
author and committer using the mailmap.
It turns out that this has a rather unpleasant performance
implications. In the linux kernel repository, running
$ git log --author='Junio C Hamano' --pretty=short >/dev/null
under /usr/bin/time, with and without --use-mailmap (the .mailmap
file is 118 entries long, the particular author does not appear in
it), cost (with warm cache):
[without --use-mailmap]
5.42user 0.26system 0:05.70elapsed 99%CPU (0avgtext+0avgdata 2005936maxresident)k
0inputs+0outputs (0major+137669minor)pagefaults 0swaps
[with --use-mailmap]
6.47user 0.30system 0:06.78elapsed 99%CPU (0avgtext+0avgdata 2006288maxresident)k
0inputs+0outputs (0major+137692minor)pagefaults 0swaps
which incurs about 20% overhead. The command is doing extra work,
so the extra cost may be justified.
But it is inexcusable to pay the cost when we do not need
author/committer match. In the same repository,
$ git log --grep='fix menuconfig on debian lenny' --pretty=short >/dev/null
shows very similar numbers as the above:
[without --use-mailmap]
5.32user 0.30system 0:05.63elapsed 99%CPU (0avgtext+0avgdata 2005984maxresident)k
0inputs+0outputs (0major+137672minor)pagefaults 0swaps
[with --use-mailmap]
6.64user 0.24system 0:06.89elapsed 99%CPU (0avgtext+0avgdata 2006320maxresident)k
0inputs+0outputs (0major+137694minor)pagefaults 0swaps
The latter case is an unnecessary performance regression. We may
want to _show_ the result with mailmap applied, but we do not have
to copy and rewrite the author/committer of all commits we try to
match if we do not query for these fields.
Trivially optimize this performace regression by limiting the
rewrites for only when we are matching with author/committer fields.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently you can use mailmap to display log authors and committers
but you can't use the mailmap to find commits with mapped values.
This commit allows you to run:
git log --use-mailmap --author mapped_name_or_email
git log --use-mailmap --committer mapped_name_or_email
Of course it only works if the --use-mailmap option is used.
The new name and email are copied only when necessary.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emit the notes attached to the commit in "format-patch --notes"
output after three-dashes.
* jc/prettier-pretty-note:
format-patch: add a blank line between notes and diffstat
Doc User-Manual: Patch cover letter, three dashes, and --notes
Doc format-patch: clarify --notes use case
Doc notes: Include the format-patch --notes option
Doc SubmittingPatches: Mention --notes option after "cover letter"
Documentation: decribe format-patch --notes
format-patch --notes: show notes after three-dashes
format-patch: append --signature after notes
pretty_print_commit(): do not append notes message
pretty: prepare notes message at a centralized place
format_note(): simplify API
pretty: remove reencode_commit_message()
We either stuff the notes message without modification for %N
userformat, or format it for human consumption. Using two bits
is an overkill that does not benefit anybody.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we added the "--perl-regexp" option (or "-P") to "git grep", we
should have done the same for the commands in the "git log" family,
but somehow we forgot to do so. This corrects it, but we will
reserve the short-and-sweet "-P" option for something else for now.
Also introduce the "--basic-regexp" option for completeness, so that
the "last one wins" principle can be used to defeat an earlier -E
option, e.g. "git log -E --basic-regexp --grep='<bre>'". Note that
it cannot have the short "-G" option as the option is to grep in the
patch text in the context of "log" family.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command line option parser for "git log -F -E --grep='<ere>'"
did not flip the "fixed" bit, violating the general "last option
wins" principle among conflicting options.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using the hand-rolled initialization sequence,
use grep_init() to populate the necessary bits. This opens
the door to allow the calling commands to optionally read
grep.* configuration variables via git_config() if they
want to.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-log-grep-all-match-1:
grep.c: make two symbols really file-scope static this time
t7810-grep: test --all-match with multiple --grep and --author options
t7810-grep: test interaction of multiple --grep and --author options
t7810-grep: test multiple --author with --all-match
t7810-grep: test multiple --grep with and without --all-match
t7810-grep: bring log --grep tests in common form
grep.c: mark private file-scope symbols as static
log: document use of multiple commit limiting options
log --grep/--author: honor --all-match honored for multiple --grep patterns
grep: show --debug output only once
grep: teach --debug option to dump the parse tree
Notes are shown after commit body. From user perspective it looks
pretty much like commit body and they may assume --grep would search
in that part too.
Make it so.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to --author/--committer which filters commits by author and
committer header fields. --grep-reflog adds a fake "reflog" header to
commit and a grep filter to search on that line.
All rules to --author/--committer apply except no timestamp stripping.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a long-standing bug in "git log --grep" when multiple "--grep"
are used together with "--all-match" and "--author" or "--committer".
* jc/maint-log-grep-all-match:
t7810-grep: test --all-match with multiple --grep and --author options
t7810-grep: test interaction of multiple --grep and --author options
t7810-grep: test multiple --author with --all-match
t7810-grep: test multiple --grep with and without --all-match
t7810-grep: bring log --grep tests in common form
grep.c: mark private file-scope symbols as static
log: document use of multiple commit limiting options
log --grep/--author: honor --all-match honored for multiple --grep patterns
grep: show --debug output only once
grep: teach --debug option to dump the parse tree
* mz/cherry-pick-cmdline-order:
cherry-pick/revert: respect order of revisions to pick
demonstrate broken 'git cherry-pick three one two'
teach log --no-walk=unsorted, which avoids sorting
Our "grep" allows complex boolean expressions to be formed to match
each individual line with operators like --and, '(', ')' and --not.
Introduce the "--debug" option to show the parse tree to help people
who want to debug and enhance it.
Also "log" learns "--grep-debug" option to do the same. The command
line parser to the log family is a lot more limited than the general
"git grep" parser, but it has special handling for header matching
(e.g. "--author"), and a parse tree is valuable when working on it.
Note that "--all-match" is *not* any individual node in the parse
tree. It is an instruction to the evaluator to check all the nodes
in the top-level backbone have matched and reject a document as
non-matching otherwise.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log .." errored out saying it is both rev range and a path when
there is no disambiguating "--" is on the command line. Update the
command line parser to interpret ".." as a path in such a case.
* jc/dotdot-is-parent-directory:
specifying ranges: we did not mean to make ".." an empty set
"git cherry-pick A C B" used to replay changes in A and then B and
then C if these three commits had committer timestamps in that
order, which is not what the user who said "A C B" naturally expects.
* mz/cherry-pick-cmdline-order:
cherry-pick/revert: respect order of revisions to pick
demonstrate broken 'git cherry-pick three one two'
teach log --no-walk=unsorted, which avoids sorting
* maint-1.7.11:
Almost 1.7.11.6
gitweb: URL-decode $my_url/$my_uri when stripping PATH_INFO
rebase -i: use full onto sha1 in reflog
sh-setup: protect from exported IFS
receive-pack: do not leak output from auto-gc to standard output
t/t5400: demonstrate breakage caused by informational message from prune
setup: clarify error messages for file/revisions ambiguity
send-email: improve RFC2047 quote parsing
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
"git log .." errored out saying it is both rev range and a path when
there is no disambiguating "--" is on the command line. Update the
command line parser to interpret ".." as a path in such a case.
* jc/dotdot-is-parent-directory:
specifying ranges: we did not mean to make ".." an empty set
When 'git log' is passed the --no-walk option, no revision walk takes
place, naturally. Perhaps somewhat surprisingly, however, the provided
revisions still get sorted by commit date. So e.g 'git log --no-walk
HEAD HEAD~1' and 'git log --no-walk HEAD~1 HEAD' give the same result
(unless the two revisions share the commit date, in which case they
will retain the order given on the command line). As the commit that
introduced --no-walk (8e64006 (Teach revision machinery about
--no-walk, 2007-07-24)) points out, the sorting is intentional, to
allow things like
git log --abbrev-commit --pretty=oneline --decorate --all --no-walk
to show all refs in order by commit date.
But there are also other cases where the sorting is not wanted, such
as
<command producing revisions in order> |
git log --oneline --no-walk --stdin
To accomodate both cases, leave the decision of whether or not to sort
up to the caller, by allowing --no-walk={sorted,unsorted}, defaulting
to 'sorted' for backward-compatibility reasons.
Signed-off-by: Martin von Zweigbergk <martinvonz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not want a link to 0{40} object stored anywhere in our objects.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
Either end of revision range operator can be omitted to default to HEAD,
as in "origin.." (what did I do since I forked) or "..origin" (what did
they do since I forked). But the current parser interprets ".." as an
empty range "HEAD..HEAD", and worse yet, because ".." does exist on the
filesystem, we get this annoying output:
$ cd Documentation/howto
$ git log .. ;# give me recent commits that touch Documentation/ area.
fatal: ambiguous argument '..': both revision and filename
Use '--' to separate filenames from revisions
Surely we could say "git log ../" or even "git log -- .." to disambiguate,
but we shouldn't have to.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diff_setup_done() has historically returned an error code, but lost
the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
regardless of -M/-C, 2006-08-09). The callers were in a pretty
confused state: some actually checked for the return code, and some
did not.
Let it return void, and patch all callers to take this into account.
This conveniently also gets rid of a handful of different(!) error
messages that could never be triggered anyway.
Note that the function can still die().
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log -n 1 -- rarely-touched-path" was spending unnecessary
cycles after showing the first change to find the next one, only to
discard it.
* jk/revision-walk-stop-at-max-count:
revision: avoid work after --max-count is reached
Teaches the object name parser things like a "git describe" output
is always a commit object, "A" in "git log A" must be a committish,
and "A" and "B" in "git log A...B" both must be committish, etc., to
prolong the lifetime of abbreviated object names.
* jc/sha1-name-more: (27 commits)
t1512: match the "other" object names
t1512: ignore whitespaces in wc -l output
rev-parse --disambiguate=<prefix>
rev-parse: A and B in "rev-parse A..B" refer to committish
reset: the command takes committish
commit-tree: the command wants a tree and commits
apply: --build-fake-ancestor expects blobs
sha1_name.c: add support for disambiguating other types
revision.c: the "log" family, except for "show", takes committish
revision.c: allow handle_revision_arg() to take other flags
sha1_name.c: introduce get_sha1_committish()
sha1_name.c: teach lookup context to get_sha1_with_context()
sha1_name.c: many short names can only be committish
sha1_name.c: get_sha1_1() takes lookup flags
sha1_name.c: get_describe_name() by definition groks only commits
sha1_name.c: teach get_short_sha1() a commit-only option
sha1_name.c: allow get_short_sha1() to take other flags
get_sha1(): fix error status regression
sha1_name.c: restructure disambiguation of short names
sha1_name.c: correct misnamed "canonical" and "res"
...
During a revision traversal in which --max-count has been
specified, we decrement a counter for each revision returned
by get_revision. When it hits 0, we typically return NULL
(the exception being if we still have boundary commits to
show).
However, before we check the counter, we call get_revision_1
to get the next commit. This might involve looking at a
large number of commits if we have restricted the traversal
(e.g., we might traverse until we find the next commit whose
diff actually matches a pathspec).
There's no need to make this get_revision_1 call when our
counter runs out. If we are not in --boundary mode, we will
just throw away the result and immediately return NULL. If
we are in --boundary mode, then we will still throw away the
result, and then start showing the boundary commits.
However, as git_revision_1 does not impact the boundary
list, it should not have an impact.
In most cases, avoiding this work will not be especially
noticeable. However, in some cases, it can make a big
difference:
[before]
$ time git rev-list -1 origin Documentation/RelNotes/1.7.11.2.txt
8d141a1d56
real 0m0.301s
user 0m0.280s
sys 0m0.016s
[after]
$ time git rev-list -1 origin Documentation/RelNotes/1.7.11.2.txt
8d141a1d56
real 0m0.010s
user 0m0.008s
sys 0m0.000s
Note that the output is produced almost instantaneously in
the first case, and then git uselessly spends a long time
looking for the next commit to touch that file (but there
isn't one, and we traverse all the way down to the roots).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git log" gets "--simplify-merges/by-decoration" together with
"--first-parent", the combination of these options makes the
simplification logic to use in-core commit objects that haven't been
examined for relevance, either producing incorrect result or taking
too long to produce any output. Teach the simplification logic to
ignore commits that the first-parent traversal logic ignored when
both are in effect to work around the issue.
* jc/rev-list-simplify-merges-first-parent:
revision: ignore side parents while running simplify-merges
revision: note the lack of free() in simplify_merges()
revision: "simplify" options imply topo-order sort
"git diff COPYING HEAD:COPYING" gave a nonsense error message that
claimed that the treeish HEAD did not have COPYING in it.
* mm/verify-filename-fix:
verify_filename(): ask the caller to chose the kind of diagnosis
sha1_name: do not trigger detailed diagnosis for file arguments
Add a field to setup_revision_opt structure and allow these callers
to tell the setup_revisions command parsing machinery that short SHA1
it encounters are meant to name committish.
This step does not go all the way to connect the setup_revisions()
to sha1_name.c yet.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing "cant_be_filename" that tells the function that the
caller knows the arg is not a path (hence it does not have to be
checked for absense of the file whose name matches it) is made into
a bit in the flag word.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many callers know that the user meant to name a committish by
syntactical positions where the object name appears. Calling this
function allows the machinery to disambiguate shorter-than-unique
abbreviated object names between committish and others.
Note that this does NOT error out when the named object is not a
committish. It is merely to give a hint to the disambiguation
machinery.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function takes user input string and returns the object name
(binary SHA-1) with mode bits and path when the object was looked
up in a tree.
Additionally give hints to help disambiguation of abbreviated object
names when the caller knows what it is looking for.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are only two callers, and they will benefit from being able to
pass disambiguation hints to underlying get_sha1_with_context() API
once it happens.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "--simplify-merges/by-decoration" is given together with
"--first-parent" to "git log", the combination of these options
makes the simplification logic to use in-core commit objects that
haven't been examined for relevance, either producing incorrect
result or taking too long to produce any output. Teach the
simplification logic to ignore commits that the first-parent
traversal logic ignored when both are in effect to work around the
issue.