2006-02-26 01:19:46 +01:00
|
|
|
#ifndef REVISION_H
|
|
|
|
#define REVISION_H
|
|
|
|
|
2018-08-15 19:54:05 +02:00
|
|
|
#include "commit.h"
|
2008-07-09 23:38:34 +02:00
|
|
|
#include "parse-options.h"
|
2008-08-25 08:15:05 +02:00
|
|
|
#include "grep.h"
|
2010-03-12 18:04:26 +01:00
|
|
|
#include "notes.h"
|
2017-12-12 09:55:35 +01:00
|
|
|
#include "pretty.h"
|
2013-10-31 10:25:36 +01:00
|
|
|
#include "diff.h"
|
2018-05-19 07:28:24 +02:00
|
|
|
#include "commit-slab-decl.h"
|
2008-07-09 23:38:34 +02:00
|
|
|
|
2019-11-17 22:04:48 +01:00
|
|
|
/**
|
|
|
|
* The revision walking API offers functions to build a list of revisions
|
|
|
|
* and then iterate over that list.
|
|
|
|
*
|
|
|
|
* Calling sequence
|
|
|
|
* ----------------
|
|
|
|
*
|
|
|
|
* The walking API has a given calling sequence: first you need to initialize
|
|
|
|
* a rev_info structure, then add revisions to control what kind of revision
|
|
|
|
* list do you want to get, finally you can iterate over the revision list.
|
|
|
|
*
|
|
|
|
*/
|
|
|
|
|
2014-03-25 14:23:26 +01:00
|
|
|
/* Remember to update object flag allocation in object.h */
|
2006-02-26 01:19:46 +01:00
|
|
|
#define SEEN (1u<<0)
|
|
|
|
#define UNINTERESTING (1u<<1)
|
2007-11-13 08:16:08 +01:00
|
|
|
#define TREESAME (1u<<2)
|
2006-03-01 00:07:20 +01:00
|
|
|
#define SHOWN (1u<<3)
|
2006-03-01 09:58:56 +01:00
|
|
|
#define TMP_MARK (1u<<4) /* for isolated cases; clean after use */
|
2006-03-28 09:58:34 +02:00
|
|
|
#define BOUNDARY (1u<<5)
|
2007-03-06 01:10:28 +01:00
|
|
|
#define CHILD_SHOWN (1u<<6)
|
2006-04-17 03:12:49 +02:00
|
|
|
#define ADDED (1u<<7) /* Parents already parsed and added? */
|
2006-10-23 02:32:47 +02:00
|
|
|
#define SYMMETRIC_LEFT (1u<<8)
|
2011-03-07 13:31:40 +01:00
|
|
|
#define PATCHSAME (1u<<9)
|
2013-05-16 17:32:38 +02:00
|
|
|
#define BOTTOM (1u<<10)
|
revision: --show-pulls adds helpful merges
The default file history simplification of "git log -- <path>" or
"git rev-list -- <path>" focuses on providing the smallest set of
commits that first contributed a change. The revision walk greatly
restricts the set of walked commits by visiting only the first
TREESAME parent of a merge commit, when one exists. This means
that portions of the commit-graph are not walked, which can be a
performance benefit, but can also "hide" commits that added changes
but were ignored by a merge resolution.
The --full-history option modifies this by walking all commits and
reporting a merge commit as "interesting" if it has _any_ parent
that is not TREESAME. This tends to be an over-representation of
important commits, especially in an environment where most merge
commits are created by pull request completion.
Suppose we have a commit A and we create a commit B on top that
changes our file. When we merge the pull request, we create a merge
commit M. If no one else changed the file in the first-parent
history between M and A, then M will not be TREESAME to its first
parent, but will be TREESAME to B. Thus, the simplified history
will be "B". However, M will appear in the --full-history mode.
However, suppose that a number of topics T1, T2, ..., Tn were
created based on commits C1, C2, ..., Cn between A and M as
follows:
A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
\ \ \ \ / / / /
\ \__.. \ \/ ..__T1 / Tn
\ \__.. /\ ..__T2 /
\_____________________B \____________________/
If the commits T1, T2, ... Tn did not change the file, then all of
P1 through Pn will be TREESAME to their first parent, but not
TREESAME to their second. This means that all of those merge commits
appear in the --full-history view, with edges that immediately
collapse into the lower history without introducing interesting
single-parent commits.
The --simplify-merges option was introduced to remove these extra
merge commits. By noticing that the rewritten parents are reachable
from their first parents, those edges can be simplified away. Finally,
the commits now look like single-parent commits that are TREESAME to
their "only" parent. Thus, they are removed and this issue does not
cause issues anymore. However, this also ends up removing the commit
M from the history view! Even worse, the --simplify-merges option
requires walking the entire history before returning a single result.
Many Git users are using Git alongside a Git service that provides
code storage alongside a code review tool commonly called "Pull
Requests" or "Merge Requests" against a target branch. When these
requests are accepted and merged, they typically create a merge
commit whose first parent is the previous branch tip and the second
parent is the tip of the topic branch used for the request. This
presents a valuable order to the parents, but also makes that merge
commit slightly special. Users may want to see not only which
commits changed a file, but which pull requests merged those commits
into their branch. In the previous example, this would mean the
users want to see the merge commit "M" in addition to the single-
parent commit "C".
Users are even more likely to want these merge commits when they
use pull requests to merge into a feature branch before merging that
feature branch into their trunk.
In some sense, users are asking for the "first" merge commit to
bring in the change to their branch. As long as the parent order is
consistent, this can be handled with the following rule:
Include a merge commit if it is not TREESAME to its first
parent, but is TREESAME to a later parent.
These merges look like the merge commits that would result from
running "git pull <topic>" on a main branch. Thus, the option to
show these commits is called "--show-pulls". This has the added
benefit of showing the commits created by closing a pull request or
merge request on any of the Git hosting and code review platforms.
To test these options, extend the standard test example to include
a merge commit that is not TREESAME to its first parent. It is
surprising that that option was not already in the example, as it
is instructive.
In particular, this extension demonstrates a common issue with file
history simplification. When a user resolves a merge conflict using
"-Xours" or otherwise ignoring one side of the conflict, they create
a TREESAME edge that probably should not be TREESAME. This leads
users to become frustrated and complain that "my change disappeared!"
In my experience, showing them history with --full-history and
--simplify-merges quickly reveals the problematic merge. As mentioned,
this option is expensive to compute. The --show-pulls option
_might_ show the merge commit (usually titled "resolving conflicts")
more quickly. Of course, this depends on the user having the correct
parent order, which is backwards when using "git pull master" from a
topic branch.
There are some special considerations when combining the --show-pulls
option with --simplify-merges. This requires adding a new PULL_MERGE
object flag to store the information from the initial TREESAME
comparisons. This helps avoid dropping those commits in later filters.
This is covered by a test, including how the parents can be simplified.
Since "struct object" has already ruined its 32-bit alignment by using
33 bits across parsed, type, and flags member, let's not make it worse.
PULL_MERGE is used in revision.c with the same value (1u<<15) as
REACHABLE in commit-graph.c. The REACHABLE flag is only used when
writing a commit-graph file, and a revision walk using --show-pulls
does not happen in the same process. Care must be taken in the future
to ensure this remains the case.
Update Documentation/rev-list-options.txt with significant details
around this option. This requires updating the example in the
History Simplification section to demonstrate some of the problems
with TREESAME second parents.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:19:43 +02:00
|
|
|
|
|
|
|
/* WARNING: This is also used as REACHABLE in commit-graph.c. */
|
|
|
|
#define PULL_MERGE (1u<<15)
|
revision: reallocate TOPO_WALK object flags
The bit fields in struct object have an unfortunate layout. Here's what
pahole reports on x86_64 GNU/Linux:
struct object {
unsigned int parsed:1; /* 0: 0 4 */
unsigned int type:3; /* 0: 1 4 */
/* XXX 28 bits hole, try to pack */
/* Force alignment to the next boundary: */
unsigned int :0;
unsigned int flags:29; /* 4: 0 4 */
/* XXX 3 bits hole, try to pack */
struct object_id oid; /* 8 32 */
/* size: 40, cachelines: 1, members: 4 */
/* sum members: 32 */
/* sum bitfield members: 33 bits, bit holes: 2, sum bit holes: 31 bits */
/* last cacheline: 40 bytes */
};
Notice the 1+3+29=33 bits in bit fields and 28+3=31 bits in holes.
There are holes inside the flags bit field as well -- while some object
flags are used for more than one purpose, 22, 23 and 24 are still free.
Use 23 and 24 instead of 27 and 28 for TOPO_WALK_EXPLORED and
TOPO_WALK_INDEGREE. This allows us to reduce FLAG_BITS by one so that
all bitfields combined fit into a single 32-bit slot:
struct object {
unsigned int parsed:1; /* 0: 0 4 */
unsigned int type:3; /* 0: 1 4 */
unsigned int flags:28; /* 0: 4 4 */
struct object_id oid; /* 4 32 */
/* size: 36, cachelines: 1, members: 4 */
/* last cacheline: 36 bytes */
};
With this tight packing the size of struct object is reduced by 10%.
Other architectures probably benefit as well.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 15:05:38 +02:00
|
|
|
|
|
|
|
#define TOPO_WALK_EXPLORED (1u<<23)
|
|
|
|
#define TOPO_WALK_INDEGREE (1u<<24)
|
|
|
|
|
2018-10-05 23:31:24 +02:00
|
|
|
/*
|
|
|
|
* Indicates object was reached by traversal. i.e. not given by user on
|
|
|
|
* command-line or stdin.
|
|
|
|
* NEEDSWORK: NOT_USER_GIVEN doesn't apply to commits because we only support
|
|
|
|
* filtering trees and blobs, but it may be useful to support filtering commits
|
|
|
|
* in the future.
|
|
|
|
*/
|
|
|
|
#define NOT_USER_GIVEN (1u<<25)
|
2014-03-25 14:23:27 +01:00
|
|
|
#define TRACK_LINEAR (1u<<26)
|
revision: --show-pulls adds helpful merges
The default file history simplification of "git log -- <path>" or
"git rev-list -- <path>" focuses on providing the smallest set of
commits that first contributed a change. The revision walk greatly
restricts the set of walked commits by visiting only the first
TREESAME parent of a merge commit, when one exists. This means
that portions of the commit-graph are not walked, which can be a
performance benefit, but can also "hide" commits that added changes
but were ignored by a merge resolution.
The --full-history option modifies this by walking all commits and
reporting a merge commit as "interesting" if it has _any_ parent
that is not TREESAME. This tends to be an over-representation of
important commits, especially in an environment where most merge
commits are created by pull request completion.
Suppose we have a commit A and we create a commit B on top that
changes our file. When we merge the pull request, we create a merge
commit M. If no one else changed the file in the first-parent
history between M and A, then M will not be TREESAME to its first
parent, but will be TREESAME to B. Thus, the simplified history
will be "B". However, M will appear in the --full-history mode.
However, suppose that a number of topics T1, T2, ..., Tn were
created based on commits C1, C2, ..., Cn between A and M as
follows:
A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
\ \ \ \ / / / /
\ \__.. \ \/ ..__T1 / Tn
\ \__.. /\ ..__T2 /
\_____________________B \____________________/
If the commits T1, T2, ... Tn did not change the file, then all of
P1 through Pn will be TREESAME to their first parent, but not
TREESAME to their second. This means that all of those merge commits
appear in the --full-history view, with edges that immediately
collapse into the lower history without introducing interesting
single-parent commits.
The --simplify-merges option was introduced to remove these extra
merge commits. By noticing that the rewritten parents are reachable
from their first parents, those edges can be simplified away. Finally,
the commits now look like single-parent commits that are TREESAME to
their "only" parent. Thus, they are removed and this issue does not
cause issues anymore. However, this also ends up removing the commit
M from the history view! Even worse, the --simplify-merges option
requires walking the entire history before returning a single result.
Many Git users are using Git alongside a Git service that provides
code storage alongside a code review tool commonly called "Pull
Requests" or "Merge Requests" against a target branch. When these
requests are accepted and merged, they typically create a merge
commit whose first parent is the previous branch tip and the second
parent is the tip of the topic branch used for the request. This
presents a valuable order to the parents, but also makes that merge
commit slightly special. Users may want to see not only which
commits changed a file, but which pull requests merged those commits
into their branch. In the previous example, this would mean the
users want to see the merge commit "M" in addition to the single-
parent commit "C".
Users are even more likely to want these merge commits when they
use pull requests to merge into a feature branch before merging that
feature branch into their trunk.
In some sense, users are asking for the "first" merge commit to
bring in the change to their branch. As long as the parent order is
consistent, this can be handled with the following rule:
Include a merge commit if it is not TREESAME to its first
parent, but is TREESAME to a later parent.
These merges look like the merge commits that would result from
running "git pull <topic>" on a main branch. Thus, the option to
show these commits is called "--show-pulls". This has the added
benefit of showing the commits created by closing a pull request or
merge request on any of the Git hosting and code review platforms.
To test these options, extend the standard test example to include
a merge commit that is not TREESAME to its first parent. It is
surprising that that option was not already in the example, as it
is instructive.
In particular, this extension demonstrates a common issue with file
history simplification. When a user resolves a merge conflict using
"-Xours" or otherwise ignoring one side of the conflict, they create
a TREESAME edge that probably should not be TREESAME. This leads
users to become frustrated and complain that "my change disappeared!"
In my experience, showing them history with --full-history and
--simplify-merges quickly reveals the problematic merge. As mentioned,
this option is expensive to compute. The --show-pulls option
_might_ show the merge commit (usually titled "resolving conflicts")
more quickly. Of course, this depends on the user having the correct
parent order, which is backwards when using "git pull master" from a
topic branch.
There are some special considerations when combining the --show-pulls
option with --simplify-merges. This requires adding a new PULL_MERGE
object flag to store the information from the initial TREESAME
comparisons. This helps avoid dropping those commits in later filters.
This is covered by a test, including how the parents can be simplified.
Since "struct object" has already ruined its 32-bit alignment by using
33 bits across parsed, type, and flags member, let's not make it worse.
PULL_MERGE is used in revision.c with the same value (1u<<15) as
REACHABLE in commit-graph.c. The REACHABLE flag is only used when
writing a commit-graph file, and a revision walk using --show-pulls
does not happen in the same process. Care must be taken in the future
to ensure this remains the case.
Update Documentation/rev-list-options.txt with significant details
around this option. This requires updating the example in the
History Simplification section to demonstrate some of the problems
with TREESAME second parents.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:19:43 +02:00
|
|
|
#define ALL_REV_FLAGS (((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | PULL_MERGE)
|
2006-02-26 01:19:46 +01:00
|
|
|
|
2009-08-15 16:23:12 +02:00
|
|
|
#define DECORATE_SHORT_REFS 1
|
|
|
|
#define DECORATE_FULL_REFS 2
|
|
|
|
|
Log message printout cleanups
On Sun, 16 Apr 2006, Junio C Hamano wrote:
>
> In the mid-term, I am hoping we can drop the generate_header()
> callchain _and_ the custom code that formats commit log in-core,
> found in cmd_log_wc().
Ok, this was nastier than expected, just because the dependencies between
the different log-printing stuff were absolutely _everywhere_, but here's
a patch that does exactly that.
The patch is not very easy to read, and the "--patch-with-stat" thing is
still broken (it does not call the "show_log()" thing properly for
merges). That's not a new bug. In the new world order it _should_ do
something like
if (rev->logopt)
show_log(rev, rev->logopt, "---\n");
but it doesn't. I haven't looked at the --with-stat logic, so I left it
alone.
That said, this patch removes more lines than it adds, and in particular,
the "cmd_log_wc()" loop is now a very clean:
while ((commit = get_revision(rev)) != NULL) {
log_tree_commit(rev, commit);
free(commit->buffer);
commit->buffer = NULL;
}
so it doesn't get much prettier than this. All the complexity is entirely
hidden in log-tree.c, and any code that needs to flush the log literally
just needs to do the "if (rev->logopt) show_log(...)" incantation.
I had to make the combined_diff() logic take a "struct rev_info" instead
of just a "struct diff_options", but that part is pretty clean.
This does change "git whatchanged" from using "diff-tree" as the commit
descriptor to "commit", and I changed one of the tests to reflect that new
reality. Otherwise everything still passes, and my other tests look fine
too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
|
|
|
struct log_info;
|
2018-09-21 17:57:38 +02:00
|
|
|
struct repository;
|
|
|
|
struct rev_info;
|
2010-03-12 18:04:26 +01:00
|
|
|
struct string_list;
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
|
|
|
struct saved_parents;
|
2020-04-06 18:59:52 +02:00
|
|
|
struct bloom_key;
|
|
|
|
struct bloom_filter_settings;
|
2018-05-19 07:28:24 +02:00
|
|
|
define_shared_commit_slab(revision_sources, char *);
|
2006-03-10 10:21:39 +01:00
|
|
|
|
2011-08-26 02:35:39 +02:00
|
|
|
struct rev_cmdline_info {
|
|
|
|
unsigned int nr;
|
|
|
|
unsigned int alloc;
|
|
|
|
struct rev_cmdline_entry {
|
|
|
|
struct object *item;
|
|
|
|
const char *name;
|
|
|
|
enum {
|
|
|
|
REV_CMD_REF,
|
|
|
|
REV_CMD_PARENTS_ONLY,
|
|
|
|
REV_CMD_LEFT,
|
|
|
|
REV_CMD_RIGHT,
|
2013-05-13 17:00:47 +02:00
|
|
|
REV_CMD_MERGE_BASE,
|
2011-08-26 02:35:39 +02:00
|
|
|
REV_CMD_REV
|
|
|
|
} whence;
|
|
|
|
unsigned flags;
|
|
|
|
} *rev;
|
|
|
|
};
|
|
|
|
|
teach log --no-walk=unsorted, which avoids sorting
When 'git log' is passed the --no-walk option, no revision walk takes
place, naturally. Perhaps somewhat surprisingly, however, the provided
revisions still get sorted by commit date. So e.g 'git log --no-walk
HEAD HEAD~1' and 'git log --no-walk HEAD~1 HEAD' give the same result
(unless the two revisions share the commit date, in which case they
will retain the order given on the command line). As the commit that
introduced --no-walk (8e64006 (Teach revision machinery about
--no-walk, 2007-07-24)) points out, the sorting is intentional, to
allow things like
git log --abbrev-commit --pretty=oneline --decorate --all --no-walk
to show all refs in order by commit date.
But there are also other cases where the sorting is not wanted, such
as
<command producing revisions in order> |
git log --oneline --no-walk --stdin
To accomodate both cases, leave the decision of whether or not to sort
up to the caller, by allowing --no-walk={sorted,unsorted}, defaulting
to 'sorted' for backward-compatibility reasons.
Signed-off-by: Martin von Zweigbergk <martinvonz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-29 08:15:54 +02:00
|
|
|
#define REVISION_WALK_WALK 0
|
|
|
|
#define REVISION_WALK_NO_WALK_SORTED 1
|
|
|
|
#define REVISION_WALK_NO_WALK_UNSORTED 2
|
|
|
|
|
2019-01-16 19:25:58 +01:00
|
|
|
struct oidset;
|
revision.c: begin refactoring --topo-order logic
When running 'git rev-list --topo-order' and its kin, the topo_order
setting in struct rev_info implies the limited setting. This means
that the following things happen during prepare_revision_walk():
* revs->limited implies we run limit_list() to walk the entire
reachable set. There are some short-cuts here, such as if we
perform a range query like 'git rev-list COMPARE..HEAD' and we
can stop limit_list() when all queued commits are uninteresting.
* revs->topo_order implies we run sort_in_topological_order(). See
the implementation of that method in commit.c. It implies that
the full set of commits to order is in the given commit_list.
These two methods imply that a 'git rev-list --topo-order HEAD'
command must walk the entire reachable set of commits _twice_ before
returning a single result.
If we have a commit-graph file with generation numbers computed, then
there is a better way. This patch introduces some necessary logic
redirection when we are in this situation.
In v2.18.0, the commit-graph file contains zero-valued bytes in the
positions where the generation number is stored in v2.19.0 and later.
Thus, we use generation_numbers_enabled() to check if the commit-graph
is available and has non-zero generation numbers.
When setting revs->limited only because revs->topo_order is true,
only do so if generation numbers are not available. There is no
reason to use the new logic as it will behave similarly when all
generation numbers are INFINITY or ZERO.
In prepare_revision_walk(), if we have revs->topo_order but not
revs->limited, then we trigger the new logic. It breaks the logic
into three pieces, to fit with the existing framework:
1. init_topo_walk() fills a new struct topo_walk_info in the rev_info
struct. We use the presence of this struct as a signal to use the
new methods during our walk. In this patch, this method simply
calls limit_list() and sort_in_topological_order(). In the future,
this method will set up a new data structure to perform that logic
in-line.
2. next_topo_commit() provides get_revision_1() with the next topo-
ordered commit in the list. Currently, this simply pops the commit
from revs->commits.
3. expand_topo_walk() provides get_revision_1() with a way to signal
walking beyond the latest commit. Currently, this calls
add_parents_to_list() exactly like the old logic.
While this commit presents method redirection for performing the
exact same logic as before, it allows the next commit to focus only
on the new logic.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-01 14:46:20 +01:00
|
|
|
struct topo_walk_info;
|
|
|
|
|
2006-02-26 01:19:46 +01:00
|
|
|
struct rev_info {
|
|
|
|
/* Starting list */
|
|
|
|
struct commit_list *commits;
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 02:42:35 +02:00
|
|
|
struct object_array pending;
|
2018-09-21 17:57:38 +02:00
|
|
|
struct repository *repo;
|
2006-02-26 01:19:46 +01:00
|
|
|
|
revision walker: Fix --boundary when limited
This cleans up the boundary processing in the commit walker. It
- rips out the boundary logic from the commit walker. Placing
"negative" commits in the revs->commits list was Ok if all we
cared about "boundary" was the UNINTERESTING limiting case,
but conceptually it was wrong.
- makes get_revision_1() function to walk the commits and return
the results as if there is no funny postprocessing flags such
as --reverse, --skip nor --max-count.
- makes get_revision() function the postprocessing phase:
If reverse is given, wait for get_revision_1() to give
everything that it would normally give, and then reverse it
before consuming.
If skip is given, skip that many before going further.
If max is given, stop when we gave out that many.
Now that we are about to return one positive commit, mark
the parents of that commit to be potential boundaries
before returning, iff we are doing the boundary processing.
Return the commit.
- After get_revision() finishes giving out all the positive
commits, if we are doing the boundary processing, we look at
the parents that we marked as potential boundaries earlier,
see if they are really boundaries, and give them out.
It loses more code than it adds, even when the new gc_boundary()
function, which is purely for early optimization, is counted.
Note that this patch is purely for eyeballing and discussion
only. It breaks git-bundle's verify logic because the logic
does not use BOUNDARY_SHOW flag for its internal computation
anymore. After we correct it not to attempt to affect the
boundary processing by setting the BOUNDARY_SHOW flag, we can
remove BOUNDARY_SHOW from revision.h and use that bit assignment
for the new CHILD_SHOWN flag.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-05 22:10:06 +01:00
|
|
|
/* Parents of shown commits */
|
|
|
|
struct object_array boundary_commits;
|
|
|
|
|
2011-08-26 02:35:39 +02:00
|
|
|
/* The end-points specified by the end user */
|
|
|
|
struct rev_cmdline_info cmdline;
|
|
|
|
|
revision: introduce --exclude=<glob> to tame wildcards
People often find "git log --branches" etc. that includes _all_
branches is cumbersome to use when they want to grab most but except
some. The same applies to --tags, --all and --glob.
Teach the revision machinery to remember patterns, and then upon the
next such a globbing option, exclude those that match the pattern.
With this, I can view only my integration branches (e.g. maint,
master, etc.) without topic branches, which are named after two
letters from primary authors' names, slash and topic name.
git rev-list --no-walk --exclude=??/* --branches |
git name-rev --refs refs/heads/* --stdin
This one shows things reachable from local and remote branches that
have not been merged to the integration branches.
git log --remotes --branches --not --exclude=??/* --branches
It may be a bit rough around the edges, in that the pattern to give
the exclude option depends on what globbing option follows. In
these examples, the pattern "??/*" is used, not "refs/heads/??/*",
because the globbing option that follows the -"-exclude=<pattern>"
is "--branches". As each use of globbing option resets previously
set "--exclude", this may not be such a bad thing, though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-31 01:37:55 +02:00
|
|
|
/* excluding from --branches, --refs, etc. expansion */
|
|
|
|
struct string_list *ref_excludes;
|
|
|
|
|
2006-02-26 01:19:46 +01:00
|
|
|
/* Basic information */
|
|
|
|
const char *prefix;
|
2008-07-08 15:19:33 +02:00
|
|
|
const char *def;
|
2010-12-17 13:43:06 +01:00
|
|
|
struct pathspec prune_data;
|
toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
|
|
|
|
2017-08-03 00:25:27 +02:00
|
|
|
/*
|
|
|
|
* Whether the arguments parsed by setup_revisions() included any
|
|
|
|
* "input" revisions that might still have yielded an empty pending
|
|
|
|
* list (e.g., patterns like "--all" or "--glob").
|
|
|
|
*/
|
|
|
|
int rev_input_given;
|
|
|
|
|
rev-list: make empty --stdin not an error
When we originally did the series that contains 7ba826290a
(revision: add rev_input_given flag, 2017-08-02) the intent
was that "git rev-list --stdin </dev/null" would similarly
become a successful noop. However, an attempt at the time to
do that did not work[1]. The problem is that rev_input_given
serves two roles:
- it tells rev-list.c that it should not error out
- it tells revision.c that it should not have the "default"
ref kick (e.g., "HEAD" in "git log")
We want to trigger the former, but not the latter. This is
technically possible with a single flag, if we set the flag
only after revision.c's revs->def check. But this introduces
a rather subtle ordering dependency.
Instead, let's keep two flags: one to denote when we got
actual input (which triggers both roles) and one for when we
read stdin (which triggers only the first).
This does mean a caller interested in the first role has to
check both flags, but there's only one such caller. And any
future callers might want to make the distinction anyway
(e.g., if they care less about erroring out, and more about
whether revision.c soaked up our stdin).
In fact, we already keep such a flag internally in
revision.c for this purpose, so this is really just exposing
that to the caller (and the old function-local flag can go
away in favor of our new one).
[1] https://public-inbox.org/git/20170802223416.gwiezhbuxbdmbjzx@sigill.intra.peff.net/
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 23:37:23 +02:00
|
|
|
/*
|
|
|
|
* Whether we read from stdin due to the --stdin option.
|
|
|
|
*/
|
|
|
|
int read_from_stdin;
|
|
|
|
|
toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
|
|
|
/* topo-sort */
|
|
|
|
enum rev_sort_order sort_order;
|
|
|
|
|
2017-06-10 13:41:01 +02:00
|
|
|
unsigned int early_output;
|
|
|
|
|
|
|
|
unsigned int ignore_missing:1,
|
add `ignore_missing_links` mode to revwalk
When pack-objects is computing the reachability bitmap to
serve a fetch request, it can erroneously die() if some of
the UNINTERESTING objects are not present. Upload-pack
throws away HAVE lines from the client for objects we do not
have, but we may have a tip object without all of its
ancestors (e.g., if the tip is no longer reachable and was
new enough to survive a `git prune`, but some of its
reachable objects did get pruned).
In the non-bitmap case, we do a revision walk with the HAVE
objects marked as UNINTERESTING. The revision walker
explicitly ignores errors in accessing UNINTERESTING commits
to handle this case (and we do not bother looking at
UNINTERESTING trees or blobs at all).
When we have bitmaps, however, the process is quite
different. The bitmap index for a pack-objects run is
calculated in two separate steps:
First, we perform an extensive walk from all the HAVEs to
find the full set of objects reachable from them. This walk
is usually optimized away because we are expected to hit an
object with a bitmap during the traversal, which allows us
to terminate early.
Secondly, we perform an extensive walk from all the WANTs,
which usually also terminates early because we hit a commit
with an existing bitmap.
Once we have the resulting bitmaps from the two walks, we
AND-NOT them together to obtain the resulting set of objects
we need to pack.
When we are walking the HAVE objects, the revision walker
does not know that we are walking it only to mark the
results as uninteresting. We strip out the UNINTERESTING flag,
because those objects _are_ interesting to us during the
first walk. We want to keep going to get a complete set of
reachable objects if we can.
We need some way to tell the revision walker that it's OK to
silently truncate the HAVE walk, just like it does for the
UNINTERESTING case. This patch introduces a new
`ignore_missing_links` flag to the `rev_info` struct, which
we set only for the HAVE walk.
It also adds tests to cover UNINTERESTING objects missing
from several positions: a missing blob, a missing tree, and
a missing parent commit. The missing blob already worked (as
we do not care about its contents at all), but the other two
cases caused us to die().
Note that there are a few cases we do not need to test:
1. We do not need to test a missing tree, with the blob
still present. Without the tree that refers to it, we
would not know that the blob is relevant to our walk.
2. We do not need to test a tip commit that is missing.
Upload-pack omits these for us (and in fact, we
complain even in the non-bitmap case if it fails to do
so).
Reported-by: Siddharth Agarwal <sid0@fb.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-28 11:00:43 +01:00
|
|
|
ignore_missing_links:1;
|
2007-11-03 19:11:10 +01:00
|
|
|
|
2006-02-26 01:19:46 +01:00
|
|
|
/* Traversal flags */
|
|
|
|
unsigned int dense:1,
|
2007-11-05 22:22:34 +01:00
|
|
|
prune:1,
|
teach log --no-walk=unsorted, which avoids sorting
When 'git log' is passed the --no-walk option, no revision walk takes
place, naturally. Perhaps somewhat surprisingly, however, the provided
revisions still get sorted by commit date. So e.g 'git log --no-walk
HEAD HEAD~1' and 'git log --no-walk HEAD~1 HEAD' give the same result
(unless the two revisions share the commit date, in which case they
will retain the order given on the command line). As the commit that
introduced --no-walk (8e64006 (Teach revision machinery about
--no-walk, 2007-07-24)) points out, the sorting is intentional, to
allow things like
git log --abbrev-commit --pretty=oneline --decorate --all --no-walk
to show all refs in order by commit date.
But there are also other cases where the sorting is not wanted, such
as
<command producing revisions in order> |
git log --oneline --no-walk --stdin
To accomodate both cases, leave the decision of whether or not to sort
up to the caller, by allowing --no-walk={sorted,unsorted}, defaulting
to 'sorted' for backward-compatibility reasons.
Signed-off-by: Martin von Zweigbergk <martinvonz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-29 08:15:54 +02:00
|
|
|
no_walk:2,
|
2006-02-26 01:19:46 +01:00
|
|
|
remove_empty_trees:1,
|
2006-06-11 19:57:35 +02:00
|
|
|
simplify_history:1,
|
revision: --show-pulls adds helpful merges
The default file history simplification of "git log -- <path>" or
"git rev-list -- <path>" focuses on providing the smallest set of
commits that first contributed a change. The revision walk greatly
restricts the set of walked commits by visiting only the first
TREESAME parent of a merge commit, when one exists. This means
that portions of the commit-graph are not walked, which can be a
performance benefit, but can also "hide" commits that added changes
but were ignored by a merge resolution.
The --full-history option modifies this by walking all commits and
reporting a merge commit as "interesting" if it has _any_ parent
that is not TREESAME. This tends to be an over-representation of
important commits, especially in an environment where most merge
commits are created by pull request completion.
Suppose we have a commit A and we create a commit B on top that
changes our file. When we merge the pull request, we create a merge
commit M. If no one else changed the file in the first-parent
history between M and A, then M will not be TREESAME to its first
parent, but will be TREESAME to B. Thus, the simplified history
will be "B". However, M will appear in the --full-history mode.
However, suppose that a number of topics T1, T2, ..., Tn were
created based on commits C1, C2, ..., Cn between A and M as
follows:
A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
\ \ \ \ / / / /
\ \__.. \ \/ ..__T1 / Tn
\ \__.. /\ ..__T2 /
\_____________________B \____________________/
If the commits T1, T2, ... Tn did not change the file, then all of
P1 through Pn will be TREESAME to their first parent, but not
TREESAME to their second. This means that all of those merge commits
appear in the --full-history view, with edges that immediately
collapse into the lower history without introducing interesting
single-parent commits.
The --simplify-merges option was introduced to remove these extra
merge commits. By noticing that the rewritten parents are reachable
from their first parents, those edges can be simplified away. Finally,
the commits now look like single-parent commits that are TREESAME to
their "only" parent. Thus, they are removed and this issue does not
cause issues anymore. However, this also ends up removing the commit
M from the history view! Even worse, the --simplify-merges option
requires walking the entire history before returning a single result.
Many Git users are using Git alongside a Git service that provides
code storage alongside a code review tool commonly called "Pull
Requests" or "Merge Requests" against a target branch. When these
requests are accepted and merged, they typically create a merge
commit whose first parent is the previous branch tip and the second
parent is the tip of the topic branch used for the request. This
presents a valuable order to the parents, but also makes that merge
commit slightly special. Users may want to see not only which
commits changed a file, but which pull requests merged those commits
into their branch. In the previous example, this would mean the
users want to see the merge commit "M" in addition to the single-
parent commit "C".
Users are even more likely to want these merge commits when they
use pull requests to merge into a feature branch before merging that
feature branch into their trunk.
In some sense, users are asking for the "first" merge commit to
bring in the change to their branch. As long as the parent order is
consistent, this can be handled with the following rule:
Include a merge commit if it is not TREESAME to its first
parent, but is TREESAME to a later parent.
These merges look like the merge commits that would result from
running "git pull <topic>" on a main branch. Thus, the option to
show these commits is called "--show-pulls". This has the added
benefit of showing the commits created by closing a pull request or
merge request on any of the Git hosting and code review platforms.
To test these options, extend the standard test example to include
a merge commit that is not TREESAME to its first parent. It is
surprising that that option was not already in the example, as it
is instructive.
In particular, this extension demonstrates a common issue with file
history simplification. When a user resolves a merge conflict using
"-Xours" or otherwise ignoring one side of the conflict, they create
a TREESAME edge that probably should not be TREESAME. This leads
users to become frustrated and complain that "my change disappeared!"
In my experience, showing them history with --full-history and
--simplify-merges quickly reveals the problematic merge. As mentioned,
this option is expensive to compute. The --show-pulls option
_might_ show the merge commit (usually titled "resolving conflicts")
more quickly. Of course, this depends on the user having the correct
parent order, which is backwards when using "git pull master" from a
topic branch.
There are some special considerations when combining the --show-pulls
option with --simplify-merges. This requires adding a new PULL_MERGE
object flag to store the information from the initial TREESAME
comparisons. This helps avoid dropping those commits in later filters.
This is covered by a test, including how the parents can be simplified.
Since "struct object" has already ruined its 32-bit alignment by using
33 bits across parsed, type, and flags member, let's not make it worse.
PULL_MERGE is used in revision.c with the same value (1u<<15) as
REACHABLE in commit-graph.c. The REACHABLE flag is only used when
writing a commit-graph file, and a revision walk using --show-pulls
does not happen in the same process. Care must be taken in the future
to ensure this remains the case.
Update Documentation/rev-list-options.txt with significant details
around this option. This requires updating the example in the
History Simplification section to demonstrate some of the problems
with TREESAME second parents.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-10 14:19:43 +02:00
|
|
|
show_pulls:1,
|
2006-02-26 01:19:46 +01:00
|
|
|
topo_order:1,
|
revision traversal: show full history with merge simplification
The --full-history traversal keeps all merges in addition to non-merge
commits that touch paths in the given pathspec. This is useful to view
both sides of a merge in a topology like this:
A---M---o
/ /
---O---B
even when A and B makes identical change to the given paths. The revision
traversal without --full-history aims to come up with the simplest history
to explain the final state of the tree, and one of the side branches can
be pruned away.
The behaviour to keep all merges however is inconvenient if neither A nor
B touches the paths we are interested in. --full-history reduces the
topology to:
---O---M---o
in such a case, without removing M.
This adds a post processing phase on top of --full-history traversal to
remove needless merges from the resulting history.
The idea is to compute, for each commit in the "full history" result set,
the commit that should replace it in the simplified history. The commit
to replace it in the final history is determined as follows:
* In any case, we first figure out the replacement commits of parents of
the commit we are looking at. The commit we are looking at is
rewritten as if the replacement commits of its original parents are its
parents. While doing so, we reduce the redundant parents from the
rewritten parent list by not just removing the identical ones, but also
removing a parent that is an ancestor of another parent.
* After the above parent simplification, if the commit is a root commit,
an UNINTERESTING commit, a merge commit, or modifies the paths we are
interested in, then the replacement commit of the commit is itself. In
other words, such a commit is not dropped from the final result.
The first point above essentially means that the history is rewritten in
the bottom up direction. We can rewrite the parent list of a commit only
after we know how all of its parents are rewritten. This means that the
processing needs to happen on the full history (i.e. after limit_list()).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 10:17:41 +02:00
|
|
|
simplify_merges:1,
|
2008-11-03 20:25:46 +01:00
|
|
|
simplify_by_decoration:1,
|
2017-08-23 14:36:49 +02:00
|
|
|
single_worktree:1,
|
2006-02-26 01:19:46 +01:00
|
|
|
tag_objects:1,
|
|
|
|
tree_objects:1,
|
|
|
|
blob_objects:1,
|
2011-09-02 00:43:34 +02:00
|
|
|
verify_objects:1,
|
2006-02-27 17:54:36 +01:00
|
|
|
edge_hint:1,
|
2014-12-25 00:05:39 +01:00
|
|
|
edge_hint_aggressive:1,
|
2006-02-27 17:54:36 +01:00
|
|
|
limited:1,
|
2009-02-28 09:00:21 +01:00
|
|
|
unpacked:1,
|
revision walker: Fix --boundary when limited
This cleans up the boundary processing in the commit walker. It
- rips out the boundary logic from the commit walker. Placing
"negative" commits in the revs->commits list was Ok if all we
cared about "boundary" was the UNINTERESTING limiting case,
but conceptually it was wrong.
- makes get_revision_1() function to walk the commits and return
the results as if there is no funny postprocessing flags such
as --reverse, --skip nor --max-count.
- makes get_revision() function the postprocessing phase:
If reverse is given, wait for get_revision_1() to give
everything that it would normally give, and then reverse it
before consuming.
If skip is given, skip that many before going further.
If max is given, stop when we gave out that many.
Now that we are about to return one positive commit, mark
the parents of that commit to be potential boundaries
before returning, iff we are doing the boundary processing.
Return the commit.
- After get_revision() finishes giving out all the positive
commits, if we are doing the boundary processing, we look at
the parents that we marked as potential boundaries earlier,
see if they are really boundaries, and give them out.
It loses more code than it adds, even when the new gc_boundary()
function, which is purely for early optimization, is counted.
Note that this patch is purely for eyeballing and discussion
only. It breaks git-bundle's verify logic because the logic
does not use BOUNDARY_SHOW flag for its internal computation
anymore. After we correct it not to attempt to affect the
boundary processing by setting the BOUNDARY_SHOW flag, we can
remove BOUNDARY_SHOW from revision.h and use that bit assignment
for the new CHILD_SHOWN flag.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-05 22:10:06 +01:00
|
|
|
boundary:2,
|
2010-06-10 13:47:23 +02:00
|
|
|
count:1,
|
2006-12-17 00:31:25 +01:00
|
|
|
left_right:1,
|
2011-02-21 17:09:11 +01:00
|
|
|
left_only:1,
|
|
|
|
right_only:1,
|
2008-05-04 12:36:52 +02:00
|
|
|
rewrite_parents:1,
|
|
|
|
print_parents:1,
|
2008-11-03 20:23:57 +01:00
|
|
|
show_decorations:1,
|
2007-03-13 09:57:22 +01:00
|
|
|
reverse:1,
|
2008-08-29 21:18:38 +02:00
|
|
|
reverse_output_stage:1,
|
2007-04-09 12:40:38 +02:00
|
|
|
cherry_pick:1,
|
2011-03-07 13:31:40 +01:00
|
|
|
cherry_mark:1,
|
2009-10-27 19:28:07 +01:00
|
|
|
bisect:1,
|
revision: --ancestry-path
"rev-list A..H" computes the set of commits that are ancestors of H, but
excludes the ones that are ancestors of A. This is useful to see what
happened to the history leading to H since A, in the sense that "what does
H have that did not exist in A" (e.g. when you have a choice to update to
H from A).
x---x---A---B---C <-- topic
/ \
x---x---x---o---o---o---o---M---D---E---F---G <-- dev
/ \
x---o---o---o---o---o---o---o---o---o---o---o---N---H <-- master
The result in the above example would be the commits marked with caps
letters (except for A itself, of course), and the ones marked with 'o'.
When you want to find out what commits in H are contaminated with the bug
introduced by A and need fixing, however, you might want to view only the
subset of "A..B" that are actually descendants of A, i.e. excluding the
ones marked with 'o'. Introduce a new option --ancestry-path to compute
this set with "rev-list --ancestry-path A..B".
Note that in practice, you would build a fix immediately on top of A and
"git branch --contains A" will give the names of branches that you would
need to merge the fix into (i.e. topic, dev and master), so this may not
be worth paying the extra cost of postprocessing.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-20 22:48:39 +02:00
|
|
|
ancestry_path:1,
|
Implement line-history search (git log -L)
This is a rewrite of much of Bo's work, mainly in an effort to split
it into smaller, easier to understand routines.
The algorithm is built around the struct range_set, which encodes a
series of line ranges as intervals [a,b). This is used in two
contexts:
* A set of lines we are tracking (which will change as we dig through
history).
* To encode diffs, as pairs of ranges.
The main routine is range_set_map_across_diff(). It processes the
diff between a commit C and some parent P. It determines which diff
hunks are relevant to the ranges tracked in C, and computes the new
ranges for P.
The algorithm is then simply to process history in topological order
from newest to oldest, computing ranges and (partial) diffs. At
branch points, we need to merge the ranges we are watching. We will
find that many commits do not affect the chosen ranges, and mark them
TREESAME (in addition to those already filtered by pathspec limiting).
Another pass of history simplification then gets rid of such commits.
This is wired as an extra filtering pass in the log machinery. This
currently only reduces code duplication, but should allow for other
simplifications and options to be used.
Finally, we hook a diff printer into the output chain. Ideally we
would wire directly into the diff logic, to optionally use features
like word diff. However, that will require some major reworking of
the diff chain, so we completely replace the output with our own diff
for now.
As this was a GSoC project, and has quite some history by now, many
people have helped. In no particular order, thanks go to
Jakub Narebski <jnareb@gmail.com>
Jens Lehmann <Jens.Lehmann@web.de>
Jonathan Nieder <jrnieder@gmail.com>
Junio C Hamano <gitster@pobox.com>
Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Will Palmer <wmpalmer@gmail.com>
Apologies to everyone I forgot.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 17:47:32 +01:00
|
|
|
first_parent_only:1,
|
2017-11-16 03:00:35 +01:00
|
|
|
line_level_traverse:1,
|
2018-02-13 22:39:03 +01:00
|
|
|
tree_blobs_in_commit_order:1,
|
2017-12-08 16:27:15 +01:00
|
|
|
|
2018-10-05 23:31:23 +02:00
|
|
|
/*
|
|
|
|
* Blobs are shown without regard for their existence.
|
|
|
|
* But not so for trees: unless exclude_promisor_objects
|
|
|
|
* is set and the tree in question is a promisor object;
|
|
|
|
* OR ignore_missing_links is set, the revision walker
|
|
|
|
* dies with a "bad tree object HASH" message when
|
|
|
|
* encountering a missing tree. For callers that can
|
|
|
|
* handle missing trees and want them to be filterable
|
|
|
|
* and showable, set this to true. The revision walker
|
|
|
|
* will filter and show such a missing tree as usual,
|
|
|
|
* but will not attempt to recurse into this tree
|
|
|
|
* object.
|
|
|
|
*/
|
|
|
|
do_not_die_on_missing_tree:1,
|
|
|
|
|
2017-12-08 16:27:15 +01:00
|
|
|
/* for internal use only */
|
|
|
|
exclude_promisor_objects:1;
|
2006-02-26 01:19:46 +01:00
|
|
|
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
/* Diff flags */
|
|
|
|
unsigned int diff:1,
|
|
|
|
full_diff:1,
|
|
|
|
show_root_diff:1,
|
2020-08-31 22:14:22 +02:00
|
|
|
match_missing:1,
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
no_commit_id:1,
|
|
|
|
verbose_header:1,
|
|
|
|
combine_merges:1,
|
log,diff-tree: add --combined-all-paths option
The combined diff format for merges will only list one filename, even if
rename or copy detection is active. For example, with raw format one
might see:
::100644 100644 100644 fabadb8 cc95eb0 4866510 MM describe.c
::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM bar.sh
::100644 100644 100644 e07d6c5 9042e82 ee91881 RR phooey.c
This doesn't let us know what the original name of bar.sh was in the
first parent, and doesn't let us know what either of the original names
of phooey.c were in either of the parents. In contrast, for non-merge
commits, raw format does provide original filenames (and a rename score
to boot). In order to also provide original filenames for merge
commits, add a --combined-all-paths option (which must be used with
either -c or --cc, and is likely only useful with rename or copy
detection active) so that we can print tab-separated filenames when
renames are involved. This transforms the above output to:
::100644 100644 100644 fabadb8 cc95eb0 4866510 MM desc.c desc.c desc.c
::100755 100755 100755 52b7a2d 6d1ac04 d2ac7d7 RM foo.sh bar.sh bar.sh
::100644 100644 100644 e07d6c5 9042e82 ee91881 RR fooey.c fuey.c phooey.c
Further, in patch format, this changes the from/to headers so that
instead of just having one "from" header, we get one for each parent.
For example, instead of having
--- a/phooey.c
+++ b/phooey.c
we would see
--- a/fooey.c
--- a/fuey.c
+++ b/phooey.c
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-08 02:12:46 +01:00
|
|
|
combined_all_paths:1,
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
dense_combined_merges:1,
|
|
|
|
always_show_header:1;
|
2020-07-29 22:10:20 +02:00
|
|
|
int ignore_merges:2;
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
|
|
|
|
/* Format info */
|
notes: break set_display_notes() into smaller functions
In 8164c961e1 (format-patch: use --notes behavior for format.notes,
2019-12-09), we introduced set_display_notes() which was a monolithic
function with three mutually exclusive branches. Break the function up
into three small and simple functions that each are only responsible for
one task.
This family of functions accepts an `int *show_notes` instead of
returning a value suitable for assignment to `show_notes`. This is for
two reasons. First of all, this guarantees that the external
`show_notes` variable changes in lockstep with the
`struct display_notes_opt`. Second, this prompts future developers to be
careful about doing something meaningful with this value. In fact, a
NULL check is intentionally omitted because causing a segfault here
would tell the future developer that they are meant to use the value for
something meaningful.
One alternative was making the family of functions accept a
`struct rev_info *` instead of the `struct display_notes_opt *`, since
the former contains the `show_notes` field as well. This does not work
because we have to call git_config() before repo_init_revisions().
However, if we had a `struct rev_info`, we'd need to initialize it before
it gets assigned values from git_config(). As a result, we break the
circular dependency by having standalone `int show_notes` and
`struct display_notes_opt notes_opt` variables which temporarily hold
values from git_config() until the values are copied over to `rev`.
To implement this change, we need to get a pointer to
`rev_info::show_notes`. Unfortunately, this is not possible with
bitfields and only direct-assignment is possible. Change
`rev_info::show_notes` to a non-bitfield int so that we can get its
address.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-12 01:49:50 +01:00
|
|
|
int show_notes;
|
Log message printout cleanups
On Sun, 16 Apr 2006, Junio C Hamano wrote:
>
> In the mid-term, I am hoping we can drop the generate_header()
> callchain _and_ the custom code that formats commit log in-core,
> found in cmd_log_wc().
Ok, this was nastier than expected, just because the dependencies between
the different log-printing stuff were absolutely _everywhere_, but here's
a patch that does exactly that.
The patch is not very easy to read, and the "--patch-with-stat" thing is
still broken (it does not call the "show_log()" thing properly for
merges). That's not a new bug. In the new world order it _should_ do
something like
if (rev->logopt)
show_log(rev, rev->logopt, "---\n");
but it doesn't. I haven't looked at the --with-stat logic, so I left it
alone.
That said, this patch removes more lines than it adds, and in particular,
the "cmd_log_wc()" loop is now a very clean:
while ((commit = get_revision(rev)) != NULL) {
log_tree_commit(rev, commit);
free(commit->buffer);
commit->buffer = NULL;
}
so it doesn't get much prettier than this. All the complexity is entirely
hidden in log-tree.c, and any code that needs to flush the log literally
just needs to do the "if (rev->logopt) show_log(...)" incantation.
I had to make the combined_diff() logic take a "struct rev_info" instead
of just a "struct diff_options", but that part is pretty clean.
This does change "git whatchanged" from using "diff-tree" as the commit
descriptor to "commit", and I changed one of the tests to reflect that new
reality. Otherwise everything still passes, and my other tests look fine
too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
|
|
|
unsigned int shown_one:1,
|
2012-10-18 06:27:22 +02:00
|
|
|
shown_dashes:1,
|
2008-07-08 15:19:33 +02:00
|
|
|
show_merge:1,
|
2010-01-20 22:59:36 +01:00
|
|
|
show_notes_given:1,
|
2011-10-19 00:53:23 +02:00
|
|
|
show_signature:1,
|
2010-01-20 22:59:36 +01:00
|
|
|
pretty_given:1,
|
2008-04-08 02:11:34 +02:00
|
|
|
abbrev_commit:1,
|
2011-05-18 19:56:04 +02:00
|
|
|
abbrev_commit_given:1,
|
2015-12-15 02:52:04 +01:00
|
|
|
zero_commit:1,
|
2008-05-04 12:36:54 +02:00
|
|
|
use_terminator:1,
|
2009-09-24 10:28:15 +02:00
|
|
|
missing_newline:1,
|
2011-05-27 00:28:17 +02:00
|
|
|
date_mode_explicit:1,
|
2020-04-08 06:31:38 +02:00
|
|
|
preserve_subject:1,
|
|
|
|
encode_email_headers:1;
|
2009-11-03 15:59:18 +01:00
|
|
|
unsigned int disable_stdin:1;
|
2014-03-25 14:23:27 +01:00
|
|
|
/* --show-linear-break */
|
|
|
|
unsigned int track_linear:1,
|
|
|
|
track_first_time:1,
|
|
|
|
linear:1;
|
2009-11-03 15:59:18 +01:00
|
|
|
|
convert "enum date_mode" into a struct
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.
Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor. However, the tricky case is where we use the
enum labels as constants, like:
show_date(t, tz, DATE_NORMAL);
Ideally we could say:
show_date(t, tz, &{ DATE_NORMAL });
but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:
1. Manually add a "struct date_mode d = { DATE_NORMAL }"
definition to each caller, and pass "&d". This makes
the callers uglier, because they sometimes do not even
have their own scope (e.g., they are inside a switch
statement).
2. Provide a pre-made global "date_normal" struct that can
be passed by address. We'd also need "date_rfc2822",
"date_iso8601", and so forth. But at least the ugliness
is defined in one place.
3. Provide a wrapper that generates the correct struct on
the fly. The big downside is that we end up pointing to
a single global, which makes our wrapper non-reentrant.
But show_date is already not reentrant, so it does not
matter.
This patch implements 3, along with a minor macro to keep
the size of the callers sane.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
|
|
|
struct date_mode date_mode;
|
2016-03-30 00:49:24 +02:00
|
|
|
int expand_tabs_in_log; /* unset if negative */
|
|
|
|
int expand_tabs_in_log_default;
|
2006-09-06 11:12:09 +02:00
|
|
|
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
unsigned int abbrev;
|
|
|
|
enum cmit_fmt commit_format;
|
Log message printout cleanups
On Sun, 16 Apr 2006, Junio C Hamano wrote:
>
> In the mid-term, I am hoping we can drop the generate_header()
> callchain _and_ the custom code that formats commit log in-core,
> found in cmd_log_wc().
Ok, this was nastier than expected, just because the dependencies between
the different log-printing stuff were absolutely _everywhere_, but here's
a patch that does exactly that.
The patch is not very easy to read, and the "--patch-with-stat" thing is
still broken (it does not call the "show_log()" thing properly for
merges). That's not a new bug. In the new world order it _should_ do
something like
if (rev->logopt)
show_log(rev, rev->logopt, "---\n");
but it doesn't. I haven't looked at the --with-stat logic, so I left it
alone.
That said, this patch removes more lines than it adds, and in particular,
the "cmd_log_wc()" loop is now a very clean:
while ((commit = get_revision(rev)) != NULL) {
log_tree_commit(rev, commit);
free(commit->buffer);
commit->buffer = NULL;
}
so it doesn't get much prettier than this. All the complexity is entirely
hidden in log-tree.c, and any code that needs to flush the log literally
just needs to do the "if (rev->logopt) show_log(...)" incantation.
I had to make the combined_diff() logic take a "struct rev_info" instead
of just a "struct diff_options", but that part is pretty clean.
This does change "git whatchanged" from using "diff-tree" as the commit
descriptor to "commit", and I changed one of the tests to reflect that new
reality. Otherwise everything still passes, and my other tests look fine
too.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
|
|
|
struct log_info *loginfo;
|
2006-05-05 04:30:52 +02:00
|
|
|
int nr, total;
|
2006-05-20 15:40:29 +02:00
|
|
|
const char *mime_boundary;
|
2009-03-23 03:14:05 +01:00
|
|
|
const char *patch_suffix;
|
|
|
|
int numbered_files;
|
2012-12-22 09:21:23 +01:00
|
|
|
int reroll_count;
|
2008-02-19 04:56:06 +01:00
|
|
|
char *message_id;
|
teach format-patch to place other authors into in-body "From"
Format-patch generates emails with the "From" address set to the
author of each patch. If you are going to send the emails, however,
you would want to replace the author identity with yours (if they
are not the same), and bump the author identity to an in-body
header.
Normally this is handled by git-send-email, which does the
transformation before sending out the emails. However, some
workflows may not use send-email (e.g., imap-send, or a custom
script which feeds the mbox to a non-git MUA). They could each
implement this feature themselves, but getting it right is
non-trivial (one must canonicalize the identities by reversing any
RFC2047 encoding or RFC822 quoting of the headers, which has caused
many bugs in send-email over the years).
This patch takes a different approach: it teaches format-patch a
"--from" option which handles the ident check and in-body header
while it is writing out the email. It's much simpler to do at this
level (because we haven't done any quoting yet), and any workflow
based on format-patch can easily turn it on.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-03 09:08:22 +02:00
|
|
|
struct ident_split from_ident;
|
2009-02-19 22:26:31 +01:00
|
|
|
struct string_list *ref_message_ids;
|
2013-02-12 11:17:38 +01:00
|
|
|
int add_signoff;
|
2006-06-02 15:21:17 +02:00
|
|
|
const char *extra_headers;
|
2006-12-25 20:48:35 +01:00
|
|
|
const char *log_reencode;
|
2007-04-12 01:58:07 +02:00
|
|
|
const char *subject_prefix;
|
format-patch: make output filename configurable
For the past 15 years, we've used the hardcoded 64 as the length
limit of the filename of the output from the "git format-patch"
command. Since the value is shorter than the 80-column terminal, it
could grow without line wrapping a bit. At the same time, since the
value is longer than half of the 80-column terminal, we could fit
two or more of them in "ls" output on such a terminal if we allowed
to lower it.
Introduce a new command line option --filename-max-length=<n> and a
new configuration variable format.filenameMaxLength to override the
hardcoded default.
While we are at it, remove a check that the name of output directory
does not exceed PATH_MAX---this check is pointless in that by the
time control reaches the function, the caller would already have
done an equivalent of "mkdir -p", so if the system does not like an
overly long directory name, the control wouldn't have reached here,
and otherwise, we know that the system allowed the output directory
to exist. In the worst case, we will get an error when we try to
open the output file and handle the error correctly anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-06 22:56:24 +01:00
|
|
|
int patch_name_max;
|
2007-03-04 00:12:06 +01:00
|
|
|
int no_inline;
|
2007-07-20 20:15:13 +02:00
|
|
|
int show_log_size;
|
2013-01-05 22:26:41 +01:00
|
|
|
struct string_list *mailmap;
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
|
2006-09-18 00:43:40 +02:00
|
|
|
/* Filter by commit log message */
|
2008-08-25 08:15:05 +02:00
|
|
|
struct grep_opt grep_filter;
|
log: teach --invert-grep option
"git log --grep=<string>" shows only commits with messages that
match the given string, but sometimes it is useful to be able to
show only commits that do *not* have certain messages (e.g. "show
me ones that are not FIXUP commits").
Originally, we had the invert-grep flag in grep_opt, but because
"git grep --invert-grep" does not make sense except in conjunction
with "--files-with-matches", which is already covered by
"--files-without-matches", it was moved it to revisions structure.
To have the flag there expresses the function to the feature better.
When the newly inserted two tests run, the history would have commits
with messages "initial", "second", "third", "fourth", "fifth", "sixth"
and "Second", committed in this order. The commits that does not match
either "th" or "Sec" is "second" and "initial". For the case insensitive
case only "initial" matches.
Signed-off-by: Christoph Junghans <ottxor@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-13 02:33:32 +01:00
|
|
|
/* Negate the match of grep_filter */
|
|
|
|
int invert_grep;
|
2006-09-18 00:43:40 +02:00
|
|
|
|
2008-05-04 12:36:54 +02:00
|
|
|
/* Display history graph */
|
|
|
|
struct git_graph *graph;
|
|
|
|
|
2006-02-26 01:19:46 +01:00
|
|
|
/* special limits */
|
2006-12-20 03:25:32 +01:00
|
|
|
int skip_count;
|
2006-02-26 01:19:46 +01:00
|
|
|
int max_count;
|
2017-04-26 21:29:31 +02:00
|
|
|
timestamp_t max_age;
|
|
|
|
timestamp_t min_age;
|
2011-03-21 11:14:06 +01:00
|
|
|
int min_parents;
|
|
|
|
int max_parents;
|
2013-10-24 20:01:41 +02:00
|
|
|
int (*include_check)(struct commit *, void *);
|
|
|
|
void *include_check_data;
|
2006-03-10 10:21:39 +01:00
|
|
|
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
/* diff info for patches and for paths limiting */
|
2006-04-11 03:14:54 +02:00
|
|
|
struct diff_options diffopt;
|
Common option parsing for "git log --diff" and friends
This basically does a few things that are sadly somewhat interdependent,
and nontrivial to split out
- get rid of "struct log_tree_opt"
The fields in "log_tree_opt" are moved into "struct rev_info", and all
users of log_tree_opt are changed to use the rev_info struct instead.
- add the parsing for the log_tree_opt arguments to "setup_revision()"
- make setup_revision set a flag (revs->diff) if the diff-related
arguments were used. This allows "git log" to decide whether it wants
to show diffs or not.
- make setup_revision() also initialize the diffopt part of rev_info
(which we had from before, but we just didn't initialize it)
- make setup_revision() do all the "finishing touches" on it all (it will
do the proper flag combination logic, and call "diff_setup_done()")
Now, that was the easy and straightforward part.
The slightly more involved part is that some of the programs that want to
use the new-and-improved rev_info parsing don't actually want _commits_,
they may want tree'ish arguments instead. That meant that I had to change
setup_revision() to parse the arguments not into the "revs->commits" list,
but into the "revs->pending_objects" list.
Then, when we do "prepare_revision_walk()", we walk that list, and create
the sorted commit list from there.
This actually cleaned some stuff up, but it's the less obvious part of the
patch, and re-organized the "revision.c" logic somewhat. It actually paves
the way for splitting argument parsing _entirely_ out of "revision.c",
since now the argument parsing really is totally independent of the commit
walking: that didn't use to be true, since there was lots of overlap with
get_commit_reference() handling etc, now the _only_ overlap is the shared
(and trivial) "add_pending_object()" thing.
However, I didn't do that file split, just because I wanted the diff
itself to be smaller, and show the actual changes more clearly. If this
gets accepted, I'll do further cleanups then - that includes the file
split, but also using the new infrastructure to do a nicer "git diff" etc.
Even in this form, it actually ends up removing more lines than it adds.
It's nice to note how simple and straightforward this makes the built-in
"git log" command, even though it continues to support all the diff flags
too. It doesn't get much simpler that this.
I think this is worth merging soonish, because it does allow for future
cleanup and even more sharing of code. However, it obviously touches
"revision.c", which is subtle. I've tested that it passes all the tests we
have, and it passes my "looks sane" detector, but somebody else should
also give it a good look-over.
[jc: squashed the original and three "oops this too" updates, with
another fix-up.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
|
|
|
struct diff_options pruning;
|
2006-04-11 03:14:54 +02:00
|
|
|
|
2007-01-11 11:47:48 +01:00
|
|
|
struct reflog_walk_info *reflog_info;
|
2008-04-03 11:12:06 +02:00
|
|
|
struct decoration children;
|
2008-08-14 19:59:44 +02:00
|
|
|
struct decoration merge_simplification;
|
revision.c: Make --full-history consider more merges
History simplification previously always treated merges as TREESAME
if they were TREESAME to any parent.
While this was consistent with the default behaviour, this could be
extremely unhelpful when searching detailed history, and could not be
overridden. For example, if a merge had ignored a change, as if by "-s
ours", then:
git log -m -p --full-history -Schange file
would successfully locate "change"'s addition but would not locate the
merge that resolved against it.
Futher, simplify_merges could drop the actual parent that a commit
was TREESAME to, leaving it as a normal commit marked TREESAME that
isn't actually TREESAME to its remaining parent.
Now redefine a commit's TREESAME flag to be true only if a commit is
TREESAME to _all_ of its parents. This doesn't affect either the default
simplify_history behaviour (because partially TREESAME merges are turned
into normal commits), or full-history with parent rewriting (because all
merges are output). But it does affect other modes. The clearest
difference is that --full-history will show more merges - sufficient to
ensure that -m -p --full-history log searches can really explain every
change to the file, including those changes' ultimate fate in merges.
Also modify simplify_merges to recalculate TREESAME after removing
a parent. This is achieved by storing per-parent TREESAME flags on the
initial scan, so the combined flag can be easily recomputed.
This fixes some t6111 failures, but creates a couple of new ones -
we are now showing some merges that don't need to be shown.
Signed-off-by: Kevin Bracey <kevin@bracey.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 17:32:34 +02:00
|
|
|
struct decoration treesame;
|
2010-03-12 18:04:26 +01:00
|
|
|
|
|
|
|
/* notes-specific options: which refs to show */
|
|
|
|
struct display_notes_opt notes_opt;
|
2010-06-10 13:47:23 +02:00
|
|
|
|
2018-07-22 11:57:05 +02:00
|
|
|
/* interdiff */
|
|
|
|
const struct object_id *idiff_oid1;
|
|
|
|
const struct object_id *idiff_oid2;
|
2018-07-22 11:57:06 +02:00
|
|
|
const char *idiff_title;
|
2018-07-22 11:57:05 +02:00
|
|
|
|
2018-07-22 11:57:13 +02:00
|
|
|
/* range-diff */
|
|
|
|
const char *rdiff1;
|
|
|
|
const char *rdiff2;
|
|
|
|
int creation_factor;
|
2018-07-22 11:57:15 +02:00
|
|
|
const char *rdiff_title;
|
2018-07-22 11:57:13 +02:00
|
|
|
|
2010-06-10 13:47:23 +02:00
|
|
|
/* commit counts */
|
|
|
|
int count_left;
|
|
|
|
int count_right;
|
2011-04-26 10:24:29 +02:00
|
|
|
int count_same;
|
Implement line-history search (git log -L)
This is a rewrite of much of Bo's work, mainly in an effort to split
it into smaller, easier to understand routines.
The algorithm is built around the struct range_set, which encodes a
series of line ranges as intervals [a,b). This is used in two
contexts:
* A set of lines we are tracking (which will change as we dig through
history).
* To encode diffs, as pairs of ranges.
The main routine is range_set_map_across_diff(). It processes the
diff between a commit C and some parent P. It determines which diff
hunks are relevant to the ranges tracked in C, and computes the new
ranges for P.
The algorithm is then simply to process history in topological order
from newest to oldest, computing ranges and (partial) diffs. At
branch points, we need to merge the ranges we are watching. We will
find that many commits do not affect the chosen ranges, and mark them
TREESAME (in addition to those already filtered by pathspec limiting).
Another pass of history simplification then gets rid of such commits.
This is wired as an extra filtering pass in the log machinery. This
currently only reduces code duplication, but should allow for other
simplifications and options to be used.
Finally, we hook a diff printer into the output chain. Ideally we
would wire directly into the diff logic, to optionally use features
like word diff. However, that will require some major reworking of
the diff chain, so we completely replace the output with our own diff
for now.
As this was a GSoC project, and has quite some history by now, many
people have helped. In no particular order, thanks go to
Jakub Narebski <jnareb@gmail.com>
Jens Lehmann <Jens.Lehmann@web.de>
Jonathan Nieder <jrnieder@gmail.com>
Junio C Hamano <gitster@pobox.com>
Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Will Palmer <wmpalmer@gmail.com>
Apologies to everyone I forgot.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 17:47:32 +01:00
|
|
|
|
|
|
|
/* line level range that we are chasing */
|
|
|
|
struct decoration line_log_data;
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
|
|
|
|
|
|
|
/* copies of the parent lists, for --full-diff display */
|
|
|
|
struct saved_parents *saved_parents_slab;
|
2014-03-25 14:23:27 +01:00
|
|
|
|
|
|
|
struct commit_list *previous_parents;
|
|
|
|
const char *break_bar;
|
2018-05-19 07:28:24 +02:00
|
|
|
|
|
|
|
struct revision_sources *sources;
|
revision.c: begin refactoring --topo-order logic
When running 'git rev-list --topo-order' and its kin, the topo_order
setting in struct rev_info implies the limited setting. This means
that the following things happen during prepare_revision_walk():
* revs->limited implies we run limit_list() to walk the entire
reachable set. There are some short-cuts here, such as if we
perform a range query like 'git rev-list COMPARE..HEAD' and we
can stop limit_list() when all queued commits are uninteresting.
* revs->topo_order implies we run sort_in_topological_order(). See
the implementation of that method in commit.c. It implies that
the full set of commits to order is in the given commit_list.
These two methods imply that a 'git rev-list --topo-order HEAD'
command must walk the entire reachable set of commits _twice_ before
returning a single result.
If we have a commit-graph file with generation numbers computed, then
there is a better way. This patch introduces some necessary logic
redirection when we are in this situation.
In v2.18.0, the commit-graph file contains zero-valued bytes in the
positions where the generation number is stored in v2.19.0 and later.
Thus, we use generation_numbers_enabled() to check if the commit-graph
is available and has non-zero generation numbers.
When setting revs->limited only because revs->topo_order is true,
only do so if generation numbers are not available. There is no
reason to use the new logic as it will behave similarly when all
generation numbers are INFINITY or ZERO.
In prepare_revision_walk(), if we have revs->topo_order but not
revs->limited, then we trigger the new logic. It breaks the logic
into three pieces, to fit with the existing framework:
1. init_topo_walk() fills a new struct topo_walk_info in the rev_info
struct. We use the presence of this struct as a signal to use the
new methods during our walk. In this patch, this method simply
calls limit_list() and sort_in_topological_order(). In the future,
this method will set up a new data structure to perform that logic
in-line.
2. next_topo_commit() provides get_revision_1() with the next topo-
ordered commit in the list. Currently, this simply pops the commit
from revs->commits.
3. expand_topo_walk() provides get_revision_1() with a way to signal
walking beyond the latest commit. Currently, this calls
add_parents_to_list() exactly like the old logic.
While this commit presents method redirection for performing the
exact same logic as before, it allows the next commit to focus only
on the new logic.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-01 14:46:20 +01:00
|
|
|
|
|
|
|
struct topo_walk_info *topo_walk_info;
|
2020-04-06 18:59:52 +02:00
|
|
|
|
|
|
|
/* Commit graph bloom filter fields */
|
commit-graph: check all leading directories in changed path Bloom filters
The file 'dir/subdir/file' can only be modified if its leading
directories 'dir' and 'dir/subdir' are modified as well.
So when checking modified path Bloom filters looking for commits
modifying a path with multiple path components, then check not only
the full path in the Bloom filters, but all its leading directories as
well. Take care to check these paths in "deepest first" order,
because it's the full path that is least likely to be modified, and
the Bloom filter queries can short circuit sooner.
This can significantly reduce the average false positive rate, by
about an order of magnitude or three(!), and can further speed up
pathspec-limited revision walks. The table below compares the average
false positive rate and runtime of
git rev-list HEAD -- "$path"
before and after this change for 5000+ randomly* selected paths from
each repository:
Average false Average Average
positive rate runtime runtime
before after before after difference
------------------------------------------------------------------
git 3.220% 0.7853% 0.0558s 0.0387s -30.6%
linux 2.453% 0.0296% 0.1046s 0.0766s -26.8%
tensorflow 2.536% 0.6977% 0.0594s 0.0420s -29.2%
*Path selection was done with the following pipeline:
git ls-tree -r --name-only HEAD | sort -R | head -n 5000
The improvements in runtime are much smaller than the improvements in
average false positive rate, as we are clearly reaching diminishing
returns here. However, all these timings depend on that accessing
tree objects is reasonably fast (warm caches). If we had a partial
clone and the tree objects had to be fetched from a promisor remote,
e.g.:
$ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git
$ git -C webkit.git -c core.modifiedPathBloomFilters=1 \
commit-graph write --reachable
$ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/
$ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \
rev-list HEAD -- "$path"
then checking all leading path component can reduce the runtime from
over an hour to a few seconds (and this is with the clone and the
promisor on the same machine).
This adjusts the tracing values in t4216-log-bloom.sh, which provides a
concrete way to notice the improvement.
Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 15:27:30 +02:00
|
|
|
/* The bloom filter key(s) for the pathspec */
|
|
|
|
struct bloom_key *bloom_keys;
|
|
|
|
int bloom_keys_nr;
|
|
|
|
|
2020-04-06 18:59:52 +02:00
|
|
|
/*
|
|
|
|
* The bloom filter settings used to generate the key.
|
|
|
|
* This is loaded from the commit-graph being used.
|
|
|
|
*/
|
|
|
|
struct bloom_filter_settings *bloom_filter_settings;
|
2006-02-26 01:19:46 +01:00
|
|
|
};
|
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
int ref_excluded(struct string_list *, const char *path);
|
2013-11-01 20:02:45 +01:00
|
|
|
void clear_ref_exclusion(struct string_list **);
|
|
|
|
void add_ref_exclusion(struct string_list **, const char *exclude);
|
|
|
|
|
|
|
|
|
2006-03-10 10:21:39 +01:00
|
|
|
#define REV_TREE_SAME 0
|
2009-06-03 03:34:01 +02:00
|
|
|
#define REV_TREE_NEW 1 /* Only new files */
|
|
|
|
#define REV_TREE_OLD 2 /* Only files removed */
|
|
|
|
#define REV_TREE_DIFFERENT 3 /* Mixed changes */
|
2006-03-10 10:21:39 +01:00
|
|
|
|
2006-02-26 01:19:46 +01:00
|
|
|
/* revision.c */
|
2007-11-03 19:11:10 +01:00
|
|
|
typedef void (*show_early_output_fn_t)(struct rev_info *, struct commit_list *);
|
2008-08-21 02:34:30 +02:00
|
|
|
extern volatile show_early_output_fn_t show_early_output;
|
2006-03-10 10:21:39 +01:00
|
|
|
|
2010-03-09 07:58:09 +01:00
|
|
|
struct setup_revision_opt {
|
|
|
|
const char *def;
|
2010-03-09 08:27:25 +01:00
|
|
|
void (*tweak)(struct rev_info *, struct setup_revision_opt *);
|
2018-09-21 17:57:38 +02:00
|
|
|
const char *submodule; /* TODO: drop this and use rev_info->repo */
|
2018-12-03 23:10:19 +01:00
|
|
|
unsigned int assume_dashdash:1,
|
|
|
|
allow_exclude_promisor_objects:1;
|
2012-07-02 21:43:05 +02:00
|
|
|
unsigned revarg_opt;
|
2010-03-09 07:58:09 +01:00
|
|
|
};
|
|
|
|
|
2018-09-21 17:57:38 +02:00
|
|
|
#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
|
|
|
|
#define init_revisions(revs, prefix) repo_init_revisions(the_repository, revs, prefix)
|
|
|
|
#endif
|
2019-11-17 22:04:48 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Initialize a rev_info structure with default values. The third parameter may
|
|
|
|
* be NULL or can be prefix path, and then the `.prefix` variable will be set
|
|
|
|
* to it. This is typically the first function you want to call when you want
|
|
|
|
* to deal with a revision list. After calling this function, you are free to
|
|
|
|
* customize options, like set `.ignore_merges` to 0 if you don't want to
|
|
|
|
* ignore merges, and so on.
|
|
|
|
*/
|
2018-09-21 17:57:38 +02:00
|
|
|
void repo_init_revisions(struct repository *r,
|
|
|
|
struct rev_info *revs,
|
|
|
|
const char *prefix);
|
2019-11-17 22:04:48 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Parse revision information, filling in the `rev_info` structure, and
|
|
|
|
* removing the used arguments from the argument list. Returns the number
|
|
|
|
* of arguments left that weren't recognized, which are also moved to the
|
|
|
|
* head of the argument list. The last parameter is used in case no
|
|
|
|
* parameter given by the first two arguments.
|
|
|
|
*/
|
2018-06-30 11:20:30 +02:00
|
|
|
int setup_revisions(int argc, const char **argv, struct rev_info *revs,
|
|
|
|
struct setup_revision_opt *);
|
2019-11-17 22:04:48 +01:00
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
void parse_revision_opt(struct rev_info *revs, struct parse_opt_ctx_t *ctx,
|
|
|
|
const struct option *options,
|
|
|
|
const char * const usagestr[]);
|
2012-07-02 21:33:52 +02:00
|
|
|
#define REVARG_CANNOT_BE_FILENAME 01
|
2012-07-02 21:43:05 +02:00
|
|
|
#define REVARG_COMMITTISH 02
|
2018-06-30 11:20:30 +02:00
|
|
|
int handle_revision_arg(const char *arg, struct rev_info *revs,
|
|
|
|
int flags, unsigned revarg_opt);
|
2006-09-06 06:28:36 +02:00
|
|
|
|
2019-11-17 22:04:48 +01:00
|
|
|
/**
|
|
|
|
* Reset the flags used by the revision walking api. You can use this to do
|
|
|
|
* multiple sequential revision walks.
|
|
|
|
*/
|
2018-06-30 11:20:30 +02:00
|
|
|
void reset_revision_walk(void);
|
2019-11-17 22:04:48 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Prepares the rev_info structure for a walk. You should check if it returns
|
|
|
|
* any error (non-zero return code) and if it does not, you can start using
|
|
|
|
* get_revision() to do the iteration.
|
|
|
|
*/
|
2018-06-30 11:20:30 +02:00
|
|
|
int prepare_revision_walk(struct rev_info *revs);
|
2019-11-17 22:04:48 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* Takes a pointer to a `rev_info` structure and iterates over it, returning a
|
|
|
|
* `struct commit *` each time you call it. The end of the revision list is
|
|
|
|
* indicated by returning a NULL pointer.
|
|
|
|
*/
|
2018-06-30 11:20:30 +02:00
|
|
|
struct commit *get_revision(struct rev_info *revs);
|
2019-11-17 22:04:48 +01:00
|
|
|
|
2019-11-20 01:51:13 +01:00
|
|
|
const char *get_revision_mark(const struct rev_info *revs,
|
|
|
|
const struct commit *commit);
|
2018-06-30 11:20:30 +02:00
|
|
|
void put_revision_mark(const struct rev_info *revs,
|
|
|
|
const struct commit *commit);
|
2006-02-28 20:24:00 +01:00
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
void mark_parents_uninteresting(struct commit *commit);
|
2018-09-21 17:57:39 +02:00
|
|
|
void mark_tree_uninteresting(struct repository *r, struct tree *tree);
|
2019-01-16 19:25:58 +01:00
|
|
|
void mark_trees_uninteresting_sparse(struct repository *r, struct oidset *trees);
|
2006-02-26 01:19:46 +01:00
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
void show_object_with_name(FILE *, struct object *, const char *);
|
2011-08-17 23:30:34 +02:00
|
|
|
|
2019-11-17 22:04:48 +01:00
|
|
|
/**
|
|
|
|
* This function can be used if you want to add commit objects as revision
|
|
|
|
* information. You can use the `UNINTERESTING` object flag to indicate if
|
|
|
|
* you want to include or exclude the given commit (and commits reachable
|
|
|
|
* from the given commit) from the revision list.
|
|
|
|
*
|
|
|
|
* NOTE: If you have the commits as a string list then you probably want to
|
|
|
|
* use setup_revisions(), instead of parsing each string and using this
|
|
|
|
* function.
|
|
|
|
*/
|
2018-06-30 11:20:30 +02:00
|
|
|
void add_pending_object(struct rev_info *revs,
|
|
|
|
struct object *obj, const char *name);
|
2019-11-17 22:04:48 +01:00
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
void add_pending_oid(struct rev_info *revs,
|
|
|
|
const char *name, const struct object_id *oid,
|
|
|
|
unsigned int flags);
|
2006-02-26 01:19:46 +01:00
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
void add_head_to_pending(struct rev_info *);
|
|
|
|
void add_reflogs_to_pending(struct rev_info *, unsigned int flags);
|
|
|
|
void add_index_objects_to_pending(struct rev_info *, unsigned int flags);
|
2007-12-11 19:09:04 +01:00
|
|
|
|
2007-11-04 21:12:05 +01:00
|
|
|
enum commit_action {
|
|
|
|
commit_ignore,
|
|
|
|
commit_show,
|
|
|
|
commit_error
|
|
|
|
};
|
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
enum commit_action get_commit_action(struct rev_info *revs,
|
|
|
|
struct commit *commit);
|
|
|
|
enum commit_action simplify_commit(struct rev_info *revs,
|
|
|
|
struct commit *commit);
|
2007-11-04 21:12:05 +01:00
|
|
|
|
2013-03-28 17:47:31 +01:00
|
|
|
enum rewrite_result {
|
|
|
|
rewrite_one_ok,
|
|
|
|
rewrite_one_noparents,
|
|
|
|
rewrite_one_error
|
|
|
|
};
|
|
|
|
|
|
|
|
typedef enum rewrite_result (*rewrite_parent_fn_t)(struct rev_info *revs, struct commit **pp);
|
|
|
|
|
2018-06-30 11:20:30 +02:00
|
|
|
int rewrite_parents(struct rev_info *revs,
|
|
|
|
struct commit *commit,
|
|
|
|
rewrite_parent_fn_t rewrite_parent);
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
|
|
|
|
|
|
|
/*
|
2015-01-14 23:49:24 +01:00
|
|
|
* The log machinery saves the original parent list so that
|
|
|
|
* get_saved_parents() can later tell what the real parents of the
|
|
|
|
* commits are, when commit->parents has been modified by history
|
|
|
|
* simpification.
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
|
|
|
*
|
|
|
|
* get_saved_parents() will transparently return commit->parents if
|
|
|
|
* history simplification is off.
|
|
|
|
*/
|
2018-06-30 11:20:30 +02:00
|
|
|
struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit);
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
|
|
|
|
2006-02-26 01:19:46 +01:00
|
|
|
#endif
|