2006-08-03 17:24:35 +02:00
|
|
|
#include "builtin.h"
|
2005-10-26 15:10:20 +02:00
|
|
|
#include "cache.h"
|
2018-06-29 03:21:51 +02:00
|
|
|
#include "repository.h"
|
2017-06-14 20:07:36 +02:00
|
|
|
#include "config.h"
|
2005-10-26 15:10:20 +02:00
|
|
|
#include "commit.h"
|
|
|
|
#include "tag.h"
|
|
|
|
#include "refs.h"
|
2007-10-15 22:57:59 +02:00
|
|
|
#include "parse-options.h"
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
#include "prio-queue.h"
|
2020-12-31 12:56:23 +01:00
|
|
|
#include "hash-lookup.h"
|
2018-05-19 07:28:26 +02:00
|
|
|
#include "commit-slab.h"
|
2022-03-12 01:00:15 +01:00
|
|
|
#include "commit-graph.h"
|
2005-10-26 15:10:20 +02:00
|
|
|
|
name-rev: avoid cutoff timestamp underflow
When 'git name-rev' is invoked with commit-ish parameters, it tries to
save some work, and doesn't visit commits older than the committer
date of the oldest given commit minus a one day worth of slop. Since
our 'timestamp_t' is an unsigned type, this leads to a timestamp
underflow when the committer date of the oldest given commit is within
a day of the UNIX epoch. As a result the cutoff timestamp ends up
far-far in the future, and 'git name-rev' doesn't visit any commits,
and names each given commit as 'undefined'.
Check whether subtracting the slop from the oldest committer date
would lead to an underflow, and use no cutoff in that case. We don't
have a TIME_MIN constant, dddbad728c (timestamp_t: a new data type for
timestamps, 2017-04-26) didn't add one, so do it now.
Note that the type of the cutoff timestamp variable used to be signed
before 5589e87fd8 (name-rev: change a "long" variable to timestamp_t,
2017-05-20). The behavior was still the same even back then, but the
underflow didn't happen when substracting the slop from the oldest
committer date, but when comparing the signed cutoff timestamp with
unsigned committer dates in name_rev(). IOW, this underflow bug is as
old as 'git name-rev' itself.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-24 09:32:13 +02:00
|
|
|
/*
|
|
|
|
* One day. See the 'name a rev shortly after epoch' test in t6120 when
|
|
|
|
* changing this value
|
|
|
|
*/
|
|
|
|
#define CUTOFF_DATE_SLOP 86400
|
2007-05-24 21:20:42 +02:00
|
|
|
|
2020-02-04 22:15:19 +01:00
|
|
|
struct rev_name {
|
name-rev: release unused name strings
name_rev() assigns a name to a commit and its parents and grandparents
and so on. Commits share their name string with their first parent,
which in turn does the same, recursively to the root. That saves a lot
of allocations. When a better name is found, the old name is replaced,
but its memory is not released. That leakage can become significant.
Can we release these old strings exactly once even though they are
referenced multiple times? Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.
Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0. These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.
That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.
If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.
We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.
For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly. Here's the output of GNU
time before:
0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps
... and with this patch:
1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps
It also gets faster; hyperfine before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s]
Range (min … max): 1.522 s … 1.542 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s]
Range (min … max): 1.327 s … 1.346 s 10 runs
For the Linux repo it doesn't pay off; memory usage only gets down from:
0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps
... to:
0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps
The runtime actually increases slightly from:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms]
Range (min … max): 824.1 ms … 838.9 ms 10 runs
... to:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms]
Range (min … max): 843.4 ms … 854.3 ms 10 runs
Why is that? In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca. 1000x longer in the latter.
Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 22:26:18 +01:00
|
|
|
char *tip_name;
|
2017-04-26 21:29:31 +02:00
|
|
|
timestamp_t taggerdate;
|
2005-10-26 15:10:20 +02:00
|
|
|
int generation;
|
2007-08-27 13:37:33 +02:00
|
|
|
int distance;
|
name-rev: favor describing with tags and use committer date to tiebreak
"git name-rev" assigned a phony "far in the future" date to tips of
refs that are not pointing at tag objects, and favored names based
on a ref with the oldest date. This made it almost impossible for
an unannotated tags and branches to be counted as a viable base,
which was especially problematic when the command is run with the
"--tags" option. If an unannotated tag that points at an ancient
commit and an annotated tag that points at a much newer commit
reaches the commit that is being named, the old unannotated tag was
ignored.
Update the "taggerdate" field of the rev-name structure, which is
initialized from the tip of ref, to have the committer date if the
object at the tip of ref is a commit, not a tag, so that we can
optionally take it into account when doing "is this name better?"
comparison logic.
When "name-rev" is run without the "--tags" option, the general
expectation is still to name the commit based on a tag if possible,
but use non-tag refs as fallback, and tiebreak among these non-tag
refs by favoring names with shorter hops from the tip. The use of a
phony "far in the future" date in the original code was an effective
way to ensure this expectation is held: a non-tag tip gets the same
"far in the future" timestamp, giving precedence to tags, and among
non-tag tips, names with shorter hops are preferred over longer
hops, without taking the "taggerdate" into account. As we are
taking over the "taggerdate" field to store the committer date for
tips with commits:
(1) keep the original logic when comparing names based on two refs
both of which are from refs/tags/;
(2) favoring a name based on a ref in refs/tags/ hierarchy over
a ref outside the hierarchy;
(3) between two names based on a ref both outside refs/tags/, give
precedence to a name with shorter hops and use "taggerdate"
only to tie-break.
A change to t4202 is a natural consequence. The test creates a
commit on a branch "side" and points at it with an unannotated tag
"refs/tags/side-2". The original code couldn't decide which one to
favor at all, and gave a name based on a branch (simply because
refs/heads/side sorts earlier than refs/tags/side-2). Because the
updated logic is taught to favor refs in refs/tags/ hierarchy, the
the test is updated to expect to see tags/side-2 instead.
[mjg: open-coded the comparisons in is_better_name(), dropping a
helper macro used in the original]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-29 16:39:16 +02:00
|
|
|
int from_tag;
|
2020-02-04 22:15:19 +01:00
|
|
|
};
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2020-02-04 22:22:36 +01:00
|
|
|
define_commit_slab(commit_rev_name, struct rev_name);
|
2018-05-19 07:28:26 +02:00
|
|
|
|
2022-03-12 01:00:15 +01:00
|
|
|
static timestamp_t generation_cutoff = GENERATION_NUMBER_INFINITY;
|
2017-05-20 07:39:43 +02:00
|
|
|
static timestamp_t cutoff = TIME_MAX;
|
2018-05-19 07:28:26 +02:00
|
|
|
static struct commit_rev_name rev_names;
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2022-03-12 01:00:15 +01:00
|
|
|
/* Disable the cutoff checks entirely */
|
|
|
|
static void disable_cutoff(void)
|
|
|
|
{
|
|
|
|
generation_cutoff = 0;
|
|
|
|
cutoff = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Cutoff searching any commits older than this one */
|
|
|
|
static void set_commit_cutoff(struct commit *commit)
|
|
|
|
{
|
|
|
|
|
|
|
|
if (cutoff > commit->date)
|
|
|
|
cutoff = commit->date;
|
|
|
|
|
|
|
|
if (generation_cutoff) {
|
|
|
|
timestamp_t generation = commit_graph_generation(commit);
|
|
|
|
|
|
|
|
if (generation_cutoff > generation)
|
|
|
|
generation_cutoff = generation;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* adjust the commit date cutoff with a slop to allow for slightly incorrect
|
|
|
|
* commit timestamps in case of clock skew.
|
|
|
|
*/
|
|
|
|
static void adjust_cutoff_timestamp_for_slop(void)
|
|
|
|
{
|
|
|
|
if (cutoff) {
|
|
|
|
/* check for undeflow */
|
|
|
|
if (cutoff > TIME_MIN + CUTOFF_DATE_SLOP)
|
|
|
|
cutoff = cutoff - CUTOFF_DATE_SLOP;
|
|
|
|
else
|
|
|
|
cutoff = TIME_MIN;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Check if a commit is before the cutoff. Prioritize generation numbers
|
|
|
|
* first, but use the commit timestamp if we lack generation data.
|
|
|
|
*/
|
|
|
|
static int commit_is_before_cutoff(struct commit *commit)
|
|
|
|
{
|
|
|
|
if (generation_cutoff < GENERATION_NUMBER_INFINITY)
|
|
|
|
return generation_cutoff &&
|
|
|
|
commit_graph_generation(commit) < generation_cutoff;
|
|
|
|
|
|
|
|
return commit->date < cutoff;
|
|
|
|
}
|
|
|
|
|
2007-08-27 13:37:33 +02:00
|
|
|
/* How many generations are maximally preferred over _one_ merge traversal? */
|
|
|
|
#define MERGE_TRAVERSAL_WEIGHT 65535
|
|
|
|
|
2020-02-04 22:22:36 +01:00
|
|
|
static int is_valid_rev_name(const struct rev_name *name)
|
|
|
|
{
|
name-rev: release unused name strings
name_rev() assigns a name to a commit and its parents and grandparents
and so on. Commits share their name string with their first parent,
which in turn does the same, recursively to the root. That saves a lot
of allocations. When a better name is found, the old name is replaced,
but its memory is not released. That leakage can become significant.
Can we release these old strings exactly once even though they are
referenced multiple times? Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.
Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0. These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.
That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.
If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.
We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.
For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly. Here's the output of GNU
time before:
0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps
... and with this patch:
1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps
It also gets faster; hyperfine before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s]
Range (min … max): 1.522 s … 1.542 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s]
Range (min … max): 1.327 s … 1.346 s 10 runs
For the Linux repo it doesn't pay off; memory usage only gets down from:
0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps
... to:
0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps
The runtime actually increases slightly from:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms]
Range (min … max): 824.1 ms … 838.9 ms 10 runs
... to:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms]
Range (min … max): 843.4 ms … 854.3 ms 10 runs
Why is that? In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca. 1000x longer in the latter.
Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 22:26:18 +01:00
|
|
|
return name && (name->generation || name->tip_name);
|
2020-02-04 22:22:36 +01:00
|
|
|
}
|
|
|
|
|
2020-02-04 22:16:10 +01:00
|
|
|
static struct rev_name *get_commit_rev_name(const struct commit *commit)
|
2018-05-19 07:28:26 +02:00
|
|
|
{
|
2020-02-04 22:22:36 +01:00
|
|
|
struct rev_name *name = commit_rev_name_peek(&rev_names, commit);
|
2018-05-19 07:28:26 +02:00
|
|
|
|
2020-02-04 22:22:36 +01:00
|
|
|
return is_valid_rev_name(name) ? name : NULL;
|
2018-05-19 07:28:26 +02:00
|
|
|
}
|
|
|
|
|
name-rev: prefer shorter names over following merges
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 06:35:52 +01:00
|
|
|
static int effective_distance(int distance, int generation)
|
|
|
|
{
|
|
|
|
return distance + (generation > 0 ? MERGE_TRAVERSAL_WEIGHT : 0);
|
|
|
|
}
|
|
|
|
|
2017-03-29 16:39:15 +02:00
|
|
|
static int is_better_name(struct rev_name *name,
|
2017-05-30 04:16:39 +02:00
|
|
|
timestamp_t taggerdate,
|
name-rev: prefer shorter names over following merges
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 06:35:52 +01:00
|
|
|
int generation,
|
name-rev: favor describing with tags and use committer date to tiebreak
"git name-rev" assigned a phony "far in the future" date to tips of
refs that are not pointing at tag objects, and favored names based
on a ref with the oldest date. This made it almost impossible for
an unannotated tags and branches to be counted as a viable base,
which was especially problematic when the command is run with the
"--tags" option. If an unannotated tag that points at an ancient
commit and an annotated tag that points at a much newer commit
reaches the commit that is being named, the old unannotated tag was
ignored.
Update the "taggerdate" field of the rev-name structure, which is
initialized from the tip of ref, to have the committer date if the
object at the tip of ref is a commit, not a tag, so that we can
optionally take it into account when doing "is this name better?"
comparison logic.
When "name-rev" is run without the "--tags" option, the general
expectation is still to name the commit based on a tag if possible,
but use non-tag refs as fallback, and tiebreak among these non-tag
refs by favoring names with shorter hops from the tip. The use of a
phony "far in the future" date in the original code was an effective
way to ensure this expectation is held: a non-tag tip gets the same
"far in the future" timestamp, giving precedence to tags, and among
non-tag tips, names with shorter hops are preferred over longer
hops, without taking the "taggerdate" into account. As we are
taking over the "taggerdate" field to store the committer date for
tips with commits:
(1) keep the original logic when comparing names based on two refs
both of which are from refs/tags/;
(2) favoring a name based on a ref in refs/tags/ hierarchy over
a ref outside the hierarchy;
(3) between two names based on a ref both outside refs/tags/, give
precedence to a name with shorter hops and use "taggerdate"
only to tie-break.
A change to t4202 is a natural consequence. The test creates a
commit on a branch "side" and points at it with an unannotated tag
"refs/tags/side-2". The original code couldn't decide which one to
favor at all, and gave a name based on a branch (simply because
refs/heads/side sorts earlier than refs/tags/side-2). Because the
updated logic is taught to favor refs in refs/tags/ hierarchy, the
the test is updated to expect to see tags/side-2 instead.
[mjg: open-coded the comparisons in is_better_name(), dropping a
helper macro used in the original]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-29 16:39:16 +02:00
|
|
|
int distance,
|
|
|
|
int from_tag)
|
2017-03-29 16:39:15 +02:00
|
|
|
{
|
name-rev: prefer shorter names over following merges
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 06:35:52 +01:00
|
|
|
int name_distance = effective_distance(name->distance, name->generation);
|
|
|
|
int new_distance = effective_distance(distance, generation);
|
|
|
|
|
name-rev: favor describing with tags and use committer date to tiebreak
"git name-rev" assigned a phony "far in the future" date to tips of
refs that are not pointing at tag objects, and favored names based
on a ref with the oldest date. This made it almost impossible for
an unannotated tags and branches to be counted as a viable base,
which was especially problematic when the command is run with the
"--tags" option. If an unannotated tag that points at an ancient
commit and an annotated tag that points at a much newer commit
reaches the commit that is being named, the old unannotated tag was
ignored.
Update the "taggerdate" field of the rev-name structure, which is
initialized from the tip of ref, to have the committer date if the
object at the tip of ref is a commit, not a tag, so that we can
optionally take it into account when doing "is this name better?"
comparison logic.
When "name-rev" is run without the "--tags" option, the general
expectation is still to name the commit based on a tag if possible,
but use non-tag refs as fallback, and tiebreak among these non-tag
refs by favoring names with shorter hops from the tip. The use of a
phony "far in the future" date in the original code was an effective
way to ensure this expectation is held: a non-tag tip gets the same
"far in the future" timestamp, giving precedence to tags, and among
non-tag tips, names with shorter hops are preferred over longer
hops, without taking the "taggerdate" into account. As we are
taking over the "taggerdate" field to store the committer date for
tips with commits:
(1) keep the original logic when comparing names based on two refs
both of which are from refs/tags/;
(2) favoring a name based on a ref in refs/tags/ hierarchy over
a ref outside the hierarchy;
(3) between two names based on a ref both outside refs/tags/, give
precedence to a name with shorter hops and use "taggerdate"
only to tie-break.
A change to t4202 is a natural consequence. The test creates a
commit on a branch "side" and points at it with an unannotated tag
"refs/tags/side-2". The original code couldn't decide which one to
favor at all, and gave a name based on a branch (simply because
refs/heads/side sorts earlier than refs/tags/side-2). Because the
updated logic is taught to favor refs in refs/tags/ hierarchy, the
the test is updated to expect to see tags/side-2 instead.
[mjg: open-coded the comparisons in is_better_name(), dropping a
helper macro used in the original]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-29 16:39:16 +02:00
|
|
|
/*
|
|
|
|
* When comparing names based on tags, prefer names
|
|
|
|
* based on the older tag, even if it is farther away.
|
|
|
|
*/
|
|
|
|
if (from_tag && name->from_tag)
|
|
|
|
return (name->taggerdate > taggerdate ||
|
|
|
|
(name->taggerdate == taggerdate &&
|
name-rev: prefer shorter names over following merges
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 06:35:52 +01:00
|
|
|
name_distance > new_distance));
|
name-rev: favor describing with tags and use committer date to tiebreak
"git name-rev" assigned a phony "far in the future" date to tips of
refs that are not pointing at tag objects, and favored names based
on a ref with the oldest date. This made it almost impossible for
an unannotated tags and branches to be counted as a viable base,
which was especially problematic when the command is run with the
"--tags" option. If an unannotated tag that points at an ancient
commit and an annotated tag that points at a much newer commit
reaches the commit that is being named, the old unannotated tag was
ignored.
Update the "taggerdate" field of the rev-name structure, which is
initialized from the tip of ref, to have the committer date if the
object at the tip of ref is a commit, not a tag, so that we can
optionally take it into account when doing "is this name better?"
comparison logic.
When "name-rev" is run without the "--tags" option, the general
expectation is still to name the commit based on a tag if possible,
but use non-tag refs as fallback, and tiebreak among these non-tag
refs by favoring names with shorter hops from the tip. The use of a
phony "far in the future" date in the original code was an effective
way to ensure this expectation is held: a non-tag tip gets the same
"far in the future" timestamp, giving precedence to tags, and among
non-tag tips, names with shorter hops are preferred over longer
hops, without taking the "taggerdate" into account. As we are
taking over the "taggerdate" field to store the committer date for
tips with commits:
(1) keep the original logic when comparing names based on two refs
both of which are from refs/tags/;
(2) favoring a name based on a ref in refs/tags/ hierarchy over
a ref outside the hierarchy;
(3) between two names based on a ref both outside refs/tags/, give
precedence to a name with shorter hops and use "taggerdate"
only to tie-break.
A change to t4202 is a natural consequence. The test creates a
commit on a branch "side" and points at it with an unannotated tag
"refs/tags/side-2". The original code couldn't decide which one to
favor at all, and gave a name based on a branch (simply because
refs/heads/side sorts earlier than refs/tags/side-2). Because the
updated logic is taught to favor refs in refs/tags/ hierarchy, the
the test is updated to expect to see tags/side-2 instead.
[mjg: open-coded the comparisons in is_better_name(), dropping a
helper macro used in the original]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-29 16:39:16 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* We know that at least one of them is a non-tag at this point.
|
|
|
|
* favor a tag over a non-tag.
|
|
|
|
*/
|
|
|
|
if (name->from_tag != from_tag)
|
|
|
|
return from_tag;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We are now looking at two non-tags. Tiebreak to favor
|
|
|
|
* shorter hops.
|
|
|
|
*/
|
name-rev: prefer shorter names over following merges
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 06:35:52 +01:00
|
|
|
if (name_distance != new_distance)
|
|
|
|
return name_distance > new_distance;
|
name-rev: favor describing with tags and use committer date to tiebreak
"git name-rev" assigned a phony "far in the future" date to tips of
refs that are not pointing at tag objects, and favored names based
on a ref with the oldest date. This made it almost impossible for
an unannotated tags and branches to be counted as a viable base,
which was especially problematic when the command is run with the
"--tags" option. If an unannotated tag that points at an ancient
commit and an annotated tag that points at a much newer commit
reaches the commit that is being named, the old unannotated tag was
ignored.
Update the "taggerdate" field of the rev-name structure, which is
initialized from the tip of ref, to have the committer date if the
object at the tip of ref is a commit, not a tag, so that we can
optionally take it into account when doing "is this name better?"
comparison logic.
When "name-rev" is run without the "--tags" option, the general
expectation is still to name the commit based on a tag if possible,
but use non-tag refs as fallback, and tiebreak among these non-tag
refs by favoring names with shorter hops from the tip. The use of a
phony "far in the future" date in the original code was an effective
way to ensure this expectation is held: a non-tag tip gets the same
"far in the future" timestamp, giving precedence to tags, and among
non-tag tips, names with shorter hops are preferred over longer
hops, without taking the "taggerdate" into account. As we are
taking over the "taggerdate" field to store the committer date for
tips with commits:
(1) keep the original logic when comparing names based on two refs
both of which are from refs/tags/;
(2) favoring a name based on a ref in refs/tags/ hierarchy over
a ref outside the hierarchy;
(3) between two names based on a ref both outside refs/tags/, give
precedence to a name with shorter hops and use "taggerdate"
only to tie-break.
A change to t4202 is a natural consequence. The test creates a
commit on a branch "side" and points at it with an unannotated tag
"refs/tags/side-2". The original code couldn't decide which one to
favor at all, and gave a name based on a branch (simply because
refs/heads/side sorts earlier than refs/tags/side-2). Because the
updated logic is taught to favor refs in refs/tags/ hierarchy, the
the test is updated to expect to see tags/side-2 instead.
[mjg: open-coded the comparisons in is_better_name(), dropping a
helper macro used in the original]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael J Gruber <git@grubix.eu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-29 16:39:16 +02:00
|
|
|
|
|
|
|
/* ... or tiebreak to favor older date */
|
|
|
|
if (name->taggerdate != taggerdate)
|
|
|
|
return name->taggerdate > taggerdate;
|
|
|
|
|
|
|
|
/* keep the current one if we cannot decide */
|
|
|
|
return 0;
|
2017-03-29 16:39:15 +02:00
|
|
|
}
|
|
|
|
|
2019-11-12 11:38:15 +01:00
|
|
|
static struct rev_name *create_or_update_name(struct commit *commit,
|
|
|
|
timestamp_t taggerdate,
|
|
|
|
int generation, int distance,
|
|
|
|
int from_tag)
|
2005-10-26 15:10:20 +02:00
|
|
|
{
|
2020-02-04 22:22:36 +01:00
|
|
|
struct rev_name *name = commit_rev_name_at(&rev_names, commit);
|
2005-10-26 15:10:20 +02:00
|
|
|
|
name-rev: release unused name strings
name_rev() assigns a name to a commit and its parents and grandparents
and so on. Commits share their name string with their first parent,
which in turn does the same, recursively to the root. That saves a lot
of allocations. When a better name is found, the old name is replaced,
but its memory is not released. That leakage can become significant.
Can we release these old strings exactly once even though they are
referenced multiple times? Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.
Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0. These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.
That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.
If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.
We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.
For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly. Here's the output of GNU
time before:
0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps
... and with this patch:
1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps
It also gets faster; hyperfine before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s]
Range (min … max): 1.522 s … 1.542 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s]
Range (min … max): 1.327 s … 1.346 s 10 runs
For the Linux repo it doesn't pay off; memory usage only gets down from:
0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps
... to:
0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps
The runtime actually increases slightly from:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms]
Range (min … max): 824.1 ms … 838.9 ms 10 runs
... to:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms]
Range (min … max): 843.4 ms … 854.3 ms 10 runs
Why is that? In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca. 1000x longer in the latter.
Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 22:26:18 +01:00
|
|
|
if (is_valid_rev_name(name)) {
|
name-rev: prefer shorter names over following merges
name-rev has a MERGE_TRAVERSAL_WEIGHT to say that traversing a second or
later parent of a merge should be 65535 times more expensive than a
first-parent traversal, as per ac076c29ae8d (name-rev: Fix non-shortest
description, 2007-08-27). The point of this weight is to prefer names
like
v2.32.0~1471^2
over names like
v2.32.0~43^2~15^2~11^2~20^2~31^2
which are two equally valid names in git.git for the same commit. Note
that the first follows 1472 parent traversals compared to a mere 125 for
the second. Weighting all traversals equally would clearly prefer the
second name since it has fewer parent traversals, but humans aren't
going to be traversing commits and they tend to have an easier time
digesting names with fewer segments. The fact that the former only has
two segments (~1471, ^2) makes it much simpler than the latter which has
six segments (~43, ^2, ~15, etc.). Since name-rev is meant to "find
symbolic names suitable for human digestion", we prefer fewer segments.
However, the particular rule implemented in name-rev would actually
prefer
v2.33.0-rc0~11^2~1
over
v2.33.0-rc0~20^2
because both have precisely one second parent traversal, and it gives
the tie breaker to shortest number of total parent traversals. Fewer
segments is more important for human consumption than number of hops, so
we'd rather see the latter which has one fewer segment.
Include the generation in is_better_name() and use a new
effective_distance() calculation so that we prefer fewer segments in
the printed name over fewer total parent traversals performed to get the
answer.
== Side-note on tie-breakers ==
When there are the same number of segments for two different names, we
actually use the name of an ancestor commit as a tie-breaker as well.
For example, for the commit cbdca289fb in the git.git repository, we
prefer the name v2.33.0-rc0~112^2~1 over v2.33.0-rc0~57^2~5. This is
because:
* cbdca289fb is the parent of 25e65b6dd5, which implies the name for
cbdca289fb should be the first parent of the preferred name for
25e65b6dd5
* 25e65b6dd5 could be named either v2.33.0-rc0~112^2 or
v2.33.0-rc0~57^2~4, but the former is preferred over the latter due
to fewer segments
* combine the two previous facts, and the name we get for cbdca289fb
is "v2.33.0-rc0~112^2~1" rather than "v2.33.0-rc0~57^2~5".
Technically, we get this for free out of the implementation since we
only keep track of one name for each commit as we walk history (and
re-add parents to the queue if we find a better name for those parents),
but the first bullet point above ensures users get results that feel
more consistent.
== Alternative Ideas and Meanings Discussed ==
One suggestion that came up during review was that shortest
string-length might be easiest for users to consume. However, such a
scheme would be rather computationally expensive (we'd have to track all
names for each commit as we traversed the graph) and would additionally
come with the possibly perplexing result that on a linear segment of
history we could rapidly swap back and forth on names:
MYTAG~3^2 would be preferred over MYTAG~9998
MYTAG~3^2~1 would NOT be preferred over MYTAG~9999
MYTAG~3^2~2 might be preferred over MYTAG~10000
Another item that came up was possible auxiliary semantic meanings for
name-rev results either before or after this patch. The basic answer
was that the previous implementation had no known useful auxiliary
semantics, but that for many repositories (most in my experience), the
new scheme does. In particular, the new name-rev output can often be
used to answer the question, "How or when did this commit get merged?"
Since that usefulness depends on how merges happen within the repository
and thus isn't universally applicable, details are omitted here but you
can see them at [1].
[1] https://lore.kernel.org/git/CABPp-BEeUM+3NLKDVdak90_UUeNghYCx=Dgir6=8ixvYmvyq3Q@mail.gmail.com/
Finally, it was noted that the algorithm could be improved by just
explicitly tracking the number of segments and using both it and
distance in the comparison, instead of giving a magic number that tries
to blend the two (and which therefore might give suboptimal results in
repositories with really huge numbers of commits that periodically merge
older code). However, "[this patch] seems to give us a much better
results than the current code, so let's take it and leave further
futzing outside the scope."
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-04 06:35:52 +01:00
|
|
|
if (!is_better_name(name, taggerdate, generation, distance, from_tag))
|
name-rev: release unused name strings
name_rev() assigns a name to a commit and its parents and grandparents
and so on. Commits share their name string with their first parent,
which in turn does the same, recursively to the root. That saves a lot
of allocations. When a better name is found, the old name is replaced,
but its memory is not released. That leakage can become significant.
Can we release these old strings exactly once even though they are
referenced multiple times? Yes, indeed -- we can make use of the fact
that name_rev() visits the ancestors of a commit after it set a new name
for it and tries to update their names as well.
Members of the first ancestral line have the same taggerdate and
from_tag values, but a higher distance value than their child commit at
generation 0. These are the only criteria used by is_better_name().
Lower distance values are considered better, so a name that is better
for a child will also be better for its parent and grandparent etc.
That means we can free(3) an inferior name at generation 0 and rely on
name_rev() to replace all references in ancestors as well.
If we do that then we need to stop using the string pointer alone to
distinguish new empty rev_name slots from initialized ones, though, as
it technically becomes invalid after the free(3) call -- even though its
value is still different from NULL.
We can check the generation value first, as empty slots will have it
initialized to 0, and for the actual generation 0 we'll set a new valid
name right after the create_or_update_name() call that releases the
string.
For the Chromium repo, releasing superceded names reduces the memory
footprint of name-rev --all significantly. Here's the output of GNU
time before:
0.98user 0.48system 0:01.46elapsed 99%CPU (0avgtext+0avgdata 2601812maxresident)k
0inputs+0outputs (0major+571470minor)pagefaults 0swaps
... and with this patch:
1.01user 0.26system 0:01.28elapsed 100%CPU (0avgtext+0avgdata 1559196maxresident)k
0inputs+0outputs (0major+314370minor)pagefaults 0swaps
It also gets faster; hyperfine before:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.534 s ± 0.006 s [User: 1.039 s, System: 0.494 s]
Range (min … max): 1.522 s … 1.542 s 10 runs
... and with this patch:
Benchmark #1: ./git -C ../chromium/src name-rev --all
Time (mean ± σ): 1.338 s ± 0.006 s [User: 1.047 s, System: 0.291 s]
Range (min … max): 1.327 s … 1.346 s 10 runs
For the Linux repo it doesn't pay off; memory usage only gets down from:
0.76user 0.03system 0:00.80elapsed 99%CPU (0avgtext+0avgdata 292848maxresident)k
0inputs+0outputs (0major+44579minor)pagefaults 0swaps
... to:
0.78user 0.03system 0:00.81elapsed 100%CPU (0avgtext+0avgdata 284696maxresident)k
0inputs+0outputs (0major+44892minor)pagefaults 0swaps
The runtime actually increases slightly from:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 828.8 ms ± 5.0 ms [User: 797.2 ms, System: 31.6 ms]
Range (min … max): 824.1 ms … 838.9 ms 10 runs
... to:
Benchmark #1: ./git -C ../linux/ name-rev --all
Time (mean ± σ): 847.6 ms ± 3.4 ms [User: 807.9 ms, System: 39.6 ms]
Range (min … max): 843.4 ms … 854.3 ms 10 runs
Why is that? In the Chromium repo, ca. 44000 free(3) calls in
create_or_update_name() release almost 1GB, while in the Linux repo
240000+ calls release a bit more than 5MB, so the average discarded
name is ca. 1000x longer in the latter.
Overall I think it's the right tradeoff to make, as it helps curb the
memory usage in repositories with big discarded names, and the added
overhead is small.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 22:26:18 +01:00
|
|
|
return NULL;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* This string might still be shared with ancestors
|
|
|
|
* (generation > 0). We can release it here regardless,
|
|
|
|
* because the new name that has just won will be better
|
|
|
|
* for them as well, so name_rev() will replace these
|
|
|
|
* stale pointers when it processes the parents.
|
|
|
|
*/
|
|
|
|
if (!name->generation)
|
|
|
|
free(name->tip_name);
|
|
|
|
}
|
2020-02-05 18:19:22 +01:00
|
|
|
|
|
|
|
name->taggerdate = taggerdate;
|
|
|
|
name->generation = generation;
|
|
|
|
name->distance = distance;
|
|
|
|
name->from_tag = from_tag;
|
|
|
|
|
|
|
|
return name;
|
2019-11-12 11:38:15 +01:00
|
|
|
}
|
|
|
|
|
2020-02-04 22:23:29 +01:00
|
|
|
static char *get_parent_name(const struct rev_name *name, int parent_number)
|
|
|
|
{
|
2020-02-04 22:24:24 +01:00
|
|
|
struct strbuf sb = STRBUF_INIT;
|
2020-02-04 22:23:29 +01:00
|
|
|
size_t len;
|
|
|
|
|
|
|
|
strip_suffix(name->tip_name, "^0", &len);
|
2020-02-04 22:24:24 +01:00
|
|
|
if (name->generation > 0) {
|
|
|
|
strbuf_grow(&sb, len +
|
|
|
|
1 + decimal_width(name->generation) +
|
|
|
|
1 + decimal_width(parent_number));
|
|
|
|
strbuf_addf(&sb, "%.*s~%d^%d", (int)len, name->tip_name,
|
|
|
|
name->generation, parent_number);
|
|
|
|
} else {
|
|
|
|
strbuf_grow(&sb, len +
|
|
|
|
1 + decimal_width(parent_number));
|
|
|
|
strbuf_addf(&sb, "%.*s^%d", (int)len, name->tip_name,
|
|
|
|
parent_number);
|
|
|
|
}
|
|
|
|
return strbuf_detach(&sb, NULL);
|
2020-02-04 22:23:29 +01:00
|
|
|
}
|
|
|
|
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
static void name_rev(struct commit *start_commit,
|
2017-04-26 21:29:31 +02:00
|
|
|
const char *tip_name, timestamp_t taggerdate,
|
2019-12-09 12:52:58 +01:00
|
|
|
int from_tag, int deref)
|
2005-10-26 15:10:20 +02:00
|
|
|
{
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
struct prio_queue queue;
|
|
|
|
struct commit *commit;
|
|
|
|
struct commit **parents_to_queue = NULL;
|
|
|
|
size_t parents_to_queue_nr, parents_to_queue_alloc = 0;
|
2020-02-04 22:25:34 +01:00
|
|
|
struct rev_name *start_name;
|
2019-12-09 12:52:58 +01:00
|
|
|
|
|
|
|
parse_commit(start_commit);
|
2022-03-12 01:00:15 +01:00
|
|
|
if (commit_is_before_cutoff(start_commit))
|
2019-12-09 12:52:58 +01:00
|
|
|
return;
|
|
|
|
|
2020-02-04 22:25:34 +01:00
|
|
|
start_name = create_or_update_name(start_commit, taggerdate, 0, 0,
|
|
|
|
from_tag);
|
|
|
|
if (!start_name)
|
|
|
|
return;
|
2019-12-09 12:52:58 +01:00
|
|
|
if (deref)
|
2020-02-04 22:25:34 +01:00
|
|
|
start_name->tip_name = xstrfmt("%s^0", tip_name);
|
2020-02-04 22:17:02 +01:00
|
|
|
else
|
2020-02-04 22:25:34 +01:00
|
|
|
start_name->tip_name = xstrdup(tip_name);
|
2005-10-26 15:10:20 +02:00
|
|
|
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
memset(&queue, 0, sizeof(queue)); /* Use the prio_queue as LIFO */
|
|
|
|
prio_queue_put(&queue, start_commit);
|
|
|
|
|
|
|
|
while ((commit = prio_queue_get(&queue))) {
|
|
|
|
struct rev_name *name = get_commit_rev_name(commit);
|
|
|
|
struct commit_list *parents;
|
|
|
|
int parent_number = 1;
|
|
|
|
|
|
|
|
parents_to_queue_nr = 0;
|
|
|
|
|
|
|
|
for (parents = commit->parents;
|
|
|
|
parents;
|
|
|
|
parents = parents->next, parent_number++) {
|
|
|
|
struct commit *parent = parents->item;
|
2020-02-04 22:25:34 +01:00
|
|
|
struct rev_name *parent_name;
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
int generation, distance;
|
|
|
|
|
|
|
|
parse_commit(parent);
|
2022-03-12 01:00:15 +01:00
|
|
|
if (commit_is_before_cutoff(parent))
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
continue;
|
|
|
|
|
|
|
|
if (parent_number > 1) {
|
|
|
|
generation = 0;
|
|
|
|
distance = name->distance + MERGE_TRAVERSAL_WEIGHT;
|
|
|
|
} else {
|
|
|
|
generation = name->generation + 1;
|
|
|
|
distance = name->distance + 1;
|
|
|
|
}
|
|
|
|
|
2020-02-04 22:25:34 +01:00
|
|
|
parent_name = create_or_update_name(parent, taggerdate,
|
|
|
|
generation,
|
|
|
|
distance, from_tag);
|
|
|
|
if (parent_name) {
|
|
|
|
if (parent_number > 1)
|
|
|
|
parent_name->tip_name =
|
|
|
|
get_parent_name(name,
|
|
|
|
parent_number);
|
|
|
|
else
|
|
|
|
parent_name->tip_name = name->tip_name;
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
ALLOC_GROW(parents_to_queue,
|
|
|
|
parents_to_queue_nr + 1,
|
|
|
|
parents_to_queue_alloc);
|
|
|
|
parents_to_queue[parents_to_queue_nr] = parent;
|
|
|
|
parents_to_queue_nr++;
|
|
|
|
}
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
2019-11-12 11:38:18 +01:00
|
|
|
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
/* The first parent must come out first from the prio_queue */
|
|
|
|
while (parents_to_queue_nr)
|
|
|
|
prio_queue_put(&queue,
|
|
|
|
parents_to_queue[--parents_to_queue_nr]);
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
name-rev: eliminate recursion in name_rev()
The name_rev() function calls itself recursively for each interesting
parent of the commit it got as parameter, and, consequently, it can
segfault when processing a deep history if it exhausts the available
stack space. E.g. running 'git name-rev --all' and 'git name-rev
HEAD~100000' in the gcc, gecko-dev, llvm, and WebKit repositories
results in segfaults on my machine ('ulimit -s' reports 8192kB of
stack size limit), and nowadays the former segfaults in the Linux repo
as well (it reached the necessasry depth sometime between v5.3-rc4 and
-rc5).
Eliminate the recursion by inserting the interesting parents into a
LIFO 'prio_queue' [1] and iterating until the queue becomes empty.
Note that the parent commits must be added in reverse order to the
LIFO 'prio_queue', so their relative order is preserved during
processing, i.e. the first parent should come out first from the
queue, because otherwise performance greatly suffers on mergy
histories [2].
The stacksize-limited test 'name-rev works in a deep repo' in
't6120-describe.sh' demonstrated this issue and expected failure. Now
the recursion is gone, so flip it to expect success. Also gone are
the dmesg entries logging the segfault of that segfaulting 'git
name-rev' process on every execution of the test suite.
Note that this slightly changes the order of lines in the output of
'git name-rev --all', usually swapping two lines every 35 lines in
git.git or every 150 lines in linux.git. This shouldn't matter in
practice, because the output has always been unordered anyway.
This patch is best viewed with '--ignore-all-space'.
[1] Early versions of this patch used a 'commit_list', resulting in
~15% performance penalty for 'git name-rev --all' in 'linux.git',
presumably because of the memory allocation and release for each
insertion and removal. Using a LIFO 'prio_queue' has basically no
effect on performance.
[2] We prefer shorter names, i.e. 'v0.1~234' is preferred over
'v0.1^2~5', meaning that usually following the first parent of a
merge results in the best name for its ancestors. So when later
we follow the remaining parent(s) of a merge, and reach an already
named commit, then we usually find that we can't give that commit
a better name, and thus we don't have to visit any of its
ancestors again.
OTOH, if we were to follow the Nth parent of the merge first, then
the name of all its ancestors would include a corresponding '^N'.
Those are not the best names for those commits, so when later we
reach an already named commit following the first parent of that
merge, then we would have to update the name of that commit and
the names of all of its ancestors as well. Consequently, we would
have to visit many commits several times, resulting in a
significant slowdown.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-09 12:52:57 +01:00
|
|
|
|
|
|
|
clear_prio_queue(&queue);
|
|
|
|
free(parents_to_queue);
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
|
|
|
|
2013-06-18 14:35:31 +02:00
|
|
|
static int subpath_matches(const char *path, const char *filter)
|
|
|
|
{
|
|
|
|
const char *subpath = path;
|
|
|
|
|
|
|
|
while (subpath) {
|
2017-06-22 23:38:08 +02:00
|
|
|
if (!wildmatch(filter, subpath, 0))
|
2013-06-18 14:35:31 +02:00
|
|
|
return subpath - path;
|
|
|
|
subpath = strchr(subpath, '/');
|
|
|
|
if (subpath)
|
|
|
|
subpath++;
|
|
|
|
}
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
2013-07-07 23:13:41 +02:00
|
|
|
static const char *name_ref_abbrev(const char *refname, int shorten_unambiguous)
|
|
|
|
{
|
|
|
|
if (shorten_unambiguous)
|
|
|
|
refname = shorten_unambiguous_ref(refname, 0);
|
2019-11-26 16:23:31 +01:00
|
|
|
else if (skip_prefix(refname, "refs/heads/", &refname))
|
|
|
|
; /* refname already advanced */
|
|
|
|
else
|
|
|
|
skip_prefix(refname, "refs/", &refname);
|
2013-07-07 23:13:41 +02:00
|
|
|
return refname;
|
|
|
|
}
|
|
|
|
|
2007-02-17 19:22:35 +01:00
|
|
|
struct name_ref_data {
|
|
|
|
int tags_only;
|
2007-05-21 09:20:25 +02:00
|
|
|
int name_only;
|
2017-01-19 00:06:05 +01:00
|
|
|
struct string_list ref_filters;
|
2017-01-19 00:06:06 +01:00
|
|
|
struct string_list exclude_filters;
|
2007-02-17 19:22:35 +01:00
|
|
|
};
|
|
|
|
|
2013-07-07 23:14:22 +02:00
|
|
|
static struct tip_table {
|
|
|
|
struct tip_table_entry {
|
2017-05-01 04:28:57 +02:00
|
|
|
struct object_id oid;
|
2013-07-07 23:14:22 +02:00
|
|
|
const char *refname;
|
2020-02-05 18:50:23 +01:00
|
|
|
struct commit *commit;
|
|
|
|
timestamp_t taggerdate;
|
|
|
|
unsigned int from_tag:1;
|
|
|
|
unsigned int deref:1;
|
2013-07-07 23:14:22 +02:00
|
|
|
} *table;
|
|
|
|
int nr;
|
|
|
|
int alloc;
|
|
|
|
int sorted;
|
|
|
|
} tip_table;
|
|
|
|
|
2017-05-01 04:28:57 +02:00
|
|
|
static void add_to_tip_table(const struct object_id *oid, const char *refname,
|
2020-02-05 18:50:23 +01:00
|
|
|
int shorten_unambiguous, struct commit *commit,
|
|
|
|
timestamp_t taggerdate, int from_tag, int deref)
|
2013-07-07 23:14:22 +02:00
|
|
|
{
|
|
|
|
refname = name_ref_abbrev(refname, shorten_unambiguous);
|
|
|
|
|
|
|
|
ALLOC_GROW(tip_table.table, tip_table.nr + 1, tip_table.alloc);
|
2017-05-01 04:28:57 +02:00
|
|
|
oidcpy(&tip_table.table[tip_table.nr].oid, oid);
|
2013-07-07 23:14:22 +02:00
|
|
|
tip_table.table[tip_table.nr].refname = xstrdup(refname);
|
2020-02-05 18:50:23 +01:00
|
|
|
tip_table.table[tip_table.nr].commit = commit;
|
|
|
|
tip_table.table[tip_table.nr].taggerdate = taggerdate;
|
|
|
|
tip_table.table[tip_table.nr].from_tag = from_tag;
|
|
|
|
tip_table.table[tip_table.nr].deref = deref;
|
2013-07-07 23:14:22 +02:00
|
|
|
tip_table.nr++;
|
|
|
|
tip_table.sorted = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int tipcmp(const void *a_, const void *b_)
|
|
|
|
{
|
|
|
|
const struct tip_table_entry *a = a_, *b = b_;
|
2017-05-01 04:28:57 +02:00
|
|
|
return oidcmp(&a->oid, &b->oid);
|
2013-07-07 23:14:22 +02:00
|
|
|
}
|
|
|
|
|
2020-02-05 18:50:23 +01:00
|
|
|
static int cmp_by_tag_and_age(const void *a_, const void *b_)
|
|
|
|
{
|
|
|
|
const struct tip_table_entry *a = a_, *b = b_;
|
|
|
|
int cmp;
|
|
|
|
|
|
|
|
/* Prefer tags. */
|
|
|
|
cmp = b->from_tag - a->from_tag;
|
|
|
|
if (cmp)
|
|
|
|
return cmp;
|
|
|
|
|
|
|
|
/* Older is better. */
|
|
|
|
if (a->taggerdate < b->taggerdate)
|
|
|
|
return -1;
|
|
|
|
return a->taggerdate != b->taggerdate;
|
|
|
|
}
|
|
|
|
|
2015-05-25 20:38:37 +02:00
|
|
|
static int name_ref(const char *path, const struct object_id *oid, int flags, void *cb_data)
|
2005-10-26 15:10:20 +02:00
|
|
|
{
|
2018-06-29 03:21:51 +02:00
|
|
|
struct object *o = parse_object(the_repository, oid);
|
2007-02-17 19:22:35 +01:00
|
|
|
struct name_ref_data *data = cb_data;
|
2013-06-18 14:35:31 +02:00
|
|
|
int can_abbreviate_output = data->tags_only && data->name_only;
|
2005-10-26 15:10:20 +02:00
|
|
|
int deref = 0;
|
2020-02-05 18:50:23 +01:00
|
|
|
int from_tag = 0;
|
|
|
|
struct commit *commit = NULL;
|
2017-04-26 21:29:31 +02:00
|
|
|
timestamp_t taggerdate = TIME_MAX;
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2013-11-30 21:55:40 +01:00
|
|
|
if (data->tags_only && !starts_with(path, "refs/tags/"))
|
2007-02-17 19:22:35 +01:00
|
|
|
return 0;
|
|
|
|
|
2017-01-19 00:06:06 +01:00
|
|
|
if (data->exclude_filters.nr) {
|
|
|
|
struct string_list_item *item;
|
|
|
|
|
|
|
|
for_each_string_list_item(item, &data->exclude_filters) {
|
|
|
|
if (subpath_matches(path, item->string) >= 0)
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-01-19 00:06:05 +01:00
|
|
|
if (data->ref_filters.nr) {
|
|
|
|
struct string_list_item *item;
|
|
|
|
int matched = 0;
|
|
|
|
|
|
|
|
/* See if any of the patterns match. */
|
|
|
|
for_each_string_list_item(item, &data->ref_filters) {
|
|
|
|
/*
|
|
|
|
* Check all patterns even after finding a match, so
|
|
|
|
* that we can see if a match with a subpath exists.
|
|
|
|
* When a user asked for 'refs/tags/v*' and 'v1.*',
|
|
|
|
* both of which match, the user is showing her
|
|
|
|
* willingness to accept a shortened output by having
|
|
|
|
* the 'v1.*' in the acceptable refnames, so we
|
|
|
|
* shouldn't stop when seeing 'refs/tags/v1.4' matches
|
|
|
|
* 'refs/tags/v*'. We should show it as 'v1.4'.
|
|
|
|
*/
|
|
|
|
switch (subpath_matches(path, item->string)) {
|
|
|
|
case -1: /* did not match */
|
|
|
|
break;
|
|
|
|
case 0: /* matched fully */
|
|
|
|
matched = 1;
|
|
|
|
break;
|
|
|
|
default: /* matched subpath */
|
|
|
|
matched = 1;
|
|
|
|
can_abbreviate_output = 1;
|
|
|
|
break;
|
|
|
|
}
|
2013-06-18 14:35:31 +02:00
|
|
|
}
|
2017-01-19 00:06:05 +01:00
|
|
|
|
|
|
|
/* If none of the patterns matched, stop now */
|
|
|
|
if (!matched)
|
|
|
|
return 0;
|
2013-06-18 14:35:31 +02:00
|
|
|
}
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2006-07-12 05:45:31 +02:00
|
|
|
while (o && o->type == OBJ_TAG) {
|
2005-10-26 15:10:20 +02:00
|
|
|
struct tag *t = (struct tag *) o;
|
|
|
|
if (!t->tagged)
|
|
|
|
break; /* broken repository */
|
2018-06-29 03:21:51 +02:00
|
|
|
o = parse_object(the_repository, &t->tagged->oid);
|
2005-10-26 15:10:20 +02:00
|
|
|
deref = 1;
|
2016-04-22 15:39:01 +02:00
|
|
|
taggerdate = t->date;
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
2006-07-12 05:45:31 +02:00
|
|
|
if (o && o->type == OBJ_COMMIT) {
|
2020-02-05 18:50:23 +01:00
|
|
|
commit = (struct commit *)o;
|
|
|
|
from_tag = starts_with(path, "refs/tags/");
|
2017-08-30 11:46:06 +02:00
|
|
|
if (taggerdate == TIME_MAX)
|
2019-11-12 11:38:12 +01:00
|
|
|
taggerdate = commit->date;
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
2020-02-05 18:50:23 +01:00
|
|
|
|
|
|
|
add_to_tip_table(oid, path, can_abbreviate_output, commit, taggerdate,
|
|
|
|
from_tag, deref);
|
2005-10-26 15:10:20 +02:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2020-02-05 18:50:23 +01:00
|
|
|
static void name_tips(void)
|
|
|
|
{
|
|
|
|
int i;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Try to set better names first, so that worse ones spread
|
|
|
|
* less.
|
|
|
|
*/
|
|
|
|
QSORT(tip_table.table, tip_table.nr, cmp_by_tag_and_age);
|
|
|
|
for (i = 0; i < tip_table.nr; i++) {
|
|
|
|
struct tip_table_entry *e = &tip_table.table[i];
|
|
|
|
if (e->commit) {
|
|
|
|
name_rev(e->commit, e->refname, e->taggerdate,
|
|
|
|
e->from_tag, e->deref);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-01-28 07:20:23 +01:00
|
|
|
static const struct object_id *nth_tip_table_ent(size_t ix, const void *table_)
|
2013-07-07 23:14:22 +02:00
|
|
|
{
|
2021-01-28 07:20:23 +01:00
|
|
|
const struct tip_table_entry *table = table_;
|
2021-01-28 07:19:42 +01:00
|
|
|
return &table[ix].oid;
|
2013-07-07 23:14:22 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
static const char *get_exact_ref_match(const struct object *o)
|
|
|
|
{
|
|
|
|
int found;
|
|
|
|
|
|
|
|
if (!tip_table.table || !tip_table.nr)
|
|
|
|
return NULL;
|
|
|
|
|
|
|
|
if (!tip_table.sorted) {
|
2016-09-29 17:27:31 +02:00
|
|
|
QSORT(tip_table.table, tip_table.nr, tipcmp);
|
2013-07-07 23:14:22 +02:00
|
|
|
tip_table.sorted = 1;
|
|
|
|
}
|
|
|
|
|
2021-01-28 07:19:42 +01:00
|
|
|
found = oid_pos(&o->oid, tip_table.table, tip_table.nr,
|
|
|
|
nth_tip_table_ent);
|
2013-07-07 23:14:22 +02:00
|
|
|
if (0 <= found)
|
|
|
|
return tip_table.table[found].refname;
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2017-03-28 21:46:44 +02:00
|
|
|
/* may return a constant string or use "buf" as scratch space */
|
|
|
|
static const char *get_rev_name(const struct object *o, struct strbuf *buf)
|
2005-10-26 15:10:20 +02:00
|
|
|
{
|
2006-06-18 03:26:18 +02:00
|
|
|
struct rev_name *n;
|
2020-02-04 22:16:10 +01:00
|
|
|
const struct commit *c;
|
2006-06-18 03:26:18 +02:00
|
|
|
|
2006-07-12 05:45:31 +02:00
|
|
|
if (o->type != OBJ_COMMIT)
|
2013-07-07 23:14:22 +02:00
|
|
|
return get_exact_ref_match(o);
|
2020-02-04 22:16:10 +01:00
|
|
|
c = (const struct commit *) o;
|
2018-05-19 07:28:26 +02:00
|
|
|
n = get_commit_rev_name(c);
|
2005-10-26 15:10:20 +02:00
|
|
|
if (!n)
|
2007-12-24 12:18:22 +01:00
|
|
|
return NULL;
|
2005-10-26 15:10:20 +02:00
|
|
|
|
|
|
|
if (!n->generation)
|
|
|
|
return n->tip_name;
|
2007-02-20 01:08:48 +01:00
|
|
|
else {
|
2017-03-28 21:46:44 +02:00
|
|
|
strbuf_reset(buf);
|
2019-11-12 11:38:11 +01:00
|
|
|
strbuf_addstr(buf, n->tip_name);
|
|
|
|
strbuf_strip_suffix(buf, "^0");
|
|
|
|
strbuf_addf(buf, "~%d", n->generation);
|
2017-03-28 21:46:44 +02:00
|
|
|
return buf->buf;
|
2007-02-20 01:08:48 +01:00
|
|
|
}
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 02:42:35 +02:00
|
|
|
|
2008-03-02 17:51:57 +01:00
|
|
|
static void show_name(const struct object *obj,
|
|
|
|
const char *caller_name,
|
|
|
|
int always, int allow_undefined, int name_only)
|
|
|
|
{
|
|
|
|
const char *name;
|
2015-11-10 03:22:28 +01:00
|
|
|
const struct object_id *oid = &obj->oid;
|
2017-03-28 21:46:44 +02:00
|
|
|
struct strbuf buf = STRBUF_INIT;
|
2008-03-02 17:51:57 +01:00
|
|
|
|
|
|
|
if (!name_only)
|
2015-11-10 03:22:28 +01:00
|
|
|
printf("%s ", caller_name ? caller_name : oid_to_hex(oid));
|
2017-03-28 21:46:44 +02:00
|
|
|
name = get_rev_name(obj, &buf);
|
2008-03-02 17:51:57 +01:00
|
|
|
if (name)
|
|
|
|
printf("%s\n", name);
|
|
|
|
else if (allow_undefined)
|
|
|
|
printf("undefined\n");
|
|
|
|
else if (always)
|
2018-03-12 03:27:30 +01:00
|
|
|
printf("%s\n", find_unique_abbrev(oid, DEFAULT_ABBREV));
|
2008-03-02 17:51:57 +01:00
|
|
|
else
|
2015-11-10 03:22:28 +01:00
|
|
|
die("cannot describe '%s'", oid_to_hex(oid));
|
2017-03-28 21:46:44 +02:00
|
|
|
strbuf_release(&buf);
|
2008-03-02 17:51:57 +01:00
|
|
|
}
|
|
|
|
|
2007-10-15 22:57:59 +02:00
|
|
|
static char const * const name_rev_usage[] = {
|
2015-01-13 08:44:47 +01:00
|
|
|
N_("git name-rev [<options>] <commit>..."),
|
|
|
|
N_("git name-rev [<options>] --all"),
|
2022-02-15 21:52:07 +01:00
|
|
|
N_("git name-rev [<options>] --annotate-stdin"),
|
2007-10-15 22:57:59 +02:00
|
|
|
NULL
|
|
|
|
};
|
|
|
|
|
2008-08-02 20:04:22 +02:00
|
|
|
static void name_rev_line(char *p, struct name_ref_data *data)
|
|
|
|
{
|
2017-03-28 21:46:44 +02:00
|
|
|
struct strbuf buf = STRBUF_INIT;
|
2019-02-19 01:05:04 +01:00
|
|
|
int counter = 0;
|
2008-08-02 20:04:22 +02:00
|
|
|
char *p_start;
|
2019-02-19 01:05:04 +01:00
|
|
|
const unsigned hexsz = the_hash_algo->hexsz;
|
|
|
|
|
2008-08-02 20:04:22 +02:00
|
|
|
for (p_start = p; *p; p++) {
|
|
|
|
#define ishex(x) (isdigit((x)) || ((x) >= 'a' && (x) <= 'f'))
|
|
|
|
if (!ishex(*p))
|
2019-02-19 01:05:04 +01:00
|
|
|
counter = 0;
|
|
|
|
else if (++counter == hexsz &&
|
2008-08-02 20:04:22 +02:00
|
|
|
!ishex(*(p+1))) {
|
2017-05-01 04:28:57 +02:00
|
|
|
struct object_id oid;
|
2008-08-02 20:04:22 +02:00
|
|
|
const char *name = NULL;
|
|
|
|
char c = *(p+1);
|
2008-08-03 15:44:33 +02:00
|
|
|
int p_len = p - p_start + 1;
|
2008-08-02 20:04:22 +02:00
|
|
|
|
2019-02-19 01:05:04 +01:00
|
|
|
counter = 0;
|
2008-08-02 20:04:22 +02:00
|
|
|
|
|
|
|
*(p+1) = 0;
|
2019-02-19 01:05:04 +01:00
|
|
|
if (!get_oid(p - (hexsz - 1), &oid)) {
|
2008-08-02 20:04:22 +02:00
|
|
|
struct object *o =
|
2019-06-20 09:41:14 +02:00
|
|
|
lookup_object(the_repository, &oid);
|
2008-08-02 20:04:22 +02:00
|
|
|
if (o)
|
2017-03-28 21:46:44 +02:00
|
|
|
name = get_rev_name(o, &buf);
|
2008-08-02 20:04:22 +02:00
|
|
|
}
|
|
|
|
*(p+1) = c;
|
|
|
|
|
|
|
|
if (!name)
|
|
|
|
continue;
|
|
|
|
|
2008-08-03 15:44:33 +02:00
|
|
|
if (data->name_only)
|
2019-02-19 01:05:04 +01:00
|
|
|
printf("%.*s%s", p_len - hexsz, p_start, name);
|
2008-08-03 15:44:33 +02:00
|
|
|
else
|
|
|
|
printf("%.*s (%s)", p_len, p_start, name);
|
2008-08-02 20:04:22 +02:00
|
|
|
p_start = p + 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* flush */
|
|
|
|
if (p_start != p)
|
|
|
|
fwrite(p_start, p - p_start, 1, stdout);
|
2017-03-28 21:46:44 +02:00
|
|
|
|
|
|
|
strbuf_release(&buf);
|
2008-08-02 20:04:22 +02:00
|
|
|
}
|
|
|
|
|
2006-08-03 17:24:35 +02:00
|
|
|
int cmd_name_rev(int argc, const char **argv, const char *prefix)
|
2005-10-26 15:10:20 +02:00
|
|
|
{
|
2010-08-29 04:04:17 +02:00
|
|
|
struct object_array revs = OBJECT_ARRAY_INIT;
|
2022-01-06 00:29:31 +01:00
|
|
|
int all = 0, annotate_stdin = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0;
|
2017-01-19 00:06:06 +01:00
|
|
|
struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP };
|
2007-10-15 22:57:59 +02:00
|
|
|
struct option opts[] = {
|
2020-08-14 03:07:12 +02:00
|
|
|
OPT_BOOL(0, "name-only", &data.name_only, N_("print only ref-based names (no object names)")),
|
2013-08-03 13:51:19 +02:00
|
|
|
OPT_BOOL(0, "tags", &data.tags_only, N_("only use tags to name the commits")),
|
2017-01-19 00:06:05 +01:00
|
|
|
OPT_STRING_LIST(0, "refs", &data.ref_filters, N_("pattern"),
|
2012-08-20 14:32:27 +02:00
|
|
|
N_("only use refs matching <pattern>")),
|
2017-01-19 00:06:06 +01:00
|
|
|
OPT_STRING_LIST(0, "exclude", &data.exclude_filters, N_("pattern"),
|
|
|
|
N_("ignore refs matching <pattern>")),
|
2007-10-15 22:57:59 +02:00
|
|
|
OPT_GROUP(""),
|
2013-08-03 13:51:19 +02:00
|
|
|
OPT_BOOL(0, "all", &all, N_("list all commits reachable from all refs")),
|
2022-01-06 00:29:31 +01:00
|
|
|
OPT_BOOL(0, "stdin", &transform_stdin, N_("deprecated: use annotate-stdin instead")),
|
|
|
|
OPT_BOOL(0, "annotate-stdin", &annotate_stdin, N_("annotate text from stdin")),
|
2013-08-03 13:51:19 +02:00
|
|
|
OPT_BOOL(0, "undefined", &allow_undefined, N_("allow to print `undefined` names (default)")),
|
|
|
|
OPT_BOOL(0, "always", &always,
|
2012-08-20 14:32:27 +02:00
|
|
|
N_("show abbreviated commit object as fallback")),
|
2013-07-18 23:46:51 +02:00
|
|
|
{
|
|
|
|
/* A Hidden OPT_BOOL */
|
|
|
|
OPTION_SET_INT, 0, "peel-tag", &peel_tag, NULL,
|
|
|
|
N_("dereference tags in the input (internal use)"),
|
|
|
|
PARSE_OPT_NOARG | PARSE_OPT_HIDDEN, NULL, 1,
|
|
|
|
},
|
2007-10-15 22:57:59 +02:00
|
|
|
OPT_END(),
|
|
|
|
};
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2018-05-19 07:28:26 +02:00
|
|
|
init_commit_rev_name(&rev_names);
|
2008-05-14 19:46:53 +02:00
|
|
|
git_config(git_default_config, NULL);
|
2009-05-23 20:53:12 +02:00
|
|
|
argc = parse_options(argc, argv, prefix, opts, name_rev_usage, 0);
|
2022-01-06 00:29:31 +01:00
|
|
|
|
|
|
|
if (transform_stdin) {
|
|
|
|
warning("--stdin is deprecated. Please use --annotate-stdin instead, "
|
|
|
|
"which is functionally equivalent.\n"
|
|
|
|
"This option will be removed in a future release.");
|
|
|
|
annotate_stdin = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (all + annotate_stdin + !!argc > 1) {
|
2007-10-15 22:57:59 +02:00
|
|
|
error("Specify either a list, or --all, not both!");
|
|
|
|
usage_with_options(name_rev_usage, opts);
|
|
|
|
}
|
2022-01-06 00:29:31 +01:00
|
|
|
if (all || annotate_stdin)
|
2022-03-12 01:00:15 +01:00
|
|
|
disable_cutoff();
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2007-10-15 22:57:59 +02:00
|
|
|
for (; argc; argc--, argv++) {
|
2017-05-01 04:28:57 +02:00
|
|
|
struct object_id oid;
|
2013-07-18 23:11:35 +02:00
|
|
|
struct object *object;
|
2005-10-26 15:10:20 +02:00
|
|
|
struct commit *commit;
|
|
|
|
|
2017-05-01 04:28:57 +02:00
|
|
|
if (get_oid(*argv, &oid)) {
|
2005-10-26 15:10:20 +02:00
|
|
|
fprintf(stderr, "Could not get sha1 for %s. Skipping.\n",
|
|
|
|
*argv);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2013-07-18 23:11:35 +02:00
|
|
|
commit = NULL;
|
2018-06-29 03:21:51 +02:00
|
|
|
object = parse_object(the_repository, &oid);
|
2013-07-18 23:11:35 +02:00
|
|
|
if (object) {
|
2018-06-29 03:22:05 +02:00
|
|
|
struct object *peeled = deref_tag(the_repository,
|
|
|
|
object, *argv, 0);
|
2013-07-18 23:11:35 +02:00
|
|
|
if (peeled && peeled->type == OBJ_COMMIT)
|
|
|
|
commit = (struct commit *)peeled;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!object) {
|
|
|
|
fprintf(stderr, "Could not get object for %s. Skipping.\n",
|
2005-10-26 15:10:20 +02:00
|
|
|
*argv);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
|
2022-03-12 01:00:15 +01:00
|
|
|
if (commit)
|
|
|
|
set_commit_cutoff(commit);
|
2013-07-18 23:46:51 +02:00
|
|
|
|
|
|
|
if (peel_tag) {
|
|
|
|
if (!commit) {
|
|
|
|
fprintf(stderr, "Could not get commit for %s. Skipping.\n",
|
|
|
|
*argv);
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
object = (struct object *)commit;
|
|
|
|
}
|
2013-07-18 23:11:35 +02:00
|
|
|
add_object_array(object, *argv, &revs);
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
|
|
|
|
2022-03-12 01:00:15 +01:00
|
|
|
adjust_cutoff_timestamp_for_slop();
|
|
|
|
|
2015-05-25 20:38:37 +02:00
|
|
|
for_each_ref(name_ref, &data);
|
2020-02-05 18:50:23 +01:00
|
|
|
name_tips();
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2022-01-06 00:29:31 +01:00
|
|
|
if (annotate_stdin) {
|
2022-01-06 00:29:32 +01:00
|
|
|
struct strbuf sb = STRBUF_INIT;
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2022-01-06 00:29:32 +01:00
|
|
|
while (strbuf_getline(&sb, stdin) != EOF) {
|
|
|
|
strbuf_addch(&sb, '\n');
|
|
|
|
name_rev_line(sb.buf, &data);
|
2005-10-26 15:10:20 +02:00
|
|
|
}
|
2022-01-06 00:29:32 +01:00
|
|
|
strbuf_release(&sb);
|
2005-10-26 15:10:20 +02:00
|
|
|
} else if (all) {
|
2006-06-30 06:38:55 +02:00
|
|
|
int i, max;
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2006-06-30 06:38:55 +02:00
|
|
|
max = get_max_object_index();
|
2008-06-06 01:31:55 +02:00
|
|
|
for (i = 0; i < max; i++) {
|
|
|
|
struct object *obj = get_indexed_object(i);
|
2011-11-16 00:51:05 +01:00
|
|
|
if (!obj || obj->type != OBJ_COMMIT)
|
2008-06-06 01:31:55 +02:00
|
|
|
continue;
|
|
|
|
show_name(obj, NULL,
|
2008-03-02 17:51:57 +01:00
|
|
|
always, allow_undefined, data.name_only);
|
2008-06-06 01:31:55 +02:00
|
|
|
}
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 02:42:35 +02:00
|
|
|
} else {
|
|
|
|
int i;
|
2008-03-02 17:51:57 +01:00
|
|
|
for (i = 0; i < revs.nr; i++)
|
|
|
|
show_name(revs.objects[i].item, revs.objects[i].name,
|
|
|
|
always, allow_undefined, data.name_only);
|
Add "named object array" concept
We've had this notion of a "object_list" for a long time, which eventually
grew a "name" member because some users (notably git-rev-list) wanted to
name each object as it is generated.
That object_list is great for some things, but it isn't all that wonderful
for others, and the "name" member is generally not used by everybody.
This patch splits the users of the object_list array up into two: the
traditional list users, who want the list-like format, and who don't
actually use or want the name. And another class of users that really used
the list as an extensible array, and generally wanted to name the objects.
The patch is fairly straightforward, but it's also biggish. Most of it
really just cleans things up: switching the revision parsing and listing
over to the array makes things like the builtin-diff usage much simpler
(we now see exactly how many members the array has, and we don't get the
objects reversed from the order they were on the command line).
One of the main reasons for doing this at all is that the malloc overhead
of the simple object list was actually pretty high, and the array is just
a lot denser. So this patch brings down memory usage by git-rev-list by
just under 3% (on top of all the other memory use optimizations) on the
mozilla archive.
It does add more lines than it removes, and more importantly, it adds a
whole new infrastructure for maintaining lists of objects, but on the
other hand, the new dynamic array code is pretty obvious. The change to
builtin-diff-tree.c shows a fairly good example of why an array interface
is sometimes more natural, and just much simpler for everybody.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 02:42:35 +02:00
|
|
|
}
|
2005-10-26 15:10:20 +02:00
|
|
|
|
2017-10-01 19:42:08 +02:00
|
|
|
UNLEAK(revs);
|
2005-10-26 15:10:20 +02:00
|
|
|
return 0;
|
|
|
|
}
|