2005-04-18 20:39:48 +02:00
|
|
|
#ifndef COMMIT_H
|
|
|
|
#define COMMIT_H
|
|
|
|
|
|
|
|
#include "object.h"
|
|
|
|
#include "tree.h"
|
2007-09-10 12:35:06 +02:00
|
|
|
#include "strbuf.h"
|
2007-04-17 01:05:10 +02:00
|
|
|
#include "decorate.h"
|
2013-03-31 18:00:14 +02:00
|
|
|
#include "gpg-interface.h"
|
teach format-patch to place other authors into in-body "From"
Format-patch generates emails with the "From" address set to the
author of each patch. If you are going to send the emails, however,
you would want to replace the author identity with yours (if they
are not the same), and bump the author identity to an in-body
header.
Normally this is handled by git-send-email, which does the
transformation before sending out the emails. However, some
workflows may not use send-email (e.g., imap-send, or a custom
script which feeds the mbox to a non-git MUA). They could each
implement this feature themselves, but getting it right is
non-trivial (one must canonicalize the identities by reversing any
RFC2047 encoding or RFC822 quoting of the headers, which has caused
many bugs in send-email over the years).
This patch takes a different approach: it teaches format-patch a
"--from" option which handles the ident check and in-body header
while it is writing out the email. It's much simpler to do at this
level (because we haven't done any quoting yet), and any workflow
based on format-patch can easily turn it on.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-03 09:08:22 +02:00
|
|
|
#include "string-list.h"
|
2017-12-12 09:55:35 +01:00
|
|
|
#include "pretty.h"
|
2018-11-01 14:46:21 +01:00
|
|
|
#include "commit-slab.h"
|
2005-04-18 20:39:48 +02:00
|
|
|
|
2018-04-10 14:56:05 +02:00
|
|
|
#define COMMIT_NOT_FROM_GRAPH 0xFFFFFFFF
|
2021-01-16 19:11:13 +01:00
|
|
|
#define GENERATION_NUMBER_INFINITY ((1ULL << 63) - 1)
|
|
|
|
#define GENERATION_NUMBER_V1_MAX 0x3FFFFFFF
|
2018-04-25 16:37:55 +02:00
|
|
|
#define GENERATION_NUMBER_ZERO 0
|
commit-graph: implement generation data chunk
As discovered by Ævar, we cannot increment graph version to
distinguish between generation numbers v1 and v2 [1]. Thus, one of
pre-requistes before implementing generation number v2 was to
distinguish between graph versions in a backwards compatible manner.
We are going to introduce a new chunk called Generation DATa chunk (or
GDAT). GDAT will store corrected committer date offsets whereas CDAT
will still store topological level.
Old Git does not understand GDAT chunk and would ignore it, reading
topological levels from CDAT. New Git can parse GDAT and take advantage
of newer generation numbers, falling back to topological levels when
GDAT chunk is missing (as it would happen with a commit-graph written
by old Git).
We introduce a test environment variable 'GIT_TEST_COMMIT_GRAPH_NO_GDAT'
which forces commit-graph file to be written without generation data
chunk to emulate a commit-graph file written by old Git.
To minimize the space required to store corrrected commit date, Git
stores corrected commit date offsets into the commit-graph file, instea
of corrected commit dates. This saves us 4 bytes per commit, decreasing
the GDAT chunk size by half, but it's possible for the offset to
overflow the 4-bytes allocated for storage. As such overflows are and
should be exceedingly rare, we use the following overflow management
scheme:
We introduce a new commit-graph chunk, Generation Data OVerflow ('GDOV')
to store corrected commit dates for commits with offsets greater than
GENERATION_NUMBER_V2_OFFSET_MAX.
If the offset is greater than GENERATION_NUMBER_V2_OFFSET_MAX, we set
the MSB of the offset and the other bits store the position of corrected
commit date in GDOV chunk, similar to how Extra Edge List is maintained.
We test the overflow-related code with the following repo history:
F - N - U
/ \
U - N - U N
\ /
N - F - N
Where the commits denoted by U have committer date of zero seconds
since Unix epoch, the commits denoted by N have committer date of
1112354055 (default committer date for the test suite) seconds since
Unix epoch and the commits denoted by F have committer date of
(2 ^ 31 - 2) seconds since Unix epoch.
The largest offset observed is 2 ^ 31, just large enough to overflow.
[1]: https://lore.kernel.org/git/87a7gdspo4.fsf@evledraar.gmail.com/
Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-16 19:11:15 +01:00
|
|
|
#define GENERATION_NUMBER_V2_OFFSET_MAX ((1ULL << 31) - 1)
|
2018-04-10 14:56:05 +02:00
|
|
|
|
2005-04-18 20:39:48 +02:00
|
|
|
struct commit_list {
|
|
|
|
struct commit *item;
|
|
|
|
struct commit_list *next;
|
|
|
|
};
|
|
|
|
|
2018-05-19 07:28:31 +02:00
|
|
|
/*
|
|
|
|
* The size of this struct matters in full repo walk operations like
|
|
|
|
* 'git clone' or 'git gc'. Consider using commit-slab to attach data
|
|
|
|
* to a commit instead of adding new fields here.
|
|
|
|
*/
|
2005-04-18 20:39:48 +02:00
|
|
|
struct commit {
|
|
|
|
struct object object;
|
2017-04-26 21:29:31 +02:00
|
|
|
timestamp_t date;
|
2005-04-18 20:39:48 +02:00
|
|
|
struct commit_list *parents;
|
2018-04-06 21:09:46 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the commit is loaded from the commit-graph file, then this
|
2019-04-16 11:33:19 +02:00
|
|
|
* member may be NULL. Only access it through repo_get_commit_tree()
|
2018-04-06 21:09:46 +02:00
|
|
|
* or get_commit_tree_oid().
|
|
|
|
*/
|
2018-04-06 21:09:32 +02:00
|
|
|
struct tree *maybe_tree;
|
2018-05-11 19:20:54 +02:00
|
|
|
unsigned int index;
|
2005-04-18 20:39:48 +02:00
|
|
|
};
|
|
|
|
|
[PATCH] Avoid wasting memory in git-rev-list
As pointed out on the list, git-rev-list can use a lot of memory.
One low-hanging fruit is to free the commit buffer for commits that we
parse. By default, parse_commit() will save away the buffer, since a lot
of cases do want it, and re-reading it continually would be unnecessary.
However, in many cases the buffer isn't actually necessary and saving it
just wastes memory.
We could just free the buffer ourselves, but especially in git-rev-list,
we actually end up using the helper functions that automatically add
parent commits to the commit lists, so we don't actually control the
commit parsing directly.
Instead, just make this behaviour of "parse_commit()" a global flag.
Maybe this is a bit tasteless, but it's very simple, and it makes a
noticable difference in memory usage.
Before the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.02system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+3714minor)pagefaults 0swaps
after the change:
[torvalds@g5 linux]$ /usr/bin/time git-rev-list v2.6.12..HEAD > /dev/null
0.26user 0.00system 0:00.27elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2433minor)pagefaults 0swaps
note how the minor faults have decreased from 3714 pages to 2433 pages.
That's all due to the fewer anonymous pages allocated to hold the comment
buffers and their metadata.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-15 23:43:17 +02:00
|
|
|
extern int save_commit_buffer;
|
2005-04-18 20:39:48 +02:00
|
|
|
extern const char *commit_type;
|
|
|
|
|
2007-04-17 01:05:10 +02:00
|
|
|
/* While we can decorate any object with a name, it's only used for commits.. */
|
|
|
|
struct name_decoration {
|
|
|
|
struct name_decoration *next;
|
2010-06-19 03:37:33 +02:00
|
|
|
int type;
|
2014-08-26 12:24:20 +02:00
|
|
|
char name[FLEX_ARRAY];
|
2007-04-17 01:05:10 +02:00
|
|
|
};
|
|
|
|
|
2014-08-26 12:23:36 +02:00
|
|
|
enum decoration_type {
|
|
|
|
DECORATION_NONE = 0,
|
|
|
|
DECORATION_REF_LOCAL,
|
|
|
|
DECORATION_REF_REMOTE,
|
|
|
|
DECORATION_REF_TAG,
|
|
|
|
DECORATION_REF_STASH,
|
|
|
|
DECORATION_REF_HEAD,
|
|
|
|
DECORATION_GRAFTED,
|
|
|
|
};
|
|
|
|
|
|
|
|
void add_name_decoration(enum decoration_type type, const char *name, struct object *obj);
|
2014-08-26 12:23:54 +02:00
|
|
|
const struct name_decoration *get_name_decoration(const struct object *obj);
|
2014-08-26 12:23:36 +02:00
|
|
|
|
2018-06-29 03:22:10 +02:00
|
|
|
struct commit *lookup_commit(struct repository *r, const struct object_id *oid);
|
2018-06-29 03:22:22 +02:00
|
|
|
struct commit *lookup_commit_reference(struct repository *r,
|
|
|
|
const struct object_id *oid);
|
2018-06-29 03:22:21 +02:00
|
|
|
struct commit *lookup_commit_reference_gently(struct repository *r,
|
2018-06-29 03:21:57 +02:00
|
|
|
const struct object_id *oid,
|
2005-08-21 11:51:10 +02:00
|
|
|
int quiet);
|
2010-11-02 20:59:07 +01:00
|
|
|
struct commit *lookup_commit_reference_by_name(const char *name);
|
2005-04-18 20:39:48 +02:00
|
|
|
|
2011-09-17 13:57:45 +02:00
|
|
|
/*
|
Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-07 00:10:10 +02:00
|
|
|
* Look up object named by "oid", dereference tag as necessary,
|
|
|
|
* get a commit and return it. If "oid" does not dereference to
|
2011-09-17 13:57:45 +02:00
|
|
|
* a commit, use ref_name to report an error and die.
|
|
|
|
*/
|
Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.
Introduce a temporary in parse_object buffer in order to convert this
function. This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted. Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.
parse_object_buffer will lose this temporary in a later patch.
This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)
@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)
@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)
@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)
@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)
@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-07 00:10:10 +02:00
|
|
|
struct commit *lookup_commit_or_die(const struct object_id *oid, const char *ref_name);
|
2011-09-17 13:57:45 +02:00
|
|
|
|
2018-06-29 03:22:13 +02:00
|
|
|
int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph);
|
2018-11-14 01:12:50 +01:00
|
|
|
int repo_parse_commit_internal(struct repository *r, struct commit *item,
|
|
|
|
int quiet_on_missing, int use_commit_graph);
|
|
|
|
int repo_parse_commit_gently(struct repository *r,
|
|
|
|
struct commit *item,
|
|
|
|
int quiet_on_missing);
|
|
|
|
static inline int repo_parse_commit(struct repository *r, struct commit *item)
|
add quieter versions of parse_{tree,commit}
When we call parse_commit, it will complain to stderr if the
object does not exist or cannot be read. This means that we
may produce useless error messages if this situation is
expected (e.g., because the object is marked UNINTERESTING,
or because revs->ignore_missing_links is set).
We can fix this by adding a new "parse_X_gently" form that
takes a flag to suppress the messages. The existing
"parse_X" form is already gentle in the sense that it
returns an error rather than dying, and we could in theory
just add a "quiet" flag to it (with existing callers passing
"0"). But doing it this way means we do not have to disturb
existing callers.
Note also that the new flag is "quiet_on_missing", and not
just "quiet". We could add a flag to suppress _all_ errors,
but besides being a more invasive change (we would have to
pass the flag down to sub-functions, too), there is a good
reason not to: we would never want to use it. Missing a
linked object is expected in some circumstances, but it is
never expected to have a malformed commit, or to get a tree
when we wanted a commit. We should always complain about
these corruptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01 11:56:26 +02:00
|
|
|
{
|
2018-11-14 01:12:50 +01:00
|
|
|
return repo_parse_commit_gently(r, item, 0);
|
add quieter versions of parse_{tree,commit}
When we call parse_commit, it will complain to stderr if the
object does not exist or cannot be read. This means that we
may produce useless error messages if this situation is
expected (e.g., because the object is marked UNINTERESTING,
or because revs->ignore_missing_links is set).
We can fix this by adding a new "parse_X_gently" form that
takes a flag to suppress the messages. The existing
"parse_X" form is already gentle in the sense that it
returns an error rather than dying, and we could in theory
just add a "quiet" flag to it (with existing callers passing
"0"). But doing it this way means we do not have to disturb
existing callers.
Note also that the new flag is "quiet_on_missing", and not
just "quiet". We could add a flag to suppress _all_ errors,
but besides being a more invasive change (we would have to
pass the flag down to sub-functions, too), there is a good
reason not to: we would never want to use it. Missing a
linked object is expected in some circumstances, but it is
never expected to have a malformed commit, or to get a tree
when we wanted a commit. We should always complain about
these corruptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-01 11:56:26 +02:00
|
|
|
}
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:33 +01:00
|
|
|
|
2021-02-01 18:15:03 +01:00
|
|
|
static inline int repo_parse_commit_no_graph(struct repository *r,
|
|
|
|
struct commit *commit)
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:33 +01:00
|
|
|
{
|
2021-02-01 18:15:03 +01:00
|
|
|
return repo_parse_commit_internal(r, commit, 0, 0);
|
commit-graph write: don't die if the existing graph is corrupt
When the commit-graph is written we end up calling
parse_commit(). This will in turn invoke code that'll consult the
existing commit-graph about the commit, if the graph is corrupted we
die.
We thus get into a state where a failing "commit-graph verify" can't
be followed-up with a "commit-graph write" if core.commitGraph=true is
set, the graph either needs to be manually removed to proceed, or
core.commitGraph needs to be set to "false".
Change the "commit-graph write" codepath to use a new
parse_commit_no_graph() helper instead of parse_commit() to avoid
this. The latter will call repo_parse_commit_internal() with
use_commit_graph=1 as seen in 177722b344 ("commit: integrate commit
graph with commit parsing", 2018-04-10).
Not using the old graph at all slows down the writing of the new graph
by some small amount, but is a sensible way to prevent an error in the
existing commit-graph from spreading.
Just fixing the current issue would be likely to result in code that's
inadvertently broken in the future. New code might use the
commit-graph at a distance. To detect such cases introduce a
"GIT_TEST_COMMIT_GRAPH_DIE_ON_LOAD" setting used when we do our
corruption tests, and test that a "write/verify" combo works after
every one of our current test cases where we now detect commit-graph
corruption.
Some of the code changes here might be strictly unnecessary, e.g. I
was unable to find cases where the parse_commit() called from
write_graph_chunk_data() didn't exit early due to
"item->object.parsed" being true in
repo_parse_commit_internal() (before the use_commit_graph=1 has any
effect). But let's also convert those cases for good measure, we do
not have exhaustive tests for all possible types of commit-graph
corruption.
This might need to be re-visited if we learn to write the commit-graph
incrementally, but probably not. Hopefully we'll just start by finding
out what commits we have in total, then read the old graph(s) to see
what they cover, and finally write a new graph file with everything
that's missing. In that case the new graph writing code just needs to
continue to use e.g. a parse_commit() that doesn't consult the
existing commit-graphs.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-25 13:08:33 +01:00
|
|
|
}
|
|
|
|
|
2018-11-14 01:12:50 +01:00
|
|
|
#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
|
|
|
|
#define parse_commit_internal(item, quiet, use) repo_parse_commit_internal(the_repository, item, quiet, use)
|
|
|
|
#define parse_commit(item) repo_parse_commit(the_repository, item)
|
|
|
|
#endif
|
|
|
|
|
2013-10-24 10:52:36 +02:00
|
|
|
void parse_commit_or_die(struct commit *item);
|
2005-04-18 20:39:48 +02:00
|
|
|
|
2018-06-29 03:22:15 +02:00
|
|
|
struct buffer_slab;
|
|
|
|
struct buffer_slab *allocate_commit_buffer_slab(void);
|
|
|
|
void free_commit_buffer_slab(struct buffer_slab *bs);
|
|
|
|
|
2014-06-10 23:40:14 +02:00
|
|
|
/*
|
|
|
|
* Associate an object buffer with the commit. The ownership of the
|
|
|
|
* memory is handed over to the commit, and must be free()-able.
|
|
|
|
*/
|
2018-06-29 03:22:16 +02:00
|
|
|
void set_commit_buffer(struct repository *r, struct commit *, void *buffer, unsigned long size);
|
2014-06-10 23:40:14 +02:00
|
|
|
|
2014-06-10 23:40:39 +02:00
|
|
|
/*
|
|
|
|
* Get any cached object buffer associated with the commit. Returns NULL
|
|
|
|
* if none. The resulting memory should not be freed.
|
|
|
|
*/
|
2018-06-29 03:22:17 +02:00
|
|
|
const void *get_cached_commit_buffer(struct repository *, const struct commit *, unsigned long *size);
|
2014-06-10 23:40:39 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Get the commit's object contents, either from cache or by reading the object
|
|
|
|
* from disk. The resulting memory should not be modified, and must be given
|
|
|
|
* to unuse_commit_buffer when the caller is done.
|
|
|
|
*/
|
2018-11-14 01:12:57 +01:00
|
|
|
const void *repo_get_commit_buffer(struct repository *r,
|
|
|
|
const struct commit *,
|
|
|
|
unsigned long *size);
|
|
|
|
#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
|
|
|
|
#define get_commit_buffer(c, s) repo_get_commit_buffer(the_repository, c, s)
|
|
|
|
#endif
|
2014-06-10 23:40:39 +02:00
|
|
|
|
|
|
|
/*
|
2019-11-05 18:07:23 +01:00
|
|
|
* Tell the commit subsystem that we are done with a particular commit buffer.
|
2014-06-10 23:40:39 +02:00
|
|
|
* The commit and buffer should be the input and return value, respectively,
|
|
|
|
* from an earlier call to get_commit_buffer. The buffer may or may not be
|
|
|
|
* freed by this call; callers should not access the memory afterwards.
|
|
|
|
*/
|
2018-11-14 01:12:58 +01:00
|
|
|
void repo_unuse_commit_buffer(struct repository *r,
|
|
|
|
const struct commit *,
|
|
|
|
const void *buffer);
|
|
|
|
#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
|
|
|
|
#define unuse_commit_buffer(c, b) repo_unuse_commit_buffer(the_repository, c, b)
|
|
|
|
#endif
|
2014-06-10 23:40:39 +02:00
|
|
|
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 00:05:37 +02:00
|
|
|
/*
|
|
|
|
* Free any cached object buffer associated with the commit.
|
|
|
|
*/
|
2018-12-15 01:09:40 +01:00
|
|
|
void free_commit_buffer(struct parsed_object_pool *pool, struct commit *);
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 00:05:37 +02:00
|
|
|
|
2019-04-16 11:33:19 +02:00
|
|
|
struct tree *repo_get_commit_tree(struct repository *, const struct commit *);
|
|
|
|
#define get_commit_tree(c) repo_get_commit_tree(the_repository, c)
|
2018-04-06 21:09:34 +02:00
|
|
|
struct object_id *get_commit_tree_oid(const struct commit *);
|
|
|
|
|
2018-05-15 23:48:42 +02:00
|
|
|
/*
|
|
|
|
* Release memory related to a commit, including the parent list and
|
|
|
|
* any cached object buffer.
|
|
|
|
*/
|
2018-12-15 01:09:40 +01:00
|
|
|
void release_commit_memory(struct parsed_object_pool *pool, struct commit *c);
|
2018-05-15 23:48:42 +02:00
|
|
|
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 00:05:37 +02:00
|
|
|
/*
|
|
|
|
* Disassociate any cached object buffer from the commit, but do not free it.
|
|
|
|
* The buffer (or NULL, if none) is returned.
|
|
|
|
*/
|
2014-06-10 23:44:13 +02:00
|
|
|
const void *detach_commit_buffer(struct commit *, unsigned long *sizep);
|
provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 00:05:37 +02:00
|
|
|
|
2010-07-22 15:18:30 +02:00
|
|
|
/* Find beginning and length of commit subject. */
|
|
|
|
int find_commit_subject(const char *commit_buffer, const char **subject);
|
|
|
|
|
2010-11-27 02:58:14 +01:00
|
|
|
struct commit_list *commit_list_insert(struct commit *item,
|
|
|
|
struct commit_list **list);
|
2012-04-25 22:35:27 +02:00
|
|
|
struct commit_list **commit_list_append(struct commit *commit,
|
|
|
|
struct commit_list **next);
|
2008-06-27 18:21:55 +02:00
|
|
|
unsigned commit_list_count(const struct commit_list *l);
|
2010-11-27 02:58:14 +01:00
|
|
|
struct commit_list *commit_list_insert_by_date(struct commit *item,
|
|
|
|
struct commit_list **list);
|
|
|
|
void commit_list_sort_by_date(struct commit_list **list);
|
2005-04-24 03:47:23 +02:00
|
|
|
|
log: use true parents for diff even when rewriting
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
|
|
|
/* Shallow copy of the input list */
|
|
|
|
struct commit_list *copy_commit_list(struct commit_list *list);
|
|
|
|
|
2005-04-18 20:39:48 +02:00
|
|
|
void free_commit_list(struct commit_list *list);
|
|
|
|
|
2017-03-01 12:37:07 +01:00
|
|
|
struct rev_info; /* in revision.h, it circularly uses enum cmit_fmt */
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int has_non_ascii(const char *text);
|
|
|
|
const char *logmsg_reencode(const struct commit *commit,
|
2019-04-29 10:28:23 +02:00
|
|
|
char **commit_encoding,
|
|
|
|
const char *output_encoding);
|
2018-11-14 01:12:59 +01:00
|
|
|
const char *repo_logmsg_reencode(struct repository *r,
|
|
|
|
const struct commit *commit,
|
|
|
|
char **commit_encoding,
|
|
|
|
const char *output_encoding);
|
|
|
|
#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
|
|
|
|
#define logmsg_reencode(c, enc, out) repo_logmsg_reencode(the_repository, c, enc, out)
|
|
|
|
#endif
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
const char *skip_blank_lines(const char *msg);
|
2005-06-01 17:34:23 +02:00
|
|
|
|
2005-04-24 03:47:23 +02:00
|
|
|
/** Removes the first commit from a list sorted by date, and adds all
|
|
|
|
* of its parents.
|
|
|
|
**/
|
2007-06-07 09:04:01 +02:00
|
|
|
struct commit *pop_most_recent_commit(struct commit_list **list,
|
2005-04-24 05:29:22 +02:00
|
|
|
unsigned int mark);
|
2005-04-24 03:47:23 +02:00
|
|
|
|
2005-06-06 17:39:40 +02:00
|
|
|
struct commit *pop_commit(struct commit_list **stack);
|
|
|
|
|
2006-01-08 03:52:42 +01:00
|
|
|
void clear_commit_marks(struct commit *commit, unsigned int mark);
|
2013-03-05 20:42:20 +01:00
|
|
|
void clear_commit_marks_many(int nr, struct commit **commit, unsigned int mark);
|
2006-01-08 03:52:42 +01:00
|
|
|
|
toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
|
|
|
|
|
|
|
enum rev_sort_order {
|
|
|
|
REV_SORT_IN_GRAPH_ORDER = 0,
|
2013-06-07 19:35:54 +02:00
|
|
|
REV_SORT_BY_COMMIT_DATE,
|
|
|
|
REV_SORT_BY_AUTHOR_DATE
|
toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
|
|
|
};
|
|
|
|
|
2005-07-06 18:39:34 +02:00
|
|
|
/*
|
|
|
|
* Performs an in-place topological sort of list supplied.
|
|
|
|
*
|
|
|
|
* invariant of resulting list is:
|
|
|
|
* a reachable from b => ord(b) < ord(a)
|
toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
|
|
|
* sort_order further specifies:
|
|
|
|
* REV_SORT_IN_GRAPH_ORDER: try to show a commit on a single-parent
|
|
|
|
* chain together.
|
|
|
|
* REV_SORT_BY_COMMIT_DATE: show eligible commits in committer-date order.
|
2005-07-06 18:39:34 +02:00
|
|
|
*/
|
toposort: rename "lifo" field
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
|
|
|
void sort_in_topological_order(struct commit_list **, enum rev_sort_order);
|
2006-04-07 08:58:51 +02:00
|
|
|
|
|
|
|
struct commit_graft {
|
2015-03-14 00:39:34 +01:00
|
|
|
struct object_id oid;
|
2006-10-30 20:09:06 +01:00
|
|
|
int nr_parent; /* < 0 if shallow commit */
|
2015-03-14 00:39:34 +01:00
|
|
|
struct object_id parent[FLEX_ARRAY]; /* more */
|
2006-04-07 08:58:51 +02:00
|
|
|
};
|
2011-08-18 14:29:35 +02:00
|
|
|
typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
|
2006-04-07 08:58:51 +02:00
|
|
|
|
2017-08-18 20:33:12 +02:00
|
|
|
struct commit_graft *read_graft_line(struct strbuf *line);
|
2020-04-30 23:11:28 +02:00
|
|
|
/* commit_graft_pos returns an index into r->parsed_objects->grafts. */
|
|
|
|
int commit_graft_pos(struct repository *r, const unsigned char *sha1);
|
2018-05-18 00:51:48 +02:00
|
|
|
int register_commit_graft(struct repository *r, struct commit_graft *, int);
|
2018-08-20 20:24:30 +02:00
|
|
|
void prepare_commit_graft(struct repository *r);
|
2018-05-18 00:51:54 +02:00
|
|
|
struct commit_graft *lookup_commit_graft(struct repository *r, const struct object_id *oid);
|
2006-04-07 08:58:51 +02:00
|
|
|
|
2018-09-05 00:00:08 +02:00
|
|
|
struct commit *get_fork_point(const char *refname, struct commit *commit);
|
|
|
|
|
2013-04-12 00:36:10 +02:00
|
|
|
/* largest positive number a signed 32-bit integer can contain */
|
2013-01-11 10:05:46 +01:00
|
|
|
#define INFINITE_DEPTH 0x7fffffff
|
|
|
|
|
2017-03-31 03:40:00 +02:00
|
|
|
struct oid_array;
|
shallow.c: the 8 steps to select new commits for .git/shallow
Suppose a fetch or push is requested between two shallow repositories
(with no history deepening or shortening). A pack that contains
necessary objects is transferred over together with .git/shallow of
the sender. The receiver has to determine whether it needs to update
.git/shallow if new refs needs new shallow comits.
The rule here is avoid updating .git/shallow by default. But we don't
want to waste the received pack. If the pack contains two refs, one
needs new shallow commits installed in .git/shallow and one does not,
we keep the latter and reject/warn about the former.
Even if .git/shallow update is allowed, we only add shallow commits
strictly necessary for the former ref (remember the sender can send
more shallow commits than necessary) and pay attention not to
accidentally cut the receiver history short (no history shortening is
asked for)
So the steps to figure out what ref need what new shallow commits are:
1. Split the sender shallow commit list into "ours" and "theirs" list
by has_sha1_file. Those that exist in current repo in "ours", the
remaining in "theirs".
2. Check the receiver .git/shallow, remove from "ours" the ones that
also exist in .git/shallow.
3. Fetch the new pack. Either install or unpack it.
4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
has_sha1_file. Obviously the new pack does not need them.
5. If the pack is kept, remove from "ours" the ones that do not exist
in the new pack.
6. Walk the new refs to answer the question "what shallow commits,
both ours and theirs, are required in .git/shallow in order to add
this ref?". Shallow commits not associated to any refs are removed
from their respective list.
7. (*) Check reachability (from the current refs) of all remaining
commits in "ours". Those reachable are removed. We do not want to
cut any part of our (reachable) history. We only check up
commits. True reachability test is done by
check_everything_connected() at the end as usual.
8. Combine the final "ours" and "theirs" and add them all to
.git/shallow. Install new refs. The case where some hook rejects
some refs on a push is explained in more detail in the push
patches.
Of these steps, #6 and #7 are expensive. Both require walking through
some commits, or in the worst case all commits. And we rather avoid
them in at least common case, where the transferred pack does not
contain any shallow commits that the sender advertises. Let's look at
each scenario:
1) the sender has longer history than the receiver
All shallow commits from the sender will be put into "theirs" list
at step 1 because none of them exists in current repo. In the
common case, "theirs" becomes empty at step 4 and exit early.
2) the sender has shorter history than the receiver
All shallow commits from the sender are likely in "ours" list at
step 1. In the common case, if the new pack is kept, we could empty
"ours" and exit early at step 5.
If the pack is not kept, we hit the expensive step 6 then exit
after "ours" is emptied. There'll be only a handful of objects to
walk in fast-forward case. If it's forced update, we may need to
walk to the bottom.
3) the sender has same .git/shallow as the receiver
This is similar to case 2 except that "ours" should be emptied at
step 2 and exit early.
A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
a shallow repo) is case 3. Luckily they're cheap for the common case.
A push from "clone --depth=X" falls into case 2, which is expensive.
Some more work may be done at the sender/client side to avoid more
work on the server side: if the transferred pack does not contain any
shallow commits, send-pack should not send any shallow commits to the
receive-pack, effectively turning it into a normal push and avoid all
steps.
This patch implements all steps except #3, already handled by
fetch-pack and receive-pack, #6 and #7, which has their own patch due
to their size.
(*) in previous versions step 7 was put before step 3. I reorder it so
that the common case that keeps the pack does not need to walk
commits at all. In future if we implement faster commit
reachability check (maybe with the help of pack bitmaps or commit
cache), step 7 could become cheap and be moved up before 6 again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:02:35 +01:00
|
|
|
struct ref;
|
2019-04-29 10:28:14 +02:00
|
|
|
int for_each_commit_graft(each_commit_graft_fn, void *);
|
2006-10-30 20:09:06 +01:00
|
|
|
|
2020-09-30 14:28:18 +02:00
|
|
|
int interactive_add(const char **argv, const char *prefix, int patch);
|
2019-04-29 10:28:14 +02:00
|
|
|
int run_add_interactive(const char *revision, const char *patch_mode,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct pathspec *pathspec);
|
2007-09-18 02:06:44 +02:00
|
|
|
|
2011-11-08 01:21:32 +01:00
|
|
|
struct commit_extra_header {
|
|
|
|
struct commit_extra_header *next;
|
|
|
|
char *key;
|
|
|
|
char *value;
|
|
|
|
size_t len;
|
|
|
|
};
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void append_merge_tag_headers(struct commit_list *parents,
|
2019-04-29 10:28:23 +02:00
|
|
|
struct commit_extra_header ***tail);
|
2011-11-08 01:21:32 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int commit_tree(const char *msg, size_t msg_len,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct object_id *tree,
|
|
|
|
struct commit_list *parents, struct object_id *ret,
|
|
|
|
const char *author, const char *sign_commit);
|
2011-11-08 01:21:32 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int commit_tree_extended(const char *msg, size_t msg_len,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct object_id *tree,
|
2020-08-17 19:40:01 +02:00
|
|
|
struct commit_list *parents, struct object_id *ret,
|
|
|
|
const char *author, const char *committer,
|
|
|
|
const char *sign_commit, struct commit_extra_header *);
|
2011-11-08 01:21:32 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
struct commit_extra_header *read_commit_extra_headers(struct commit *, const char **);
|
2011-11-09 00:38:07 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void free_commit_extra_headers(struct commit_extra_header *extra);
|
make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-02 02:05:23 +02:00
|
|
|
|
commit: provide a function to find a header in a buffer
Usually when we parse a commit, we read it line by line and
handle each individual line (e.g., parse_commit and
parse_commit_header). Sometimes, however, we only care
about extracting a single header. Code in this situation is
stuck doing an ad-hoc parse of the commit buffer.
Let's provide a reusable function to locate a header within
the commit. The code is modeled after pretty.c's
get_header, which is used to extract the encoding.
Since some callers may not have the "struct commit" to go
along with the buffer, we drop that parameter. The only
thing lost is a warning for truncated commits, but that's
OK. This shouldn't happen in practice, and even if it does,
there's no particular reason that this function needs to
complain about it. It either finds the header it was asked
for, or it doesn't (and in the latter case, the caller will
typically complain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 09:56:01 +02:00
|
|
|
/*
|
|
|
|
* Search the commit object contents given by "msg" for the header "key".
|
|
|
|
* Returns a pointer to the start of the header contents, or NULL. The length
|
|
|
|
* of the header, up to the first newline, is returned via out_len.
|
|
|
|
*
|
|
|
|
* Note that some headers (like mergetag) may be multi-line. It is the caller's
|
|
|
|
* responsibility to parse further in this case!
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
const char *find_commit_header(const char *msg, const char *key,
|
2019-04-29 10:28:23 +02:00
|
|
|
size_t *out_len);
|
commit: provide a function to find a header in a buffer
Usually when we parse a commit, we read it line by line and
handle each individual line (e.g., parse_commit and
parse_commit_header). Sometimes, however, we only care
about extracting a single header. Code in this situation is
stuck doing an ad-hoc parse of the commit buffer.
Let's provide a reusable function to locate a header within
the commit. The code is modeled after pretty.c's
get_header, which is used to extract the encoding.
Since some callers may not have the "struct commit" to go
along with the buffer, we drop that parameter. The only
thing lost is a warning for truncated commits, but that's
OK. This shouldn't happen in practice, and even if it does,
there's no particular reason that this function needs to
complain about it. It either finds the header it was asked
for, or it doesn't (and in the latter case, the caller will
typically complain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 09:56:01 +02:00
|
|
|
|
2014-11-09 10:23:41 +01:00
|
|
|
/* Find the end of the log message, the right place for a new trailer. */
|
2019-04-29 10:28:14 +02:00
|
|
|
size_t ignore_non_trailer(const char *buf, size_t len);
|
2014-11-09 10:23:41 +01:00
|
|
|
|
2018-04-25 11:54:04 +02:00
|
|
|
typedef int (*each_mergetag_fn)(struct commit *commit, struct commit_extra_header *extra,
|
2019-04-29 10:28:23 +02:00
|
|
|
void *cb_data);
|
2014-07-07 08:35:37 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int for_each_mergetag(each_mergetag_fn fn, struct commit *commit, void *data);
|
2014-07-07 08:35:37 +02:00
|
|
|
|
2011-11-07 22:26:22 +01:00
|
|
|
struct merge_remote_desc {
|
|
|
|
struct object *obj; /* the named object, could be a tag */
|
2016-08-13 14:21:27 +02:00
|
|
|
char name[FLEX_ARRAY];
|
2011-11-07 22:26:22 +01:00
|
|
|
};
|
2019-04-29 10:28:14 +02:00
|
|
|
struct merge_remote_desc *merge_remote_util(struct commit *);
|
|
|
|
void set_merge_remote_desc(struct commit *commit,
|
2019-04-29 10:28:23 +02:00
|
|
|
const char *name, struct object *obj);
|
2011-11-07 22:26:22 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Given "name" from the command line to merge, find the commit object
|
|
|
|
* and return it, while storing merge_remote_desc in its ->util field,
|
|
|
|
* to allow callers to tell if we are told to merge a tag.
|
|
|
|
*/
|
|
|
|
struct commit *get_merge_parent(const char *name);
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int parse_signed_commit(const struct commit *commit,
|
2019-04-29 10:28:23 +02:00
|
|
|
struct strbuf *message, struct strbuf *signature);
|
2019-04-29 10:28:14 +02:00
|
|
|
int remove_signature(struct strbuf *buf);
|
2014-07-19 17:01:12 +02:00
|
|
|
|
2013-03-31 18:00:14 +02:00
|
|
|
/*
|
2013-03-31 18:02:46 +02:00
|
|
|
* Check the signature of the given commit. The result of the check is stored
|
|
|
|
* in sig->check_result, 'G' for a good signature, 'U' for a good signature
|
|
|
|
* from an untrusted signer, 'B' for a bad signature and 'N' for no signature
|
|
|
|
* at all. This may allocate memory for sig->gpg_output, sig->gpg_status,
|
|
|
|
* sig->signer and sig->key.
|
2013-03-31 18:00:14 +02:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int check_commit_signature(const struct commit *commit, struct signature_check *sigc);
|
2013-03-31 18:00:14 +02:00
|
|
|
|
2018-11-01 14:46:21 +01:00
|
|
|
/* record author-date for each commit object */
|
|
|
|
struct author_date_slab;
|
|
|
|
void record_author_date(struct author_date_slab *author_date,
|
|
|
|
struct commit *commit);
|
|
|
|
|
|
|
|
int compare_commits_by_author_date(const void *a_, const void *b_, void *unused);
|
2018-11-18 10:23:54 +01:00
|
|
|
|
2018-11-06 08:50:17 +01:00
|
|
|
/*
|
|
|
|
* Verify a single commit with check_commit_signature() and die() if it is not
|
|
|
|
* a good signature. This isn't really suitable for general use, but is a
|
|
|
|
* helper to implement consistent logic for pull/merge --verify-signatures.
|
gpg-interface: add minTrustLevel as a configuration option
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature(). If that was the case, the process die()d.
The other code paths that did signature verification relied entirely on
the return code from check_commit_signature(). And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().
This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).
The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`). These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].
The GPG documentation says the following on the TRUST_ status codes [1]:
"""
These are several similar status codes:
- TRUST_UNDEFINED <error_token>
- TRUST_NEVER <error_token>
- TRUST_MARGINAL [0 [<validation_model>]]
- TRUST_FULLY [0 [<validation_model>]]
- TRUST_ULTIMATE [0 [<validation_model>]]
For good signatures one of these status lines are emitted to
indicate the validity of the key used to create the signature.
The error token values are currently only emitted by gpgsm.
"""
My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature. That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.
The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).
I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).
I also think it makes sense to not store the trust level in the same
struct member as the key/signature status. While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.
This patch introduces a new configuration option: gpg.minTrustLevel. It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.
Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced. If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.
Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure. A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.
Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature(). This would also have made the
behavior consistent with other parts of git that perform signature
verification. However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case. For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].
[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] https://github.com/QubesOS/qubes-builder/blob/9674c1991deef45b1a1b1c71fddfab14ba50dccf/scripts/verify-git-tag#L43
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-27 14:55:57 +01:00
|
|
|
*
|
|
|
|
* The check_trust parameter is meant for backward-compatibility. The GPG
|
|
|
|
* interface verifies key trust with a default trust level that is below the
|
|
|
|
* default trust level for merge operations. Its value should be non-zero if
|
|
|
|
* the user hasn't set a minimum trust level explicitly in their configuration.
|
|
|
|
*
|
|
|
|
* If the user has set a minimum trust level, then that value should be obeyed
|
|
|
|
* and check_trust should be zero, even if the configured trust level is below
|
|
|
|
* the default trust level for merges.
|
2018-11-06 08:50:17 +01:00
|
|
|
*/
|
gpg-interface: add minTrustLevel as a configuration option
Previously, signature verification for merge and pull operations checked
if the key had a trust-level of either TRUST_NEVER or TRUST_UNDEFINED in
verify_merge_signature(). If that was the case, the process die()d.
The other code paths that did signature verification relied entirely on
the return code from check_commit_signature(). And signatures made with
a good key, irregardless of its trust level, was considered valid by
check_commit_signature().
This difference in behavior might induce users to erroneously assume
that the trust level of a key in their keyring is always considered by
Git, even for operations where it is not (e.g. during a verify-commit or
verify-tag).
The way it worked was by gpg-interface.c storing the result from the
key/signature status *and* the lowest-two trust levels in the `result`
member of the signature_check structure (the last of these status lines
that were encountered got written to `result`). These are documented in
GPG under the subsection `General status codes` and `Key related`,
respectively [1].
The GPG documentation says the following on the TRUST_ status codes [1]:
"""
These are several similar status codes:
- TRUST_UNDEFINED <error_token>
- TRUST_NEVER <error_token>
- TRUST_MARGINAL [0 [<validation_model>]]
- TRUST_FULLY [0 [<validation_model>]]
- TRUST_ULTIMATE [0 [<validation_model>]]
For good signatures one of these status lines are emitted to
indicate the validity of the key used to create the signature.
The error token values are currently only emitted by gpgsm.
"""
My interpretation is that the trust level is conceptionally different
from the validity of the key and/or signature. That seems to also have
been the assumption of the old code in check_signature() where a result
of 'G' (as in GOODSIG) and 'U' (as in TRUST_NEVER or TRUST_UNDEFINED)
were both considered a success.
The two cases where a result of 'U' had special meaning were in
verify_merge_signature() (where this caused git to die()) and in
format_commit_one() (where it affected the output of the %G? format
specifier).
I think it makes sense to refactor the processing of TRUST_ status lines
such that users can configure a minimum trust level that is enforced
globally, rather than have individual parts of git (e.g. merge) do it
themselves (except for a grace period with backward compatibility).
I also think it makes sense to not store the trust level in the same
struct member as the key/signature status. While the presence of a
TRUST_ status code does imply that the signature is good (see the first
paragraph in the included snippet above), as far as I can tell, the
order of the status lines from GPG isn't well-defined; thus it would
seem plausible that the trust level could be overwritten with the
key/signature status if they were stored in the same member of the
signature_check structure.
This patch introduces a new configuration option: gpg.minTrustLevel. It
consolidates trust-level verification to gpg-interface.c and adds a new
`trust_level` member to the signature_check structure.
Backward-compatibility is maintained by introducing a special case in
verify_merge_signature() such that if no user-configurable
gpg.minTrustLevel is set, then the old behavior of rejecting
TRUST_UNDEFINED and TRUST_NEVER is enforced. If, on the other hand,
gpg.minTrustLevel is set, then that value overrides the old behavior.
Similarly, the %G? format specifier will continue show 'U' for
signatures made with a key that has a trust level of TRUST_UNDEFINED or
TRUST_NEVER, even though the 'U' character no longer exist in the
`result` member of the signature_check structure. A new format
specifier, %GT, is also introduced for users that want to show all
possible trust levels for a signature.
Another approach would have been to simply drop the trust-level
requirement in verify_merge_signature(). This would also have made the
behavior consistent with other parts of git that perform signature
verification. However, requiring a minimum trust level for signing keys
does seem to have a real-world use-case. For example, the build system
used by the Qubes OS project currently parses the raw output from
verify-tag in order to assert a minimum trust level for keys used to
sign git tags [2].
[1] https://git.gnupg.org/cgi-bin/gitweb.cgi?p=gnupg.git;a=blob;f=doc/doc/DETAILS;h=bd00006e933ac56719b1edd2478ecd79273eae72;hb=refs/heads/master
[2] https://github.com/QubesOS/qubes-builder/blob/9674c1991deef45b1a1b1c71fddfab14ba50dccf/scripts/verify-git-tag#L43
Signed-off-by: Hans Jerry Illikainen <hji@dyntopia.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-12-27 14:55:57 +01:00
|
|
|
void verify_merge_signature(struct commit *commit, int verbose,
|
|
|
|
int check_trust);
|
2018-11-06 08:50:17 +01:00
|
|
|
|
2013-07-02 08:21:48 +02:00
|
|
|
int compare_commits_by_commit_date(const void *a_, const void *b_, void *unused);
|
2018-05-01 14:47:11 +02:00
|
|
|
int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused);
|
2013-07-02 08:21:48 +02:00
|
|
|
|
2014-03-18 11:00:53 +01:00
|
|
|
LAST_ARG_MUST_BE_NULL
|
2019-04-29 10:28:20 +02:00
|
|
|
int run_commit_hook(int editor_is_used, const char *index_file, const char *name, ...);
|
2014-03-18 11:00:53 +01:00
|
|
|
|
2005-04-18 20:39:48 +02:00
|
|
|
#endif /* COMMIT_H */
|