Code clean-up to avoid using a variable string that compilers may
feel untrustable as printf-style format given to write_file()
helper function.
* jk/printf-format:
commit.c: remove print_commit_list()
avoid using sha1_to_hex output as printf format
walker: let walker_say take arbitrary formats
"git commit --amend --allow-empty-message -S" for a commit without
any message body could have misidentified where the header of the
commit object ends.
* js/sign-empty-commit-fix:
commit -S: avoid invalid pointer with empty message
A helper function that takes the contents of a commit object and
finds its subject line did not ignore leading blank lines, as is
commonly done by other codepaths. Make it ignore leading blank
lines to match.
* js/find-commit-subject-ignore-leading-blanks:
reset --hard: skip blank lines when reporting the commit subject
sequencer: use skip_blank_lines() to find the commit subject
commit -C: skip blank lines at the beginning of the message
commit.c: make find_commit_subject() more robust
pretty: make the skip_blank_lines() function public
The helper function tries to offer a way to conveniently show the
last one differently from others, presumably to allow you to say
something like
A, B, and C.
while iterating over a list that has these three elements.
However, there is only one caller, and it passes the same format
string "%s\n" for both the last one and the other ones. Retire the
helper function and update the caller with a simplified version.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While it is not recommended, fsck.c says:
Not having a body is not a crime [...]
... which means that we cannot assume that the commit buffer
contains an empty line to separate header from body. A commit
object with only a header without any body, not even without
a blank line after the header, is valid.
So let's tread carefully here. strstr("\n\n") may find nothing
and return NULL.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just like the pretty printing machinery, we should simply ignore
blank lines at the beginning of the commit messages.
This discrepancy was noticed when an early version of the
rebase--helper produced commit objects with more than one empty line
between the header and the commit message.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
struct object is one of the major data structures dealing with object
IDs. Convert it to use struct object_id instead of an unsigned char
array. Convert get_object_hash to refer to the new member as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Instead of open-coding the function pop_commit() just call it. This
makes the intent clearer and reduces code size.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Memory use reduction when commit-slab facility is used to annotate
sparsely (which is not recommended in the first place).
* jc/commit-slab:
commit-slab: introduce slabname##_peek() function
"git verify-tag" and "git verify-commit" have been taught to share
more code, and then learned to optionally show the verification
message from the underlying GPG implementation.
* bc/gpg-verify-raw:
verify-tag: add option to print raw gpg status information
verify-commit: add option to print raw gpg status information
gpg: centralize printing signature buffers
gpg: centralize signature check
verify-commit: add test for exit status on untrusted signature
verify-tag: share code with verify-commit
verify-tag: add tests
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.
* jk/squelch-missing-link-warning-for-unreachable:
suppress errors on missing UNINTERESTING links
silence broken link warnings with revs->ignore_missing_links
add quieter versions of parse_{tree,commit}
verify-commit and verify-tag both share a central codepath for verifying
commits: check_signature. However, verify-tag exited successfully for
untrusted signature, while verify-commit exited unsuccessfully.
Centralize this signature check and make verify-commit adopt the older
verify-tag behavior. This behavior is more logical anyway, as the
signature is in fact valid, whether or not there's a path of trust to
the author.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
verify-tag was executing an entirely different codepath than
verify-commit, except for the underlying verify_signed_buffer. Move
much of the code from check_commit_signature to a generic
check_signature function and adjust both codepaths to call it.
Update verify-tag to explicitly output the signature text, as we now
call verify_signed_buffer with strbufs to catch the output, which
prevents it from being printed automatically.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.
* jk/squelch-missing-link-warning-for-unreachable:
suppress errors on missing UNINTERESTING links
silence broken link warnings with revs->ignore_missing_links
add quieter versions of parse_{tree,commit}
When we call parse_commit, it will complain to stderr if the
object does not exist or cannot be read. This means that we
may produce useless error messages if this situation is
expected (e.g., because the object is marked UNINTERESTING,
or because revs->ignore_missing_links is set).
We can fix this by adding a new "parse_X_gently" form that
takes a flag to suppress the messages. The existing
"parse_X" form is already gentle in the sense that it
returns an error rather than dying, and we could in theory
just add a "quiet" flag to it (with existing callers passing
"0"). But doing it this way means we do not have to disturb
existing callers.
Note also that the new flag is "quiet_on_missing", and not
just "quiet". We could add a flag to suppress _all_ errors,
but besides being a more invasive change (we would have to
pass the flag down to sub-functions, too), there is a good
reason not to: we would never want to use it. Missing a
linked object is expected in some circumstances, but it is
never expected to have a malformed commit, or to get a tree
when we wanted a commit. We should always complain about
these corruptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no API to ask "Does this commit have associated data in
slab?". If an application wants to (1) parse just a few commits at
the beginning of a process, (2) store data for only these commits,
and then (3) start processing many commits, taking into account the
data stored (for a few of them) in the slab, the application would
use slabname##_at() to allocate a space to store data in (2), but
there is no API other than slabname##_at() to use in step (3). This
allocates and wastes new space for these commits the caller is only
interested in checking if they have data stored in step (2).
Introduce slabname##_peek(), which is similar to slabname##_at() but
returns NULL when there is no data already associated to it in such
a use case.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct commit_graft and necessary local parts of commit.c.
Also, convert several constants based on the hex length of an SHA-1 to
use GIT_SHA1_HEXSZ, and move several magic constants into variables for
readability.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_merge_bases*() API was easy to misuse by careless
copy&paste coders, leaving object flags tainted in the commits that
needed to be traversed.
* jc/merge-bases:
get_merge_bases(): always clean-up object flags
bisect: clean flags after checking merge bases
"git interpret-trailers" learned to properly handle the
"Conflicts:" block at the end.
* cc/interpret-trailers-more:
trailer: add test with an old style conflict block
trailer: reuse ignore_non_trailer() to ignore conflict lines
commit: make ignore_non_trailer() non static
merge & sequencer: turn "Conflicts:" hint into a comment
builtin/commit.c: extract ignore_non_trailer() helper function
merge & sequencer: unify codepaths that write "Conflicts:" hint
builtin/merge.c: drop a parameter that is never used
The callers of get_merge_bases() can choose to leave object flags
used during the merge-base traversal by passing cleanup=0 as a
parameter, but in practice a very few callers can afford to do so
(namely, "git merge-base"), as they need to compute merge base in
preparation for other processing of their own and they need to see
the object without contaminate flags.
Change the function signature of get_merge_bases_many() and
get_merge_bases() to drop the cleanup parameter, so that the
majority of the callers do not have to say ", 1" at the end.
Give a new get_merge_bases_many_dirty() API to support only a few
callers that know they do not need to spend cycles cleaning up the
object flags.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow "git push" request to be signed, so that it can be verified and
audited, using the GPG signature of the person who pushed, that the
tips of branches at a public repository really point the commits
the pusher wanted to, without having to "trust" the server.
* jc/push-cert: (24 commits)
receive-pack::hmac_sha1(): copy the entire SHA-1 hash out
signed push: allow stale nonce in stateless mode
signed push: teach smart-HTTP to pass "git push --signed" around
signed push: fortify against replay attacks
signed push: add "pushee" header to push certificate
signed push: remove duplicated protocol info
send-pack: send feature request on push-cert packet
receive-pack: GPG-validate push certificates
push: the beginning of "git push --signed"
pack-protocol doc: typofix for PKT-LINE
gpg-interface: move parse_signature() to where it should be
gpg-interface: move parse_gpg_output() to where it should be
send-pack: clarify that cmds_sent is a boolean
send-pack: refactor inspecting and resetting status and sending commands
send-pack: rename "new_refs" to "need_pack_data"
receive-pack: factor out capability string generation
send-pack: factor out capability string generation
send-pack: always send capabilities
send-pack: refactor decision to send update per ref
send-pack: move REF_STATUS_REJECT_NODELETE logic a bit higher
...
Earlier, ffb6d7d5 (Move commit GPG signature verification to
commit.c, 2013-03-31) moved this helper that used to be in pretty.c
(i.e. the output code path) to commit.c for better reusability.
It was a good first step in the right direction, but still suffers
from a myopic view that commits will be the only thing we would ever
want to sign---we would actually want to be able to reuse it even
wider.
The function interprets what GPG said; gpg-interface is obviously a
better place. Move it there.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This saves us some manual parsing and makes the code more
readable.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we hit the end-of-header without finding an "author"
line, we just return from the function. We should jump to
the fail_exit path to clean up the buffer that we may have
allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually when we parse a commit, we read it line by line and
handle each individual line (e.g., parse_commit and
parse_commit_header). Sometimes, however, we only care
about extracting a single header. Code in this situation is
stuck doing an ad-hoc parse of the commit buffer.
Let's provide a reusable function to locate a header within
the commit. The code is modeled after pretty.c's
get_header, which is used to extract the encoding.
Since some callers may not have the "struct commit" to go
along with the buffer, we drop that parameter. The only
thing lost is a warning for truncated commits, but that's
OK. This shouldn't happen in practice, and even if it does,
there's no particular reason that this function needs to
complain about it. It either finds the header it was asked
for, or it doesn't (and in the latter case, the caller will
typically complain).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:
1. Look for an existing object struct. If we don't have
one, allocate and return a new one.
2. Double check that any object we have is the expected
type (and complain and return NULL otherwise).
3. Convert an object with type OBJ_NONE (from a prior
call to lookup_unknown_object) to the expected type.
We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.
Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.
Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:
- for lookup_{commit,tree,etc} functions, we are just
pulling steps 2 and 3 into a function that does the same
thing.
- for the call in peel_object, we currently only do step 3
(but we want to consolidate it with the others, as
mentioned above). However, step 2 is a noop here, as the
surrounding conditional makes sure we have OBJ_NONE
(which we want to keep to avoid an extraneous call to
sha1_object_info).
- for the call in lookup_commit_reference_gently, we are
currently doing step 2 but not step 3. However, step 3
is a noop here. The object we got will have just come
from deref_tag, which must have figured out the type for
each object in order to know when to stop peeling.
Therefore the type will never be OBJ_NONE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "struct object" type implements basic object
polymorphism. Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum. This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).
In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).
We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.
This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git replace" learned a "--graft" option to rewrite parents of a
commit.
* cc/replace-graft:
replace: add test for --graft with a mergetag
replace: check mergetags when using --graft
replace: add test for --graft with signed commit
replace: remove signature when using --graft
contrib: add convert-grafts-to-replace-refs.sh
Documentation: replace: add --graft option
replace: add test for --graft
replace: add --graft option
replace: cleanup redirection style in tests
* jk/stable-prio-queue:
t5539: update a flaky test
paint_down_to_common: use prio_queue
prio-queue: make output stable with respect to insertion
prio-queue: factor out compare and swap operations
* rs/code-cleaning:
remote-testsvn: use internal argv_array of struct child_process in cmd_import()
bundle: use internal argv_array of struct child_process in create_bundle()
fast-import: use hashcmp() for SHA1 hash comparison
transport: simplify fetch_objs_via_rsync() using argv_array
run-command: use internal argv_array of struct child_process in run_hook_ve()
use commit_list_count() to count the members of commit_lists
strbuf: use strbuf_addstr() for adding C strings
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.
* jk/alloc-commit-id:
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
It could be misleading to keep a signature in a
replacement commit, so let's remove it.
Note that there should probably be a way to sign
the replacement commit created when using --graft,
but this can be dealt with in another commit or
patch series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call commit_list_count() instead of open-coding it repeatedly.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/code-cleaning:
fsck: simplify fsck_commit_buffer() by using commit_list_count()
commit: use commit_list_append() instead of duplicating its code
merge: simplify merge_trivial() by using commit_list_append()
use strbuf_addch for adding single characters
use strbuf_addbuf for adding strbufs
When we are traversing to find merge bases, we keep our
usual commit_list of commits to process, sorted by their
commit timestamp. As we add each parent to the list, we have
to spend "O(width of history)" to do the insertion, where
the width of history is the number of simultaneous lines of
development.
If we instead use a heap-based priority queue, we can do
these insertions in "O(log width)" time. This provides minor
speedups to merge-base calculations (timings in linux.git,
warm cache, best-of-five):
[before]
$ git merge-base HEAD v2.6.12
real 0m3.251s
user 0m3.148s
sys 0m0.104s
[after]
$ git merge-base HEAD v2.6.12
real 0m3.234s
user 0m3.108s
sys 0m0.128s
That's only an 0.5% speedup, but it does help protect us
against pathological cases.
While we are munging the "interesting" function, we also
take the opportunity to give it a more descriptive name, and
convert the return value to an int (we returned the first
interesting commit, but nobody ever looked at it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:
1. Look for an existing object struct. If we don't have
one, allocate and return a new one.
2. Double check that any object we have is the expected
type (and complain and return NULL otherwise).
3. Convert an object with type OBJ_NONE (from a prior
call to lookup_unknown_object) to the expected type.
We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.
Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.
Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:
- for lookup_{commit,tree,etc} functions, we are just
pulling steps 2 and 3 into a function that does the same
thing.
- for the call in peel_object, we currently only do step 3
(but we want to consolidate it with the others, as
mentioned above). However, step 2 is a noop here, as the
surrounding conditional makes sure we have OBJ_NONE
(which we want to keep to avoid an extraneous call to
sha1_object_info).
- for the call in lookup_commit_reference_gently, we are
currently doing step 2 but not step 3. However, step 3
is a noop here. The object we got will have just come
from deref_tag, which must have figured out the type for
each object in order to know when to stop peeling.
Therefore the type will never be OBJ_NONE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "struct object" type implements basic object
polymorphism. Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum. This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).
In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).
We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.
This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add 'verify-commit' to be used in a way similar to 'verify-tag' is
used. Further work on verifying the mergetags might be needed.
* mg/verify-commit:
t7510: test verify-commit
t7510: exit for loop with test result
verify-commit: scriptable commit signature verification
gpg-interface: provide access to the payload
gpg-interface: provide clear helper for struct signature_check
* jk/skip-prefix:
http-push: refactor parsing of remote object names
imap-send: use skip_prefix instead of using magic numbers
use skip_prefix to avoid repeated calculations
git: avoid magic number with skip_prefix
fetch-pack: refactor parsing in get_ack
fast-import: refactor parsing of spaces
stat_opt: check extra strlen call
daemon: use skip_prefix to avoid magic numbers
fast-import: use skip_prefix for parsing input
use skip_prefix to avoid repeating strings
use skip_prefix to avoid magic numbers
transport-helper: avoid reading past end-of-string
fast-import: fix read of uninitialized argv memory
apply: use skip_prefix instead of raw addition
refactor skip_prefix to return a boolean
avoid using skip_prefix as a boolean
daemon: mark some strings as const
parse_diff_color_slot: drop ofs parameter
In the same way as there is for_each_ref() to iterate on refs,
for_each_mergetag() allows the caller to iterate on the mergetags of
a given commit. Use it to rewrite show_mergetag() used in "git log".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths. Use this to optimize the code paths to
validate GPG signatures in commit objects.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
In contrast to tag signatures, commit signatures are put into the
header, that is between the other header parts and commit messages.
Provide access to the commit content sans the signature, which is the
payload that is actually signed. Commit signature verification does the
parsing anyways, and callers may wish to act on or display the commit
object sans the signature.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The skip_prefix() function returns a pointer to the content
past the prefix, or NULL if the prefix was not found. While
this is nice and simple, in practice it makes it hard to use
for two reasons:
1. When you want to conditionally skip or keep the string
as-is, you have to introduce a temporary variable.
For example:
tmp = skip_prefix(buf, "foo");
if (tmp)
buf = tmp;
2. It is verbose to check the outcome in a conditional, as
you need extra parentheses to silence compiler
warnings. For example:
if ((cp = skip_prefix(buf, "foo"))
/* do something with cp */
Both of these make it harder to use for long if-chains, and
we tend to use starts_with() instead. However, the first line
of "do something" is often to then skip forward in buf past
the prefix, either using a magic constant or with an extra
strlen(3) (which is generally computed at compile time, but
means we are repeating ourselves).
This patch refactors skip_prefix() to return a simple boolean,
and to provide the pointer value as an out-parameter. If the
prefix is not found, the out-parameter is untouched. This
lets you write:
if (skip_prefix(arg, "foo ", &arg))
do_foo(arg);
else if (skip_prefix(arg, "bar ", &arg))
do_bar(arg);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call show_signature or show_mergetag, we read the
commit object fresh via read_sha1_file and reparse its
headers. However, in most cases we already have the object
data available, attached to the "struct commit". This is
partially laziness in dealing with the memory allocation
issues, but partially defensive programming, in that we
would always want to verify a clean version of the buffer
(not one that might have been munged by other users of the
commit).
However, we do not currently ever munge the commit buffer,
and not using the already-available buffer carries a fairly
big performance penalty when we are looking at a large
number of commits. Here are timings on linux.git:
[baseline, no signatures]
$ time git log >/dev/null
real 0m4.902s
user 0m4.784s
sys 0m0.120s
[before]
$ time git log --show-signature >/dev/null
real 0m14.735s
user 0m9.964s
sys 0m0.944s
[after]
$ time git log --show-signature >/dev/null
real 0m9.981s
user 0m5.260s
sys 0m0.936s
Note that our user CPU time drops almost in half, close to
the non-signature case, but we do still spend more
wall-clock and system time, presumably from dealing with
gpg.
An alternative to this is to note that most commits do not
have signatures (less than 1% in this repo), yet we pay the
re-parsing cost for every commit just to find out if it has
a mergetag or signature. If we checked that when parsing the
commit initially, we could avoid re-examining most commits
later on. Even if we did pursue that direction, however,
this would still speed up the cases where we _do_ have
signatures. So it's probably worth doing either way.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.
For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.
This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will make it easier to manage the buffer cache
independently of the "struct commit" objects. It also
shrinks "struct commit" by one pointer, which may be
helpful.
Unfortunately it does not reduce the max memory size of
something like "rev-list", because rev-list uses
get_cached_commit_buffer() to decide not to show each
commit's output (and due to the design of slab_at, accessing
the slab requires us to extend it, allocating exactly the
same number of buffer pointers we dropped from the commit
structs).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For both of these sites, we already do the "fallback to
read_sha1_file" trick. But we can shorten the code by just
using get_commit_buffer.
Note that the error cases are slightly different when
read_sha1_file fails. get_commit_buffer will die() if the
object cannot be loaded, or is a non-commit.
For get_sha1_oneline, this will almost certainly never
happen, as we will have just called parse_object (and if it
does, it's probably worth complaining about).
For record_author_date, the new behavior is probably better;
we notify the user of the error instead of silently ignoring
it. And because it's used only for sorting by author-date,
somebody examining a corrupt repo can fallback to the
regular traversal order.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many sites look at commit->buffer to get more detailed
information than what is in the parsed commit struct.
However, we sometimes drop commit->buffer to save memory,
in which case the caller would need to read the object
afresh. Some callers do this (leading to duplicated code),
and others do not (which opens the possibility of a segfault
if somebody else frees the buffer).
Let's provide a pair of helpers, "get" and "unuse", that let
callers easily get the buffer. They will use the cached
buffer when possible, and otherwise load from disk using
read_sha1_file.
Note that we also need to add a "get_cached" variant which
returns NULL when we do not have a cached buffer. At first
glance this seems to defeat the purpose of "get", which is
to always provide a return value. However, some log code
paths actually use the NULL-ness of commit->buffer as a
boolean flag to decide whether to try printing the
commit. At least for now, we want to continue supporting
that use.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now this is just a one-liner, but abstracting it will
make it easier to change later.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whenever we create a commit object via lookup_commit, we
give it a unique index to be used with the commit-slab API.
The theory is that any "struct commit" we create would
follow this code path, so any such struct would get an
index. However, callers could use alloc_commit_node()
directly (and get multiple commits with index 0).
Let's push the indexing into alloc_commit_node so that it's
hard for callers to get it wrong.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While strbufs are pretty common throughout our code, it is
more flexible for functions to take a pointer/len pair than
a strbuf. It's easy to turn a strbuf into such a pair (by
dereferencing its members), but less easy to go the other
way (you can strbuf_attach, but that has implications about
memory ownership).
This patch teaches commit_tree (and its associated callers
and sub-functions) to take such a pair for the commit
message rather than a strbuf. This makes passing the buffer
around slightly more verbose, but means we can get rid of
some dangerous strbuf_attach calls in the next patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xcalloc() takes two arguments: the number of elements and their size.
reduce_heads() passes the arguments in reverse order, passing the
size of a commit*, followed by the number of commit* to be allocated.
Rearrange them so they are in the correct order.
Signed-off-by: Brian Gesiak <modocache@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Attempts to show where a single-strand-of-pearls break in "git log"
output.
* nd/log-show-linear-break:
log: add --show-linear-break to help see non-linear history
object.h: centralize object flag allocation
While the field "flags" is mainly used by the revision walker, it is
also used in many other places. Centralize the whole flag allocation to
one place for a better overview (and easier to move flags if we have
too).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace open-coded reallocation with ALLOC_GROW() macro.
* dd/use-alloc-grow:
sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
read-cache.c: use ALLOC_GROW() in add_index_entry()
builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
attr.c: use ALLOC_GROW() in handle_attr_line()
dir.c: use ALLOC_GROW() in create_simplify()
reflog-walk.c: use ALLOC_GROW()
replace_object.c: use ALLOC_GROW() in register_replace_object()
patch-ids.c: use ALLOC_GROW() in add_commit()
diffcore-rename.c: use ALLOC_GROW()
diff.c: use ALLOC_GROW()
commit.c: use ALLOC_GROW() in register_commit_graft()
cache-tree.c: use ALLOC_GROW() in find_subtree()
bundle.c: use ALLOC_GROW() in add_to_ref_list()
builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
Replace a hand-rolled binary search with a call to our generic
binary search helper function.
* dd/find-graft-with-sha1-pos:
commit.c: use the generic "sha1_pos" function for lookup
In record_author_date() & parse_gpg_output(), the callers of
starts_with() not just want to know if the string starts with the
prefix, but also can benefit from knowing the string that follows
the prefix.
By using skip_prefix(), we can do both at the same time.
Helped-by: Max Horn <max@quendi.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor binary search in "commit_graft_pos" function: use
generic "sha1_pos" function.
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason to have a hardcoded upper limit of the number of
parents for an octopus merge, created via the graft mechanism.
* js/lift-parent-count-limit:
Remove the line length limit for graft files
pptr is needless. Some related code got cleaned as well.
Signed-off-by: Vasily Makarov <einmalfel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support for grafts predates Git's strbuf, and hence it is understandable
that there was a hard-coded line length limit of 1023 characters (which
was chosen a bit awkwardly, given that it is *exactly* one byte short of
aligning with the 41 bytes occupied by a commit name and the following
space or new-line character).
While regular commit histories hardly win comprehensibility in general
if they merge more than twenty-two branches in one go, it is not Git's
business to limit grafts in such a way.
In this particular developer's case, the use case that requires
substantially longer graft lines to be supported is the visualization of
the commits' order implied by their changes: commits are considered to
have an implicit relationship iff exchanging them in an interactive
rebase would result in merge conflicts.
Thusly implied branches tend to be very shallow in general, and the
resulting thicket of implied branches is usually very wide; It is
actually quite common that *most* of the commits in a topic branch have
not even one implied parent, so that a final merge commit has about as
many implied parents as there are commits in said branch.
[jc: squashed in tests by Jonathan]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.
* cc/starts-n-ends-with:
replace {pre,suf}fixcmp() with {starts,ends}_with()
strbuf: introduce starts_with() and ends_with()
builtin/remote: remove postfixcmp() and use suffixcmp() instead
environment: normalize use of prefixcmp() by removing " != 0"
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/robustify-parse-commit:
checkout: do not die when leaving broken detached HEAD
use parse_commit_or_die instead of custom message
use parse_commit_or_die instead of segfaulting
assume parse_commit checks for NULL commit
assume parse_commit checks commit->object.parsed
log_tree_diff: die when we fail to parse a commit
The parse_commit function will check whether it was passed a
NULL commit pointer, and if so, return an error. There is no
need for callers to check this separately.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently call parse_commit and then assume we can
dereference the resulting "tree" struct field. If parsing
failed, however, that field is NULL and we end up
segfaulting.
Instead of a segfault, let's print an error message and die
a little more gracefully.
Note that this should never happen in practice, but may
happen in a corrupt repository.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Output from "git log --full-diff -- <pathspec>" looked strange,
because comparison was done with the previous ancestor that touched
the specified <pathspec>, causing the patches for paths outside the
pathspec to show more than the single commit has changed.
Tweak "git reflog -p" for the same reason using the same mechanism.
* tr/log-full-diff-keep-true-parents:
log: use true parents for diff when walking reflogs
log: use true parents for diff even when rewriting
We wanted to catch all codepoints that ends with FFFE and FFFF,
not with 0FFFE and 0FFFF.
Noticed and corrected by Peter Krefting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using pathspec filtering in combination with diff-based log
output, parent simplification happens before the diff is computed.
The diff is therefore against the *simplified* parents.
This works okay, arguably by accident, in the normal case:
simplification reduces to one parent as long as the commit is TREESAME
to it. So the simplified parent of any given commit must have the
same tree contents on the filtered paths as its true (unfiltered)
parent.
However, --full-diff breaks this guarantee, and indeed gives pretty
spectacular results when comparing the output of
git log --graph --stat ...
git log --graph --full-diff --stat ...
(--graph internally kicks in parent simplification, much like
--parents).
To fix it, store a copy of the parent list before simplification (in a
slab) whenever --full-diff is in effect. Then use the stored parents
instead of the simplified ones in the commit display code paths. The
latter do not actually check for --full-diff to avoid duplicated code;
they just grab the original parents if save_parents() has not been
called for this revision walk.
For ordinary commits it should be obvious that this is the right thing
to do.
Merge commits are a bit subtle. Observe that with default
simplification, merge simplification is an all-or-nothing decision:
either the merge is TREESAME to one parent and disappears, or it is
different from all parents and the parent list remains intact.
Redundant parents are not pruned, so the existing code also shows them
as a merge.
So if we do show a merge commit, the parent list just consists of the
rewrite result on each parent. Running, e.g., --cc on this in
--full-diff mode is not very useful: if any commits were skipped, some
hunks will disagree with all sides of the merge (with one side,
because commits were skipped; with the others, because they didn't
have those changes in the first place). This triggers --cc showing
these hunks spuriously.
Therefore I believe that even for merge commits it is better to show
the diffs wrt. the original parents.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Logic to auto-detect character encodings in the commit log message
did not reject overlong and invalid UTF-8 characters.
* bc/commit-invalid-utf8:
commit: reject non-characters
commit: reject overlong UTF-8 sequences
commit: reject invalid UTF-8 codepoints
Unicode clause D14 defines all characters U+nFFFE and U+nFFFF (where
0 <= n <= 10h) as well as the range U+FDD0..U+FDEF as non-characters,
reserved for internal use only. Disallow these characters in commit
messages as they are normally not recommended for interchange.
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit code accepts pseudo-UTF-8 sequences that encode a character with more
bytes than necessary. Reject such sequences, since they are not valid UTF-8.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit code already contains code for validating UTF-8, but it does not
check for invalid values, such as guaranteed non-characters and surrogates. Fix
this by explicitly checking for and rejecting such characters.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helper function was introduced as a prio_queue
comparator to help topological sorting. However, other users
of prio_queue who want to replace commit_list_insert_by_date
will want to use it, too. So let's make it public.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log" learned the "--author-date-order" option, with which the
output is topologically sorted and commits in parallel histories
are shown intermixed together based on the author timestamp.
* jc/topo-author-date-sort:
t6003: add --author-date-order test
topology tests: teach a helper to set author dates as well
t6003: add --date-order test
topology tests: teach a helper to take abbreviated timestamps
t/lib-t6000: style fixes
log: --author-date-order
sort-in-topological-order: use prio-queue
prio-queue: priority queue of pointers to structs
toposort: rename "lifo" field
Allow adding custom information to commit objects in order to
represent unbound number of flag bits etc.
* jk/commit-info-slab:
commit-slab: introduce a macro to define a slab for new type
commit-slab: avoid large realloc
commit: allow associating auxiliary info on-demand
Sometimes people would want to view the commits in parallel
histories in the order of author dates, not committer dates.
Teach "topo-order" sort machinery to do so, using a commit-info slab
to record the author dates of each commit, and prio-queue to sort
them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the prio-queue data structure to implement a priority queue of
commits sorted by committer date, when handling --date-order. The
structure can also be used as a simple LIFO stack, which is a good
match for --topo-order processing.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using a single "slab" and keep reallocating it as we find
that we need to deal with commits with larger values of commit->index,
make a "slab" an array of many "slab_piece"s. Each access may need
two levels of indirections, but we only need to reallocate the first
level array of pointers when we have to grow the table this way.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "indegree" field in the commit object is only used while sorting
a list of commits in topological order, and wasting memory otherwise.
We would prefer to shrink the size of individual commit objects,
which we may have to hold thousands of in-core. We could eject
"indegree" field out from the commit object and represent it as a
dynamic table based on the decoration infrastructure, but the
decoration is meant for sparse annotation and is not a good match.
Instead, let's try a different approach.
- Assign an integer (commit->index) to each commit we keep in-core
(reuse the space of "indegree" field for it);
- When running the topological sort, allocate an array of integers
in bulk (called "slab"), use the commit->index as an index into
this array, and store the "indegree" information there.
This does _not_ reduce the memory footprint of a commit object, but
the commit->index can be used as the index to dynamically associate
commits with other kinds of information as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>