We wanted to catch all codepoints that ends with FFFE and FFFF,
not with 0FFFE and 0FFFF.
Noticed and corrected by Peter Krefting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Logic to auto-detect character encodings in the commit log message
did not reject overlong and invalid UTF-8 characters.
* bc/commit-invalid-utf8:
commit: reject non-characters
commit: reject overlong UTF-8 sequences
commit: reject invalid UTF-8 codepoints
Unicode clause D14 defines all characters U+nFFFE and U+nFFFF (where
0 <= n <= 10h) as well as the range U+FDD0..U+FDEF as non-characters,
reserved for internal use only. Disallow these characters in commit
messages as they are normally not recommended for interchange.
Signed-off-by: Peter Krefting <peter@softwolves.pp.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit code accepts pseudo-UTF-8 sequences that encode a character with more
bytes than necessary. Reject such sequences, since they are not valid UTF-8.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit code already contains code for validating UTF-8, but it does not
check for invalid values, such as guaranteed non-characters and surrogates. Fix
this by explicitly checking for and rejecting such characters.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helper function was introduced as a prio_queue
comparator to help topological sorting. However, other users
of prio_queue who want to replace commit_list_insert_by_date
will want to use it, too. So let's make it public.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log" learned the "--author-date-order" option, with which the
output is topologically sorted and commits in parallel histories
are shown intermixed together based on the author timestamp.
* jc/topo-author-date-sort:
t6003: add --author-date-order test
topology tests: teach a helper to set author dates as well
t6003: add --date-order test
topology tests: teach a helper to take abbreviated timestamps
t/lib-t6000: style fixes
log: --author-date-order
sort-in-topological-order: use prio-queue
prio-queue: priority queue of pointers to structs
toposort: rename "lifo" field
Allow adding custom information to commit objects in order to
represent unbound number of flag bits etc.
* jk/commit-info-slab:
commit-slab: introduce a macro to define a slab for new type
commit-slab: avoid large realloc
commit: allow associating auxiliary info on-demand
Sometimes people would want to view the commits in parallel
histories in the order of author dates, not committer dates.
Teach "topo-order" sort machinery to do so, using a commit-info slab
to record the author dates of each commit, and prio-queue to sort
them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the prio-queue data structure to implement a priority queue of
commits sorted by committer date, when handling --date-order. The
structure can also be used as a simple LIFO stack, which is a good
match for --topo-order processing.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The primary invariant of sort_in_topological_order() is that a
parent commit is not emitted until all children of it are. When
traversing a forked history like this with "git log C E":
A----B----C
\
D----E
we ensure that A is emitted after all of B, C, D, and E are done, B
has to wait until C is done, and D has to wait until E is done.
In some applications, however, we would further want to control how
these child commits B, C, D and E on two parallel ancestry chains
are shown.
Most of the time, we would want to see C and B emitted together, and
then E and D, and finally A (i.e. the --topo-order output). The
"lifo" parameter of the sort_in_topological_order() function is used
to control this behaviour. We start the traversal by knowing two
commits, C and E. While keeping in mind that we also need to
inspect E later, we pick C first to inspect, and we notice and
record that B needs to be inspected. By structuring the "work to be
done" set as a LIFO stack, we ensure that B is inspected next,
before other in-flight commits we had known that we will need to
inspect, e.g. E.
When showing in --date-order, we would want to see commits ordered
by timestamps, i.e. show C, E, B and D in this order before showing
A, possibly mixing commits from two parallel histories together.
When "lifo" parameter is set to false, the function keeps the "work
to be done" set sorted in the date order to realize this semantics.
After inspecting C, we add B to the "work to be done" set, but the
next commit we inspect from the set is E which is newer than B.
The name "lifo", however, is too strongly tied to the way how the
function implements its behaviour, and does not describe what the
behaviour _means_.
Replace this field with an enum rev_sort_order, with two possible
values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and
update the existing code. The mechanical replacement rule is:
"lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE"
"lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER"
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a header file to define a macro that can define the struct
type, initializer, accessor and cleanup functions to manage a commit
slab. Update the "indegree" topological sort facility using it.
To associate 32 flag bits with each commit, you can write:
define_commit_slab(flag32, uint32);
to declare "struct flag32" type, define an instance of it with
struct flag32 flags;
and initialize it by calling
init_flag32(&flags);
After that, a call to flag32_at() function
uint32 *fp = flag32_at(&flags, commit);
will return a pointer pointing at a uint32 for that commit. Once
you are done with these flags, clean them up with
clear_flag32(&flags);
Callers that cannot hard-code how wide the data to be associated
with the commit be at compile time can use the "_with_stride"
variant to initialize the slab.
Suppose you want to give one bit per existing ref, and paint commits
down to find which refs are descendants of each commit. Saying
typedef uint32 bits320[5];
define_commit_slab(flagbits, bits320);
at compile time will still limit your code with hard-coded limit,
because you may find that you have more than 320 refs at runtime.
The code can declare a commit slab "struct flagbits" like this
instead:
define_commit_slab(flagbits, unsigned char);
struct flagbits flags;
and initialize it by:
nrefs = ... count number of refs ...
init_flagbits_with_stride(&flags, (nrefs + 7) / 8);
so that
unsigned char *fp = flagbits_at(&flags, commit);
will return a pointer pointing at an array of 40 "unsigned char"s
associated with the commit, once you figure out nrefs is 320 at
runtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using a single "slab" and keep reallocating it as we find
that we need to deal with commits with larger values of commit->index,
make a "slab" an array of many "slab_piece"s. Each access may need
two levels of indirections, but we only need to reallocate the first
level array of pointers when we have to grow the table this way.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "indegree" field in the commit object is only used while sorting
a list of commits in topological order, and wasting memory otherwise.
We would prefer to shrink the size of individual commit objects,
which we may have to hold thousands of in-core. We could eject
"indegree" field out from the commit object and represent it as a
dynamic table based on the decoration infrastructure, but the
decoration is meant for sparse annotation and is not a good match.
Instead, let's try a different approach.
- Assign an integer (commit->index) to each commit we keep in-core
(reuse the space of "indegree" field for it);
- When running the topological sort, allocate an array of integers
in bulk (called "slab"), use the commit->index as an index into
this array, and store the "indegree" information there.
This does _not_ reduce the memory footprint of a commit object, but
the commit->index can be used as the index to dynamically associate
commits with other kinds of information as needed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Correct common spelling mistakes in comments and tests
kwset: fix spelling in comments
precompose-utf8: fix spelling of "want" in error message
compat/nedmalloc: fix spelling in comments
compat/regex: fix spelling and grammar in comments
obstack: fix spelling of similar
contrib/subtree: fix spelling of accidentally
git-remote-mediawiki: spelling fixes
doc: various spelling fixes
fast-export: fix argument name in error messages
Documentation: distinguish between ref and offset deltas in pack-format
i18n: make the translation of -u advice in one go
Most of these were found using Lucas De Marchi's codespell tool.
Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --verify-signatures is specified, abort the merge in case a good
GPG signature from an untrusted key is encountered.
Signed-off-by: Sebastian Götte <jaseg@physik-pool.tu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to in_merge_bases(commit, other) that returns true when
commit is an ancestor (i.e. in the merge bases between the two) of
the other commit, in_merge_bases_many(commit, n_other, other[])
checks if commit is an ancestor of any of the other[] commits.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
clear_commit_marks(struct commit *, unsigned) only can clear flag
bits starting from a single commit; introduce an API to allow
feeding an array of commits, so that flag bits can be cleared from
commits reachable from any of them with a single traversal.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is used by bisect.c, part of libgit.a while it stays in
builtin/rev-list.c. Move it to commit.c so that we won't get undefined
reference if a program that uses libgit.a happens to pull it in.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
"git fmt-merge-msg" (an internal helper reduce_heads() it uses) had
a severe performance regression; an empty "git pull" took forever to
finish as the result.
* jc/merge-bases-paint-fix:
paint_down_to_common(): parse commit before relying on its timestamp
When refactoring the merge-base computation to reduce the pairwise
O(n*(n-1)) traversals to parallel O(n) traversals, the code forgot
that timestamp based heuristics needs each commit to have been
parsed. This caused an empty "git pull" to spend cycles, traversing
the history all the way down to 0 (because an unparsed commit object
has 0 timestamp, and any other commit object with positive timestamp
will be processed for its parents, all getting parsed), only to come
up with a merge message to be used.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimise the "merge-base" computation a bit, and also update its
users that do not need the full merge-base information to call a
cheaper subset.
* jc/merge-bases:
reduce_heads(): reimplement on top of remove_redundant()
merge-base: "--is-ancestor A B"
get_merge_bases_many(): walk from many tips in parallel
in_merge_bases(): use paint_down_to_common()
merge_bases_many(): split out the logic to paint history
in_merge_bases(): omit unnecessary redundant common ancestor reduction
http-push: use in_merge_bases() for fast-forward check
receive-pack: use in_merge_bases() for fast-forward check
in_merge_bases(): support only one "other" commit
Teach "git commit" and "git commit-tree" the "we are told to use
utf-8 in log message, but this does not look like utf-8---attempt to
pass it through convert-from-latin1-to-utf8 and see if it makes
sense" heuristics "git mailinfo" already uses.
* lt/commit-tree-guess-utf-8:
commit/commit-tree: correct latin1 to utf-8
This is used by "git merge" and "git merge-base --independent" but
used to use a similar N*(N-1) traversals to reject commits that are
ancestors of other commits.
Reimplement it on top of remove_redundant(). Note that the callers
of this function are allowed to pass the same commit more than once,
but remove_redundant() is designed to be fed each commit only once.
The function removes duplicates before calling remove_redundant().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_merge_bases_many() function reduces the result returned by
the merge_bases_many() function, which is a set of possible merge
bases, by excluding commits that can be reached from other commits.
We used to do N*(N-1) traversals for this, but we can check if one
commit reaches which other (N-1) commits by a single traversal, and
repeat it for all the candidates to find the answer.
Introduce remove_redundant() helper function to do this painting; we
should be able to use it to reimplement reduce_heads() as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With paint_down_to_common(), we can tell if "commit" is reachable
from "reference" by simply looking at its object flag, instead of
iterating over the merge bases.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new helper function paint_down_to_common() that takes
the same parameters as merge_bases_many(), but without the first
optimization of not painting anything when "one" is one of the
"twos" (or vice versa), and the last clean-up of removing the common
ancestor that is known to be an ancestor of another common one.
This way, the caller of the new function could tell if "one" is
reachable from any of the "twos" by simply looking at the flag bits
of "one". If (and only if) it is painted in PARENT2, it is
reachable from one of the "twos".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function get_merge_bases() needs to postprocess the result from
merge_bases_many() in order to make sure none of the commit is a
true ancestor of another commit, which is expensive. However, when
checking if a commit is an ancestor of another commit, we only need
to see if the commit is a common ancestor between the two, and do
not have to care if other common ancestors merge_bases_many() finds
are true merge bases or an ancestor of another merge base.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In early days of its life, I planned to make it possible to compute
"is a commit contained in all of these other commits?" with this
function, but it turned out that no caller needed it.
Just make it take two commit objects and add a comment to say what
these two functions do.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a line in the message is not a valid utf-8, "git mailinfo"
attempts to convert it to utf-8 assuming the input is latin1 (and
punt if it does not convert cleanly). Using the same heuristics in
"git commit" and "git commit-tree" lets the editor output be in
latin1 to make the overall system more consistent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teaches the object name parser things like a "git describe" output
is always a commit object, "A" in "git log A" must be a committish,
and "A" and "B" in "git log A...B" both must be committish, etc., to
prolong the lifetime of abbreviated object names.
* jc/sha1-name-more: (27 commits)
t1512: match the "other" object names
t1512: ignore whitespaces in wc -l output
rev-parse --disambiguate=<prefix>
rev-parse: A and B in "rev-parse A..B" refer to committish
reset: the command takes committish
commit-tree: the command wants a tree and commits
apply: --build-fake-ancestor expects blobs
sha1_name.c: add support for disambiguating other types
revision.c: the "log" family, except for "show", takes committish
revision.c: allow handle_revision_arg() to take other flags
sha1_name.c: introduce get_sha1_committish()
sha1_name.c: teach lookup context to get_sha1_with_context()
sha1_name.c: many short names can only be committish
sha1_name.c: get_sha1_1() takes lookup flags
sha1_name.c: get_describe_name() by definition groks only commits
sha1_name.c: teach get_short_sha1() a commit-only option
sha1_name.c: allow get_short_sha1() to take other flags
get_sha1(): fix error status regression
sha1_name.c: restructure disambiguation of short names
sha1_name.c: correct misnamed "canonical" and "res"
...
Many callers know that the user meant to name a committish by
syntactical positions where the object name appears. Calling this
function allows the machinery to disambiguate shorter-than-unique
abbreviated object names between committish and others.
Note that this does NOT error out when the named object is not a
committish. It is merely to give a hint to the disambiguation
machinery.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers who ask for ERROR_ON_NO_NAME are not so much
concerned that the name will be blank (because, after all,
we will fall back to using the username), but rather it is a
check to make sure that low-quality identities do not end up
in things like commit messages or emails (whereas it is OK
for them to end up in things like reflogs).
When future commits add more quality checks on the identity,
each of these callers would want to use those checks, too.
Rather than modify each of them later to add a new flag,
let's refactor the flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function commit_list_reverse() is not used anymore; delete it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function can be used in other parts of git. Give it a new home
in commit.c.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Setting up a revision traversal with many starting points was inefficient
as these were placed in a date-order priority queue one-by-one.
By René Scharfe (3) and Junio C Hamano (1)
* rs/commit-list-sort-in-batch:
mergesort: rename it to llist_mergesort()
revision: insert unsorted, then sort in prepare_revision_walk()
commit: use mergesort() in commit_list_sort_by_date()
add mergesort() for linked lists
Even though the function is generic enough, <anything>sort() inherits
connotations from the standard function qsort() that sorts an array.
Rename it to llist_mergesort() and describe the external interface in
its header file.
This incidentally avoids name clashes with mergesort() some platforms
declare in, and contaminate user namespace with, their <stdlib.h>.
Reported-by: Brian Gernhardt
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Speed up prepare_revision_walk() by adding commits without sorting
to the commit_list and at the end sort the list in one go. Thanks
to mergesort() working behind the scenes, this is a lot faster for
large numbers of commits than the current insert sort.
Also introduce and use commit_list_reverse(), to keep the ordering
of commits sharing the same commit date unchanged. That's because
commit_list_insert_by_date() sorts commits with descending date,
but adds later entries with the same date entries last, while
commit_list_insert() always inserts entries at the top. The
following commit_list_sort_by_date() keeps the order of entries
sharing the same date.
Jeff's test case, in a repo with lots of refs, was to run:
# make a new commit on top of HEAD, but not yet referenced
sha1=`git commit-tree HEAD^{tree} -p HEAD </dev/null`
# now do the same "connected" test that receive-pack would do
git rev-list --objects $sha1 --not --all
With a git.git with a ref for each revision, master needs (best of
five):
real 0m2.210s
user 0m2.188s
sys 0m0.016s
And with this patch:
real 0m0.480s
user 0m0.456s
sys 0m0.020s
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the insertion sort in commit_list_sort_by_date() with a
call to the generic mergesort function. This sets the stage for
using commit_list_sort_by_date() for larger lists, as shown in
the next patch.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/index-pack-no-recurse:
index-pack: eliminate unlimited recursion in get_base_data()
index-pack: eliminate recursion in find_unresolved_deltas
Eliminate recursion in setting/clearing marks in commit list
Recursion in a DAG is generally a bad idea because it could be very
deep. Be defensive and avoid recursion in mark_parents_uninteresting()
and clear_commit_marks().
mark_parents_uninteresting() learns a trick from clear_commit_marks()
to avoid malloc() in (dominant) single-parent case.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/show-sig:
log --show-signature: reword the common two-head merge case
log-tree: show mergetag in log --show-signature output
log-tree.c: small refactor in show_signature()
commit --amend -S: strip existing gpgsig headers
verify_signed_buffer: fix stale comment
gpg-interface: allow use of a custom GPG binary
pretty: %G[?GS] placeholders
test "commit -S" and "log --show-signature"
log: --show-signature
commit: teach --gpg-sign option
Conflicts:
builtin/commit-tree.c
builtin/commit.c
builtin/merge.c
notes-cache.c
pretty.c
Any existing commit signature was made against the contents of the old
commit, including its committer date that is about to change, and will
become invalid by amending it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/war-on-nul-in-commit:
commit_tree(): refuse commit messages that contain NULs
Convert commit_tree() to take strbuf as message
merge: abort if fails to commit
Conflicts:
builtin/commit.c
commit.c
commit.h