After a note is removed, note_tree_consolidate is called to eliminate
some useless nodes. The typical case is that if you had an int_node
with 2 PTR_TYPE_NOTEs in it, and remove one of them, then the
PTR_TYPE_INTERNAL pointer in the parent tree can be replaced with the
remaining PTR_TYPE_NOTE.
This works fine when PTR_TYPE_NOTEs are involved, but falls flat when
other types are involved.
To put things in more practical terms, let's say we start from an empty
notes tree, and add 3 notes:
- one for a sha1 that starts with 424
- one for a sha1 that starts with 428
- one for a sha1 that starts with 4c
To keep track of this, note_tree.root will have a PTR_TYPE_INTERNAL at
a[4], pointing to an int_node*.
In turn, that int_node* will have a PTR_TYPE_NOTE at a[0xc], pointing to
the leaf_node* with the key and value, and a PTR_TYPE_INTERNAL at a[2],
pointing to another int_node*.
That other int_node* will have 2 PTR_TYPE_NOTE, one at a[4] and the
other at a[8].
When looking for the note for the sha1 starting with 428, get_note() will
recurse through (simplified) root.a[4].a[2].a[8].
Now, if we remove the note for the sha1 that starts with 4c, we're left
with a int_node* with only one PTR_TYPE_INTERNAL entry in it. After
note_tree_consolidate runs, root.a[4] now points to what used to be
pointed at by root.a[4].a[2].
Which means looking up for the note for the sha1 starting with 428 now
fails because there is nothing at root.a[4].a[2] anymore: there is only
root.a[4].a[4] and root.a[4].a[8], which don't match the expected
structure for the lookup.
So if all there is left in an int_node* is a PTR_TYPE_INTERNAL pointer,
we can't safely remove it. I think the same applies for PTR_TYPE_SUBTREE
pointers. IOW, only PTR_TYPE_NOTEs are safe to be moved to the parent
int_node*.
This doesn't have a practical effect on git because all that happens
after a remove_note is a write_notes_tree, which just iterates the entire
note tree, but this affects anything using libgit.a that would try to do
lookups after removing notes.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two types of string_lists: those that own the
string memory, and those that don't. You can tell the
difference by the strdup_strings flag, and one should use
either STRING_LIST_INIT_DUP, or STRING_LIST_INIT_NODUP as an
initializer.
Historically, the normal all-zeros initialization has
corresponded to the NODUP case. Many sites use no
initializer at all, and that works as a shorthand for that
case. But for a reader of the code, it can be hard to
remember which is which. Let's be more explicit and actually
have each site declare which type it means to use.
This is a fairly mechanical conversion; I assumed each site
was correct as-is, and just switched them all to NODUP.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:
1. It automatically checks the array-size multiplication
for overflow.
2. It always uses sizeof(*array) for the element-size,
so that it can never go out of sync with the declared
type of the array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git notes merge" used to limit the source of the merged notes tree
to somewhere under refs/notes/ hierarchy, which was too limiting
when inventing a workflow to exchange notes with remote
repositories using remote-tracking notes trees (located in e.g.
refs/remote-notes/ or somesuch).
* jk/notes-merge-from-anywhere:
notes: allow merging from arbitrary references
Create a new expansion function, expand_loose_notes_ref which will first
check whether the ref can be found using get_sha1. If it can't be found
then it will fallback to using expand_notes_ref. The content of the
strbuf will not be changed if the notes ref can be located using
get_sha1. Otherwise, it may be updated as done by expand_notes_ref.
Since we now support merging from non-notes refs, remove the test case
associated with that behavior. Add a test case for merging from a
non-notes ref.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Reviewed-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
init_notes() is the main point of entry to the notes API. It ensures
that the input can be used as ref, because it needs a ref to update to
store notes tree after modifying it.
There however are many use cases where notes tree is only read, e.g.
"git log --notes=...". Any notes-shaped treeish could be used for such
purpose, but it is not allowed due to existing restriction.
Allow treeish expressions to be used in the case the notes tree is going
to be used without write "permissions". Add a flag to distinguish
whether the notes tree is intended to be used read-only, or will be
updated.
With this change, operations that use notes read-only can be fed any
notes-shaped tree-ish can be used, e.g. git log --notes=notes@{1}.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We know that a fanned-out sha1 in a notes tree cannot be
more than "aa/bb/cc/...", and we have an assert() to confirm
that. But let's factor out that length into a constant so we
can be sure it is used consistently. And even though we
assert() earlier, let's replace a strcpy with xsnprintf, so
it is clear to a reader that all cases are covered.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are loading a notes tree into our internal hash
table, we also collect any files that are clearly non-notes.
We format the name of the file into a PATH_MAX buffer, but
unlike true notes (which cannot be larger than a fanned-out
sha1 hash), these tree entries can be arbitrarily long,
overflowing our buffer.
We can fix this by switching to a strbuf. It doesn't even
cost us an extra allocation, as we can simply hand ownership
of the buffer over to the non-note struct.
This is of moderate security interest, as you might fetch
notes trees from an untrusted remote. However, we do not do
so by default, so you would have to manually fetch into the
notes namespace.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".
To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().
This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".
* jk/blame-commit-label:
blame.c: fix garbled error message
use xstrdup_or_null to replace ternary conditionals
builtin/commit.c: use xstrdup_or_null instead of envdup
builtin/apply.c: use xstrdup_or_null instead of null_strdup
git-compat-util: add xstrdup_or_null helper
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".
* jk/blame-commit-label:
blame.c: fix garbled error message
use xstrdup_or_null to replace ternary conditionals
builtin/commit.c: use xstrdup_or_null instead of envdup
builtin/apply.c: use xstrdup_or_null instead of null_strdup
git-compat-util: add xstrdup_or_null helper
This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x).
The change is fairly mechanical, with the exception of
resolve_refdup, which can eliminate a temporary variable.
There are still a few hits grepping for "?.*xstrdup", but
these are of slightly different forms and cannot be
converted (e.g., "x ? xstrdup(x->foo) : NULL").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote update --prune" to drop many refs has been optimized.
* mh/simplify-repack-without-refs:
sort_string_list(): rename to string_list_sort()
prune_remote(): iterate using for_each_string_list_item()
prune_remote(): rename local variable
repack_without_refs(): make the refnames argument a string_list
prune_remote(): sort delete_refs_list references en masse
prune_remote(): initialize both delete_refs lists in a single loop
prune_remote(): exit early if there are no stale references
The new name is more consistent with the names of other
string_list-related functions.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the user has gone through the trouble of explicitly adding an empty
note, then "git log" should not silently skip it (as if it didn't exist).
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xcalloc() takes two arguments: the number of elements and their size.
notes.c includes several calls to xcalloc() that pass the arguments in
reverse order.
Rearrange them so they are in the correct order.
Signed-off-by: Brian Gesiak <modocache@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since string_list_add_one_ref() adds refname to the string list, but
the lifetime of refname is limited, it is important that the
string_list passed to string_list_add_one_ref() has strdup_strings
set. Document this fact.
All current callers do the right thing.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emit the notes attached to the commit in "format-patch --notes"
output after three-dashes.
* jc/prettier-pretty-note:
format-patch: add a blank line between notes and diffstat
Doc User-Manual: Patch cover letter, three dashes, and --notes
Doc format-patch: clarify --notes use case
Doc notes: Include the format-patch --notes option
Doc SubmittingPatches: Mention --notes option after "cover letter"
Documentation: decribe format-patch --notes
format-patch --notes: show notes after three-dashes
format-patch: append --signature after notes
pretty_print_commit(): do not append notes message
pretty: prepare notes message at a centralized place
format_note(): simplify API
pretty: remove reencode_commit_message()
Improve the asymptotic performance of the cat_sort_uniq notes merge
strategy.
* mh/notes-string-list:
string_list_add_refs_from_colon_sep(): use string_list_split()
notes: fix handling of colon-separated values
combine_notes_cat_sort_uniq(): sort and dedup lines all at once
Initialize sort_uniq_list using named constant
string_list: add a function string_list_remove_empty_items()
Various codepaths checked if two encoding names are the same using
ad-hoc code and some of them ended up asking iconv() to convert
between "utf8" and "UTF-8". The former is not a valid way to spell
the encoding name, but often people use it by mistake, and we
equated them in some but not all codepaths. Introduce a new helper
function to make these codepaths consistent.
* jc/same-encoding:
reencode_string(): introduce and use same_encoding()
Conflicts:
builtin/mailinfo.c
It makes for simpler code than strbuf_split().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Jeff King <peff@peff.net>
The substrings output by strbuf_split() include the ':' delimiters.
When processing GIT_NOTES_DISPLAY_REF and GIT_NOTES_REWRITE_REF, strip
off the delimiter character *before* checking whether the substring is
empty rather than after, so that empty strings within the list are
also skipped.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Jeff King <peff@peff.net>
Instead of reading lines one by one and insertion-sorting them into a
string_list, read all of the lines, sort them, then remove duplicates.
Aside from being less code, this reduces the complexity from O(N^2) to
O(N lg N) in the total number of lines.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Jeff King <peff@peff.net>
Callers of reencode_string() that re-encodes a string from one
encoding to another all used ad-hoc way to bypass the case where the
input and the output encodings are the same. Some did strcmp(),
some did strcasecmp(), yet some others when converting to UTF-8 used
is_encoding_utf8().
Introduce same_encoding() helper function to make these callers use
the same logic. Notably, is_encoding_utf8() has a work-around for
common misconfiguration to use "utf8" to name UTF-8 encoding, which
does not match "UTF-8" hence strcasecmp() would not consider the
same. Make use of it in this helper function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We either stuff the notes message without modification for %N
userformat, or format it for human consumption. Using two bits
is an overkill that does not benefit anybody.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Detected by "gcc -std=iso9899:1990 ...". This patch applies against
"maint".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for more notes-related revision
command-line options.
The "suppress_default_notes" option is renamed to
"use_default_notes", and is now a tri-state with values less
than one indicating "not set". If the value is "not set",
then we show default refs if and only if no other refs were
given.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to use an extra pointer, which just ends up
leaking memory. The fact that the list is empty tells us the
same thing.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is useful for other commands besides "git
notes" which want to let users refer to notes by their
shorthand name.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jh/notes-merge: (23 commits)
Provide 'git merge --abort' as a synonym to 'git reset --merge'
cmd_merge(): Parse options before checking MERGE_HEAD
Provide 'git notes get-ref' to easily retrieve current notes ref
git notes merge: Add testcases for merging notes trees at different fanouts
git notes merge: Add another auto-resolving strategy: "cat_sort_uniq"
git notes merge: --commit should fail if underlying notes ref has moved
git notes merge: List conflicting notes in notes merge commit message
git notes merge: Manual conflict resolution, part 2/2
git notes merge: Manual conflict resolution, part 1/2
Documentation: Preliminary docs on 'git notes merge'
git notes merge: Add automatic conflict resolvers (ours, theirs, union)
git notes merge: Handle real, non-conflicting notes merges
builtin/notes.c: Refactor creation of notes commits.
git notes merge: Initial implementation handling trivial merges only
builtin/notes.c: Split notes ref DWIMmery into a separate function
notes.c: Use two newlines (instead of one) when concatenating notes
(trivial) t3303: Indent with tabs instead of spaces for consistency
notes.h/c: Propagate combine_notes_fn return value to add_note() and beyond
notes.h/c: Allow combine_notes functions to remove notes
notes.c: Reorder functions in preparation for next commit
...
Conflicts:
builtin.h
This new strategy is similar to "concatenate", but in addition to
concatenating the two note candidates, this strategy sorts the resulting
lines, and removes duplicate lines from the result. This is equivalent to
applying the "cat | sort | uniq" shell pipeline to the two note candidates.
This strategy is useful if the notes follow a line-based format where one
wants to avoid duplicate lines in the merge result.
Note that if either of the note candidates contain duplicate lines _prior_
to the merge, these will also be removed by this merge strategy.
The patch also contains tests and documentation for the new strategy.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using combine_notes_concatenate() to concatenate notes, it currently
ensures exactly one newline character between the given notes. However,
when using builtin/notes.c:create_note() to concatenate notes (e.g. by
'git notes append'), it adds a newline character to the trailing newline
of the preceding notes object, thus resulting in _two_ newlines (aka. a
blank line) separating contents of the two notes.
This patch brings combine_notes_concatenate() into consistency with
builtin/notes.c:create_note(), by ensuring exactly _two_ newline characters
between concatenated notes.
The patch also changes a few notes-related selftests accordingly.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The combine_notes_fn functions uses a non-zero return value to indicate
failure. However, this return value was converted to a call to die()
in note_tree_insert().
Instead, propagate this return value out to add_note(), and return it
from there to enable the caller to handle errors appropriately.
Existing add_note() callers are updated to die() upon failure, thus
preserving the current behaviour. The only exceptions are copy_note()
and notes_cache_put() where we are able to propagate the add_note()
return value instead.
This patch has been improved by the following contributions:
- Jonathan Nieder: Future-proof by always checking add_note() return value
- Jonathan Nieder: Improve clarity of final if-condition in note_tree_insert()
Thanks-to: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow combine_notes functions to request that a note be removed, by setting
the resulting note SHA1 to null_sha1 (0000000...).
For consistency, also teach note_tree_insert() to skip insertion of an empty
note (a note with entry->val_sha1 equal to null_sha1) when there is no note
to combine it with.
In general, an empty note (null_sha1) is treated identically to no note at
all, but when adding an empty note where there already exists a non-empty
note, we allow the combine_notes function to potentially record a new/changed
note. Document this behaviour, and clearly specify how combine_notes functions
are expected to handle null_sha1 in input.
Before this patch, storing null_sha1s in the notes tree were silently allowed,
causing an invalid notes tree (referring to blobs with null_sha1) to be
produced by write_notes_tree().
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch introduces no functional change. It consists solely of reordering
functions in notes.c to avoid use-before-declaration errors after applying
the next commit in this series.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extend remove_note() in the notes API to return whether or not a note was
actually removed. Use this in 'git notes remove' to skip the creation of
a notes commit when no notes were actually removed.
Also add a test illustrating the change in behavior.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The rule for selecting the candidates for conversion is: if the callback
function returns only 0 (the condition for for_each_string_list to exit
early), than it can be safely converted to the macro.
A notable exception are the callers in builtin/remote.c. If converted, the
readability in the file will suffer greately. Besides, the code is not very
performance critical (at the moment, at least): it does output formatting of
the list of remotes.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jp/string-list-api-cleanup:
string_list: Fix argument order for string_list_append
string_list: Fix argument order for string_list_lookup
string_list: Fix argument order for string_list_insert_at_index
string_list: Fix argument order for string_list_insert
string_list: Fix argument order for for_each_string_list
string_list: Fix argument order for print_string_list
gcc version 3.4.4 thinks that the 'cmp' variable could be used
while uninitialised and complains thus:
notes.c: In function `write_each_non_note_until':
notes.c:719: warning: 'cmp' might be used uninitialized in \
this function
Note that gcc versions 4.1.2 and 4.4.0 do not issue this warning.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the definition and callers of string_list_append to use the
string_list as the first argument. This helps make the string_list
API easier to use by being more consistent.
Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the definition and callers of for_each_string_list to use the
string_list as the first argument. This helps make the string_list
API easier to use by being more consistent.
Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce -n and -v options for "git notes prune" in complete analogy to
"git prune" so that one can check for dangling notes easily.
The output is a list of names of objects whose notes would be resp.
are removed so that one can check the object ("git show sha1") as well as
the note ("git notes show sha1").
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tr/notes-display:
git-notes(1): add a section about the meaning of history
notes: track whether notes_trees were changed at all
notes: add shorthand --ref to override GIT_NOTES_REF
commit --amend: copy notes to the new commit
rebase: support automatic notes copying
notes: implement helpers needed for note copying during rewrite
notes: implement 'git notes copy --stdin'
rebase -i: invoke post-rewrite hook
rebase: invoke post-rewrite hook
commit --amend: invoke post-rewrite hook
Documentation: document post-rewrite hook
Support showing notes from more than one notes tree
test-lib: unset GIT_NOTES_REF to stop it from influencing tests
Conflicts:
git-am.sh
refs.c
Currently, the notes copying is a bit wasteful since it always creates
new trees, even if no notes were copied at all.
Teach add_note() and remove_note() to flag the affected notes tree as
changed ('dirty'). Then teach builtin/notes.c to use this knowledge
and avoid committing trees that weren't changed.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This implements a mass-copy command that takes a sequence of lines in
the format
<from-sha1> SP <to-sha1> [ SP <rest> ] LF
on stdin, and copies each <from-sha1>'s notes to the <to-sha1>. The
<rest> is ignored. The intent, of course, is that this can read the
same input that the 'post-rewrite' hook gets.
The copy_note() function is exposed for everyone's and in particular
the next commit's use.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With this patch, you can set notes.displayRef to a glob that points at
your favourite notes refs, e.g.,
[notes]
displayRef = refs/notes/*
Then git-log and friends will show notes from all trees.
Thanks to Junio C Hamano for lots of feedback, which greatly
influenced the design of the entire series and this commit in
particular.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The mode bits for entries in a tree object should be an octal number
with minimum number of digits. Do not pad it with 0 to the left.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an object is made unreachable by Git, any notes that annotate that object
are not automagically made unreachable, since all notes are always trivially
reachable from a notes ref. In order to remove notes for non-existing objects,
we therefore need to add functionality for traversing the notes tree and
explicitly removing references to notes that annotate non-reachable objects.
Thus the notes objects themselves also become unreachable, and are removed
by a later garbage collect.
prune_notes() performs this traversal (by using for_each_note() internally),
and removes the notes in question from the notes tree.
Note that the effect of prune_notes() is not persistent unless a subsequent
call to write_notes_tree() is made.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The note tree structure allows for non-note entries to coexist with note
entries in a notes tree. Although we certainly expect there to be very
few non-notes in a notes tree, we should still support them to a certain
degree.
This patch teaches the notes code to preserve non-notes when updating the
notes tree with write_notes_tree(). Non-notes are not affected by fanout
restructuring.
For non-notes to be handled correctly, we can no longer allow subtree
entries that do not match the fanout structure produced by the notes code
itself. This means that fanouts like 4/36, 6/34, 8/32, 4/4/32, etc. are
no longer recognized as note subtrees; only 2-based fanouts are allowed
(2/38, 2/2/36, 2/2/2/34, etc.). Since the notes code has never at any point
_produced_ non-2-based fanouts, it is highly unlikely that this change will
cause problems for anyone.
The patch also adds some tests verifying the correct handling of non-notes
in a notes tree.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding a note to an object that already has an existing note, the
current solution is to concatenate the contents of the two notes. However,
the caller may instead wish to _overwrite_ the existing note with the new
note, or maybe even _ignore_ the new note, and keep the existing one. There
might also be other ways of combining notes that are only known to the
caller.
Therefore, instead of unconditionally concatenating notes, we let the caller
specify how to combine notes, by passing in a pointer to a function for
combining notes. The caller may choose to implement its own function for
notes combining, but normally one of the following three conveniently
supplied notes combination functions will be sufficient:
- combine_notes_concatenate() combines the two notes by appending the
contents of the new note to the contents of the existing note.
- combine_notes_overwrite() replaces the existing note with the new note.
- combine_notes_ignore() keeps the existing note, and ignores the new note.
A combine_notes function can be passed to init_notes() to choose a default
combine_notes function for that notes tree. If NULL is given, the notes tree
falls back to combine_notes_concatenate() as the ultimate default.
A combine_notes function can also be passed directly to add_note(), to
control the notes combining behaviour for a note addition in particular.
If NULL is passed, the combine_notes function registered for the given
notes tree is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new struct notes_tree encapsulates access to a specific notes tree.
It is provided to allow callers to make use of several different notes trees
simultaneously.
A struct notes_tree * parameter is added to every function in the notes API.
In all cases, NULL can be passed, in which case the fallback "default" notes
tree (default_notes_tree) is used.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Uses for_each_note() to traverse the notes tree, and produces tree
objects on the fly representing the "on-disk" version of the notes
tree with appropriate fanout.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This includes a first attempt at creating an optimal fanout scheme (which
is calculated on-the-fly, while traversing).
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Created by a simple cleanup and rename of lookup_notes().
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This includes adding internal functions for maintaining a healthy notes tree
structure after removing individual notes.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Created by a simple refactoring of initialize_notes().
Also add a new 'flags' parameter, which is a bitwise combination of notes
initialization flags. For now, there is only one flag - NOTES_INIT_EMPTY -
which indicates that the notes tree should not auto-load the contents of
the given (or default) notes ref, but rather should leave the notes tree
initialized to an empty state. This will become useful in the future when
manipulating the notes tree through the notes API.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is really no reason why only commit objects can be annotated. By
changing the struct commit parameter to get_commit_notes() into a sha1 we
gain the ability to annotate any object type. To reflect this in the function
naming as well, we rename get_commit_notes() to format_note().
This patch also fixes comments and variable names throughout notes.c as a
consequence of the removal of the unnecessary 'commit' restriction.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When loading a notes tree, the code primarily looks for SHA1-like paths
whose total length (discounting directory separators) are 40 chars
(interpreted as valid note entries) or less (interpreted as subtree
entries that may in turn contain note entries when unpacked).
However, there is an additional condition that must hold for valid
subtree entries: They must be _tree_ objects (duh).
This patch adds an appropriate test for this condition, thereby fixing
the crash that occured when passing a non-tree object to the tree-walk
API.
The patch also adds another selftest verifying correct behaviour of
non-notes in note trees.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, having multiple notes referring to the same commit from various
locations in the notes tree is strongly discouraged, since only one of those
notes will be parsed and shown.
This patch teaches the notes code to _concatenate_ multiple notes that
annotate the same commit. Notes are concatenated by creating a new blob
object containing the concatenation of the notes in question, and
replacing them with the concatenated note in the internal notes tree
structure.
Getting the concatenation right requires being more proactive in unpacking
subtree entries in the internal notes tree structure, so that we don't return
a note prematurely (i.e. before having found all other notes that annotate
the same object). As such, this patch may incur a small performance penalty.
Suggested-by: Sam Vilain <sam@vilain.net>
Re-suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The semantics used when parsing notes trees (with regards to fanout subtrees)
follow Dscho's proposal fairly closely:
- No concatenation/merging of notes is performed. If there are several notes
objects referencing a given commit, only one of those objects are used.
- If a notes object for a given commit is present in the "root" notes tree,
no subtrees are consulted; the object in the root tree is used directly.
- If there are more than one subtree that prefix-matches the given commit,
only the subtree with the longest matching prefix is consulted. This
means that if the given commit is e.g. "deadbeef", and the notes tree have
subtrees "de" and "dead", then the following paths in the notes tree are
searched: "deadbeef", "dead/beef". Note that "de/adbeef" is NOT searched.
- Fanout directories (subtrees) must references a whole number of bytes
from the SHA1 sum they subdivide. E.g. subtrees "dead" and "de" are
acceptable; "d" and "dea" are not.
- Multiple levels of fanout are allowed. All the above rules apply
recursively. E.g. "de/adbeef" is preferred over "de/adbe/ef", etc.
This patch changes the in-memory datastructure for holding parsed notes:
Instead of holding all note (and subtree) entries in a hash table, a
simple 16-tree structure is used instead. The tree structure consists of
16-arrays as internal nodes, and note/subtree entries as leaf nodes. The
tree is traversed by indexing subsequent nibbles of the search key until
a leaf node is encountered. If a subtree entry is encountered while
searching for a note, the subtree is unpacked into the 16-tree structure,
and the search continues into that subtree.
The new algorithm performs significantly better in the cases where only
a fraction of the notes need to be looked up (this is assumed to be the
common case for notes lookup). The new code even performs marginally
better in the worst case (where _all_ the notes are looked up).
In addition to this, comes the massive performance win associated with
organizing the notes tree according to some fanout scheme. Even a simple
2/38 fanout scheme is dramatically quicker to traverse (going from tens of
seconds to sub-second runtimes).
As for memory usage, the new code is marginally better than the old code in
the worst case, but in the case of looking up only some notes from a notes
tree with proper fanout, the new code uses only a small fraction of the
memory needed to hold the entire notes tree.
However, there is one casualty of this patch. The old notes lookup code was
able to parse notes that were associated with non-SHA1s (e.g. refs). The new
code requires the referenced object to be named by a SHA1 sum. Still, this
is not considered a major setback, since the notes infrastructure was not
originally intended to annotate objects outside the Git object database.
Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no need to be rude to memory-concious callers...
This patch has been improved by the following contributions:
- Junio C Hamano: avoid old-style declaration
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds the following flags to get_commit_notes() for adjusting the
format of the produced note string:
- NOTES_SHOW_HEADER: Print "Notes:" line before the notes contents
- NOTES_INDENT: Indent notes contents by 4 spaces
Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.
The idea for the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.
Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object. This
allows you to annotate arbitrary branches, or objects.
This patch has been improved by the following contributions:
- Junio C Hamano: fixed an obvious error in initialize_hash_map()
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
This patch has been improved by the following contributions:
- Thomas Rast: fix core.notesRef documentation
- Tor Arne Vestbø: fix printing of multi-line notes
- Alex Riesen: Using char array instead of char pointer costs less BSS
- Johan Herland: Plug leak when msg is good, but msglen or type causes return
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_commit_notes(): Plug memory leak when 'if' triggers, but not because of read_sha1_file() failure
The line length was read from the same position every time,
causing mangled output when printing notes with multiple lines.
Also, adding new-line manually for each line ensures that we
get a new-line between commits, matching git-log for commits
without notes.
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid looking up each and every commit in the notes ref's tree
object, which is very expensive, speed things up by slurping the tree
object's contents into a hash_map.
The idea fo the hashmap singleton is from David Reiss, initial
benchmarking by Jeff King.
Note: the implementation allows for arbitrary entries in the notes
tree object, ignoring those that do not reference a valid object. This
allows you to annotate arbitrary branches, or objects.
[jc: fixed an obvious error in initialize_hash_map()]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>