A few implementation fixes in the notes API.
* mh/notes-duplicate-entries:
notes: avoid potential use-after-free during insertion
notes: avoid leaking duplicate entries
The note_tree_insert() function may free the leaf_node struct we pass in
(e.g., if it's a duplicate, or if it needs to be combined with an
existing note).
Most callers are happy with this, as they assume that ownership of the
struct is handed off. But in load_subtree(), if we see an error we'll
use the handed-off struct's key_oid to generate the die() message,
potentially accessing freed memory.
We can easily fix this by instead using the original oid that we copied
into the leaf_node struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When add_note is called multiple times with the same key/value pair, the
leaf_node it creates is leaked by notes_tree_insert.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, clean up the_repo usage in builtin/merge-tree.c a tiny
bit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git merge-recursive" backend recently learned a new heuristics to
infer file movement based on how other files in the same directory
moved. As this is inherently less robust heuristics than the one
based on the content similarity of the file itself (rather than
based on what its neighbours are doing), it sometimes gives an
outcome unexpected by the end users. This has been toned down to
leave the renamed paths in higher/conflicted stages in the index so
that the user can examine and confirm the result.
* en/merge-directory-renames:
merge-recursive: switch directory rename detection default
merge-recursive: give callers of handle_content_merge() access to contents
merge-recursive: track information associated with directory renames
t6043: fix copied test description to match its purpose
merge-recursive: switch from (oid,mode) pairs to a diff_filespec
merge-recursive: cleanup handle_rename_* function signatures
merge-recursive: track branch where rename occurred in rename struct
merge-recursive: remove ren[12]_other fields from rename_conflict_info
merge-recursive: shrink rename_conflict_info
merge-recursive: move some struct declarations together
merge-recursive: use 'ci' for rename_conflict_info variable name
merge-recursive: rename locals 'o' and 'a' to 'obuf' and 'abuf'
merge-recursive: rename diff_filespec 'one' to 'o'
merge-recursive: rename merge_options argument from 'o' to 'opt'
Use 'unsigned short' for mode, like diff_filespec does
struct diff_filespec defines mode to be an 'unsigned short'. Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int). This caused
problems when taking addresses, so switch to unsigned short.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the uses of sha1_to_hex in this function with hash_to_hex to
allow the use of SHA-256 as well. Rename some variables since this code
is no longer limited to SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Switch out various uses of the GIT_SHA1_* constants with GIT_MAX_*
constants for allocations and the_hash_algo for general parsing. Update
a comment to no longer be SHA-1 specific.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.
Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:
if (oidcmp(E1, E2))
As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.
There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the partner patch to the previous one, but covering
the "hash" variants instead of "oid". Note that our
coccinelle rule is slightly more complex to avoid triggering
the call in hasheq().
I didn't bother to add a new rule to convert:
- hasheq(E1->hash, E2->hash)
+ oideq(E1, E2)
Since these are new functions, there won't be any such
existing callers. And since most of the code is already
using oideq, we're not likely to introduce new ones.
We might still see "!hashcmp(E1->hash, E2->hash)" from topics
in flight. But because our new rule comes after the existing
ones, that should first get converted to "!oidcmp(E1, E2)"
and then to "oideq(E1, E2)".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert get_tree_entry and find_tree_entry to take pointers to struct
object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.
This commit also converts static function write_sha1_file_prepare, as it
is closely related.
Rename these functions to write_object_file and
write_object_file_prepare respectively.
Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of write_notes_tree to
struct object_id and adjust usage of this function.
Additionally, improve style of small part of this function, as old
formatting made it hard to understand at glance what this part of
code is doing.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declarations of combine_notes_* functions
to struct object_id and adjust usage of these functions.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the function for converting pairs of hexadecimal digits to binary
available to other call sites.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All but two of the call sites already have parameters using the hash
parameter of struct object_id, so convert them to take a pointer to the
struct directly. Also convert refs_read_refs_full, the underlying
implementation.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* mh/notes-cleanup:
load_subtree(): check that `prefix_len` is in the expected range
load_subtree(): declare some variables to be `size_t`
hex_to_bytes(): simpler replacement for `get_oid_hex_segment()`
get_oid_hex_segment(): don't pad the rest of `oid`
load_subtree(): combine some common code
get_oid_hex_segment(): return 0 on success
load_subtree(): only consider blobs to be potential notes
load_subtree(): check earlier whether an internal node is a tree entry
load_subtree(): separate logic for internal vs. terminal entries
load_subtree(): fix incorrect comment
load_subtree(): reduce the scope of some local variables
load_subtree(): remove unnecessary conditional
notes: make GET_NIBBLE macro more robust
This value, which is stashed in the last byte of an object_id hash,
gets handed around a lot. So add a sanity check before using it in
`load_subtree()`.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* `prefix_len`
* `path_len`
* `i`
It's good hygiene.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that `get_oid_hex_segment()` does less, it makes sense to rename
it and simplify its semantics:
* Instead of a `hex_len` parameter, which was the number of hex
characters (and had to be even), use a `len` parameter, which is the
number of resulting bytes. This removes then need for the check that
`hex_len` is even and to divide it by two to determine the number of
bytes. For good hygiene, declare the `len` parameter to be `size_t`
instead of `unsigned int`.
* Change the order of the arguments to the more traditional (dst,
src, len).
* Rename the function to `hex_to_bytes()`.
* Remove a loop variable: just count `len` down instead.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the feature of `get_oid_hex_segment()` that it pads the rest of
the `oid` argument with zeros. Instead, do this at the caller who
needs it.
This makes the functionality of this function more coherent and
removes the need for its `oid_len` argument.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Write the length into `object_oid` (before copying) rather than
`l->key_oid` (after copying). Then combine some code from the two `if`
blocks.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Nobody cares about the return value of get_oid_hex_segment() except to
check whether it failed. So just return 0 on success.
And while we're updating its docstring, update it for some argument
renaming that happened a while ago.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code converted any entry whose path constituted a full SHA-1
as a leaf node, without regard for the type of the entry. But only
blobs can be notes. So treat entries whose paths *look like* notes
paths but that are not blobs as non-notes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If an entry is not a tree entry, then it cannot possibly be an
internal node. But the old code checked this condition only after
allocating a leaf_node object and therefore leaked that memory.
Instead, check before even entering this branch of the code.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are only two legitimate notes path components:
* A hexadecimal string that fills the rest of the SHA-1
* A two-digit hexadecimal string that constitutes another internal
node.
So handle those two cases at the top level, and reject others as
non-notes without trying to parse them. The logic separation also
simplifies upcoming changes.
This prevents us from leaking memory for a leaf_node in the case of
wrong-sized paths. There are still memory leaks in this code; they will
be fixed in upcoming commits.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This comment was added in 851c2b3791 (Teach notes code to properly
preserve non-notes in the notes tree, 2010-02-13) when the
corresponding code was added. But I believe it was incorrect even
then. The condition `path_len != 2` a dozen lines up prevents a path
like "dead/beef" from being converted to "de/ad/beef", and indeed the
test added in commit 851c2b3 verifies that this case works correctly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Declare the variables inside the loop, to make it more obvious that
their values are not carried across loop iterations.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At this point in the code, len is *always* <= 20.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Put parentheses around sha1. Otherwise it could fail for something
like
GET_NIBBLE(n, (unsigned char *)data);
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All callers of fill_tree_descriptor() have been converted to object_id
already, so convert that function as well. As a nice side-effect we get
rid of NULL checks in tree-diff.c, as fill_tree_descriptor() already
does them for us.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert add_note, get_note, and copy_note to take struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make get_note return a pointer to a const struct object_id. Add a
defensive check to ensure we don't accidentally dereference a NULL
pointer.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert for_each_note and each of the callbacks to use struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert several portions of the internals of the code to struct
object_id. Introduce two macros to denote the different constants in
the code: KEY_INDEX for the last byte of the object ID, and
FANOUT_PATH_SEPARATORS for the number of possible path separators (on
Unix, "/"). While these constants are both 19 (one less than the number
of bytes in the hash), distinguish them to make the code more
understandable, and define them logically based on their intended
purpose.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After a note is removed, note_tree_consolidate is called to eliminate
some useless nodes. The typical case is that if you had an int_node
with 2 PTR_TYPE_NOTEs in it, and remove one of them, then the
PTR_TYPE_INTERNAL pointer in the parent tree can be replaced with the
remaining PTR_TYPE_NOTE.
This works fine when PTR_TYPE_NOTEs are involved, but falls flat when
other types are involved.
To put things in more practical terms, let's say we start from an empty
notes tree, and add 3 notes:
- one for a sha1 that starts with 424
- one for a sha1 that starts with 428
- one for a sha1 that starts with 4c
To keep track of this, note_tree.root will have a PTR_TYPE_INTERNAL at
a[4], pointing to an int_node*.
In turn, that int_node* will have a PTR_TYPE_NOTE at a[0xc], pointing to
the leaf_node* with the key and value, and a PTR_TYPE_INTERNAL at a[2],
pointing to another int_node*.
That other int_node* will have 2 PTR_TYPE_NOTE, one at a[4] and the
other at a[8].
When looking for the note for the sha1 starting with 428, get_note() will
recurse through (simplified) root.a[4].a[2].a[8].
Now, if we remove the note for the sha1 that starts with 4c, we're left
with a int_node* with only one PTR_TYPE_INTERNAL entry in it. After
note_tree_consolidate runs, root.a[4] now points to what used to be
pointed at by root.a[4].a[2].
Which means looking up for the note for the sha1 starting with 428 now
fails because there is nothing at root.a[4].a[2] anymore: there is only
root.a[4].a[4] and root.a[4].a[8], which don't match the expected
structure for the lookup.
So if all there is left in an int_node* is a PTR_TYPE_INTERNAL pointer,
we can't safely remove it. I think the same applies for PTR_TYPE_SUBTREE
pointers. IOW, only PTR_TYPE_NOTEs are safe to be moved to the parent
int_node*.
This doesn't have a practical effect on git because all that happens
after a remove_note is a write_notes_tree, which just iterates the entire
note tree, but this affects anything using libgit.a that would try to do
lookups after removing notes.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two types of string_lists: those that own the
string memory, and those that don't. You can tell the
difference by the strdup_strings flag, and one should use
either STRING_LIST_INIT_DUP, or STRING_LIST_INIT_NODUP as an
initializer.
Historically, the normal all-zeros initialization has
corresponded to the NODUP case. Many sites use no
initializer at all, and that works as a shorthand for that
case. But for a reader of the code, it can be hard to
remember which is which. Let's be more explicit and actually
have each site declare which type it means to use.
This is a fairly mechanical conversion; I assumed each site
was correct as-is, and just switched them all to NODUP.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>