git-commit-vandalism/Documentation/technical/api-oid-array.txt

oid-array API
==============

The oid-array API provides storage and manipulation of sets of object
identifiers. The emphasis is on storage and processing efficiency,
making them suitable for large lists. Note that the ordering of items is
not preserved over some operations.

Data Structures
---------------

`struct oid_array`::

	A single array of object IDs. This should be initialized by
	assignment from `OID_ARRAY_INIT`.  The `oid` member contains
	the actual data. The `nr` member contains the number of items in
	the set.  The `alloc` and `sorted` members are used internally,
	and should not be needed by API callers.

Functions
---------

`oid_array_append`::
	Add an item to the set. The object ID will be placed at the end of
	the array (but note that some operations below may lose this
	ordering).

`oid_array_lookup`::
	Perform a binary search of the array for a specific object ID.
	If found, returns the offset (in number of elements) of the
	object ID. If not found, returns a negative integer. If the array
	is not sorted, this function has the side effect of sorting it.

`oid_array_clear`::
	Free all memory associated with the array and return it to the
	initial, empty state.

`oid_array_for_each`::
	Iterate over each element of the list, executing the callback
	function for each one. Does not sort the list, so any custom
	hash order is retained. If the callback returns a non-zero
	value, the iteration ends immediately and the callback's
	return is propagated; otherwise, 0 is returned.

`oid_array_for_each_unique`::
	Iterate over each unique element of the list in sorted order,
	but otherwise behave like `oid_array_for_each`. If the array
	is not sorted, this function has the side effect of sorting
	it.

Examples
--------

-----------------------------------------
int print_callback(const struct object_id *oid,
		    void *data)
{
	printf("%s\n", oid_to_hex(oid));
	return 0; /* always continue */
}

void some_func(void)
{
	struct sha1_array hashes = OID_ARRAY_INIT;
	struct object_id oid;

	/* Read objects into our set */
	while (read_object_from_stdin(oid.hash))
		oid_array_append(&hashes, &oid);

	/* Check if some objects are in our set */
	while (read_object_from_stdin(oid.hash)) {
		if (oid_array_lookup(&hashes, &oid) >= 0)
			printf("it's in there!\n");

	/*
	 * Print the unique set of objects. We could also have
	 * avoided adding duplicate objects in the first place,
	 * but we would end up re-sorting the array repeatedly.
	 * Instead, this will sort once and then skip duplicates
	 * in linear time.
	 */
	oid_array_for_each_unique(&hashes, print_callback, NULL);
}
-----------------------------------------
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`oid-array API`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`==============`

Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`The oid-array API provides storage and manipulation of sets of object`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`identifiers. The emphasis is on storage and processing efficiency,`
			`making them suitable for large lists. Note that the ordering of items is`
			`not preserved over some operations.`

			`Data Structures`
			`---------------`

Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`struct oid_array`::
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`A single array of object IDs. This should be initialized by`
			assignment from `OID_ARRAY_INIT`. The `oid` member contains
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			the actual data. The `nr` member contains the number of items in
			the set. The `alloc` and `sorted` members are used internally,
			`and should not be needed by API callers.`

			`Functions`
			`---------`

Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`oid_array_append`::
			`Add an item to the set. The object ID will be placed at the end of`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`the array (but note that some operations below may lose this`
			`ordering).`

Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`oid_array_lookup`::
			`Perform a binary search of the array for a specific object ID.`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`If found, returns the offset (in number of elements) of the`
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`object ID. If not found, returns a negative integer. If the array`
			`is not sorted, this function has the side effect of sorting it.`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`oid_array_clear`::
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`Free all memory associated with the array and return it to the`
			`initial, empty state.`

get_short_oid: sort ambiguous objects by type, then SHA-1 Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs. Within each type we show objects in hashcmp() order. Before this change the objects were only ordered by hashcmp(). The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display: hint: The candidates are: hint: e8f2093055 tree hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f25a3a50 tree hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2650052 tag v2.17.0 hint: e8f2867228 blob hint: e8f28d537c tree hint: e8f2a35526 blob hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2cf6ec0 tree Now we'll instead show: hint: e8f2650052 tag v2.17.0 hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2093055 tree hint: e8f25a3a50 tree hint: e8f28d537c tree hint: e8f2cf6ec0 tree hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f2867228 blob hint: e8f2a35526 blob Since we show the commit data in the output that's nicely aligned once we sort by object type. The decision to show tags before commits is pretty arbitrary. I don't want to order by object_type since there tags come last after blobs, which doesn't make sense if we want to show the most important things first. I could display them after commits, but it's much less likely that we'll display a tag, so if there is one it makes sense to show it prominently at the top. A note on the implementation: Derrick rightly pointed out[1] that we're bending over backwards here in get_short_oid() to first de-duplicate the list, and then emit it, but could simply do it in one step. The reason for that is that oid_array_for_each_unique() doesn't actually require that the array be sorted by oid_array_sort(), it just needs to be sorted in some order that guarantees that all objects with the same ID are adjacent to one another, which (barring a hash collision, which'll be someone else's problem) the sort_ambiguous() function does. I agree that would be simpler for this code, and had forgotten why I initially wrote it like this[2]. But on further reflection I think it's better to do more work here just so we're not underhandedly using the oid-array API where we lie about the list being sorted. That would break any subsequent use of oid_array_lookup() in subtle ways. I could get around that by hacking the API itself to support this use-case and documenting it, which I did as a WIP patch in [3], but I think it's too much code smell just for this one call site. It's simpler for the API to just introduce a oid_array_for_each() function to eagerly spew out the list without sorting or de-duplication, and then do the de-duplication and sorting in two passes. 1. https://public-inbox.org/git/20180501130318.58251-1-dstolee@microsoft.com/ 2. https://public-inbox.org/git/876047ze9v.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/874ljrzctc.fsf@evledraar.gmail.com/ Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-10 14:43:02 +02:00			`oid_array_for_each`::
			`Iterate over each element of the list, executing the callback`
			`function for each one. Does not sort the list, so any custom`
			`hash order is retained. If the callback returns a non-zero`
			`value, the iteration ends immediately and the callback's`
			`return is propagated; otherwise, 0 is returned.`

Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`oid_array_for_each_unique`::
get_short_oid: sort ambiguous objects by type, then SHA-1 Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs. Within each type we show objects in hashcmp() order. Before this change the objects were only ordered by hashcmp(). The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display: hint: The candidates are: hint: e8f2093055 tree hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f25a3a50 tree hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2650052 tag v2.17.0 hint: e8f2867228 blob hint: e8f28d537c tree hint: e8f2a35526 blob hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2cf6ec0 tree Now we'll instead show: hint: e8f2650052 tag v2.17.0 hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2093055 tree hint: e8f25a3a50 tree hint: e8f28d537c tree hint: e8f2cf6ec0 tree hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f2867228 blob hint: e8f2a35526 blob Since we show the commit data in the output that's nicely aligned once we sort by object type. The decision to show tags before commits is pretty arbitrary. I don't want to order by object_type since there tags come last after blobs, which doesn't make sense if we want to show the most important things first. I could display them after commits, but it's much less likely that we'll display a tag, so if there is one it makes sense to show it prominently at the top. A note on the implementation: Derrick rightly pointed out[1] that we're bending over backwards here in get_short_oid() to first de-duplicate the list, and then emit it, but could simply do it in one step. The reason for that is that oid_array_for_each_unique() doesn't actually require that the array be sorted by oid_array_sort(), it just needs to be sorted in some order that guarantees that all objects with the same ID are adjacent to one another, which (barring a hash collision, which'll be someone else's problem) the sort_ambiguous() function does. I agree that would be simpler for this code, and had forgotten why I initially wrote it like this[2]. But on further reflection I think it's better to do more work here just so we're not underhandedly using the oid-array API where we lie about the list being sorted. That would break any subsequent use of oid_array_lookup() in subtle ways. I could get around that by hacking the API itself to support this use-case and documenting it, which I did as a WIP patch in [3], but I think it's too much code smell just for this one call site. It's simpler for the API to just introduce a oid_array_for_each() function to eagerly spew out the list without sorting or de-duplication, and then do the de-duplication and sorting in two passes. 1. https://public-inbox.org/git/20180501130318.58251-1-dstolee@microsoft.com/ 2. https://public-inbox.org/git/876047ze9v.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/874ljrzctc.fsf@evledraar.gmail.com/ Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2018-05-10 14:43:02 +02:00			`Iterate over each unique element of the list in sorted order,`
			but otherwise behave like `oid_array_for_each`. If the array
			`is not sorted, this function has the side effect of sorting`
			`it.`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00
			`Examples`
			`--------`

			`-----------------------------------------`
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`int print_callback(const struct object_id *oid,`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`void *data)`
			`{`
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`printf("%s\n", oid_to_hex(oid));`
sha1_array: let callbacks interrupt iteration The callbacks for iterating a sha1_array must have a void return. This is unlike our usual for_each semantics, where a callback may interrupt iteration and have its value propagated. Let's switch it to the usual form, which will enable its use in more places (e.g., where we are replacing an existing iteration with a different data structure). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2016-09-26 14:00:29 +02:00			`return 0; /* always continue */`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`}`

			`void some_func(void)`
			`{`
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`struct sha1_array hashes = OID_ARRAY_INIT;`
			`struct object_id oid;`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00
			`/* Read objects into our set */`
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`while (read_object_from_stdin(oid.hash))`
			`oid_array_append(&hashes, &oid);`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00
			`/* Check if some objects are in our set */`
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`while (read_object_from_stdin(oid.hash)) {`
			`if (oid_array_lookup(&hashes, &oid) >= 0)`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`printf("it's in there!\n");`

			`/*`
			`* Print the unique set of objects. We could also have`
			`* avoided adding duplicate objects in the first place,`
			`* but we would end up re-sorting the array repeatedly.`
			`* Instead, this will sort once and then skip duplicates`
			`* in linear time.`
			`*/`
Documentation: update and rename api-sha1-array.txt Since the structure and functions have changed names, update the code examples and the documentation. Rename the file to match the new name of the API. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2017-03-31 03:40:01 +02:00			`oid_array_for_each_unique(&hashes, print_callback, NULL);`
add sha1_array API docs This API was introduced in 902bb36, but never documented. Let's be nice to future users of the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> 2011-09-13 23:57:34 +02:00			`}`
			`-----------------------------------------`