git-commit-vandalism/oid-array.c

94 lines
1.8 KiB
C
Raw Normal View History

#include "cache.h"
#include "oid-array.h"
#include "sha1-lookup.h"
void oid_array_append(struct oid_array *array, const struct object_id *oid)
{
ALLOC_GROW(array->oid, array->nr + 1, array->alloc);
oidcpy(&array->oid[array->nr++], oid);
array->sorted = 0;
}
static int void_hashcmp(const void *a, const void *b)
{
return oidcmp(a, b);
}
void oid_array_sort(struct oid_array *array)
{
if (array->sorted)
return;
QSORT(array->oid, array->nr, void_hashcmp);
array->sorted = 1;
}
static const unsigned char *sha1_access(size_t index, void *table)
{
struct object_id *array = table;
return array[index].hash;
}
int oid_array_lookup(struct oid_array *array, const struct object_id *oid)
{
oid_array_sort(array);
return sha1_pos(oid->hash, array->oid, array->nr, sha1_access);
}
void oid_array_clear(struct oid_array *array)
{
FREE_AND_NULL(array->oid);
array->nr = 0;
array->alloc = 0;
array->sorted = 0;
}
get_short_oid: sort ambiguous objects by type, then SHA-1 Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs. Within each type we show objects in hashcmp() order. Before this change the objects were only ordered by hashcmp(). The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display: hint: The candidates are: hint: e8f2093055 tree hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f25a3a50 tree hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2650052 tag v2.17.0 hint: e8f2867228 blob hint: e8f28d537c tree hint: e8f2a35526 blob hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2cf6ec0 tree Now we'll instead show: hint: e8f2650052 tag v2.17.0 hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2093055 tree hint: e8f25a3a50 tree hint: e8f28d537c tree hint: e8f2cf6ec0 tree hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f2867228 blob hint: e8f2a35526 blob Since we show the commit data in the output that's nicely aligned once we sort by object type. The decision to show tags before commits is pretty arbitrary. I don't want to order by object_type since there tags come last after blobs, which doesn't make sense if we want to show the most important things first. I could display them after commits, but it's much less likely that we'll display a tag, so if there is one it makes sense to show it prominently at the top. A note on the implementation: Derrick rightly pointed out[1] that we're bending over backwards here in get_short_oid() to first de-duplicate the list, and then emit it, but could simply do it in one step. The reason for that is that oid_array_for_each_unique() doesn't actually require that the array be sorted by oid_array_sort(), it just needs to be sorted in some order that guarantees that all objects with the same ID are adjacent to one another, which (barring a hash collision, which'll be someone else's problem) the sort_ambiguous() function does. I agree that would be simpler for this code, and had forgotten why I initially wrote it like this[2]. But on further reflection I think it's better to do more work here just so we're not underhandedly using the oid-array API where we lie about the list being sorted. That would break any subsequent use of oid_array_lookup() in subtle ways. I could get around that by hacking the API itself to support this use-case and documenting it, which I did as a WIP patch in [3], but I think it's too much code smell just for this one call site. It's simpler for the API to just introduce a oid_array_for_each() function to eagerly spew out the list without sorting or de-duplication, and then do the de-duplication and sorting in two passes. 1. https://public-inbox.org/git/20180501130318.58251-1-dstolee@microsoft.com/ 2. https://public-inbox.org/git/876047ze9v.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/874ljrzctc.fsf@evledraar.gmail.com/ Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:43:02 +02:00
int oid_array_for_each(struct oid_array *array,
for_each_oid_fn fn,
void *data)
{
size_t i;
get_short_oid: sort ambiguous objects by type, then SHA-1 Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs. Within each type we show objects in hashcmp() order. Before this change the objects were only ordered by hashcmp(). The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display: hint: The candidates are: hint: e8f2093055 tree hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f25a3a50 tree hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2650052 tag v2.17.0 hint: e8f2867228 blob hint: e8f28d537c tree hint: e8f2a35526 blob hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2cf6ec0 tree Now we'll instead show: hint: e8f2650052 tag v2.17.0 hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2093055 tree hint: e8f25a3a50 tree hint: e8f28d537c tree hint: e8f2cf6ec0 tree hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f2867228 blob hint: e8f2a35526 blob Since we show the commit data in the output that's nicely aligned once we sort by object type. The decision to show tags before commits is pretty arbitrary. I don't want to order by object_type since there tags come last after blobs, which doesn't make sense if we want to show the most important things first. I could display them after commits, but it's much less likely that we'll display a tag, so if there is one it makes sense to show it prominently at the top. A note on the implementation: Derrick rightly pointed out[1] that we're bending over backwards here in get_short_oid() to first de-duplicate the list, and then emit it, but could simply do it in one step. The reason for that is that oid_array_for_each_unique() doesn't actually require that the array be sorted by oid_array_sort(), it just needs to be sorted in some order that guarantees that all objects with the same ID are adjacent to one another, which (barring a hash collision, which'll be someone else's problem) the sort_ambiguous() function does. I agree that would be simpler for this code, and had forgotten why I initially wrote it like this[2]. But on further reflection I think it's better to do more work here just so we're not underhandedly using the oid-array API where we lie about the list being sorted. That would break any subsequent use of oid_array_lookup() in subtle ways. I could get around that by hacking the API itself to support this use-case and documenting it, which I did as a WIP patch in [3], but I think it's too much code smell just for this one call site. It's simpler for the API to just introduce a oid_array_for_each() function to eagerly spew out the list without sorting or de-duplication, and then do the de-duplication and sorting in two passes. 1. https://public-inbox.org/git/20180501130318.58251-1-dstolee@microsoft.com/ 2. https://public-inbox.org/git/876047ze9v.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/874ljrzctc.fsf@evledraar.gmail.com/ Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:43:02 +02:00
/* No oid_array_sort() here! See oid-array.h */
get_short_oid: sort ambiguous objects by type, then SHA-1 Change the output emitted when an ambiguous object is encountered so that we show tags first, then commits, followed by trees, and finally blobs. Within each type we show objects in hashcmp() order. Before this change the objects were only ordered by hashcmp(). The reason for doing this is that the output looks better as a result, e.g. the v2.17.0 tag before this change on "git show e8f2" would display: hint: The candidates are: hint: e8f2093055 tree hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f25a3a50 tree hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2650052 tag v2.17.0 hint: e8f2867228 blob hint: e8f28d537c tree hint: e8f2a35526 blob hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2cf6ec0 tree Now we'll instead show: hint: e8f2650052 tag v2.17.0 hint: e8f21caf94 commit 2013-06-24 - bash prompt: print unique detached HEAD abbreviated object name hint: e8f26250fa commit 2017-02-03 - Merge pull request #996 from jeffhostetler/jeffhostetler/register_rename_src hint: e8f2bc0c06 commit 2015-05-10 - Documentation: note behavior for multiple remote.url entries hint: e8f2093055 tree hint: e8f25a3a50 tree hint: e8f28d537c tree hint: e8f2cf6ec0 tree hint: e8f21d02f7 blob hint: e8f21d577c blob hint: e8f2867228 blob hint: e8f2a35526 blob Since we show the commit data in the output that's nicely aligned once we sort by object type. The decision to show tags before commits is pretty arbitrary. I don't want to order by object_type since there tags come last after blobs, which doesn't make sense if we want to show the most important things first. I could display them after commits, but it's much less likely that we'll display a tag, so if there is one it makes sense to show it prominently at the top. A note on the implementation: Derrick rightly pointed out[1] that we're bending over backwards here in get_short_oid() to first de-duplicate the list, and then emit it, but could simply do it in one step. The reason for that is that oid_array_for_each_unique() doesn't actually require that the array be sorted by oid_array_sort(), it just needs to be sorted in some order that guarantees that all objects with the same ID are adjacent to one another, which (barring a hash collision, which'll be someone else's problem) the sort_ambiguous() function does. I agree that would be simpler for this code, and had forgotten why I initially wrote it like this[2]. But on further reflection I think it's better to do more work here just so we're not underhandedly using the oid-array API where we lie about the list being sorted. That would break any subsequent use of oid_array_lookup() in subtle ways. I could get around that by hacking the API itself to support this use-case and documenting it, which I did as a WIP patch in [3], but I think it's too much code smell just for this one call site. It's simpler for the API to just introduce a oid_array_for_each() function to eagerly spew out the list without sorting or de-duplication, and then do the de-duplication and sorting in two passes. 1. https://public-inbox.org/git/20180501130318.58251-1-dstolee@microsoft.com/ 2. https://public-inbox.org/git/876047ze9v.fsf@evledraar.gmail.com/ 3. https://public-inbox.org/git/874ljrzctc.fsf@evledraar.gmail.com/ Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:43:02 +02:00
for (i = 0; i < array->nr; i++) {
int ret = fn(array->oid + i, data);
if (ret)
return ret;
}
return 0;
}
int oid_array_for_each_unique(struct oid_array *array,
for_each_oid_fn fn,
void *data)
{
size_t i;
oid_array_sort(array);
oid-array: provide a for-loop iterator We provide oid_array_for_each_unique() for iterating over the de-duplicated items in an array. But it's awkward to use for two reasons: 1. It uses a callback, which means marshaling arguments into a struct and passing it to the callback with a void parameter. 2. The callback doesn't know the numeric index of the oid we're looking at. This is useful for things like progress meters. Iterating with a for-loop is much more natural for some cases, but the caller has to do the de-duping itself. However, we can provide a small helper to make this easier (see the docstring in the header for an example use). The caller does have to remember to sort the array first. We could add an assertion into the helper that array->sorted is set, but I didn't want to complicate what is otherwise a pretty fast code path. I also considered adding a full iterator type with init/next/end functions (similar to what we have for hashmaps). But it ended up making the callers much harder to read. This version keeps us close to a basic for-loop. Yet another option would be adding an option to sort the array and compact out the duplicates. This would mean iterating over the array an extra time, though that's probably not a big deal (we did just do an O(n log n) sort). But we'd still have to write a for-loop to iterate, so it doesn't really make anything easier for the caller. No new test, since we'll convert the callback iterator (which is covered by t0064, among other callers) to use the new code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-07 20:11:00 +01:00
for (i = 0; i < array->nr; i = oid_array_next_unique(array, i)) {
int ret = fn(array->oid + i, data);
if (ret)
return ret;
}
return 0;
}
void oid_array_filter(struct oid_array *array,
for_each_oid_fn want,
void *cb_data)
{
size_t nr = array->nr, src, dst;
struct object_id *oids = array->oid;
for (src = dst = 0; src < nr; src++) {
if (want(&oids[src], cb_data)) {
if (src != dst)
oidcpy(&oids[dst], &oids[src]);
dst++;
}
}
array->nr = dst;
}