git-commit-vandalism/refs
Taylor Blau 52acddf36c string-list: multi-delimiter string_list_split_in_place()
Enhance `string_list_split_in_place()` to accept multiple characters as
delimiters instead of a single character.

Instead of using `strchr(2)` to locate the first occurrence of the given
delimiter character, `string_list_split_in_place_multi()` uses
`strcspn(2)` to move past the initial segment of characters comprised of
any characters in the delimiting set.

When only a single delimiting character is provided, `strpbrk(2)` (which
is implemented with `strcspn(2)`) has equivalent performance to
`strchr(2)`. Modern `strcspn(2)` implementations treat an empty
delimiter or the singleton delimiter as a special case and fall back to
calling strchrnul(). Both glibc[1] and musl[2] implement `strcspn(2)`
this way.

This change is one step to removing `strtok(2)` from the tree. Note that
`string_list_split_in_place()` is not a strict replacement for
`strtok()`, since it will happily turn sequential delimiter characters
into empty entries in the resulting string_list. For example:

    string_list_split_in_place(&xs, "foo:;:bar:;:baz", ":;", -1)

would yield a string list of:

    ["foo", "", "", "bar", "", "", "baz"]

Callers that wish to emulate the behavior of strtok(2) more directly
should call `string_list_remove_empty_items()` after splitting.

To avoid regressions for the new multi-character delimter cases, update
t0063 in this patch as well.

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcspn.c;hb=glibc-2.37#l35
[2]: https://git.musl-libc.org/cgit/musl/tree/src/string/strcspn.c?h=v1.2.3#n11

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 16:01:28 -07:00
..
debug.c cache.h: remove dependence on hex.h; make other files include it explicitly 2023-02-23 17:25:29 -08:00
files-backend.c write-or-die.h: move declarations for write-or-die.c functions from cache.h 2023-03-21 10:56:54 -07:00
iterator.c treewide: remove unnecessary cache.h inclusion from several sources 2023-03-21 10:56:51 -07:00
packed-backend.c string-list: multi-delimiter string_list_split_in_place() 2023-04-24 16:01:28 -07:00
packed-backend.h Revert "Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'" 2022-04-13 15:51:33 -07:00
ref-cache.c alloc.h: move ALLOC_GROW() functions from cache.h 2023-02-23 17:25:28 -08:00
ref-cache.h Merge branch 'jt/no-abuse-alternate-odb-for-submodules' 2021-10-25 16:06:56 -07:00
refs-internal.h treewide: remove unnecessary cache.h includes 2023-02-23 17:25:28 -08:00