git-commit-vandalism

Author	SHA1	Message	Date
Ævar Arnfjörð Bjarmason	5848fb11ac	object-file.c: return ULHR_TOO_LONG on "header too long" Split up the return code for "header too long" from the generic negative return value unpack_loose_header() returns, and report via error() if we exceed MAX_HEADER_LEN. As a test added earlier in this series in t1006-cat-file.sh shows we'll correctly emit zlib errors from zlib.c already in this case, so we have no need to carry those return codes further down the stack. Let's instead just return ULHR_TOO_LONG saying we ran into the MAX_HEADER_LEN limit, or other negative values for "unable to unpack <OID> header". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	3b6a8db3b0	object-file.c: use "enum" return type for unpack_loose_header() In a preceding commit we changed and documented unpack_loose_header() from its previous behavior of returning any negative value or zero, to only -1 or 0. Let's add an "enum unpack_loose_header_result" type and use it for these return values, and have the compiler assert that we're exhaustively covering all of them. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	01cab97679	object-file.c: simplify unpack_loose_short_header() Combine the unpack_loose_short_header(), unpack_loose_header_to_strbuf() and unpack_loose_header() functions into one. The unpack_loose_header_to_strbuf() function was added in `46f034483e` (sha1_file: support reading from a loose object of unknown type, 2015-05-03). Its code was mostly copy/pasted between it and both of unpack_loose_header() and unpack_loose_short_header(). We now have a single unpack_loose_header() function which accepts an optional "struct strbuf *" instead. I think the remaining unpack_loose_header() function could be further simplified, we're carrying some complexity just to be able to emit a garbage type longer than MAX_HEADER_LEN, we could alternatively just say "we found a garbage type <first 32 bytes>..." instead. But let's leave the current behavior in place for now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	ddb3474b66	object-file.c: make parse_loose_header_extended() public Make the parse_loose_header_extended() function public and remove the parse_loose_header() wrapper. The only direct user of it outside of object-file.c itself was in streaming.c, that caller can simply pass the required "struct object-info *" instead. This change is being done in preparation for teaching read_loose_object() to accept a flag to pass to parse_loose_header(). It isn't strictly necessary for that change, we could simply use parse_loose_header_extended() there, but will leave the API in a better end state. It would be a better end-state to have already moved the declaration of these functions to object-store.h to avoid the forward declaration of "struct object_info" in cache.h, but let's leave that cleanup for some other time. 1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	bfff2c4833	object-file.c: return -1, not "status" from unpack_loose_header() Return a -1 when git_inflate() fails instead of whatever Z_* status we'd get from zlib.c. This makes no difference to any error we report, but makes it more obvious that we don't care about the specific zlib error codes here. See `d21f842690` (unpack_sha1_header(): detect malformed object header, 2016-09-25) for the commit that added the "return status" code. As far as I can tell there was never a real reason (e.g. different reporting) for carrying down the "status" as opposed to "-1". At the time that `d21f842690` was written there was a corresponding "ret < Z_OK" check right after the unpack_sha1_header() call (the "unpack_sha1_header()" function was later rename to our current "unpack_loose_header()"). However, that check was removed in `c84a1f3ed4` (sha1_file: refactor read_object, 2017-06-21) without changing the corresponding return code. So let's do the minor cleanup of also changing this function to return a -1. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	74ad250a1c	object-file.c: don't set "typep" when returning non-zero When the loose_object_info() function returns an error stop faking up the "oi->typep" to OBJ_BAD. Let the return value of the function itself suffice. This code cleanup simplifies subsequent changes. That we set this at all is a relic from the past. Before `052fe5eaca` (sha1_loose_object_info: make type lookup optional, 2013-07-12) we would always return the type_from_string(type) via the parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't parse it. Then in a combination of `46f034483e` (sha1_file: support reading from a loose object of unknown type, 2015-05-03) and `b3ea7dd32d` (sha1_loose_object_info: handle errors from unpack_sha1_rest, 2017-10-05) our API drifted even further towards conflating the two again. Having read the code paths involved carefully I think this is OK. We are just about to return -1, and we have only one caller: do_oid_object_info_extended(). That function will in turn go on to return -1 when we return -1 here. This might be introducing a subtle bug where a caller of oid_object_info_extended() would inspect its "typep" and expect a meaningful value if the function returned -1. Such a problem would not occur for its simpler oid_object_info() sister function. That one always returns the "enum object_type", which in the case of -1 would be the OBJ_BAD. Having read the code for all the callers of these functions I don't believe any such bug is being introduced here, and in any case we'd likely already have such a bug for the "sizep" member (although blindly checking "typep" first would be a more common case). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	dd45a56246	cat-file tests: test for current --allow-unknown-type behavior Add more tests for the current --allow-unknown-type behavior. As noted in [1] I don't think much of this makes sense, but let's test for it as-is so we can see if the behavior changes in the future. 1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	7e7d220d9d	cat-file tests: add corrupt loose object test Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of object-file.c) by testing that when we can't decode a loose object with zlib we'll emit an error from zlib.c. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	59b8283d55	cat-file tests: test for missing/bogus object with -t, -s and -p When we look up a missing object with cat_one_file() what error we print out currently depends on whether we'll error out early in get_oid_with_context(), or if we'll get an error later from oid_object_info_extended(). The --allow-unknown-type flag then changes whether we pass the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or not. The "-p" flag is yet another special-case in printing the same output on the deadbeef OID as we'd emit on the deadbeef_short OID for the "-s" and "-t" options, it also doesn't support the "--allow-unknown-type" flag at all. Let's test the combination of the two sets of [-t, -s, -p] and [--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit in not supplying it), as well as a [missing,bogus] object pair. This extends tests added in `3e370f9faf` (t1006: add tests for git cat-file --allow-unknown-type, 2015-05-03). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	70e4a57762	cat-file tests: move bogus_* variable declarations earlier Change the short/long bogus bogus object type variables into a form where the two sets can be used concurrently. This'll be used by subsequently added tests. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	a5ed333121	fsck tests: test for garbage appended to a loose object There wasn't any output tests for this scenario, let's ensure that we don't regress on it in the changes that come after this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	42cd635b21	fsck tests: test current hash/type mismatch behavior If fsck we move an object around between .git/objects/?? directories to simulate a hash mismatch "git fsck" will currently hard die() in object-file.c. This behavior will be fixed in subsequent commits, but let's test for it as-is for now. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	f7a0dba7a2	fsck tests: refactor one test to use a sub-repo Refactor one of the fsck tests to use a throwaway repository. It's a pervasive pattern in t1450-fsck.sh to spend a lot of effort on the teardown of a tests so we're not leaving corrupt content for the next test. We can instead use the pattern of creating a named sub-repository, then we don't have to worry about cleaning up after ourselves, nobody will care what state the broken "hash-mismatch" repository is after this test runs. See [1] for related discussion on various "modern" test patterns that can be used to avoid verbosity and increase reliability. 1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	093fffdfbe	fsck tests: add test for fsck-ing an unknown type Fix a blindspot in the fsck tests by checking what we do when we encounter an unknown "garbage" type produced with hash-object's --literally option. This behavior needs to be improved, which'll be done in subsequent patches, but for now let's test for the current behavior. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	59580685be	config.h: remove unused git_config_get_untracked_cache() declaration This function was removed in `ad0fb65999` (repo-settings: parse core.untrackedCache, 2019-08-13), but not its corresponding *.h entry. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason	067e73c8ae	log-tree.h: remove unused function declarations The init_log_tree_opt() and log_tree_opt_parse() functions were removed in `cd2bdc5309` (Common option parsing for "git log --diff" and friends, 2006-04-14), but not their corresponding *.h declaration. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason	1fd2aa543d	grep.h: remove unused grep_threads_ok() declaration This function was removed in `0579f91dd7` (grep: enable threading with -p and -W using lazy attribute lookup, 2011-12-12), but not its corresponding *.h declaration. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason	f787ebd51c	builtin.h: remove cmd_tar_tree() declaration The cmd_tar_tree() function itself was removed in `925ceccf05` (tar-tree: remove deprecated command, 2013-11-10). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason	0000e81811	builtin/remote.c: add and use SHOW_INFO_INIT In the preceding commit we introduced REF_STATES_INIT, but did not change the "struct show_info" to have a corresponding initializer. Let's do that, and make it use "REF_STATES_INIT" and "STRING_LIST_INIT_DUP", doing that requires changing "list" and "states" away from being pointers. The resulting end-state is simpler since we omit the local "info_list" and "states" variables in show() as well as the memset(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 14:22:51 -07:00
Ævar Arnfjörð Bjarmason	0bc7787ca9	builtin/remote.c: add and use a REF_STATES_INIT Use a new REF_STATES_INIT designated initializer instead of assigning to the "strdup_strings" member of the previously memzero()'d version of this struct. The pattern of assigning to "strdup_strings" dates back to `211c89682e` (Make git-remote a builtin, 2008-02-29) (when it was "strdup_paths"), i.e. long before we used anything like our current established *_INIT patterns consistently. Then in `e61e0cc6b7` (builtin-remote: teach show to display remote HEAD, 2009-02-25) and `e5dcbfd9ab` (builtin-remote: new show output style for push refspecs, 2009-02-25) we added some more of these. As it turns out we only initialized this struct three times, all the other uses were of pointers to those initialized structs. So let's initialize it in those three places, skip the memset(), and pass those structs down appropriately. This would be a behavior change if we had codepaths that relied say on implicitly having had "new_refs" initialized to STRING_LIST_INIT_NODUP with the memset(), but only set the "strdup_strings" on some other struct, but then called string_list_append() on "new_refs". There isn't any such codepath, all of the late assignments to "strdup_strings" assigned to those structs that we'd use for those codepaths. So just initializing them all up-front makes for easier to understand code, i.e. in the pre-image it looked as though we had that tricky edge case, but we didn't. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 14:22:51 -07:00
Ævar Arnfjörð Bjarmason	73ee449bbf	urlmatch.[ch]: add and use URLMATCH_CONFIG_INIT Change the initialization pattern of "struct urlmatch_config" to use an _INIT macro and designated initializers. Right now there's no other "struct" member of "struct urlmatch_config" which would require its own _INIT, but it's good practice not to assume that. Let's also change this to a designated initializer while we're at it. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 14:22:51 -07:00
René Scharfe	afc72b5d3a	mergesort: use ranks stack The bottom-up mergesort implementation needs to skip through sublists a lot. A recursive version could avoid that, but would require log2(n) stack frames. Explicitly manage a stack of sorted sublists of various lengths instead to avoid fast-forwarding while also keeping a lid on memory usage. While this patch was developed independently, a ranks stack is also used in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by the Mono project. The idea is to keep slots for log2(n_max) sorted sublists, one for each power of 2. Such a construct can accommodate lists of any length up to n_max. Since there is a known maximum number of items (effectively SIZE_MAX), we can preallocate the whole rank stack. We add items one by one, which is akin to incrementing a binary number. Make use of that by keeping track of the number of items and check bits in it instead of checking for NULL in the rank stack when checking if a sublist of a certain rank exists, in order to avoid memory accesses. The first item can go into the empty first slot as a sublist of length 2^0. The second one needs to be merged with the previous sublist and the result goes into the empty second slot as a sublist of length 2^1. The third one goes into vacated first slot and so on. At the end we merge all the sublists to get the result. The new version still performs a stable sort by making sure to put items seen earlier first when the compare function indicates equality. That's done by preferring items from sublists with a higher rank. The new merge function also tries to minimize the number of operations. Like blame.c::blame_merge(), the function doesn't set the next pointer if it already points to the right item, and it exits when it reaches the end of one of the two sublists that it's given. The old code couldn't do the latter because it kept all items in a single list. The number of comparisons stays the same, though. Here's example output of "test-tool mergesort test" for the rand distributions with the most number of comparisons with the ranks stack: $ t/helper/test-tool mergesort test \| awk ' NR > 1 && $1 != "rand" {next} $7 > max[$3] {max[$3] = $7; line[$3] = $0} END {for (n in line) print line[n]} ' distribut mode n m get_next set_next compare verdict rand copy 100 32 669 420 569 OK rand dither 1023 64 9997 5396 8974 OK rand dither 1024 512 10007 6159 8983 OK rand dither 1025 256 10993 5988 9968 OK Here are the differences to the results without this patch: distribut mode n m get_next set_next compare rand copy 100 32 -515 -280 0 rand dither 1023 64 -6376 -4834 0 rand dither 1024 512 -6377 -4081 0 rand dither 1025 256 -7461 -5287 0 The numbers of get_next and set_next calls are reduced significantly. NB: These winners are different than the ones shown in the patch that introduced the unriffle mode because the addition of the unriffle_skewed mode in between changed the consumption of rand() values. Here are the distributions with the most comparisons overall with the ranks stack: $ t/helper/test-tool mergesort test \| awk ' $7 > max[$3] {max[$3] = $7; line[$3] = $0} END {for (n in line) print line[n]} ' distribut mode n m get_next set_next compare verdict sawtooth unriffle_skewed 100 128 689 632 589 OK sawtooth unriffle_skewed 1023 1024 10230 10220 9207 OK sawtooth unriffle 1024 1024 10241 10240 9217 OK sawtooth unriffle_skewed 1025 2048 11266 10242 10241 OK And here the differences to before: distribut mode n m get_next set_next compare sawtooth unriffle_skewed 100 128 -495 -68 0 sawtooth unriffle_skewed 1023 1024 -6143 -10 0 sawtooth unriffle 1024 1024 -6143 0 0 sawtooth unriffle_skewed 1025 2048 -7188 -1033 0 We get a similar reduction of get_next calls here, but only a slight reduction of set_next calls, if at all. And here are the results of p0071-sort.sh before: 0071.12: llist_mergesort() unsorted 0.36(0.33+0.01) 0071.14: llist_mergesort() sorted 0.15(0.13+0.01) 0071.16: llist_mergesort() reversed 0.16(0.14+0.01) ... and here the ones with this patch: 0071.12: llist_mergesort() unsorted 0.24(0.22+0.01) 0071.14: llist_mergesort() sorted 0.12(0.10+0.01) 0071.16: llist_mergesort() reversed 0.12(0.10+0.01) NB: We can't use t/perf/run to compare revisions in one run because it uses the test-tool from the worktree, not from the revisions being tested. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:09 -07:00
René Scharfe	40bc872adb	p0071: test performance of llist_mergesort() Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:09 -07:00
René Scharfe	84edc40676	p0071: measure sorting of already sorted and reversed files Check if sorting takes advantage of already sorted or reversed content, or if that corner case actually decreases performance, like it would for a simplistic quicksort implementation. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:09 -07:00
René Scharfe	f1ed4ce9e3	test-mergesort: add unriffle_skewed mode Add a mode that turns a sorted list into adversarial input for a bottom-up mergesort implementation that doubles the length of sorted sublists at each level -- like our llist_mergesort(). While unriffle mode splits the list in half at each recursion step, unriffle_skewed splits it into 2^l items and the rest, with 2^l being the highest power of two smaller than the number of items and thus 2^l >= rest. The rest is unriffled with the tail of the first half to require a merge to compare the maximum number of elements. It complements the unriffle mode, which targets balanced merges. If the number of elements is a power of two then both actually produce the same result, as 2^l == rest == n/2 at each recursion step in that case. Here are the results: $ t/helper/test-tool mergesort test \| awk ' $7 > max[$3] {max[$3] = $7; line[$3] = $0} END {for (n in line) print line[n]} ' distribut mode n m get_next set_next compare verdict sawtooth unriffle_skewed 100 128 1184 700 589 OK sawtooth unriffle_skewed 1023 1024 16373 10230 9207 OK sawtooth unriffle 1024 1024 16384 10240 9217 OK sawtooth unriffle_skewed 1025 2048 18454 11275 10241 OK The sawtooth distribution with m>=n produces a sorted list and unriffle_skewed mode turns it into adversarial input for unbalanced merges, which it wins in all cases except for n=1024 -- the resulting list is the same, but unriffle is tested before unriffle_skewed, so its result is selected by the AWK script. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:09 -07:00
René Scharfe	1aa589922b	test-mergesort: add unriffle mode Add a mode that turns sorted items into adversarial input for mergesort. Do that by running mergesort in reverse and rearranging the items in such a way that each merge needs the maximum number of operations to undo it. To riffle is a card shuffling technique and involves splitting a deck into two and then to interleave them. A perfect riffle takes one card from each half in turn. That's similar to the most expensive merge, which has to take one item from each sublist in turn, which requires the maximum number of comparisons (n-1). So unriffle does that in reverse, i.e. it generates the first sublist out of the items at even indexes and the second sublist out of the items at odd indexes, without changing their order in any other way. Done recursively until we reach the trivial sublist length of one, this twists the list into an order that requires the maximum effort for mergesort to untangle. As a baseline, here are the rand distributions with the highest number of comparisons from "test-tool mergesort test": $ t/helper/test-tool mergesort test \| awk ' NR > 1 && $1 != "rand" {next} $7 > max[$3] {max[$3] = $7; line[$3] = $0} END {for (n in line) print line[n]} ' distribut mode n m get_next set_next compare verdict rand copy 100 32 1184 700 569 OK rand reverse_1st_half 1023 256 16373 10230 8976 OK rand reverse_1st_half 1024 512 16384 10240 8993 OK rand dither 1025 64 18454 11275 9970 OK And here are the most expensive ones overall: $ t/helper/test-tool mergesort test \| awk ' $7 > max[$3] {max[$3] = $7; line[$3] = $0} END {for (n in line) print line[n]} ' distribut mode n m get_next set_next compare verdict stagger reverse 100 64 1184 700 580 OK sawtooth unriffle 1023 1024 16373 10230 9179 OK sawtooth unriffle 1024 1024 16384 10240 9217 OK stagger unriffle 1025 2048 18454 11275 10241 OK The sawtooth distribution with m>=n generates a sorted list. The unriffle mode is designed to turn that into adversarial input for mergesort, and that checks out for n=1023 and n=1024, where it produces the list that requires the most comparisons. Item counts that are not powers of two have other winners, and that's because unriffle recursively splits lists into equal-sized halves, while llist_mergesort() splits them into the biggest power of two smaller than n and the rest, e.g. for n=1025 it sorts the first 1024 separately and finally merges them to the last item. So unriffle mode works as designed for the intended use case, but to consistently generate adversarial input for unbalanced merges we need something else. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:08 -07:00
René Scharfe	0cecb75531	test-mergesort: add generate subcommand Add a subcommand for printing test data. It can be used to generate special test cases and feed them into the sort subcommand or sort(1) for performance measurements. It may also be useful to illustrate the effect of distributions, modes and their parameters. It generates n integers with the specified distribution and its distribution-specific parameter m. E.g. m is the maximum value for the plateau distribution and the length and height of individual teeth of the sawtooth distribution. The generated values are printed as zero-padded eight-digit hexadecimal numbers to make sure alphabetic and numeric order are the same. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:08 -07:00
René Scharfe	e031e9719d	test-mergesort: add test subcommand Adapt the qsort certification program from "Engineering a Sort Function" by Bentley and McIlroy for testing our linked list sort function. It generates several lists with various distribution patterns and counts the number of operations llist_mergesort() needs to order them. It compares the result to the output of a trusted sort function (qsort(1)) and also checks if the sort is stable. Also add a test script that makes use of the new subcommand. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:08 -07:00
René Scharfe	d536a71169	test-mergesort: add sort subcommand Give the code for sorting a text file its own sub-command. This allows extending the helper, which we'll do in the following patches. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:08 -07:00
René Scharfe	2e6701017e	test-mergesort: use strbuf_getline() Strip line ending characters to make sure empty lines are sorted like sort(1) does. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 12:43:08 -07:00
David Aguilar	28c10ecbfc	difftool: add a missing space to the run_dir_diff() comments Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-30 18:48:51 -07:00
David Aguilar	8e2af8f0db	difftool: remove an unnecessary call to strbuf_release() The `buf` strbuf is reused again later in the same function, so there is no benefit to calling strbuf_release(). The subsequent usage is already using strbuf_reset() to reset the buffer, so releasing it early is only going to lead to a wasteful reallocation. Remove the early call to strbuf_release(). The same strbuf is already cleaned up in the "finish:" section so nothing is leaked, either. Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-30 18:48:51 -07:00
David Aguilar	2255c80c91	difftool: refactor dir-diff to write files using helper functions Add a helpers function to handle the unlinking and writing of the dir-diff submodule and symlink stand-in files. Use the helpers to implement the guts of the hashmap loops. This eliminate duplicate code and safeguards the submodules hashmap loop against the symlink-chasing behavior that `5bafb3576a` (difftool: fix symlink-file writing in dir-diff mode, 2021-09-22) addressed. The submodules loop should not strictly require the unlink() call that this is introducing to them, but it does not necessarily hurt them either beyond the cost of the extra unlink(). Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-30 18:48:51 -07:00
David Aguilar	4ac9f15492	difftool: create a tmpdir path without repeated slashes The paths generated by difftool are passed to user-facing diff tools. Using paths with repeated slashes in them is a cosmetic blemish that is exposed to users and can be avoided. Use a strbuf to create the buffer used for the dir-diff tmpdir. Strip trailing slashes from the value read from TMPDIR to avoid repeated slashes in the generated paths. Adjust the error handling to avoid leaking strbufs and to avoid returning -1 to cmd_main(). Signed-off-by: David Aguilar <davvid@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-30 18:48:51 -07:00
Hamza Mahfooz	3f566c4e69	grep: refactor next_match() and match_one_pattern() for external use These changes are made in preparation of, the colorization support for the "git log" subcommands that, rely on regex functionality (i.e. "--author", "--committer" and "--grep"). These changes are necessary primarily because match_one_pattern() expects header lines to be prefixed, however, in pretty, the prefixes are stripped from the lines because the name-email pairs need to go through additional parsing, before they can be printed and because next_match() doesn't handle the case of "ctx == GREP_CONTEXT_HEAD" at all. So, teach next_match() how to handle the new case and move match_one_pattern()'s core logic to headerless_match_one_pattern() while preserving match_one_pattern()'s uses that depend on the additional processing. Signed-off-by: Hamza Mahfooz <someguy@effective-light.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-29 13:23:11 -07:00
Matheus Tavares	45bde58ef8	grep: demonstrate bug with textconv attributes and submodules In some circumstances, "git grep --textconv --recurse-submodules" ignores the textconv attributes from the submodules and erroneously applies the attributes defined in the superproject on the submodules' files. The textconv cache is also saved on the superproject, even for submodule objects. A fix for these problems will probably require at least three changes: - Some textconv and attributes functions (as well as their callees) will have to be adjusted to work with arbitrary repositories. Note that "fill_textconv()", for example, already receives a "struct repository" but it writes the textconv cache using "write_loose_object()", which implicitly works on "the_repository". - grep.c functions will have to call textconv/userdiff routines passing the "repo" field from "struct grep_source" instead of the one from "struct grep_opt". The latter always points to "the_repository" on "git grep" executions (see its initialization in builtin/grep.c), but the former points to the correct repository that each source (an object, file, or buffer) comes from. - "userdiff_find_by_path()" might need to use a different attributes stack for each repository it works on or reset its internal static stack when the repository is changed throughout the calls. For now, let's add some tests to demonstrate these problems, and also update a NEEDSWORK comment in grep.h that mentions this bug to reference the added tests. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-29 13:19:38 -07:00
Taylor Blau	6d08b9d4ca	builtin/repack.c: make largest pack preferred When repacking into a geometric series and writing a multi-pack bitmap, it is beneficial to have the largest resulting pack be the preferred object source in the bitmap's MIDX, since selecting the large packs can lead to fewer broken delta chains and better compression. Teach 'git repack' to identify this pack and pass it to the MIDX write machinery in order to mark it as preferred. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:56 -07:00
Taylor Blau	1d89d88d37	builtin/repack.c: support writing a MIDX while repacking Teach `git repack` a new `--write-midx` option for callers that wish to persist a multi-pack index in their repository while repacking. There are two existing alternatives to this new flag, but they don't cover our particular use-case. These alternatives are: - Call 'git multi-pack-index write' after running 'git repack', or - Set 'GIT_TEST_MULTI_PACK_INDEX=1' in your environment when running 'git repack'. The former works, but introduces a gap in bitmap coverage between repacking and writing a new MIDX (since the repack may have deleted a pack included in the existing MIDX, invalidating it altogether). Setting the 'GIT_TEST_' environment variable is obviously unsupported. In fact, even if it were supported officially, it still wouldn't work, because it generates the MIDX after redundant packs have been dropped, leading to the same issue as above. Introduce a new option which eliminates this race by teaching `git repack` to generate the MIDX at the critical point: after the new packs have been written and moved into place, but before the redundant packs have been removed. This option is compatible with `git repack`'s '--bitmap' option (it changes the interpretation to be: "write a bitmap corresponding to the MIDX after one has been generated"). There is a little bit of additional noise in the patch below to avoid repeating ourselves when selecting which packs to delete. Instead of a single loop as before (where we iterate over 'existing_packs', decide if a pack is worth deleting, and if so, delete it), we have two loops (the first where we decide which ones are worth deleting, and the second where we actually do the deleting). This makes it so we have a single check we can make consistently when (1) telling the MIDX which packs we want to exclude, and (2) actually unlinking the redundant packs. There is also a tiny change to short-circuit the body of write_midx_included_packs() when no packs remain in the case of an empty repository. The MIDX code does not handle this, so avoid trying to generate a MIDX covering zero packs in the first place. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:56 -07:00
Taylor Blau	5f18e31f46	builtin/repack.c: extract showing progress to a variable We only ask whether stderr is a tty before calling 'prune_packed_objects()', but the subsequent patch will add another use. Extract this check into a variable so that both can use it without having to call 'isatty()' twice. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:56 -07:00
Taylor Blau	a169166d2b	builtin/repack.c: rename variables that deal with non-kept packs The new variable `existing_kept_packs` (and corresponding parameter `fname_kept_list`) added by the previous patch make it seem like `existing_packs` and `fname_list` are each subsets of the other two respectively. In reality, each pair is disjoint: one stores the packs without .keep files, and the other stores the packs with .keep files. Rename each to more clearly reflect this. Suggested-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:56 -07:00
Taylor Blau	90f838bc36	builtin/repack.c: keep track of existing packs unconditionally In order to be able to write a multi-pack index during repacking, `git repack` must keep track of which packs it wants to write into the MIDX. This set is the union of existing packs which will not be deleted, new pack(s) generated as a result of the repack, and .keep packs. Prior to this patch, `git repack` populated the list of existing packs only when repacking all-into-one (i.e., with `-A` or `-a`), but we will soon need to know this list when repacking when writing a MIDX without a-i-o. Populate the list of existing packs unconditionally, and guard removing packs from that list only when repacking a-i-o. Additionally, keep track of filenames of kept packs separately, since this, too, will be used in an upcoming patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:56 -07:00
Taylor Blau	08944d1c22	midx: preliminary support for `--refs-snapshot` To figure out which commits we can write a bitmap for, the multi-pack index/bitmap code does a reachability traversal, marking any commit which can be found in the MIDX as eligible to receive a bitmap. This approach will cause a problem when multi-pack bitmaps are able to be generated from `git repack`, since the reference tips can change during the repack. Even though we ignore commits that don't exist in the MIDX (when doing a scan of the ref tips), it's possible that a commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers to loose objects (e.g., if a pack was pushed after starting the repack but before generating the MIDX which depends on an object which is stored as loose in the repository, and by definition isn't included in the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed object pointing at a loose one), we'll either (a) take a snapshot of the references before seeing the packed one, or (b) take it after, at which point we can guarantee that the loose object will be packed and included in the MIDX. This patch does just that. It writes a temporary "reference snapshot", which is a list of OIDs that are at the ref tips before writing a multi-pack bitmap. References that are "preferred" (i.e,. are a suffix of at least one value of the 'pack.preferBitmapTips' configuration) are marked with a special '+'. The format is simple: one line per commit at each tip, with an optional '+' at the beginning (for preferred references, as described above). When provided, the reference snapshot is used to drive bitmap selection instead of the MIDX code doing its own traversal. When it isn't provided, the usual traversal takes place instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:56 -07:00
Taylor Blau	6fb22ca463	builtin/multi-pack-index.c: support `--stdin-packs` mode To power a new `--write-midx` mode, `git repack` will want to write a multi-pack index containing a certain set of packs in the repository. This new option will be used by `git repack` to write a MIDX which contains only the packs which will survive after the repack (that is, it will exclude any packs which are about to be deleted). This patch effectively exposes the function implemented in the previous commit via the `git multi-pack-index` builtin. An alternative approach would have been to call that function from the `git repack` builtin directly, but this introduces awkward problems around closing and reopening the object store, so the MIDX will be written out-of-process. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:55 -07:00
Taylor Blau	56d863e979	midx: expose `write_midx_file_only()` publicly Expose a variant of the write_midx_file() function which ignores packs that aren't included in an explicit "allow" list. This will be used in an upcoming patch to power a new `--stdin-packs` mode of `git multi-pack-index write` for callers that only want to include certain packs in a MIDX (and ignore any packs which may have happened to enter the repository independently, e.g., from pushes). Those patches will provide test coverage for this new function. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:20:55 -07:00
Carlo Marcelo Arenas Belón	ebd2e4a13a	Makefile: restrict -Wpedantic and -Wno-pedantic-ms-format better `6a8cbc41ba` (developer: enable pedantic by default, 2021-09-03) enables pedantic mode in as many compilers as possible to help gather feedback on future tightening, so lets do so. -Wpedantic is missing in some really old gcc 4 versions so lets restrict it to gcc5 and clang4 (it does work in clang3 AFAIK, but it will be unlikely that a developer will use such an old compiler anyway). MinGW gcc is the only one which has -Wno-pedantic-ms-format, and while that is available also in older compilers, the Windows SDK provides gcc10 so lets aim for that. Note that in order to target the flag to only Windows, additional changes were needed in config.mak.uname to propagate the OS detection which also did some minor refactoring, but which is functionaly equivalent. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 21:15:53 -07:00
Ævar Arnfjörð Bjarmason	3b723f722d	parse-options.h: move PARSE_OPT_SHELL_EVAL between enums Fix a bad landmine of a bug which has been with us ever since PARSE_OPT_SHELL_EVAL was added in `47e9cd28f8` (parseopt: wrap rev-parse --parseopt usage for eval consumption, 2010-06-12). It's an argument to parse_options() and should therefore be in "enum parse_opt_flags", but it was added to the per-option "enum parse_opt_option_flags" by mistake. Therefore as soon as we'd have an enum member in the former that reached its value of "1 << 8" we'd run into a seemingly bizarre bug where that new option would turn on the unrelated PARSE_OPT_SHELL_EVAL in "git rev-parse --parseopt" by proxy. I manually checked that no other enum members suffered from such overlap, by setting the values to non-overlapping values, and making the relevant codepaths BUG() out if the given value was above/below the expected (excluding flags=0 in the case of "enum parse_opt_flags"). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 16:50:42 -07:00
Orgad Shaneh	6ffb990dc4	doc: fix capitalization in "git status --porcelain=v2" description The summary line had xy, while the description (and other sub-sections) has XY. Signed-off-by: Orgad Shaneh <orgads@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 16:29:04 -07:00
Junio C Hamano	b6b210c5e1	Merge branch 'jk/ref-paranoia' into jt/no-abuse-alternate-odb-for-submodules * jk/ref-paranoia: (71 commits) refs: drop "broken" flag from for_each_fullref_in() ref-filter: drop broken-ref code entirely ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN repack, prune: drop GIT_REF_PARANOIA settings refs: turn on GIT_REF_PARANOIA by default refs: omit dangling symrefs when using GIT_REF_PARANOIA refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag refs-internal.h: reorganize DO_FOR_EACH_* flag documentation refs-internal.h: move DO_FOR_EACH_* flags next to each other t5312: be more assertive about command failure t5312: test non-destructive repack t5312: create bogus ref as necessary t5312: drop "verbose" helper t5600: provide detached HEAD for corruption failures t5516: don't use HEAD ref for invalid ref-deletion tests t7900: clean up some more broken refs The eighth batch t0000: avoid masking git exit value through pipes tree-diff: fix leak when not HAVE_ALLOCA_H pack-revindex.h: correct the time complexity descriptions ...	2021-09-28 15:15:42 -07:00
Ævar Arnfjörð Bjarmason	750036c8f7	refs/ref-cache.[ch]: remove "incomplete" from create_dir_entry() Remove the now-unused "incomplete" parameter from create_dir_entry(), all its callers specify it as "1", so let's drop the "incomplete=0" case. The last caller to use it was search_for_subdir(), but that code was removed in the preceding commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 15:12:04 -07:00
Ævar Arnfjörð Bjarmason	5e4546d599	refs/ref-cache.c: remove "mkdir" parameter from find_containing_dir() Remove the "mkdir" parameter from the find_containing_dir() function, the add_ref_entry() function removed in the preceding commit was its last user. Since "mkdir" is always "0" we can also remove the parameter from search_for_subdir(), which in turn means that we can delete most of that function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-28 15:12:04 -07:00

... 8 9 10 11 12 ...

65011 Commits