git-commit-vandalism

Author	SHA1	Message	Date
Jeff King	61cc4be7ec	t1006: stop using 0-padded timestamps The fake objects in t1006 use dummy timestamps like "0000000000 +0000". While this does make them look more like normal timestamps (which, unless it is 1970, have many digits), it actually violates our fsck checks, which complain about zero-padded timestamps. This doesn't currently break anything, but let's future-proof our tests against a version of hash-object which is a little more careful about its input. We don't actually care about the exact values here (and in fact, the helper functions in this script end up removing the timestamps anyway, so we don't even have to adjust other parts of the tests). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-01-18 12:59:44 -08:00
Taylor Blau	db9d67f2e9	builtin/cat-file.c: support NUL-delimited input with `-z` When callers are using `cat-file` via one of the stdin-driven `--batch` modes, all input is newline-delimited. This presents a problem when callers wish to ask about, e.g. tree-entries that have a newline character present in their filename. To support this niche scenario, introduce a new `-z` mode to the `--batch`, `--batch-check`, and `--batch-command` suite of options that instructs `cat-file` to treat its input as NUL-delimited, allowing the individual commands themselves to have newlines present. The refactoring here is slightly unfortunate, since we turn loops like: while (strbuf_getline(&buf, stdin) != EOF) into: while (1) { int ret; if (opt->nul_terminated) ret = strbuf_getline_nul(&input, stdin); else ret = strbuf_getline(&input, stdin); if (ret == EOF) break; } It's tempting to think that we could use `strbuf_getwholeline()` and specify either `\n` or `\0` as the terminating character. But for input on platforms that include a CR character preceeding the LF, this wouldn't quite be the same, since `strbuf_getline(...)` will trim any trailing CR, while `strbuf_getwholeline(&buf, stdin, '\n')` will not. In the future, we could clean this up further by introducing a variant of `strbuf_getwholeline()` that addresses the aforementioned gap, but that approach felt too heavy-handed for this pair of uses. Some tests are added in t1006 to ensure that `cat-file` produces the same output in `--batch`, `--batch-check`, and `--batch-command` modes with and without the new `-z` option. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-22 21:42:06 -07:00
Taylor Blau	3639fefe7d	t1006: extract --batch-command inputs to variables A future commit will want to ensure that various `--batch`-related options produce the same output whether their input is newline terminated, or NUL terminated (and a to-be-implemented `-z` option exists). To prepare for this, extract the given input(s) into separate variables to that their LF characters can easily be converted into NUL bytes when testing the new `-z` mode. This is consistent with other tests in t1006 (which these days is no longer a shining example of our CodingGuidelines). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-22 21:42:05 -07:00
Ævar Arnfjörð Bjarmason	4627c67fa6	object-file: fix a unpack_loose_header() regression in `3b6a8db3b0` Fix a regression in my `3b6a8db3b0` (object-file.c: use "enum" return type for unpack_loose_header(), 2021-10-01) revealed both by running the test suite with --valgrind, and with the amended "git fsck" test. In practice this regression in v2.34.0 caused us to claim that we couldn't parse the header, as opposed to not being able to unpack it. Before the change in the C code the test_cmp added here would emit: -error: unable to unpack header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 +error: unable to parse header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 I.e. we'd proceed to call parse_loose_header() on the uninitialized "hdr" value, and it would have been very unlikely for that uninitialized memory to be a valid git object. The other callers of unpack_loose_header() were already checking the enum values exhaustively. See `3b6a8db3b0` and `5848fb11ac` (object-file.c: return ULHR_TOO_LONG on "header too long", 2021-10-01). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-12 15:42:26 -07:00
John Cai	440c705ea6	cat-file: add --batch-command mode Add a new flag --batch-command that accepts commands and arguments from stdin, similar to git-update-ref --stdin. At GitLab, we use a pair of long running cat-file processes when accessing object content. One for iterating over object metadata with --batch-check, and the other to grab object contents with --batch. However, if we had --batch-command, we wouldn't need to keep both processes around, and instead just have one --batch-command process where we can flip between getting object info, and getting object contents. Since we have a pair of cat-file processes per repository, this means we can get rid of roughly half of long lived git cat-file processes. Given there are many repositories being accessed at any given time, this can lead to huge savings. git cat-file --batch-command will enter an interactive command mode whereby the user can enter in commands and their arguments that get queued in memory: <command1> [arg1] [arg2] LF <command2> [arg1] [arg2] LF When --buffer mode is used, commands will be queued in memory until a flush command is issued that execute them: flush LF The reason for a flush command is that when a consumer process (A) talks to a git cat-file process (B) and interactively writes to and reads from it in --buffer mode, (A) needs to be able to control when the buffer is flushed to stdout. Currently, from (A)'s perspective, the only way is to either 1. kill (B)'s process 2. send an invalid object to stdin. 1. is not ideal from a performance perspective as it will require spawning a new cat-file process each time, and 2. is hacky and not a good long term solution. With this mechanism of queueing up commands and letting (A) issue a flush command, process (A) can control when the buffer is flushed and can guarantee it will receive all of the output when in --buffer mode. --batch-command also will not allow (B) to flush to stdout until a flush is received. This patch adds the basic structure for adding command which can be extended in the future to add more commands. It also adds the following two commands (on top of the flush command): contents <object> LF info <object> LF The contents command takes an <object> argument and prints out the object contents. The info command takes an <object> argument and prints out the object metadata. These can be used in the following way with --buffer: info <object> LF contents <object> LF contents <object> LF info <object> LF flush LF info <object> LF flush LF When used without --buffer: info <object> LF contents <object> LF contents <object> LF info <object> LF info <object> LF Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-18 11:21:46 -08:00
John Cai	4cf5d53b62	cat-file: add remove_timestamp helper maybe_remove_timestamp() takes arguments, but it would be useful to have a function that reads from stdin and strips the timestamp. This would allow tests to pipe data into a function to remove timestamps, and wouldn't have to always assign a variable. This is especially helpful when the data is multiple lines. Keep maybe_remove_timestamp() the same, but add a remove_timestamp helper that reads from stdin. The tests in the next patch will make use of this. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-18 11:21:46 -08:00
Junio C Hamano	008028a910	Merge branch 'ab/cat-file' Assorted updates to "git cat-file", especially "-h". * ab/cat-file: cat-file: s/_/-/ in typo'd usage_msg_optf() message cat-file: don't whitespace-pad "(...)" in SYNOPSIS and usage output cat-file: use GET_OID_ONLY_TO_DIE in --(textconv\|filters) object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY cat-file: correct and improve usage information cat-file: fix remaining usage bugs cat-file: make --batch-all-objects a CMDMODE cat-file: move "usage" variable to cmd_cat_file() cat-file docs: fix SYNOPSIS and "-h" output parse-options API: add a usage_msg_optf() cat-file tests: test messaging on bad objects/paths cat-file tests: test bad usage	2022-02-05 09:42:31 -08:00
Junio C Hamano	4f4b18497a	Merge branch 'es/test-chain-lint' Broken &&-chains in the test scripts have been corrected. * es/test-chain-lint: t6000-t9999: detect and signal failure within loop t5000-t5999: detect and signal failure within loop t4000-t4999: detect and signal failure within loop t0000-t3999: detect and signal failure within loop tests: simplify by dropping unnecessary `for` loops tests: apply modern idiom for exiting loop upon failure tests: apply modern idiom for signaling test failure tests: fix broken &&-chains in `{...}` groups tests: fix broken &&-chains in `$(...)` command substitutions tests: fix broken &&-chains in compound statements tests: use test_write_lines() to generate line-oriented output tests: simplify construction of large blocks of text t9107: use shell parameter expansion to avoid breaking &&-chain t6300: make `%(raw:size) --shell` test more robust t5516: drop unnecessary subshell and command invocation t4202: clarify intent by creating expected content less cleverly t1020: avoid aborting entire test script when one test fails t1010: fix unnoticed failure on Windows t/lib-pager: use sane_unset() to avoid breaking &&-chain	2022-01-03 16:24:15 -08:00
Ævar Arnfjörð Bjarmason	b3fe468075	cat-file: fix remaining usage bugs With the migration of --batch-all-objects to OPT_CMDMODE() in the preceding commit one bug with combining it and other OPT_CMDMODE() options was solved, but we were still left with e.g. --buffer silently being discarded when not in batch mode. Fix all those bugs, and in addition emit errors telling the user specifically what options can't be combined with what other options, before this we'd usually just emit the cryptic usage text and leave the users to work it out by themselves. This change is rather large, because to do so we need to untangle the options processing so that we can not only error out, but emit sensible errors, and e.g. emit errors about options before errors about stray argc elements (as they might become valid if the option were removed). Some of the output changes ("error:" to "fatal:" with usage_msg_opt[f]()), but none of the exit codes change, except in those cases where we silently accepted bad option combinations before, now we'll error out. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-30 13:05:29 -08:00
Ævar Arnfjörð Bjarmason	485fd2c3da	cat-file: make --batch-all-objects a CMDMODE The usage of OPT_CMDMODE() in "cat-file"[1] was added in parallel with the development of[3] the --batch-all-objects option[4], so we've since grown[5] checks that it can't be combined with other command modes, when it should just be made a top-level command-mode instead. It doesn't combine with --filters, --textconv etc. By giving parse_options() information about what options are mutually exclusive with one another we can get the die() message being removed here for free, we didn't even use that removed message in some cases, e.g. for both of: --batch-all-objects --textconv --batch-all-objects --filters We'd take the "goto usage" in the "if (opt)" branch, and never reach the previous message. Now we'll emit e.g.: $ git cat-file --batch-all-objects --filters error: option `filters' is incompatible with --batch-all-objects 1. `b48158ac94` (cat-file: make the options mutually exclusive, 2015-05-03) 2. https://lore.kernel.org/git/xmqqtwspgusf.fsf@gitster.dls.corp.google.com/ 3. https://lore.kernel.org/git/20150622104559.GG14475@peff.net/ 4. `6a951937ae` (cat-file: add --batch-all-objects option, 2015-06-22) 5. `321459439e` (cat-file: support --textconv/--filters in batch mode, 2016-09-09) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-30 13:05:29 -08:00
Ævar Arnfjörð Bjarmason	ddf8420b59	cat-file tests: test bad usage Stress test the usage emitted when options are combined in ways that isn't supported. Let's test various option combinations, some of these we buggily allow right now. E.g. this reveals a bug in `321459439e` (cat-file: support --textconv/--filters in batch mode, 2016-09-09) that we'll fix in a subsequent commit. We're supposed to be emitting a relevant message when --batch-all-objects is combined with --textconv or --filters, but we don't. The cases of needing to assign to opt=2 in the "opt" loop are because on those we do the right thing already, in subsequent commits the "test_expect_failure" cases will be fixed, and the for-loops unified. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-30 13:05:28 -08:00
Eric Sunshine	7abcbcb7ea	tests: fix broken &&-chains in `{...}` groups The top-level &&-chain checker built into t/test-lib.sh causes tests to magically exit with code 117 if the &&-chain is broken. However, it has the shortcoming that the magic does not work within `{...}` groups, `(...)` subshells, `$(...)` substitutions, or within bodies of compound statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed` partly fills in the gap by catching broken &&-chains in `(...)` subshells, but bugs can still lurk behind broken &&-chains in the other cases. Fix broken &&-chains in `{...}` groups in order to reduce the number of possible lurking bugs. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-13 10:29:48 -08:00
Eric Sunshine	c576868eaf	tests: fix broken &&-chains in `$(...)` command substitutions The top-level &&-chain checker built into t/test-lib.sh causes tests to magically exit with code 117 if the &&-chain is broken. However, it has the shortcoming that the magic does not work within `{...}` groups, `(...)` subshells, `$(...)` substitutions, or within bodies of compound statements, such as `if`, `for`, `while`, `case`, etc. `chainlint.sed` partly fills in the gap by catching broken &&-chains in `(...)` subshells, but bugs can still lurk behind broken &&-chains in the other cases. Fix broken &&-chains in `$(...)` command substitutions in order to reduce the number of possible lurking bugs. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-13 10:29:48 -08:00
Han-Wen Nienhuys	e9706a188f	refs: introduce REF_SKIP_OID_VERIFICATION flag This lets the ref-store test helper write non-existent or unparsable objects into the ref storage. Use this to make t1006 and t3800 independent of the files storage backend. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-07 13:15:19 -08:00
Junio C Hamano	061a21d36d	Merge branch 'ab/fsck-unexpected-type' "git fsck" has been taught to report mismatch between expected and actual types of an object better. * ab/fsck-unexpected-type: fsck: report invalid object type-path combinations fsck: don't hard die on invalid object types object-file.c: stop dying in parse_loose_header() object-file.c: return ULHR_TOO_LONG on "header too long" object-file.c: use "enum" return type for unpack_loose_header() object-file.c: simplify unpack_loose_short_header() object-file.c: make parse_loose_header_extended() public object-file.c: return -1, not "status" from unpack_loose_header() object-file.c: don't set "typep" when returning non-zero cat-file tests: test for current --allow-unknown-type behavior cat-file tests: add corrupt loose object test cat-file tests: test for missing/bogus object with -t, -s and -p cat-file tests: move bogus_* variable declarations earlier fsck tests: test for garbage appended to a loose object fsck tests: test current hash/type mismatch behavior fsck tests: refactor one test to use a sub-repo fsck tests: add test for fsck-ing an unknown type	2021-10-25 16:06:56 -07:00
Jeff King	5c5b29b459	cat-file: disable refs/replace with --batch-all-objects When we're enumerating all objects in the object database, it doesn't make sense to respect refs/replace. The point of this option is to enumerate all of the objects in the database at a low level. By definition we'd already show the replacement object's contents (under its real oid), and showing those contents under another oid is almost certainly working against what the user is trying to do. Note that you could make the same argument for something like: git show-index <foo.idx \| awk '{print $2}' \| git cat-file --batch but there we can't know in cat-file exactly what the user intended, because we don't know the source of the input. They could be trying to do low-level debugging, or they could be doing something more high-level (e.g., imagine a porcelain built around cat-file for its object accesses). So in those cases, we'll have to rely on the user specifying "git --no-replace-objects" to tell us what to do. One _could_ make an argument that "cat-file --batch" is sufficiently low-level plumbing that it should not respect replace-objects at all (and the caller should do any replacement if they want it). But we have been doing so for some time. The history is a little tangled: - looking back as far as v1.6.6, we would not respect replace refs for --batch-check, but would for --batch (because the former used sha1_object_info(), and the replace mechanism only affected actual object reads) - this discrepancy was made even weirder by `98e2092b50` (cat-file: teach --batch to stream blob objects, 2013-07-10), where we always output the header using the --batch-check code, and then printed the object separately. This could lead to "cat-file --batch" dying (when it notices the size or type changed for a non-blob object) or even producing bogus output (in streaming mode, we didn't notice that we wrote the wrong number of bytes). - that persisted until `1f7117ef7a` (sha1_file: perform object replacement in sha1_object_info_extended(), 2013-12-11), which then respected replace refs for both forms. So it has worked reliably this way for over 7 years, and we should make sure it continues to do so. That could also be an argument that --batch-all-objects should not change behavior (which this patch is doing), but I really consider the current behavior to be an unintended bug. It's a side effect of how the code is implemented (feeding the oids back into oid_object_info() rather than looking at what we found while reading the loose and packed object storage). The implementation is straight-forward: we just disable the global read_replace_refs flag when we're in --batch-all-objects mode. It would perhaps be a little cleaner to change the flag we pass to oid_object_info_extended(), but that's not enough. We also read objects via read_object_file() and stream_blob_to_fd(). The former could switch to its _extended() form, but the streaming code has no mechanism for disabling replace refs. Setting the global flag works, and as a bonus, it's impossible to have any "oops, we're sometimes replacing the object and sometimes not" bugs in the output (like the ones caused by `98e2092b50` above). The tests here cover the regular-input and --batch-all-objects cases, for both --batch-check and --batch. There is a test in t6050 that covers the regular-input case with --batch already, but this new one goes much further in actually verifying the output (plus covering --batch-check explicitly). This is perhaps a little overkill and the tests would be simpler just covering --batch-check, but I wanted to make sure we're checking that --batch output is consistent between the header and the content. The global-flag technique used here makes that easy to get right, but this is future-proofing us against regressions. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 15:45:14 -07:00
Jeff King	e879295b20	t1006: clean up broken objects A few of the tests create intentionally broken objects with broken types. Let's clean them up after we're done with them, so that later tests don't get confused (we hadn't noticed because this only affects tests which use --batch-all-objects, but I'm about to add more). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 15:45:14 -07:00
Ævar Arnfjörð Bjarmason	96e41f58fe	fsck: report invalid object type-path combinations Improve the error that's emitted in cases where we find a loose object we parse, but which isn't at the location we expect it to be. Before this change we'd prefix the error with a not-a-OID derived from the path at which the object was found, due to an emergent behavior in how we'd end up with an "OID" in these codepaths. Now we'll instead say what object we hashed, and what path it was found at. Before this patch series e.g.: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ mv objects/e6/ objects/e7 Would emit ("[...]" used to abbreviate the OIDs): git fsck error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...]) error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...] Now we'll instead emit: error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...] Furthermore, we'll do the right thing when the object type and its location are bad. I.e. this case: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ mv objects/83 objects/84 As noted in an earlier commits we'd simply die early in those cases, until preceding commits fixed the hard die on invalid object type: $ git fsck fatal: invalid object type Now we'll instead emit sensible error messages: $ git fsck error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...] error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...] In both fsck.c and object-file.c we're using null_oid as a sentinel value for checking whether we got far enough to be certain that the issue was indeed this OID mismatch. We need to add the "object corrupt or missing" special-case to deal with cases where read_loose_object() will return an error before completing check_object_signature(), e.g. if we have an error in unpack_loose_rest() because we find garbage after the valid gzip content: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ git fsck error: garbage at end of loose object 'e69d[...]' error: unable to unpack contents of ./objects/e6/9d[...] error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...] There is currently some weird messaging in the edge case when the two are combined, i.e. because we're not explicitly passing along an error state about this specific scenario from check_stream_oid() via read_loose_object() we'll end up printing the null OID if an object is of an unknown type and it can't be unpacked by zlib, e.g.: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ /usr/bin/git fsck fatal: invalid object type $ ~/g/git/git fsck error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f' error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f [...] I think it's OK to leave that for future improvements, which would involve enum-ifying more error state as we've done with "enum unpack_loose_header_result" in preceding commits. In these increasingly more obscure cases the worst that can happen is that we'll get slightly nonsensical or inapplicable error messages. There's other such potential edge cases, all of which might produce some confusing messaging, but still be handled correctly as far as passing along errors goes. E.g. if check_object_signature() returns and oideq(real_oid, null_oid()) is true, which could happen if it returns -1 due to the read_istream() call having failed. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:01 -07:00
Ævar Arnfjörð Bjarmason	5848fb11ac	object-file.c: return ULHR_TOO_LONG on "header too long" Split up the return code for "header too long" from the generic negative return value unpack_loose_header() returns, and report via error() if we exceed MAX_HEADER_LEN. As a test added earlier in this series in t1006-cat-file.sh shows we'll correctly emit zlib errors from zlib.c already in this case, so we have no need to carry those return codes further down the stack. Let's instead just return ULHR_TOO_LONG saying we ran into the MAX_HEADER_LEN limit, or other negative values for "unable to unpack <OID> header". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	dd45a56246	cat-file tests: test for current --allow-unknown-type behavior Add more tests for the current --allow-unknown-type behavior. As noted in [1] I don't think much of this makes sense, but let's test for it as-is so we can see if the behavior changes in the future. 1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason	7e7d220d9d	cat-file tests: add corrupt loose object test Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of object-file.c) by testing that when we can't decode a loose object with zlib we'll emit an error from zlib.c. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	59b8283d55	cat-file tests: test for missing/bogus object with -t, -s and -p When we look up a missing object with cat_one_file() what error we print out currently depends on whether we'll error out early in get_oid_with_context(), or if we'll get an error later from oid_object_info_extended(). The --allow-unknown-type flag then changes whether we pass the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or not. The "-p" flag is yet another special-case in printing the same output on the deadbeef OID as we'd emit on the deadbeef_short OID for the "-s" and "-t" options, it also doesn't support the "--allow-unknown-type" flag at all. Let's test the combination of the two sets of [-t, -s, -p] and [--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit in not supplying it), as well as a [missing,bogus] object pair. This extends tests added in `3e370f9faf` (t1006: add tests for git cat-file --allow-unknown-type, 2015-05-03). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason	70e4a57762	cat-file tests: move bogus_* variable declarations earlier Change the short/long bogus bogus object type variables into a form where the two sets can be used concurrently. This'll be used by subsequently added tests. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:05:59 -07:00
Junio C Hamano	5d96bcbc06	Merge branch 'zh/cat-file-batch-fix' "git cat-file --batch-all-objects"" misbehaved when "--batch" is in use and did not ask for certain object traits. * zh/cat-file-batch-fix: cat-file: merge two block into one cat-file: handle trivial --batch format with --batch-all-objects	2021-07-13 16:52:49 -07:00
ZheNing Hu	e16acc80a7	cat-file: handle trivial --batch format with --batch-all-objects The --batch code to print an object assumes we found out the type of the object from calling oid_object_info_extended(). This is true for the default format, but even in a custom format, we manually modify the object_info struct to ask for the type. This assumption was broken by `845de33a5b` (cat-file: avoid noop calls to sha1_object_info_extended, 2016-05-18). That commit skips the call to oid_object_info_extended() entirely when --batch-all-objects is in use, and the custom format does not include any placeholders that require calling it. Or when the custom format only include placeholders like %(objectname) or %(rest), oid_object_info_extended() will not get the type of the object. This results in an error when we try to confirm that the type didn't change: $ git cat-file --batch=batman --batch-all-objects batman fatal: object `000023961a` changed type!? and also has other subtle effects (e.g., we'd fail to stream a blob, since we don't realize it's a blob in the first place). We can fix this by flipping the order of the setup. The check for "do we need to get the object info" must come _after_ we've decided whether we need to look up the type. Helped-by: Jeff King <peff@peff.net> Signed-off-by: ZheNing Hu <adlternative@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-04 07:50:26 +09:00
Ævar Arnfjörð Bjarmason	acf9de4c94	mktag: use fsck instead of custom verify_tag() Change the validation logic in "mktag" to use fsck's fsck_tag() instead of its own custom parser. Curiously the logic for both dates back to the same commit[1]. Let's unify them so we're not maintaining two sets functions to verify that a tag is OK. The behavior of fsck_tag() and the old "mktag" code being removed here is different in few aspects. I think it makes sense to remove some of those checks, namely: A. fsck only cares that the timezone matches [-+][0-9]{4}. The mktag code disallowed values larger than 1400. Yes there's currently no timezone with a greater offset[2], but since we allow any number of non-offical timezones (e.g. +1234) passing this through seems fine. Git also won't break in the future if e.g. French Polynesia decides it needs to outdo the Line Islands when it comes to timezone extravagance. B. fsck allows missing author names such as "tagger <email>", mktag wouldn't, but would allow e.g. "tagger [2 spaces] <email>" (but not "tagger [1 space] <email>"). Now we allow all of these. C. Like B, but "mktag" disallowed spaces in the <email> part, fsck allows it. In some ways fsck_tag() is stricter than "mktag" was, namely: D. fsck disallows zero-padded dates, but mktag didn't care. So e.g. the timestamp "0000000000 +0000" produces an error now. A test in "t1006-cat-file.sh" relied on this, it's been changed to use "hash-object" (without fsck) instead. There was one check I deemed worth keeping by porting it over to fsck_tag(): E. "mktag" did not allow any custom headers, and by extension (as an empty commit is allowed) also forbade an extra stray trailing newline after the headers it knew about. Add a new check in the "ignore" category to fsck and use it. This somewhat abuses the facility added in `efaba7cc77` (fsck: optionally ignore specific fsck issues completely, 2015-06-22). This is somewhat of hack, but probably the least invasive change we can make here. The fsck command will shuffle these categories around, e.g. under --strict the "info" becomes a "warn" and "warn" becomes "error". Existing users of fsck's (and others, e.g. index-pack) --strict option rely on this. So we need to put something into a category that'll be ignored by all existing users of the API. Pretending that fsck.extraHeaderEntry=error ("ignore" by default) was set serves to do this for us. 1. `ec4465adb3` (Add "tag" objects that can be used to sign other objects., 2005-04-25) 2. https://en.wikipedia.org/wiki/List_of_UTC_time_offsets Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
brian m. carlson	e023ff0691	t: remove test_oid_init in tests Now that we call test_oid_init in the setup for all test scripts, there's no point in calling it individually. Remove all of the places where we've done so to help keep tests tidy. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-30 09:16:49 -07:00
Junio C Hamano	f2e2136ad7	Merge branch 'md/test-cleanup' Various test scripts have been updated for style and also correct handling of exit status of various commands. * md/test-cleanup: tests: order arguments to git-rev-list properly t9109: don't swallow Git errors upstream of pipes tests: don't swallow Git errors upstream of pipes t/*: fix ordering of expected/observed arguments tests: standardize pipe placement Documentation: add shell guidelines t/README: reformat Do, Don't, Keep in mind lists	2018-10-16 16:16:01 +09:00
Matthew DeVore	bdbc17e86a	tests: standardize pipe placement Instead of using a line-continuation and pipe on the second line, take advantage of the shell's implicit line continuation after a pipe character. So for example, instead of some long line \ \| next line use some long line \| next line And add a blank line before and after the pipe where it aids readability (it usually does). This better matches the coding style documented in Documentation/CodingGuidelines and used in shell scripts elsewhere in the tree. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-07 08:51:18 +09:00
brian m. carlson	e95f53137d	t1006: make hash size independent Compute the size of the tree and commit objects we're creating by checking for the size of an object ID and computing the resulting sizes accordingly. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-17 08:10:32 -07:00
Jeff King	0750bb5b51	cat-file: support "unordered" output for --batch-all-objects If you're going to access the contents of every object in a packfile, it's generally much more efficient to do so in pack order, rather than in hash order. That increases the locality of access within the packfile, which in turn is friendlier to the delta base cache, since the packfile puts related deltas next to each other. By contrast, hash order is effectively random, since the sha1 has no discernible relationship to the content. This patch introduces an "--unordered" option to cat-file which iterates over packs in pack-order under the hood. You can see the results when dumping all of the file content: $ time ./git cat-file --batch-all-objects --buffer --batch \| wc -c 6883195596 real 0m44.491s user 0m42.902s sys 0m5.230s $ time ./git cat-file --unordered \ --batch-all-objects --buffer --batch \| wc -c 6883195596 real 0m6.075s user 0m4.774s sys 0m3.548s Same output, different order, way faster. The same speed-up applies even if you end up accessing the object content in a different process, like: git cat-file --batch-all-objects --buffer --batch-check \| grep blob \| git cat-file --batch='%(objectname) %(rest)' \| wc -c Adding "--unordered" to the first command drops the runtime in git.git from 24s to 3.5s. Side note: there are actually further speedups available for doing it all in-process now. Since we are outputting the object content during the actual pack iteration, we know where to find the object and could skip the extra lookup done by oid_object_info(). This patch stops short of that optimization since the underlying API isn't ready for us to make those sorts of direct requests. So if --unordered is so much better, why not make it the default? Two reasons: 1. We've promised in the documentation that --batch-all-objects outputs in hash order. Since cat-file is plumbing, people may be relying on that default, and we can't change it. 2. It's actually _slower_ for some cases. We have to compute the pack revindex to walk in pack order. And our de-duplication step uses an oidset, rather than a sort-and-dedup, which can end up being more expensive. If we're just accessing the type and size of each object, for example, like: git cat-file --batch-all-objects --buffer --batch-check my best-of-five warm cache timings go from 900ms to 1100ms using --unordered. Though it's possible in a cold-cache or under memory pressure that we could do better, since we'd have better locality within the packfile. And one final question: why is it "--unordered" and not "--pack-order"? The answer is again two-fold: 1. "pack order" isn't a well-defined thing across the whole set of objects. We're hitting loose objects, as well as objects in multiple packs, and the only ordering we're promising is _within_ a single pack. The rest is apparently random. 2. The point here is optimization. So we don't want to promise any particular ordering, but only to say that we will choose an ordering which is likely to be efficient for accessing the object content. That leaves the door open for further changes in the future without having to add another compatibility option. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 13:48:31 -07:00
Jeff King	aa2f5ef500	t1006: test cat-file --batch-all-objects with duplicates The test for --batch-all-objects in t1006 covers a variety of object storage situations, but one thing it doesn't cover is that we avoid mentioning duplicate objects. We won't have any because running "git repack -ad" will have packed them all and deleted the loose ones. This does work (because we sort and de-dup the output list), but it's good to include it in our test. And doubly so for when we add an unordered mode which has to de-dup in a different way. Note that we cannot just re-create one of the objects, as Git will omit the write of an object that is already present. However, we can create a new pack with one of the objects, which forces the duplication. One alternative would be to just use "git repack -a" instead of "-ad". But then _every_ object would be duplicated as loose and packed, and we might miss a bug that omits packed objects (because we'd show their loose counterparts). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 13:48:29 -07:00
brian m. carlson	8125a58b91	t: switch $_z40 to $ZERO_OID Switch all uses of $_z40 to $ZERO_OID so that they work correctly with larger hashes. This commit was created by using the following sed command to modify all files in the t directory except t/test-lib.sh: sed -i 's/\$_z40/$ZERO_OID/g' Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-14 11:02:00 +09:00
Nguyễn Thái Ngọc Duy	c680668d1a	t/helper: merge test-genrandom into test-tool Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-27 08:45:47 -07:00
Ville Skyttä	2e3a16b279	Spelling fixes <BAD> <CORRECTED> accidently accidentally commited committed dependancy dependency emtpy empty existance existence explicitely explicitly git-upload-achive git-upload-archive hierachy hierarchy indegee indegree intial initial mulitple multiple non-existant non-existent precendence. precedence. priviledged privileged programatically programmatically psuedo-binary pseudo-binary soemwhere somewhere successfull successful transfering transferring uncommited uncommitted unkown unknown usefull useful writting writing Signed-off-by: Ville Skyttä <ville.skytta@iki.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-11 14:35:42 -07:00
Jeff King	3115ee45c8	cat-file: sort and de-dup output of --batch-all-objects The sorting we could probably live without, but printing duplicates is just a hassle for the user, who must then de-dup themselves (or risk a wrong answer if they are doing something like counting objects with a particular property). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-26 09:24:42 -07:00
Jeff King	6a951937ae	cat-file: add --batch-all-objects option It can sometimes be useful to examine all objects in the repository. Normally this is done with "git rev-list --all --objects", but: 1. That shows only reachable objects. You may want to look at all available objects. 2. It's slow. We actually open each object to walk the graph. If your operation is OK with seeing unreachable objects, it's an order of magnitude faster to just enumerate the loose directories and pack indices. You can do this yourself using "ls" and "git show-index", but it's non-obvious. This patch adds an option to "cat-file --batch-check" to operate on all available objects (rather than reading names from stdin). This is based on a proposal by Charles Bailey to provide a separate "git list-all-objects" command. That is more orthogonal, as it splits enumerating the objects from getting information about them. However, in practice you will either: a. Feed the list of objects directly into cat-file anyway, so you can find out information about them. Keeping it in a single process is more efficient. b. Ask the listing process to start telling you more information about the objects, in which case you will reinvent cat-file's batch-check formatter. Adding a cat-file option is simple and efficient. And if you really do want just the object names, you can always do: git cat-file --batch-check='%(objectname)' --batch-all-objects Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-06-22 14:55:52 -07:00
Junio C Hamano	67f0b6f3b2	Merge branch 'dt/cat-file-follow-symlinks' "git cat-file --batch(-check)" learned the "--follow-symlinks" option that follows an in-tree symbolic link when asked about an object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at Documentation/RelNotes/2.5.0.txt. With the new option, the command behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as input instead. * dt/cat-file-follow-symlinks: cat-file: add --follow-symlinks to --batch sha1_name: get_sha1_with_context learns to follow symlinks tree-walk: learn get_tree_entry_follow_symlinks	2015-06-01 12:45:16 -07:00
David Turner	122d53464b	cat-file: add --follow-symlinks to --batch This wires the in-repo-symlink following code through to the cat-file builtin. In the event of an out-of-repo link, cat-file will print the link in a new format. Signed-off-by: David Turner <dturner@twopensource.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-20 13:46:21 -07:00
Karthik Nayak	3e370f9faf	t1006: add tests for git cat-file --allow-unknown-type Signed-off-by: Karthik Nayak <karthik.188@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-05-06 13:47:18 -07:00
Jeff King	99094a7ad4	t: fix trivial &&-chain breakage These are tests which are missing a link in their &&-chain, but during a setup phase. We may fail to notice failure in commands that build the test environment, but these are typically not expected to fail at all (but it's still good to double-check that our test environment is what we expect). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-03-20 10:20:14 -07:00
Junio C Hamano	b2132068c6	Merge branch 'jk/oi-delta-base' Teach "cat-file --batch" to show delta-base object name for a packed object that is represented as a delta. * jk/oi-delta-base: cat-file: provide %(deltabase) batch format sha1_object_info_extended: provide delta base sha1s	2014-01-10 10:33:11 -08:00
Junio C Hamano	604ada435b	Merge branch 'jk/cat-file-regression-fix' "git cat-file --batch=", an admittedly useless command, did not behave very well. * jk/cat-file-regression-fix: cat-file: handle --batch format with missing type/size cat-file: pass expand_data to print_object_or_die	2013-12-27 14:58:11 -08:00
Jeff King	65ea9c3c3d	cat-file: provide %(deltabase) batch format It can be useful for debugging or analysis to see which objects are stored as delta bases on top of others. This information is available by running `git verify-pack`, but that is extremely expensive (and is harder than necessary to parse). Instead, let's make it available as a cat-file query format, which makes it fast and simple to get the bases for a subset of the objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-26 11:54:26 -08:00
Jeff King	6554dfa97a	cat-file: handle --batch format with missing type/size Commit `98e2092` taught cat-file to stream blobs with --batch, which requires that we look up the object type before loading it into memory. As a result, we now print the object header from information in sha1_object_info, and the actual contents from the read_sha1_file. We double-check that the information we printed in the header matches the content we are about to show. Later, commit `93d2a60` allowed custom header lines for --batch, and commit `5b08640` made type lookups optional. As a result, specifying a header line without the type or size means that we will not look up those items at all. This causes our double-checking to erroneously die with an error; we think the type or size has changed, when in fact it was simply left at "0". For the size, we can fix this by only doing the consistency double-check when we have retrieved the size via sha1_object_info. In the case that we have not retrieved the value, that means we also did not print it, so there is nothing for us to check that we are consistent with. We could do the same for the type. However, besides our consistency check, we also care about the type in deciding whether to stream or not. So instead of handling the case where we do not know the type, this patch instead makes sure that we always trigger a type lookup when we are printing, so that even a format without the type will stream as we would in the normal case. Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-12 11:31:25 -08:00
Junio C Hamano	c17fa972d3	Merge branch 'sb/sha1-loose-object-info-check-existence' "git cat-file --batch-check=ok" did not check the existence of the named object. * sb/sha1-loose-object-info-check-existence: sha1_loose_object_info(): do not return success on missing object	2013-12-05 13:00:12 -08:00
Junio C Hamano	4ef8d1dd03	sha1_loose_object_info(): do not return success on missing object Since `052fe5ea` (sha1_loose_object_info: make type lookup optional, 2013-07-12), sha1_loose_object_info() returns happily without checking if the object in question exists, which is not what the the caller sha1_object_info_extended() expects; the caller does not even bother checking the existence of the object itself. Noticed-by: Sven Brauch <svenbrauch@googlemail.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-11-06 11:03:33 -08:00
Jeff King	97be04077f	cat-file: only split on whitespace when %(rest) is used Commit `c334b87b` (cat-file: split --batch input lines on whitespace, 2013-07-11) taught `cat-file --batch-check` to split input lines on the first whitespace, and stash everything after the first token into the %(rest) output format element. It claimed: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[<tree>]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default, hence we will ship 1.8.4 with the commit reverted. Resurrect its attempt but in a weaker form; only do the splitting when "%(rest)" is used in the output format. Since that element did not exist at all before `c334b87`, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" \| git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-05 09:30:48 -07:00
Junio C Hamano	062aeee8aa	Revert "cat-file: split --batch input lines on whitespace" This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the update assumed that people only used the command to read from "rev-list --objects" output, whose lines begin with a 40-hex object name followed by a whitespace, but it turns out that scripts feed random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which a whitespace has to be kept.	2013-08-02 09:29:30 -07:00
Jeff King	c334b87b30	cat-file: split --batch input lines on whitespace If we get an input line to --batch or --batch-check that looks like "HEAD foo bar", we will currently feed the whole thing to get_sha1(). This means that to use --batch-check with `rev-list --objects`, one must pre-process the input, like: git rev-list --objects HEAD \| cut -d' ' -f1 \| git cat-file --batch-check Besides being more typing and slightly less efficient to invoke `cut`, the result loses information: we no longer know which path each object was found at. This patch teaches cat-file to split input lines at the first whitespace. Everything to the left of the whitespace is considered an object name, and everything to the right is made available as the %(reset) atom. So you can now do: git rev-list --objects HEAD \| git cat-file --batch-check='%(objectsize) %(rest)' to collect object sizes at particular paths. Even if %(rest) is not used, we always do the whitespace split (which means you can simply eliminate the `cut` command from the first example above). This whitespace split is backwards compatible for any reasonable input. Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. The only input hurt is if somebody really expected input of the form "HEAD is a fine-looking ref!" to fail; it will now parse HEAD, and make "is a fine-looking ref!" available as %(rest). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:18:42 -07:00

1 2

60 Commits