git-commit-vandalism

Author	SHA1	Message	Date
Jeff King	945ed00957	t1060: check partial clone of misnamed blob A recent commit (upload-pack: skip parse-object re-hashing of "want" objects, 2022-09-06) loosened the behavior of upload-pack so that it does not verify the sha1 of objects it receives directly via "want" requests. The existing corruption tests in t1060 aren't affected by this: the corruptions are blobs reachable from commits, and the client requests the commits. The more interesting case here is a partial clone, where the client will directly ask for the corrupted blob when it does an on-demand fetch of the filtered object. And that is not covered at all, so let's add a test. It's important here that we use the "misnamed" corruption and not "bit-error". The latter is sufficiently corrupted that upload-pack cannot even figure out the type of the object, so it bails identically both before and after the recent change. But with "misnamed", with the hash-checks enabled it sees the problem (though the error messages are a bit confusing because of the inability to create a "struct object" to store the flags): error: hash mismatch d95f3ad14dee633a758d2e331151e950dd13e4ed fatal: git upload-pack: not our ref d95f3ad14dee633a758d2e331151e950dd13e4ed fatal: remote error: upload-pack: not our ref d95f3ad14dee633a758d2e331151e950dd13e4ed After the change to skip the hash check, the server side happily sends the bogus object, but the client correctly realizes that it did not get the necessary data: remote: Enumerating objects: 1, done. remote: Counting objects: 100% (1/1), done. remote: Total 1 (delta 0), reused 0 (delta 0), pack-reused 0 Receiving objects: 100% (1/1), 49 bytes \| 49.00 KiB/s, done. fatal: bad revision 'd95f3ad14dee633a758d2e331151e950dd13e4ed' error: [...]/misnamed did not send all necessary objects which is exactly what we expect to happen. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-07 15:08:51 -07:00
Ævar Arnfjörð Bjarmason	e75d2f7f73	revisions API: have release_revisions() release "filter" Extend the the release_revisions() function so that it frees the "filter" in the "struct rev_info". This in combination with a preceding change to free "cmdline" means that we can mark another set of tests as passing under "TEST_PASSES_SANITIZE_LEAK=true". The "filter" member was added recently in `ffaa137f64` (revision: put object filter into struct rev_info, 2022-03-09), and this fixes leaks intruded in the subsequent leak `7940941de1` (pack-objects: use rev.filter when possible, 2022-03-09) and `105c6f14ad` (bundle: parse filter capability, 2022-03-09). The "builtin/pack-objects.c" leak in `7940941de1` was effectively with us already, but the variable was referred to by a "static" file-scoped variable. The "bundle.c " leak in `105c6f14ad` was newly introduced with the new "filter" feature for bundles. The "t5600-clone-fail-cleanup.sh" change here to add "TEST_PASSES_SANITIZE_LEAK=true" is one of the cases where run-command.c in not carrying the abort() exit code upwards would have had that test passing before, but now it actually passes[1]. We should fix the lack of 1=1 mapping of SANITIZE=leak testing to actual leaks some other time, but it's an existing edge case, let's just mark the really-passing test as passing for now. 1. https://lore.kernel.org/git/220303.86fsnz5o9w.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Ævar Arnfjörð Bjarmason	7a98d9ab00	revisions API: have release_revisions() release "cmdline" Extend the the release_revisions() function so that it frees the "cmdline" in the "struct rev_info". This in combination with a preceding change to free "commits" and "mailmap" means that we can whitelist another test under "TEST_PASSES_SANITIZE_LEAK=true". There was a proposal in [1] to do away with xstrdup()-ing this add_rev_cmdline(), perhaps that would be worthwhile, but for now let's just free() it. We could also make that a "char " in "struct rev_cmdline_entry" itself, but since we own it let's expose it as a constant to outside callers. I proposed that in [2] but have since changed my mind. See `14d30cdfc0` (ref-filter: fix memory leak in `free_array_item()`, 2019-07-10), `c514c62a4f` (checkout: fix leak of non-existent branch names, 2020-08-14) and other log history hits for "free((char )" for prior art. This includes the tests we had false-positive passes on before my `6798b08e84` (perl Git.pm: don't ignore signalled failure in _cmd_close(), 2022-02-01), now they pass for real. Since there are 66 tests matching t/t[0-9]git-svn.sh it's easier to list those that don't pass than to touch most of those 66. So let's introduce a "TEST_FAILS_SANITIZE_LEAK=true", which if set in the tests won't cause lib-git-svn.sh to set "TEST_PASSES_SANITIZE_LEAK=true. This change also marks all the tests that we removed "TEST_FAILS_SANITIZE_LEAK=true" from in an earlier commit due to removing the UNLEAK() from cmd_format_patch(), we can now assert that its API use doesn't leak any "struct rev_info" memory. This change also made commit "t5503-tagfollow.sh" pass on current master, but that would regress when combined with ps/fetch-atomic-fixup's `de004e848a` (t5503: simplify setup of test which exercises failure of backfill, 2022-03-03) (through no fault of that topic, that change started using "git clone" in the test, which has an outstanding leak). Let's leave that test out for now to avoid in-flight semantic conflicts. 1. https://lore.kernel.org/git/YUj%2FgFRh6pwrZalY@carlos-mbp.lan/ 2. https://lore.kernel.org/git/87o88obkb1.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-13 23:56:09 -07:00
Junio C Hamano	83b13e284c	Merge branch 'jk/virtual-objects-do-exist' A recent update broke "is this object available to us?" check for well-known objects like an empty tree (which should yield "yes", even when there is no on-disk object for an empty tree), which has been corrected. * jk/virtual-objects-do-exist: rev-list: allow cached objects in existence check	2019-03-20 15:16:07 +09:00
Jeff King	f06ab027ef	rev-list: allow cached objects in existence check This fixes a regression in `7c0fe330d5` (rev-list: handle missing tree objects properly, 2018-10-05) where rev-list will now complain about the empty tree when it doesn't physically exist on disk. Before that commit, we relied on the traversal code in list-objects.c to walk through the trees. Since it uses parse_tree(), we'd do a normal object lookup that includes looking in the set of "cached" objects (which is where our magic internal empty-tree kicks in). After that commit, we instead tell list-objects.c not to die on any missing trees, and we check them ourselves using has_object_file(). But that function uses OBJECT_INFO_SKIP_CACHED, which means we won't use our internal empty tree. This normally wouldn't come up. For most operations, Git will try to write out the empty tree object as it would any other object. And pack-objects in a push or fetch will send the empty tree (even if it's virtual on the sending side). However, there are cases where this can matter. One I found in the wild: 1. The root tree of a commit became empty by deleting all files, without using an index. In this case it was done using libgit2's tree builder API, but as the included test shows, it can easily be done with regular git using hash-object. The resulting repo works OK, as we'd avoid walking over our own reachable commits for a connectivity check. 2. Cloning with --reference pointing to the repository from (1) can trigger the problem, because we tell the other side we already have that commit (and hence the empty tree), but then walk over it during the connectivity check (where we complain about it missing). Arguably the workflow in step (1) should be more careful about writing the empty tree object if we're referencing it. But this workflow did work prior to `7c0fe330d5`, so let's restore it. This patch makes the minimal fix, which is to swap out a direct call to oid_object_info_extended(), minus the SKIP_CACHED flag, instead of calling has_object_file(). This is all that has_object_file() is doing under the hood. And there's little danger of unrelated fallout from other unexpected "cached" objects, since there's only one call site that ends such a cached object, and it's in git-blame. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-05 22:28:29 +09:00
Ævar Arnfjörð Bjarmason	aa984dbe5e	index-pack tests: don't leave test repo dirty at end Change a test added in `51054177b3` ("index-pack: detect local corruption in collision check", 2017-04-01) so that the repository isn't left dirty at the end. Due to the caveats explained in `720dae5a19` ("config doc: elaborate on fetch.fsckObjects security", 2018-07-27) even a "fetch" that fails will write to the local object store, so let's copy the bit-error test directory before running this test. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 11:12:05 +09:00
Jeff King	51054177b3	index-pack: detect local corruption in collision check When we notice that we have a local copy of an incoming object, we compare the two objects to make sure we haven't found a collision. Before we get to the actual object bytes, though, we compare the type and size from sha1_object_info(). If our local object is corrupted, then the type will be OBJ_BAD, which obviously will not match the incoming type, and we'll report "SHA1 COLLISION FOUND" (with capital letters and everything). This is confusing, as the problem is not a collision but rather local corruption. We should report that instead (just like we do if reading the rest of the object content fails a few lines later). Note that we _could_ just ignore the error and mark it as a non-collision. That would let you "git fetch" to replace a corrupted object. But it's not a very reliable method for repairing a repository. The earlier want/have negotiation tries to get the other side to omit objects we already have, and it would not realize that we are "missing" this corrupted object. So we're better off complaining loudly when we see corruption, and letting the user take more drastic measures to repair (like making a full clone elsewhere and copying the pack into place). Note that the test sets transfer.unpackLimit in the receiving repository so that we use index-pack (which is what does the collision check). Normally for such a small push we'd use unpack-objects, which would simply try to write the loose object, and discard the new one when we see that there's already an old one. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-01 10:48:11 -07:00
Jeff King	93cff9a978	sha1_loose_object_info: return error for corrupted objects When sha1_loose_object_info() finds that a loose object file cannot be stat(2)ed or mmap(2)ed, it returns -1 to signal an error to the caller. However, if it found that the loose object file is corrupt and the object data cannot be used from it, it stuffs OBJ_BAD into "type" field of the object_info, but returns zero (i.e., success), which can confuse callers. This is due to `052fe5eac` (sha1_loose_object_info: make type lookup optional, 2013-07-12), which switched the return to a strict success/error, rather than returning the type (but botched the return). Callers of regular sha1_object_info() don't notice the difference, as that function returns the type (which is OBJ_BAD in this case). However, direct callers of sha1_object_info_extended() see the function return success, but without setting any meaningful values in the object_info struct, leading them to access potentially uninitialized memory. The easiest way to see the bug is via "cat-file -s", which will happily ignore the corruption and report whatever value happened to be in the "size" variable. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-01 10:45:16 -07:00
Jeff King	d3b34622f6	clone: leave repo in place after checkout errors If we manage to clone a remote repository but run into an error in the checkout, it is probably sane to leave the repo directory in place. That lets the user examine the situation without spending time to re-clone from the remote (which may be a lengthy process). Rather than try to convert each die() from the checkout code path into an error(), we simply set a flag that tells the "remove_junk" atexit function to print a helpful message and leave the repo in place. Note that the test added in this patch actually passes without the code change. The reason is that the cleanup code is buggy; we chdir into the working tree for the checkout, but still may use relative paths to remove the directories (which means if you cloned into "foo", we would accidentally remove "foo" from the working tree!). There's no point in fixing it now, since this patch means we will never try to remove anything after the chdir, anyway. [jc: replaced the message with a more succinct version from Jonathan] Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-29 15:20:55 -07:00
Jeff King	0433ad128c	clone: run check_everything_connected When we fetch from a remote, we do a revision walk to make sure that what we received is connected to our existing history. We do not do the same check for clone, which should be able to check that we received an intact history graph. The upside of this patch is that it will make clone more resilient against propagating repository corruption. The downside is that we will now traverse "rev-list --objects --all" down to the roots, which may take some time (it is especially noticeable for a "--local --bare" clone). Note that we need to adjust t5710, which tries to make such a bogus clone. Rather than checking after the fact that our clone is bogus, we can simplify it to just make sure "git clone" reports failure. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:47:18 -07:00
Jeff King	0aac7bb287	clone: die on errors from unpack_trees When clone is populating the working tree, it ignores the return status from unpack_trees; this means we may report a successful clone, even when the checkout fails. When checkout fails, we may want to leave the $GIT_DIR in place, as it might be possible to recover the data through further use of "git checkout" (e.g., if the checkout failed due to a transient error, disk full, etc). However, we already die on a number of other checkout-related errors, so this patch follows that pattern. In addition to marking a now-passing test, we need to adjust t5710, which blindly assumed it could make bogus clones of very deep alternates hierarchies. By using "--bare", we can avoid it actually touching any objects. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:47:15 -07:00
Jeff King	0e15ad9b73	add tests for cloning corrupted repositories We try not to let corruption pass unnoticed over fetches and clones. For the most part, this works, but there are some broken corner cases, including: 1. We do not detect missing objects over git-aware transports. This is a little hard to test, because the sending side will actually complain about the missing object. To fool it, we corrupt a repository such that we have a "misnamed" object: it claims to be sha1 X, but is really Y. This lets the sender blindly transmit it, but it is the receiver's responsibility to verify that what it got is sane (and it does not). 2. We do not detect missing or misnamed blobs during the checkout phase of clone. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:47:11 -07:00
Jeff King	d9c31e14d0	streaming_write_entry: propagate streaming errors When we are streaming an index blob to disk, we store the error from stream_blob_to_fd in the "result" variable, and then immediately overwrite that with the return value of "close". That means we catch errors on close (e.g., problems committing the file to disk), but miss anything which happened before then. We can fix this by using bitwise-OR to accumulate errors in our result variable. While we're here, we can also simplify the error handling with an early return, which makes it easier to see under which circumstances we need to clean up. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:47:09 -07:00
Jeff King	7b6257b0d4	add test for streaming corrupt blobs We do not have many tests for handling corrupt objects. This new test at least checks that we detect a byte error in a corrupt blob object while streaming it out with cat-file. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:47:06 -07:00

14 Commits