diff --git a/content/posts/commit-vandalism.md b/content/posts/commit-vandalism.md new file mode 100644 index 0000000..8371831 --- /dev/null +++ b/content/posts/commit-vandalism.md @@ -0,0 +1,338 @@ +Title: git commit --vandalism +Date: 2023-05-25 18:55 +Author: Error +Slug: commit-vandalism +Summary: SHA-1 has been broken for some time, but is still used in git. To find out how git behaves when it encounters such a hash collision, I vandalized its source code and caused some collisions myself. +License: CC-BY-NC + https://creativecommons.org/licenses/by-nc/4.0/ + +## Disclaimer + +This is not a vulnerability report. +I intentionally disabled safeguards in the code in order to be able observe this behavior. +However, it's a nice example for why you shouldn't use SHA-1 for anything security related and a good opportunity to learn more about the inner workings of git. + +## SHAttering git + +In 2017, the Cryptology Group at Centrum Wiskunde & Informatica (CWI) and the Google Research Security, Privacy and Anti-abuse Group announced the first (public) SHA-1 hash collision. +They generated 2 PDF files with the same hash in an attack they called SHAttered. +While it still required a lot of processing power (the equivalent of 6,500 years single-CPU computations and 110 years of single-GPU computations), it showed that such attacks are not only theoretically possible - they are technically feasible. [^1] + +IDs in git, for example for commits, are generated using SHA-1 by default. +As explained by git's creator, Linus Torvalds, the hash function is not used for security. +It's simply used to generate a checksum, like e.g. a CRC.[°The quote "We check a checksum that's cryptographically secure. Nobody has been able to break SHA-1" didn't hold up that well, but in general, Linus' point still stands: git uses SHA-1 for consistency checks, not for security. ] [^2] +But what happens if a collision occurs, or someone intentionally causes a collision? +Let's try it out. + +So if the SHAttered files already have a hash collision, they could be used to cause a collision in git, right? +Let's put them in a repository and find out. + + $ wget https://shattered.io/static/shattered-1.pdf -O shattered1.pdf -q + $ sha1sum shattered-1.pdf + 38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-1.pdf + $ wget https://shattered.io/static/shattered-2.pdf -O shattered2.pdf -q + $ sha1sum shattered-2.pdf + 38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-2.pdf + $ diff shattered-1.pdf shattered-2.pdf + Binary files shattered-1.pdf and shattered-2.pdf differ + +As promised, these files have different contents, but the same SHA-1 hash. +The next step is to add them to a git repository and observe what happens. + + $ cp shattered-1.pdf shattered.pdf + $ git add shattered.pdf + $ git commit -m 'shattered' + [main (Root-Commit) 0aa8c5a] shattered + 1 file changed, 0 insertions(+), 0 deletions(-) + create mode 100644 shattered.pdf + $ cp shattered-2.pdf shattered.pdf -f + $ git add shattered.pdf + $ git commit -m 'shattered' + [main f1906a7] shattered + 1 file changed, 0 insertions(+), 0 deletions(-) + +Looking at the result, this did not seem to cause any issues. +If the commit ID is a SHA-1 hash, shouldn't this cause a commit ID collision? +Such a collision did not happen. +The file was updated successfully in the second commit and checking out the first commit restores the original version. +Investigating this closer shows that git somehow calculates different hashes for these files. + + $ git hash-object shattered-1.pdf + b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 + $ git hash-object shattered-2.pdf + ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 + +## Objects and IDs in git + +Internally, git is a key-value store. +It generates an ID for each object that it stores, which can later be used to identify and retrieve the object. +By default, this ID is a SHA-1 hash. +The input given to the hash function depends on the object type. +For files, which are stored as so called blobs, the ID is generated by hashing the string `blob`, the length of the file, a null byte and the file itself. [^3] + + $ echo "so long and thanks for all the fish" > test.txt + $ git hash-object test.txt + 8b86cb67f1f19db567a100b55edb5466a33e7fb7 + $ printf "blob $(wc -c 1683729684 +0200 + committer error 1683729684 +0200 + + shattered + +The commit ID is generated by hashing this information, prepended by the same header that is used for the other object types: `$object-type $length\0`.[°You can find a nice example how the commit hash is constructed [here](https://gist.github.com/masak/2415865 "How is git commit sha1 formed").] + +## Causing Collisions + +### Vandalism + +OK, so that's the reason why `git hash-object` returns different values for the two SHAttered files.[°Try it yourself if you want: prepending any data to these files will make their SHA-1 hashes differ. `echo test | cat - shatterd-1.pdf | sha1sum` results in a different hash than `echo test | cat - shatterd-2.pdf | sha1sum`] +Further, this explains why the commit hashes didn't collide. +There won't be any progress made by using the existing SHA-1 hash collision from SHAttered, but collisions can still happen. +It's unlikely to find one by accident and it's expensive to cause them intentionally, but it's not impossible. +How would git behave in this case? +Let's vandalize the code a little bit and find out. + +Git can be built with multiple different SHA-1 backends. +By default, it uses an implementation with a collision attack detection mechanism.[°This collision attack detection is not relevant for our experiment for now. We'll get back to it later.] +It can be found in the `SHA1DC` directory. + + int SHA1DCFinal(unsigned char output[20], SHA1_CTX *ctx) + { + uint32_t last = ctx->total & 63; + uint32_t padn = (last < 56) ? (56 - last) : (120 - last); + uint64_t total; + SHA1DCUpdate(ctx, (const char*)(sha1_padding), padn); + + total = ctx->total - padn; + total <<= 3; + ctx->buffer[56] = (unsigned char)(total >> 56); + ctx->buffer[57] = (unsigned char)(total >> 48); + ctx->buffer[58] = (unsigned char)(total >> 40); + ctx->buffer[59] = (unsigned char)(total >> 32); + ctx->buffer[60] = (unsigned char)(total >> 24); + ctx->buffer[61] = (unsigned char)(total >> 16); + ctx->buffer[62] = (unsigned char)(total >> 8); + ctx->buffer[63] = (unsigned char)(total); + sha1_process(ctx, (uint32_t*)(ctx->buffer)); + output[0] = (unsigned char)(ctx->ihv[0] >> 24); + output[1] = 1; //(unsigned char)(ctx->ihv[0] >> 16); + output[2] = 1; //(unsigned char)(ctx->ihv[0] >> 8); + output[3] = 1; //(unsigned char)(ctx->ihv[0]); + output[4] = 1; //(unsigned char)(ctx->ihv[1] >> 24); + output[5] = 1; //(unsigned char)(ctx->ihv[1] >> 16); + output[6] = 1; //(unsigned char)(ctx->ihv[1] >> 8); + output[7] = 1; //(unsigned char)(ctx->ihv[1]); + output[8] = 1; //(unsigned char)(ctx->ihv[2] >> 24); + output[9] = 1; //(unsigned char)(ctx->ihv[2] >> 16); + output[10] = 1; //(unsigned char)(ctx->ihv[2] >> 8); + output[11] = 1; //(unsigned char)(ctx->ihv[2]); + output[12] = 1; //(unsigned char)(ctx->ihv[3] >> 24); + output[13] = 1; //(unsigned char)(ctx->ihv[3] >> 16); + output[14] = 1; //(unsigned char)(ctx->ihv[3] >> 8); + output[15] = 1; //(unsigned char)(ctx->ihv[3]); + output[16] = 1; //(unsigned char)(ctx->ihv[4] >> 24); + output[17] = 1; //(unsigned char)(ctx->ihv[4] >> 16); + output[18] = 1; //(unsigned char)(ctx->ihv[4] >> 8); + output[19] = 1; //(unsigned char)(ctx->ihv[4]); + return ctx->found_collision; + } + +In order to cause collisions between object IDs in git, it is helpful to reduce the size of the identifier. +To achieve this, every byte of the hash after the first one was set to 1. +This reduces the effective length of the identifier to 1 byte. +256 values may still sound like a lot, but there's no need to check every possible value. +We're not looking for a specific hash, any collision will do. +The first collisions should occur after a few tries.[°Such collisions are already very likely after a surprisingly small amount of attempts if the value range is not too large. Look up the birthday problem if you don't know it.] + +### Test Set Up + +So let's compile git with our "improvement" and try to cause some collisions. + + $ touch test + $ ../git/git add test + $ ../git/git commit -m 'test' + [main (root-commit) c101010] test + 1 file changed, 1 insertion(+) + create mode 100644 test + $ touch test2 + $ ../git/git add test2 + $ ../git/git commit -m 'test2' + [main db01010] test2 + 1 file changed, 0 insertions(+), 0 deletions(-) + create mode 100644 test2 + $ ../git/git log + commit db01010101010101010101010101010101010101 (HEAD -> main) + Author: error + Date: Sun May 7 20:32:54 2023 +0200 + + test2 + + commit c101010101010101010101010101010101010101 + Author: error + Date: Sun May 7 20:32:00 2023 +0200 + + test + +The fist two commits may not collide, but it's already obvious what our little modification to the source code did: only the first 2 characters of the commit hash differ, everything afterwards is filled with the same two characters. + +Continuing to add and commits to this repository, the first collision was observed on the 7th attempt. + + $ touch test7 + $ ../git/git add test7 + $ ../git/git commit -m 'test7' + fatal: 0f01010101010101010101010101010101010101 is not a valid 'tree' object + $ ../git/git cat-file -p 0f01010101010101010101010101010101010101 + tree 8501010101010101010101010101010101010101 + parent e801010101010101010101010101010101010101 + author error 1683484473 +0200 + committer error 1683484473 +0200 + + test5 + +The collision happened between the tree object for the new commit and an already existing commit. +Interestingly, git did notice this problem only when it tried to add the tree to the commit and found an object of a wrong type instead. +The original object was not modified in the process. +It looks like the attempt to add a new object failed silently. + +On the 15th attempt, another interesting collision occurred: + + $ touch test15 + $ ../git/git add test15 + $ ../git/git commit -m 'test15' + [main e201010] test12 + 1 file changed, 0 insertions(+), 0 deletions(-) + create mode 100644 test12 + $ ../git/git log + commit e201010101010101010101010101010101010101 (HEAD -> main) + Author: error + Date: Sun May 7 21:12:38 2023 +0200 + + test12 + + commit 3b01010101010101010101010101010101010101 + Author: error + Date: Sun May 7 21:12:06 2023 +0200 + + test11 + + commit 6001010101010101010101010101010101010101 + Author: error + Date: Sun May 7 21:11:40 2023 +0200 + + test10 + +In this case both of the colliding objects are commits. +This time, git did not even print an error message. +After the creation of the new commit object once again failed silently, git proceeded to check out the existing commit with the same hash.[°People assume that a commit history is a strict progression of cause to effect. But actually from a non-linear, non-subjective viewpoint it's more like a big ball of wibbly-wobbly timey-wimey... stuff.] +This confirms the assumption that git will keep the existing object in case of a collision. +The other commits that got rolled back are still present as objects in git and can be checked out, but especially if the user does not notice what happened here and continues working, this can seriously mess up the commit history. + +### Observations + +Those tests were continued until the following collisions occurred and git's behavior could be observed: + +Collisions between two blobs +: Creating the new blob object fails silently. The existing blob object remains unchanged. + +Collisions between blobs and trees +: Creating the new blob object fails silently. The existing tree object remains unchanged. + +Collisions between blobs and commits +: Creating the new blob object fails silently. The existing commit object remains unchanged. + +Collisions between blobs and tags +: Creating the new blob object fails silently. The existing tag object remains unchanged. + +Collisions between trees and blobs +: Creating the new tree object fails silently. The existing blob object remains unchanged. Trying to commit this results in an error message complaining that the object is not a valid tree object. + +Collisions between two trees +: Creating the tree object fails silently. The existing tree object remains unchanged. Trying to commit this, git commits the old tree again. This effectively means a rollback to the commit associated with of the old tree while keeping the commit history. + +Collisions between trees and commits +: Creating the new tree object fails silently. The existing commit object remains unchanged. Trying to commit this results in an error message complaining that the object is not a valid tree object. + +Collisions between trees and tags +: Creating the new tree object fails silently. The existing tag object remains unchanged. Trying to commit this results in an error message complaining that the object is not a valid tree object. + +Collisions between commits and blobs +: Creating the new commit object fails. The existing blob object remains unchanged. Git attempts to check out the new commit (that wasn't created) and fails with the error message `fatal: cannot update ref 'refs/heads/main': trying to write non-commit object $HASH to branch 'refs/heads/main'`. + +Collisions between commits and trees +: Creating the new commit object fails. The existing tree object remains unchanged. Git attempts to check out the new commit (that wasn't created) and fails with the error message `fatal: cannot update ref 'refs/heads/main': trying to write non-commit object $HASH to branch 'refs/heads/main'`. + +Collisions between two commits +: Creating the new commit object fails silently. The existing commit remains unchanged and is checked out. + +Collisions between commits and tags +: Creating the new commit object fails. The existing tag object remains unchanged. Git attempts to check out the new commit (that wasn't created) and fails with the error message `fatal: cannot update ref 'refs/heads/main': trying to write non-commit object $HASH to branch 'refs/heads/main'`. + +Collisions between tags and blobs +: A new file is created under `.git/refs/tags` pointing at the hash, but the creation of the new tag object under `.git/objects` fails silently. The existing blob object remains unchanged. + +Collisions between tags and trees +: A new file is created under `.git/refs/tags` pointing at the hash, but the creation of the new tag object under `.git/objects` fails silently. The existing tree object remains unchanged. The tag is displayed by `git tag -l`, but attempts at checking it out fail with the error message `fatal: Cannot switch branch to a non-commit`. + +Collisions between tags and commits +: A new file is created under `.git/refs/tags` pointing at the hash, but the creation of this new tag object under `.git/objects` fails silently. The existing commit object remains unchanged. The tag is displayed by `git tag -l`, but is interpreted as a lightweight tag for the colliding commit. + +Collisions between two tags +: A new file is created under `.git/refs/tags` pointing at the hash, but the creation of this new tag object under `.git/objects` fails silently. The existing tag object remains unchanged. Since the new tag reference object points at the old tag object, the new tag will be an alias for the old tag, with the same message and object reference. + +### Summary + +Git does not overwrite existing objects. +Creating a new object with a hash that's already associated with another object always fails silently. +However, depending on the object type, git shows some interesting behavior. +If the object type fits, git will just continue with it as if nothing is wrong, e.g. it will commit an old tree or checkout an old commit. +Git only responds with an error if the object types are incompatible, e.g. if the collision causes a reference to a tree to point at a tag object instead. + +## Mitigation + +Hash collisions, intentional oŕ not, can become a problem for git. +To mitigate this risk, git includes a mechanism that detects SHA-1 collision attacks and reacts by hashing the suspected block 3 times, extending SHA-1 from 80 to 240 steps in these cases. +This ensures that different hashes are generated in theses cases.[^4] + +Further, git does not only support SHA-1. +It supports SHA-256, too. +Unlike SHA-1, SHA-256 is considered cryptographically secure. + +## Conclusion + +Actual SHA-1 collisions are very unlikely to occur as a coincidence. +The collisions in this experiment could only be observed after the code was modified to limit the effective hash size to 1 byte. +Otherwise, it would not have been possible to create collisions with the available resources. +However, SHAttered shows that it is feasible to intentionally cause such a collision.[°Whether or not attackers with the resources required to do this are part of your threat model is up to you to decide.] + +In case of a collision, git shows some interesting behavior that may not be noticed immediately. +This may leave the git repository in an unintended state. +Further, attackers could modify files and commits by replacing objects with specifically crafted objects that produce the same hash. +This risk is mitigated by the use of the collision attack detection mechanism. + +Even though the risk is already mitigated, this example shows why SHA-1 should not be used for cryptographic hashing and was a good opportunity to learn more about git itself. +If you want to try it out yourself, you can have a look at the code with my modification [here](https://git.undefinedbehavior.de/undef/git-commit-vandalism "git commit vandalism - undefined git server"). + + +[^1]: M. Stevens, E. Bursztein, P. Karpman, A. Albertini and Y. Markov (2017, February) "The first collision for full SHA-1", [https://shattered.io/static/shattered.pdf](https://shattered.io/static/shattered.pdf "The first collision for full SHA-1"). +[^2]: L. Torvalds (2007, May) "Tech Talk: Linus Torvalds on git", [https://www.youtube.com/watch?v=4XpnKHJAok8&t=56m20s](https://www.youtube.com/watch?v=4XpnKHJAok8&t=56m20s "Tech Talk: Linus Torvalds on git"). +[^3]: S. Chacon, B. Straub et al. (2014) "10.2 Git Internals - Git Objects" in *Pro Git*, [https://git-scm.com/book/en/v2/Git-Internals-Git-Objects](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects "Git Internals - Git Objects"). +[^4]: M. Stevens, D. Shumow (2017) "sha1dc/sha1.h", [https://github.com/git/git/blob/master/sha1dc/sha1.h](https://github.com/git/git/blob/master/sha1dc/sha1.h "sha1dc/sha1.h").