git-commit-vandalism

Author	SHA1	Message	Date
René Scharfe	4318094047	archive: don't add empty directories to archives While git doesn't track empty directories, git archive can be tricked into putting some into archives. One way is to construct an empty tree object, as t5004 does. While that is supported by the object database, it can't be represented in the index and thus it's unlikely to occur in the wild. Another way is using the literal name of a directory in an exclude pathspec -- its contents are are excluded, but the directory stub is included. That's inconsistent: exclude pathspecs containing wildcards don't leave empty directories in the archive. Yet another way is have a few levels of nested subdirectories (e.g. d1/d2/d3/file1) and ignoring the entries at the leaves (e.g. file1). The directories with the ignored content are ignored as well (e.g. d3), but their empty parents are included (e.g. d2). As empty directories are not supported by git, they should also not be written into archives. If an empty directory is really needed then it can be tracked and archived by placing an empty .gitignore file in it. There already is a mechanism in place for suppressing empty directories. When read_tree_recursive() encounters a directory excluded by a pathspec then it enters it anyway because it might contain included entries. It calls the callback function before it is able to decide if the directory is actually needed. For that reason git archive adds directories to a queue and writes entries for them only when it encounters the first child item -- but currently only if pathspecs with wildcards are used. Queue all directories, no matter if there even are pathspecs present. This prevents git archive from writing entries for empty directories in all cases. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-09-14 15:08:22 +09:00
René Scharfe	867e40ff3a	t5004: require 64-bit support for big ZIP tests Check if unzip supports the ZIP64 format and skip the tests that create big archives otherwise. Also skip the test that archives a big file on 32-bit platforms because the git object systems can't unpack files bigger than 4GB there. Reported-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-01 08:46:50 +09:00
René Scharfe	4cdf3f9d84	archive-zip: support files bigger than 4GB Write a zip64 extended information extra field for big files as part of their local headers and as part of their central directory headers. Also write a zip64 version of the data descriptor in that case. If we're streaming then we don't know the compressed size at the time we write the header. Deflate can end up making a file bigger instead of smaller if we're unlucky. Write a local zip64 header already for files with a size of 2GB or more in this case to be on the safe side. Both sizes need to be included in the local zip64 header, but the extra field for the directory must only contain 64-bit equivalents for 32-bit values of 0xffffffff. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-24 22:10:51 -07:00
René Scharfe	af95749f9b	archive-zip: support archives bigger than 4GB Add a zip64 extended information extra field to the central directory and emit the zip64 end of central directory records as well as locator if the offset of an entry within the archive exceeds 4GB. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-24 22:10:51 -07:00
René Scharfe	758c1f9d1b	archive-zip: add tests for big ZIP archives Test the creation of ZIP archives bigger than 4GB and containing files bigger than 4GB. They are marked as EXPENSIVE because they take quite a while and because the first one needs a bit more than 4GB of disk space to store the resulting archive. The big archive in the first test is made up of a tree containing thousands of copies of a small file. Yet the test has to write out the full archive because unzip doesn't offer a way to read from stdin. The big file in the second test is provided as a zipped pack file to avoid writing another 4GB file to disk and then adding it. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-24 21:43:21 -07:00
René Scharfe	88329ca809	archive-zip: support more than 65535 entries Support more than 65535 entries cleanly by writing a "zip64 end of central directory record" (with a 64-bit field for the number of entries) before the usual "end of central directory record" (which contains only a 16-bit field). InfoZIP's zip does the same. Archives with 65535 or less entries are not affected. Programs that extract all files like InfoZIP's zip and 7-Zip ignored the field and could extract all files already. Software that relies on the ZIP file directory to show a list of contained files quickly to simulate to normal directory like Windows' built-in ZIP functionality only saw a subset of the included files. Windows supports ZIP64 since Vista according to https://en.wikipedia.org/wiki/Zip_%28file_format%29#ZIP64. Suggested-by: Johannes Schauer <josch@debian.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-28 08:54:57 -07:00
René Scharfe	19ee29401d	t5004: test ZIP archives with many entries A ZIP file directory has a 16-bit field for the number of entries it contains. There are 64-bit extensions to deal with that. Demonstrate that git archive --format=zip currently doesn't use them and instead overflows the field. InfoZIP's unzip doesn't care about this field and extracts all files anyway. Software that uses the directory for presenting a filesystem like view quickly -- notably Windows -- depends on it, but doesn't lend itself to an automatic test case easily. Use InfoZIP's zipinfo, which probably isn't available everywhere but at least can provides some way to check this field. To speed things up a bit create and commit only a subset of the files and build a fake tree out of duplicates and pass that to git archive. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-08-28 08:52:10 -07:00
Jeff King	9ddc5ac97e	t: wrap complicated expect_code users in a block If we are expecting a command to produce a particular exit code, we can use test_expect_code. However, some cases are more complicated, and want to accept one of a range of exit codes. For these, we end up with something like: cmd; case "$?" in ... That unfortunately breaks the &&-chain and fools --chain-lint. Since these special cases are so few, we can wrap them in a block, like this: { cmd; ret=$?; } && case "$ret" in ... This accomplishes the same thing, and retains the &&-chain (the exit status fed to the && is that of the assignment, which should always be true). It's technically longer, but it is probably a good thing for unusual code like this to stand out. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-03-20 10:20:16 -07:00
Junio C Hamano	15c6ef7b06	Revert "archive: honor tar.umask even for pax headers" This reverts commit `10f343ea81`, whose output is no longer bit-for-bit equivalent from the older versions of Git, which the infrastructure to (pretend to) upload tarballs kernel.org uses depends on.	2014-10-20 12:04:46 -07:00
Junio C Hamano	4740891e47	Merge branch 'bc/archive-pax-header-mode' Implementations of "tar" that do not understand an extended pax header would extract the contents of it in a regular file; make sure the permission bits of this file follows the same tar.umask configuration setting. * bc/archive-pax-header-mode: archive: honor tar.umask even for pax headers	2014-09-02 13:27:13 -07:00
brian m. carlson	10f343ea81	archive: honor tar.umask even for pax headers git archive's tar format uses extended pax headers to encode metadata into the archive. Most tar implementations correctly treat these as metadata, but some that do not understand the pax format extract these as files instead. Apply the tar.umask setting to these entries to prevent tampering by other users. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-04 11:39:11 -07:00
Stepan Kasal	b93e6e3663	t5000, t5003: do not use test_cmp to compare binary files test_cmp() is primarily meant to compare text files (and display the difference for debug purposes). Raw "cmp" is better suited to compare binary files (tar, zip, etc.). On MinGW, test_cmp is a shell function mingw_test_cmp that tries to read both files into environment, stripping CR characters (introduced in commit `4d715ac0`). This function usually speeds things up, as fork is extremly slow on Windows. But no wonder that this function is extremely slow and sometimes even crashes when comparing large tar or zip files. Signed-off-by: Stepan Kasal <kasal@ucw.cz> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-06-04 11:14:25 -07:00
Junio C Hamano	843fb919fd	Merge branch 'rs/empty-archive' Fixes tests added in 1.8.2 era that are broken on BSDs. * rs/empty-archive: t5004: resurrect original empty tar archive test t5004: avoid using tar for checking emptiness of archive	2013-06-02 15:48:25 -07:00
René Scharfe	ea2d20d4c2	t5004: avoid using tar for checking emptiness of archive Test 2 of t5004 checks if a supposedly empty tar archive really contains no files. `24676f02` (t5004: fix issue with empty archive test and bsdtar) removed our commit hash to make it work with bsdtar, but the test still fails on NetBSD and OpenBSD, which use their own tar that considers a tar file containing only NULs as broken. Here's what the different archivers do when asked to create a tar file without entries: $ uname -v NetBSD 6.0.1 (GENERIC) $ gtar --version \| head -1 tar (GNU tar) 1.26 $ bsdtar --version bsdtar 2.8.4 - libarchive 2.8.4 $ : >zero.tar $ perl -e 'print "\0" x 10240' >tenk.tar $ sha1 zero.tar tenk.tar SHA1 (zero.tar) = da39a3ee5e6b4b0d3255bfef95601890afd80709 SHA1 (tenk.tar) = 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : \| tar cf - -T - \| sha1 da39a3ee5e6b4b0d3255bfef95601890afd80709 $ : \| gtar cf - -T - \| sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : \| bsdtar cf - -T - \| sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c So NetBSD's native tar creates an empty file, while GNU tar and bsdtar both give us 10KB of NULs -- just like git archive with an empty tree. Now let's see how the archivers handle these two kinds of empty tar files: $ tar tf zero.tar; echo $? tar: Unexpected EOF on archive file 1 $ gtar tf zero.tar; echo $? gtar: This does not look like a tar archive gtar: Exiting with failure status due to previous errors 2 $ bsdtar tf zero.tar; echo $? 0 $ tar tf tenk.tar; echo $? tar: Cannot identify format. Searching... tar: End of archive volume 1 reached tar: Sorry, unable to determine archive format. 1 $ gtar tf tenk.tar; echo $? 0 $ bsdtar tf tenk.tar; echo $? 0 NetBSD's tar complains about both, bsdtar happily accepts any of them and GNU tar doesn't like zero-length archive files. So the safest course of action is to stay with our block-of-NULs format which is compatible with GNU tar and bsdtar, as we can't make NetBSD's native tar happy anyway. We can simplify our test, however, by taking tar out of the picture. Instead of extracting the archive and checking for the non-presence of files, check if the file has a size of 10KB and contains only NULs. This makes t5004 pass on NetBSD and OpenBSD. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-09 12:41:31 -07:00
René Scharfe	56ee96572a	t5004: resurrect original empty tar archive test Add a test to verify the emptiness of an archive by extracting its contents. Don't run this test if the version of tar doesn't support archives containing only a comment header, though. The existing check 'tar archive of empty tree is empty' used to work like that (minus the tar capability check) but was changed to depend on the exact representation of empty tar files created by git archive instead of on the behaviour of tar in order to avoid issues with different tar versions. The different approaches test different things: The existing one is for empty trees, for which we know the exact expected output and thus we can simply check it without extracting; the new one is for commits with empty trees, whose archives include stamps and so the more "natural" check by extraction is a better fit because it focuses on the interesting aspect, namely the absence of any archive entries. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-09 12:22:31 -07:00
René Scharfe	71a19a3744	t5004: avoid using tar for checking emptiness of archive Test 2 of t5004 checks if a supposedly empty tar archive really contains no files. `24676f02` (t5004: fix issue with empty archive test and bsdtar) removed our commit hash to make it work with bsdtar, but the test still fails on NetBSD and OpenBSD, which use their own tar that considers a tar file containing only NULs as broken. Here's what the different archivers do when asked to create a tar file without entries: $ uname -v NetBSD 6.0.1 (GENERIC) $ gtar --version \| head -1 tar (GNU tar) 1.26 $ bsdtar --version bsdtar 2.8.4 - libarchive 2.8.4 $ : >zero.tar $ perl -e 'print "\0" x 10240' >tenk.tar $ sha1 zero.tar tenk.tar SHA1 (zero.tar) = da39a3ee5e6b4b0d3255bfef95601890afd80709 SHA1 (tenk.tar) = 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : \| tar cf - -T - \| sha1 da39a3ee5e6b4b0d3255bfef95601890afd80709 $ : \| gtar cf - -T - \| sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c $ : \| bsdtar cf - -T - \| sha1 34e163be8e43c5631d8b92e9c43ab0bf0fa62b9c So NetBSD's native tar creates an empty file, while GNU tar and bsdtar both give us 10KB of NULs -- just like git archive with an empty tree. Now let's see how the archivers handle these two kinds of empty tar files: $ tar tf zero.tar; echo $? tar: Unexpected EOF on archive file 1 $ gtar tf zero.tar; echo $? gtar: This does not look like a tar archive gtar: Exiting with failure status due to previous errors 2 $ bsdtar tf zero.tar; echo $? 0 $ tar tf tenk.tar; echo $? tar: Cannot identify format. Searching... tar: End of archive volume 1 reached tar: Sorry, unable to determine archive format. $ gtar tf tenk.tar; echo $? 0 $ bsdtar tf tenk.tar; echo $? 0 NetBSD's tar complains about both, bsdtar happily accepts any of them and GNU tar doesn't like zero-length archive files. So the safest course of action is to stay with our block-of-NULs format which is compatible with GNU tar and bsdtar, as we can't make NetBSD's native tar happy anyway. We can simplify our test, however, by taking tar out of the picture. Instead of extracting the archive and checking for the non-presence of files, check if the file has a size of 10KB and contains only NULs. This makes t5004 pass on NetBSD and OpenBSD. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-09 12:20:40 -07:00
René Scharfe	abdb9b2e4f	t5004: ignore pax global header file Versions of tar that don't know pax headers -- like the ones in NetBSD 6 and OpenBSD 5.2 -- extract them as regular files. Explicitly ignore the file created for our global header when checking the list of extracted files, as this is normal and harmless fall-back behaviour. This fixes test 3 of t5004 on these platforms. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-05-09 12:18:57 -07:00
René Scharfe	24676f02ba	t5004: fix issue with empty archive test and bsdtar bsdtar, which is the default tar on Mac OS X, handles empty archives just fine but reports archives containing only a pax extended header comment as damaged. Work around the issue by explicitly generating the archive for the tree and not the commit, which causes git archive to omit the commit hash comment record from the tar file. Reported-by: BJ Hargrave <bj@bjhargrave.com> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-04-10 12:26:14 -07:00
Jeff King	bd54cf17a4	archive: handle commits with an empty tree git-archive relies on get_pathspec to convert its argv into a list of pathspecs. When get_pathspec is given an empty argv list, it returns a single pathspec, the empty string, to indicate that everything matches. When we feed this to our path_exists function, we typically see that the pathspec turns up at least one item in the tree, and we are happy. But when our tree is empty, we erroneously think it is because the pathspec is too limited, when in fact it is simply that there is nothing to be found in the tree. This is a weird corner case, but the correct behavior is almost certainly to produce an empty archive, not to exit with an error. This patch teaches git-archive to create empty archives when there is no pathspec given (we continue to complain if a pathspec is given, since it obviously is not matched). It also confirms that the tar and zip writers produce sane output in this instance. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-10 22:25:22 -07:00

19 Commits