Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set the text flag for ZIP archive entries that look like text files so
that unzip -a can be used to perform end-of-line conversions. Info-ZIP
zip does the same.
Detect binary files the same way as git diff and git grep do, namely by
checking for the attribute "diff" and its negation "-diff", and if none
is found by falling back to checking for the presence of NUL bytes in
the first few bytes of the file contents.
7-Zip, Windows' built-in ZIP functionality and Info-ZIP unzip without
the switch -a are not affected by the change and still extract text
files without doing any end-of-line conversions.
NB: The actual end-of-line style used in the archive entries doesn't
matter to unzip -a, as it converts any CR, CRLF and LF to the line end
characters appropriate for the platform it is running on.
Suggested-by: Ulrike Fischer <luatex@nililand.de>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is set to zero just 3 lines before.
Reported by cppcheck.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use the function git_deflate_init() -- which wraps the zlib function
deflateInit() -- to initialize compression of ZIP file entries. This
results in compressed data prefixed with a two-bytes long header and
followed by a four-bytes trailer. ZIP file entries consist of ZIP
headers and raw compressed data instead, so we remove the zlib wrapper
before writing the result.
We can ask zlib for the the raw compressed data without the unwanted
parts in the first place by using deflateInit2() and specifying a
negative number of bits to size the window. For that purpose, factor
out the function do_git_deflate_init() and add git_deflate_init_raw(),
which wraps it. Then use the latter in archive-zip.c and get rid of
the code that stripped the zlib header and trailer.
Also rename the helper function zlib_deflate() to zlib_deflate_raw()
to reflect the change.
Thus we avoid generating data that we throw away anyway, the code
becomes shorter and some magic constants are removed.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently ZIP archive entries of files with export-subst attribute are
broken if they are stored uncompressed.
We get the size of a file from sha1_object_info(), but this number is
likely wrong for files whose contents are changed due to export-subst
placeholder expansion. We use sha1_file_to_archive() to get the
expanded file contents and size in that case. We proceed to use that
size for the uncompressed size field (good), but the compressed size
field is set based on the size from sha1_object_info() (bad).
This matters only for uncompressed files because for deflated files
we use the correct value after compression is done. And for files
without export-subst expansion the sizes from sha1_object_info() and
sha1_file_to_archive() are the same, so they are unaffected as well.
This patch fixes the issue by setting the compressed size based on the
uncompressed size only after we actually know the latter.
Also make use of the test file substfile1 to check for the breakage;
it was only stored verbatim so far. For that purpose, set the
attribute export-subst and replace its contents with the expected
expansion after committing.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve compatibility of our zip output to fill uncompressed size
in the header, which we can do without seeking back (even though it
should not be necessary).
* rs/zip-with-uncompressed-size-in-the-header:
archive-zip: write uncompressed size into header even with streaming
We record the uncompressed and compressed sizes and the CRC of streamed
files as zero in the local header of the file. The actual values are
recorded in an extra data descriptor after the file content, and in the
usual ZIP directory entry at the end of the archive.
While we know the compressed size and the CRC only after we processed
the contents, we actually know the uncompressed size right from the
start. And for files that we store uncompressed we also already know
their final size.
Do it like InfoZIP's zip and recored the known values, even though they
can be reconstructed using the ZIP directory and the data descriptors
alone. InfoZIP's unzip worked fine before, but NetBSD's version
actually depends on these fields.
The uncompressed size is already set by sha1_object_info(). We just
need to initialize the compressed size to zero or the uncompressed size
depending on the compression method (0 means storing). The CRC was
propertly initialized already.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
File modification times in ZIP files are encoded in DOS format: local
time with a granularity of two seconds. Add an extra field to all
archive entries to also record the mtime in Unix' fashion, as UTC with
a granularity of one second.
This has the desirable side-effect of convincing Info-ZIP unzip 6.00
to respect general purpose flag 11, which is used to indicate that a
file name is encoded in UTF-8. Any extra field would do, actually,
but the extended timestamp is a reasonably small one (22 bytes per
entry). Archives created by Info-ZIP zip 3.0 contain it, too (but
with ctime and atime as well).
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set general purpose flag 11 if we encounter a path that contains
non-ASCII characters. We assume that all paths are given as UTF-8; no
conversion is done.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set general purpose flag 11 if we encounter a path that contains
non-ASCII characters. We assume that all paths are given as UTF-8; no
conversion is done.
The flag seems to be ignored by unzip unless we also mark the archive
entry as coming from a Unix system. This is done by setting the field
creator_version ("version made by" in the standard[1]) to 0x03NN.
The NN part represents the version of the standard supported by us, and
this patch sets it to 3f (for version 6.3) for Unix paths. We keep
creator_version set to 0 (FAT filesystem, standard version 0) in the
non-special cases, as before.
But when we declare a file to have a Unix path, then we have to set the
file mode as well, or unzip will extract the files with the permission
set 0000, i.e. inaccessible by all.
[1] http://www.pkware.com/documents/casestudies/APPNOTE.TXT
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After an entry has been streamed out, its CRC and sizes are written as
part of a data descriptor.
For simplicity, we make the buffer for the compressed chunks twice as
big as for the uncompressed ones, to be sure the result fit in even
if deflate makes them bigger.
t5000 verifies output. t1050 makes sure the command always respects
core.bigfilethreshold
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Write a data descriptor containing the CRC of the entry and its sizes
after streaming it out. For simplicity, do that only if we're storing
files (option -0) for now.
t5000 verifies output. t1050 makes sure the command always respects
core.bigfilethreshold
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're going to reuse them soon for streaming. Also, update the ZIP
directory only at the very end, which will also make streaming easier.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only need size and compressed_size.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
archive-tar.c and archive-zip.c now perform conversion check, with
help of sha1_file_to_archive() from archive.c
This gives backends more freedom in dealing with (streaming) large
blobs.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tar filters may be very expensive to run, so sites do
not want to expose them via upload-archive. This patch lets
users configure tar.<filter>.remote to turn them off.
By default, gzip filters are left on, as they are about as
expensive as creating zip archives.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current archivers are very static; when you are in the
write_tar_archive function, you know you are writing a tar.
However, to facilitate runtime-configurable archivers
that will share a common write function we need to tell the
function which archiver was used.
As a convenience, we also provide an opaque data pointer in
the archiver struct so that individual archivers can put
something useful there when they register themselves.
Technically they could just use the "name" field to look in
an internal map of names to data, but this is much simpler.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the tar and zip code was nicely split out into two
abstracted files which knew only about their specific
formats. The entry point to this code was a single "write
archive" function.
However, as these basic formats grow more complex (e.g., by
handling multiple file extensions and format names), a
static list of the entry point functions won't be enough.
Instead, let's provide a way for the tar and zip code to
tell the main archive code what they support by registering
archiver names and functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.
But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.
In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.
Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().
There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
zlib_compression_level is the compression level used for git's object store.
It's 1 by default, which is the fastest setting. This variable is also used
as the default compression level for ZIP archives created by git archive.
For archives, however, zlib's own default of 6 is more appropriate, as it's
favouring small size over speed -- archive creation is not that performance
critical most of the time.
This patch makes git archive independent from git's internal compression
level setting. It affects invocations of git archive without explicitly
specified compression level option, only.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the code that calls backend specific argument parsers by a
simple flag mechanism. This reduces code size and complexity.
We can add back such a mechanism (based on incremental parse_opt(),
perhaps) when we need it. The compression level parameter, though,
is going to be shared by future compressing backends like tgz.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the exported function write_archive_entries() to archive.c, which uses
the new ability of read_tree_recursive() to pass a context pointer to its
callback in order to centralize previously duplicated code.
The new callback function write_archive_entry() does the work that every
archiver backend needs to do: loading file contents, entering subdirectories,
handling file attributes, constructing the full path of the entry. All that
done, it calls the backend specific write_archive_entry_fn_t function.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Calculate the length of base and save it in a new member of struct
archiver_args. This way we don't have to compute it in each of the
format backends.
Note: parse_archive_args() guarantees that ->base won't ever be NULL.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a pointer parameter to read_tree_recursive(), which is passed to the
callback function. This allows callers of read_tree_recursive() to
share data with the callback without resorting to global variables. All
current callers pass NULL.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paths marked with this attribute are not output to git-archive
output.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ulrik Sverdrup noticed that git-archive doesn't correctly apply the attribute
export-subst when the option --prefix is given, too.
When it checked if a file has the attribute turned on, git-archive would try
to look up the full path -- including the prefix -- in .gitattributes. That's
wrong, as the prefix doesn't need to have any relation to any existing
directories, tracked or not.
This patch makes git-archive ignore the prefix when looking up if value of the
attribute export-subst for a file.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for a new attribute, specfile. Files marked as being
specfiles are expanded by git-archive when they are written to an
archive. It has no effect on worktree files. The same placeholders
as those for the option --pretty=format: of git-log et al. can be
used.
The attribute is useful for creating auto-updating specfiles. It is
limited by the underlying function format_commit_message(), though.
E.g. currently there is no placeholder for git-describe like output,
and expanded specfiles can't contain NUL bytes. That can be fixed
in format_commit_message() later and will then benefit users of
git-log, too.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As noted by Johan Herland, git-archive is a kind of checkout and needs
to apply any checkout filters that might be configured.
This patch adds the convenience function convert_sha1_file which returns
a buffer containing the object's contents, after converting, if necessary
(i.e. it's a combination of read_sha1_file and convert_to_working_tree).
Direct calls to read_sha1_file in git-archive are then replaced by calls
to convert_sha1_file.
Since convert_sha1_file expects its path argument to be NUL-terminated --
a convention it inherits from convert_to_working_tree -- the patch also
changes the path handling in archive-tar.c to always NUL-terminate the
string. It used to solely rely on the len field of struct strbuf before.
archive-zip.c already NUL-terminates the path and thus needs no such
change.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Both archive-tar and archive-zip needed to be taught about subprojects.
The tar function died when trying to read the subproject commit object,
while the zip function reported "unsupported file mode".
This fixes both by representing the subproject as an empty directory.
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
We can't rely on sizeof(struct zip_*) returning the sum of
all struct members. At least on ARM padding is added at the
end, as Gerrit Pape reported. This fixes the problem but
still lets the compiler do the summing by introducing
explicit padding at the end of the structs and then taking
its offset as the combined size of the preceding members.
As Junio correctly notes, the _end[] marker array's size
must be greater than zero for compatibility with compilers
other than gcc. The space wasted by the markers can safely
be neglected because we only have one instance of each
struct, i.e. in sum 3 wasted bytes on i386, and 0 on ARM. :)
We still rely on the compiler to not add padding between the
struct members, but that's reasonable given that all of them
are unsigned char arrays.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Z_NULL is defined as 0, use a proper NULL pointer in its stead.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add symlink support to ZIP file creation, and a few tests.
This implementation sets the "version made by" field
(creator_version) to Unix for symlinks, only; regular files and
directories are still marked as originating from FAT/VFAT/NTFS.
Also set "external file attributes" (attr2) to 0 for regular
files and 16 for directories (FAT attribute), and to the file
mode for symlinks.
We could always set the creator_version to Unix and include the
mode, but then Info-ZIP unzip would set the mode of the extracted
files to *exactly* the value stored in attr2. The FAT trick
makes it apply the umask instead. Note: FAT has no executable
bit, so this information is not stored in the ZIP file.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Use 10 for the "version needed to extract" field. This is the
default value, and we want to use it because we don't do anything
special. Info-ZIP's zip uses it, too.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>