git-commit-vandalism

Author	SHA1	Message	Date
René Scharfe	4f4be00d30	archive-tar: use internal gzip by default Drop the dependency on gzip(1) and use our internal implementation to create tar.gz and tgz files. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-15 13:19:47 -07:00
René Scharfe	23fcf8b09f	archive-tar: use OS_CODE 3 (Unix) for internal gzip gzip(1) encodes the OS it runs on in the 10th byte of its output. It uses the following OS_CODE values according to its tailor.h [1]: 0 - MS-DOS 3 - UNIX 5 - Atari ST 6 - OS/2 10 - TOPS-20 11 - Windows NT The gzip.exe that comes with Git for Windows uses OS_CODE 3 for some reason, so this value is used on practically all supported platforms when generating tgz archives using gzip(1). Zlib uses a bigger set of values according to its zutil.h [2], aligned with section 4.4.2 of the ZIP specification, APPNOTE.txt [3]: 0 - MS-DOS 1 - Amiga 3 - UNIX 4 - VM/CMS 5 - Atari ST 6 - OS/2 7 - Macintosh 8 - Z-System 10 - Windows NT 11 - MVS (OS/390 - Z/OS) 13 - Acorn Risc 16 - BeOS 18 - OS/400 19 - OS X (Darwin) Thus the internal gzip implementation in archive-tar.c sets different OS_CODE header values on major platforms Windows and macOS. Git for Windows uses its own zlib-based variant since v2.20.1 by default and thus embeds OS_CODE 10 in tgz archives. The tar archive for a commit is generated consistently on all systems (by the same Git version). The OS_CODE in the gzip header does not influence extraction. Avoid leaking OS information and make tgz archives constistent and reproducable (with the same Git and libz versions) by using OS_CODE 3 everywhere. At least on macOS 12.4 this produces the same output as gzip(1) for the examples I tried: # before $ git -c tar.tgz.command='git archive gzip' archive --format=tgz v2.36.0 \| shasum 3abbffb40b7c63cf9b7d91afc682f11682f80759 - # with this patch $ git -c tar.tgz.command='git archive gzip' archive --format=tgz v2.36.0 \| shasum dc6dc6ba9636d522799085d0d77ab6a110bcc141 - $ git archive --format=tar v2.36.0 \| gzip -cn \| shasum dc6dc6ba9636d522799085d0d77ab6a110bcc141 - [1] https://git.savannah.gnu.org/cgit/gzip.git/tree/tailor.h [2] https://github.com/madler/zlib/blob/master/zutil.h [3] https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-15 13:19:47 -07:00
René Scharfe	76d7602631	archive-tar: add internal gzip implementation Git uses zlib for its own object store, but calls gzip when creating tgz archives. Add an option to perform the gzip compression for the latter using zlib, without depending on the external gzip binary. Plug it in by making write_block a function pointer and switching to a compressing variant if the filter command has the magic value "git archive gzip". Does that indirection slow down tar creation? Not really, at least not in this test: $ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \ './git -C ../linux archive --format=tar HEAD # {rev}' Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s] Range (min … max): 4.038 s … 4.059 s 10 runs Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s] Range (min … max): 4.038 s … 4.066 s 10 runs How does tgz creation perform? $ hyperfine -w3 -L command 'gzip -cn','git archive gzip' \ './git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD' Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s] Range (min … max): 20.395 s … 20.414 s 10 runs Benchmark #2: ./git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s] Range (min … max): 23.782 s … 23.857 s 10 runs Summary './git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran 1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD' So the internal implementation takes 17% longer on the Linux repo, but uses 2% less CPU time. That's because the external gzip can run in parallel on its own processor, while the internal one works sequentially and avoids the inter-process communication overhead. What are the benefits? Only an internal sequential implementation can offer this eco mode, and it allows avoiding the gzip(1) requirement. This implementation uses the helper functions from our zlib.c instead of the convenient gz* functions from zlib, because the latter doesn't give the control over the generated gzip header that the next patch requires. Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-15 13:19:47 -07:00
René Scharfe	dfce1186c6	archive-tar: factor out write_block() All tar archive writes have the same size and are done to the same file descriptor. Move them to a common function, write_block(), to reduce code duplication and make it easy to change the destination. Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com> Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-15 13:19:47 -07:00
René Scharfe	96b9e5151b	archive: rename archiver data field to filter_command The void pointer "data" in struct archiver is only used to store filter commands to pass tar archives to, like gzip. Rename it accordingly and also turn it into a char pointer to document the fact that it's a string reference. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-15 13:19:46 -07:00
Ævar Arnfjörð Bjarmason	4fbedd4dc0	archive-*.c: use designated initializers for "struct archiver" Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-24 15:59:20 -08:00
Ævar Arnfjörð Bjarmason	7f14609e29	run-command API users: use strvec_push(), not argv construction Change a pattern of hardcoding an "argv" array size, populating it and assigning to the "argv" member of "struct child_process" to instead use "strvec_push()" to add data to the "args" member. As noted in the preceding commit this moves us further towards being able to remove the "argv" member in a subsequent commit These callers could have used strvec_pushl(), but moving to strvec_push() makes the diff easier to read, and keeps the arguments aligned as before. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-25 22:15:07 -08:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
René Scharfe	cde8ea9c66	archive: support compression levels beyond 9 Compression programs like zip, gzip, bzip2 and xz allow to adjust the trade-off between CPU cost and size gain with numerical options from -1 for fast compression and -9 for high compression ratio. zip also accepts -0 for storing files verbatim. git archive directly support these single-digit compression levels for ZIP output and passes them to filters like gzip. Zstandard additionally supports compression level options -10 to -19, or up to -22 with --ultra. This seems to work with git archive in most cases, e.g. it will produce an archive with -19 without complaining, but since it only supports single-digit compression level options this is the same as -1 -9 and thus -9. Allow git archive to accept multi-digit compression levels to support the full range supported by zstd. Explicitly reject them for the ZIP format, as otherwise deflateInit2() would just fail with a somewhat cryptic "stream consistency error". Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-09 11:25:45 -08:00
René Scharfe	200589abcb	archive: read short blobs in archive.c::write_archive_entry() Centralize reading of symlink destinations and the contents of regular files that are too small to be streamed. This reduces code duplication and allows future patches to add support for adding non-tracked files to archives. The backends are expected to stream blobs if buffer is NULL. object_file_to_archive() is only called from archive.c and thus no longer exported. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-19 15:56:05 -07:00
Jeff King	f5914f4b6b	parse_config_key(): return subsection len as size_t We return the length to a subset of a string using an "int *" out-parameter. This is fine most of the time, as we'd expect config keys to be relatively short, but it could behave oddly if we had a gigantic config key. A more appropriate type is size_t. Let's switch over, which lets our callers use size_t as appropriate (they are bound by our type because they must pass the out-parameter as a pointer). This is mostly just a cleanup to make it clear this code handles long strings correctly. In practice, our config parser already chokes on long key names (because of a similar int/size_t mixup!). When doing an int/size_t conversion, we have to be careful that nobody was trying to assign a negative value to the variable. I manually confirmed that for each case here. They tend to just feed the result to xmemdupz() or similar; in a few cases I adjusted the parameter types for helper functions to make sure the size_t is preserved. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-10 14:44:29 -07:00
Matheus Tavares	c8123e72f6	streaming: allow open_istream() to handle any repo Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Junio C Hamano	af2b8faf49	Merge branch 'rs/pax-extended-header-length-fix' "git archive" recorded incorrect length in extended pax header in some corner cases, which has been corrected. * rs/pax-extended-header-length-fix: archive-tar: turn length miscalculation warning into BUG archive-tar: use size_t in strbuf_append_ext_header() archive-tar: fix pax extended header length calculation archive-tar: report wrong pax extended header length	2019-09-09 12:26:37 -07:00
René Scharfe	71d41ff651	archive-tar: turn length miscalculation warning into BUG Now that we're confident our pax extended header calculation is correct, turn the criticality of the assertion up to the maximum, from warning right up to BUG. Simplify the test, as the stderr comparison step would not be reached in case the BUG message is triggered. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 10:49:00 -07:00
René Scharfe	17e9ef00d2	archive-tar: use size_t in strbuf_append_ext_header() One of its callers already passes in a size_t value. Use it consistently in this function. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 10:49:00 -07:00
René Scharfe	82a46af13e	archive-tar: fix pax extended header length calculation A pax extended header record starts with a decimal number. Its value is the length of the whole record, including its own length. The calculation of that number in strbuf_append_ext_header() is off by one in case the length of the rest is close to a higher order of magnitude. This affects paths and link targets a bit shorter than 1000, 10000, 100000 etc. characters -- paths with a length of up to 100 fit into the tar header and don't need a pax extended header. The mistake has been present since the function was added by `ae64bbc18c` ("tar-tree: Introduce write_entry()", 2006-03-25). Account for digits added to len during the loop and keep incrementing until we have enough space for len and the rest. The crucial change is to check against the current value of len before each iteration, instead of against its value before the loop. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 10:48:02 -07:00
René Scharfe	4060c1990a	archive-tar: report wrong pax extended header length Extended header entries contain a length value that is a bit tricky to calculate because it includes its own length (number of decimal digits) as well. We get it wrong in corner cases. Add a check, report wrong results as a warning and add a test for exercising it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 10:47:28 -07:00
brian m. carlson	bbf05cf70e	archive: convert struct archiver_args to object_id Change the commit_sha1 member to be called "commit_oid" and change it to be a pointer to struct object_id. Additionally, update some uses of GIT_SHA1_HEXSZ and hard-coded values to use the_hash_algo instead. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 11:57:39 +09:00
Nguyễn Thái Ngọc Duy	ec36c42a63	Indent code with TABs We indent with TABs and sometimes for fine alignment, TABs followed by spaces, but never all spaces (unless the indentation is less than 8 columns). Indenting with spaces slips through in some places. Fix them. Imported code and compat/ are left alone on purpose. The former should remain as close as upstream as possible. The latter pretty much has separate maintainers, it's up to them to decide. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-12-09 12:37:32 +09:00
Torsten Bögershausen	ca473cef91	Upcast size_t variables to uintmax_t when printing When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 16:43:52 +09:00
Junio C Hamano	dc0f6f9e1d	Merge branch 'nd/no-the-index' The more library-ish parts of the codebase learned to work on the in-core index-state instance that is passed in by their callers, instead of always working on the singleton "the_index" instance. * nd/no-the-index: (24 commits) blame.c: remove implicit dependency on the_index apply.c: remove implicit dependency on the_index apply.c: make init_apply_state() take a struct repository apply.c: pass struct apply_state to more functions resolve-undo.c: use the right index instead of the_index archive-.c: use the right repository archive.c: avoid access to the_index grep: use the right index instead of the_index attr: remove index from git_attr_set_direction() entry.c: use the right index instead of the_index submodule.c: use the right index instead of the_index pathspec.c: use the right index instead of the_index unpack-trees: avoid the_index in verify_absent() unpack-trees: convert clear_ce_flags to avoid the_index unpack-trees: don't shadow global var the_index unpack-trees: add a note about path invalidation unpack-trees: remove 'extern' on function declaration ls-files: correct index argument to get_convert_attr_ascii() preload-index.c: use the right index instead of the_index dir.c: remove an implicit dependency on the_index in pathspec code ...	2018-08-20 11:33:53 -07:00
Junio C Hamano	4bea8485e3	Merge branch 'nd/i18n' Many more strings are prepared for l10n. * nd/i18n: (23 commits) transport-helper.c: mark more strings for translation transport.c: mark more strings for translation sha1-file.c: mark more strings for translation sequencer.c: mark more strings for translation replace-object.c: mark more strings for translation refspec.c: mark more strings for translation refs.c: mark more strings for translation pkt-line.c: mark more strings for translation object.c: mark more strings for translation exec-cmd.c: mark more strings for translation environment.c: mark more strings for translation dir.c: mark more strings for translation convert.c: mark more strings for translation connect.c: mark more strings for translation config.c: mark more strings for translation commit-graph.c: mark more strings for translation builtin/replace.c: mark more strings for translation builtin/pack-objects.c: mark more strings for translation builtin/grep.c: mark strings for translation builtin/config.c: mark more strings for translation ...	2018-08-15 15:08:23 -07:00
Nguyễn Thái Ngọc Duy	b67b55127c	archive-*.c: use the right repository With 'struct archive_args' gaining new repository pointer, we don't have to assume the_repository in the archive backends anymore. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-13 14:14:44 -07:00
Nguyễn Thái Ngọc Duy	d0482e697c	archive-tar.c: mark more strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-23 11:19:09 -07:00
Junio C Hamano	00624d608c	Merge branch 'sb/object-store-grafts' The conversion to pass "the_repository" and then "a_repository" throughout the object access API continues. * sb/object-store-grafts: commit: allow lookup_commit_graft to handle arbitrary repositories commit: allow prepare_commit_graft to handle arbitrary repositories shallow: migrate shallow information into the object parser path.c: migrate global git_path_* to take a repository argument cache: convert get_graft_file to handle arbitrary repositories commit: convert read_graft_file to handle arbitrary repositories commit: convert register_commit_graft to handle arbitrary repositories commit: convert commit_graft_pos() to handle arbitrary repositories shallow: add repository argument to is_repository_shallow shallow: add repository argument to check_shallow_file_for_update shallow: add repository argument to register_shallow shallow: add repository argument to set_alternate_shallow_file commit: add repository argument to lookup_commit_graft commit: add repository argument to prepare_commit_graft commit: add repository argument to read_graft_file commit: add repository argument to register_commit_graft commit: add repository argument to commit_graft_pos object: move grafts to object parser object-store: move object access functions to object-store.h	2018-07-18 12:20:28 -07:00
Junio C Hamano	50f08db594	Merge branch 'js/use-bug-macro' Developer support update, by using BUG() macro instead of die() to mark codepaths that should not happen more clearly. * js/use-bug-macro: BUG_exit_code: fix sparse "symbol not declared" warning Convert remaining die*(BUG) messages Replace all die("BUG: ...") calls by BUG() ones run-command: use BUG() to report bugs, not die() test-tool: help verifying BUG() code paths	2018-05-30 14:04:07 +09:00
Stefan Beller	cbd53a2193	object-store: move object access functions to object-store.h This should make these functions easier to find and cache.h less overwhelming to read. In particular, this moves: - read_object_file - oid_object_info - write_object_file As a result, most of the codebase needs to #include object-store.h. In this patch the #include is only added to files that would fail to compile otherwise. It would be better to #include wherever identifiers from the header are used. That can happen later when we have better tooling for it. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-16 11:42:03 +09:00
Johannes Schindelin	033abf97fc	Replace all die("BUG: ...") calls by BUG() ones In `d8193743e0` (usage.c: add BUG() function, 2017-05-12), a new macro was introduced to use for reporting bugs instead of die(). It was then subsequently used to convert one single caller in `588a538ae5` (setup_git_env: convert die("BUG") to BUG(), 2017-05-12). The cover letter of the patch series containing this patch (cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not terribly clear why only one call site was converted, or what the plan is for other, similar calls to die() to report bugs. Let's just convert all remaining ones in one fell swoop. This trick was performed by this invocation: sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c) Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-05-06 19:06:13 +09:00
Stefan Beller	0df8e96566	cache.h: add repository argument to oid_object_info Add a repository argument to allow the callers of oid_object_info to be more specific about which repository to handle. This is a small mechanical change; it doesn't change the implementation to handle repositories other than the_repository yet. As with the previous commits, use a macro to catch callers passing a repository other than the_repository at compile time. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-04-26 10:54:27 +09:00
brian m. carlson	abef9020e3	sha1_file: convert sha1_object_info* to object_id Convert sha1_object_info and sha1_object_info_extended to take pointers to struct object_id and rename them to use "oid" instead of "sha1" in their names. Update the declaration and definition and apply the following semantic patch, plus the standard object_id transforms: @@ expression E1, E2; @@ - sha1_object_info(E1.hash, E2) + oid_object_info(&E1, E2) @@ expression E1, E2; @@ - sha1_object_info(E1->hash, E2) + oid_object_info(E1, E2) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1.hash, E2, E3) + oid_object_info_extended(&E1, E2, E3) @@ expression E1, E2, E3; @@ - sha1_object_info_extended(E1->hash, E2, E3) + oid_object_info_extended(E1, E2, E3) Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:49 -07:00
brian m. carlson	ef7b5195f1	streaming: convert open_istream to use struct object_id Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:49 -07:00
brian m. carlson	e5ec981a4b	archive: convert sha1_file_to_archive to struct object_id Convert this function to take a pointer to struct object_id and rename it object_file_to_archive. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:48 -07:00
brian m. carlson	015ff4f822	archive: convert write_archive_entry_fn_t to object_id Convert the write_archive_entry_fn_t type to use a pointer to struct object_id. Convert various static functions in the tar and zip archivers also. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-03-14 09:23:48 -07:00
Junio C Hamano	f31d23a399	Merge branch 'bw/config-h' Fix configuration codepath to pay proper attention to commondir that is used in multi-worktree situation, and isolate config API into its own header file. * bw/config-h: config: don't implicitly use gitdir or commondir config: respect commondir setup: teach discover_git_directory to respect the commondir config: don't include config.h by default config: remove git_config_iter config: create config.h	2017-06-24 14:28:41 -07:00
Brandon Williams	b2141fc1d2	config: don't include config.h by default Stop including config.h by default in cache.h. Instead only include config.h in those files which require use of the config system. Signed-off-by: Brandon Williams <bmwill@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-06-15 12:56:22 -07:00
Ramsay Jones	3f789719a6	archive-tar: fix a sparse 'constant too large' warning Commit `dddbad728c` ("timestamp_t: a new data type for timestamps", 26-04-2017) introduced a new typedef 'timestamp_t', as a synonym for an unsigned long, which was used at the time to represent timestamps in git. A later commit `28f4aee3fb` ("use uintmax_t for timestamps", 26-04-2017) changed the typedef to use an 'uintmax_t' for the timestamp representation type. When building on a 32-bit Linux system, sparse complains that a constant (USTAR_MAX_MTIME) used to detect a 'far-future mtime' timestamp, is too large; 'warning: constant 077777777777UL is so big it is unsigned long long' on lines 335 and 338 of archive-tar.c. Note that both gcc and clang only issue a warning if this constant is used in a context that requires an 'unsigned long' (rather than an uintmax_t). (Since TIME_MAX is no longer equal to 0xFFFFFFFF, even on a 32-bit system, the macro USTAR_MAX_MTIME is set to 077777777777UL, which cannot be represented as an 'unsigned long' constant). In order to suppress the warning, change the definition of the macro constant USTAR_MAX_MTIME to use an 'ULL' type suffix. In a similar vein, on systems which use a 64-bit representation of the 'unsigned long' type, the USTAR_MAX_SIZE constant macro is defined with the value 077777777777ULL. Although this does not cause any warning messages to be issued, it would be more appropriate for this constant to use an 'UL' type suffix rather than 'ULL'. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-05-09 11:23:14 +09:00
Johannes Schindelin	dddbad728c	timestamp_t: a new data type for timestamps Git's source code assumes that unsigned long is at least as precise as time_t. Which is incorrect, and causes a lot of problems, in particular where unsigned long is only 32-bit (notably on Windows, even in 64-bit versions). So let's just use a more appropriate data type instead. In preparation for this, we introduce the new `timestamp_t` data type. By necessity, this is a very, very large patch, as it has to replace all timestamps' data type in one go. As we will use a data type that is not necessarily identical to `time_t`, we need to be very careful to use `time_t` whenever we interact with the system functions, and `timestamp_t` everywhere else. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-04-27 13:07:39 +09:00
Junio C Hamano	57734b4e88	Merge branch 'jk/big-and-future-archive-tar' A small code clean-up. * jk/big-and-future-archive-tar: archive-tar: make write_extended_header() void	2016-08-12 09:47:37 -07:00
René Scharfe	560b0e8f52	archive-tar: make write_extended_header() void The function write_extended_header() only ever returns 0. Simplify it and its caller by dropping its return value, like we did with write_global_extended_header() earlier. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-06 10:31:51 -07:00
Junio C Hamano	29493589e9	archive-tar: huge offset and future timestamps would not work on 32-bit As we are not yet moving everything to size_t but still using ulong internally when talking about the size of object, platforms with 32-bit long will not be able to produce tar archive with 4GB+ file, and cannot grok 077777777777UL as a constant. Disable the extended header feature and do not test it on them. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-15 10:51:55 -07:00
Jeff King	5caeeb83bc	archive-tar: drop return value We never do any error checks, and so never return anything but "0". Let's just drop this to simplify the code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-01 10:26:28 -07:00
Jeff King	6e8e0991e5	archive-tar: write extended headers for far-future mtime The ustar format represents timestamps as seconds since the epoch, but only has room to store 11 octal digits. To express anything larger, we need to use an extended header. This is exactly the same case we fixed for the size field in the previous commit, and the solution here follows the same pattern. This is even mentioned as an issue in `f2f0267` (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), but since it only affected things far in the future, it wasn't deemed worth dealing with. But note that my calculations claiming thousands of years were off there; because our xsnprintf produces a NUL byte, we only have until the year 2242 to fix this. Given that this is just around the corner (geologically speaking, anyway), and because it's easy to fix, let's just make it work. Unlike the previous fix for "size", where we had to write an individual extended header for each file, we can write one global header (since we have only one mtime for the whole archive). There's a slight bit of trickiness there. We may already be writing a global header with a "comment" field for the commit sha1. So we need to write our new field into the same header. To do this, we push the decision of whether to write such a header down into write_global_extended_header(), which will now assemble the header as it sees fit, and will return early if we have nothing to write (in practice, we'll only have a large mtime if it comes from a commit, but this makes it also work if you set your system clock ahead such that time() returns a huge value). Note that we don't (and never did) handle negative timestamps (i.e., before 1970). This would probably not be too hard to support in the same way, but since git does not support negative timestamps at all, I didn't bother here. After writing the extended header, we munge the timestamp in the ustar headers to the maximum-allowable size. This is wrong, but it's the least-wrong thing we can provide to a tar implementation that doesn't understand pax headers (it's also what GNU tar does). Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-01 10:26:01 -07:00
Jeff King	d1657b570a	archive-tar: write extended headers for file sizes >= 8GB The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to `f2f0267` (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-01 10:25:46 -07:00
Jeff King	9e6c1e91a3	archive-tar: convert snprintf to xsnprintf Commit `f2f0267` (archive-tar: use xsnprintf for trivial formatting, 2015-09-24) converted cases of "sprintf" to "xsnprintf", but accidentally left one as just "snprintf". This meant that we could silently truncate the resulting buffer instead of flagging an error. In practice, this is impossible to achieve, as we are formatting a ustar checksum, which can be at most 7 characters. But the point of xsnprintf is to document and check for "should be impossible" conditions; this site was just accidentally mis-converted during `f2f0267`. Noticed-by: Paul Green <Paul.Green@stratus.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-05-26 10:44:26 -07:00
Jeff King	f2f0267529	archive-tar: use xsnprintf for trivial formatting When we generate tar headers, we sprintf() values directly into a struct with the fixed-size header values. For the most part this is fine, as we are formatting small values (e.g., the octal format of "mode & 0x7777" is of fixed length). But it's still a good idea to use xsnprintf here. It communicates to readers what our expectation is, and it provides a run-time check that we are not overflowing the buffers. The one exception here is the mtime, which comes from the epoch time of the commit we are archiving. For sane values, this fits into the 12-byte value allocated in the header. But since git can handle 64-bit times, if I claim to be a visitor from the year 10,000 AD, I can overflow the buffer. This turns out to be harmless, as we simply overflow into the chksum field, which is then overwritten. This case is also best as an xsnprintf. It should never come up, short of extremely malformed dates, and in that case we are probably better off dying than silently truncating the date value (and we cannot expand the size of the buffer, since it is dictated by the ustar format). Our friends in the year 5138 (when we legitimately flip to a 12-digit epoch) can deal with that problem then. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Jeff King	5096d4909f	convert trivial sprintf / strcpy calls to xsnprintf We sometimes sprintf into fixed-size buffers when we know that the buffer is large enough to fit the input (either because it's a constant, or because it's numeric input that is bounded in size). Likewise with strcpy of constant strings. However, these sites make it hard to audit sprintf and strcpy calls for buffer overflows, as a reader has to cross-reference the size of the array with the input. Let's use xsnprintf instead, which communicates to a reader that we don't expect this to overflow (and catches the mistake in case we do). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Jeff King	108332c7a0	archive-tar: fix minor indentation violation This looks like a simple omission from `8539070` (archive-tar: unindent write_tar_entry by one level, 2012-05-03). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2015-09-25 10:18:18 -07:00
Junio C Hamano	15c6ef7b06	Revert "archive: honor tar.umask even for pax headers" This reverts commit `10f343ea81`, whose output is no longer bit-for-bit equivalent from the older versions of Git, which the infrastructure to (pretend to) upload tarballs kernel.org uses depends on.	2014-10-20 12:04:46 -07:00
Junio C Hamano	825fd93767	Merge branch 'rs/child-process-init' Code clean-up. * rs/child-process-init: run-command: inline prepare_run_command_v_opt() run-command: call run_command_v_opt_cd_env() instead of duplicating it run-command: introduce child_process_init() run-command: introduce CHILD_PROCESS_INIT	2014-09-11 10:33:27 -07:00
René Scharfe	d318027932	run-command: introduce CHILD_PROCESS_INIT Most struct child_process variables are cleared using memset first after declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to initialize them statically instead. That's shorter, doesn't require a function call and is slightly more readable (especially given that we already have STRBUF_INIT, ARGV_ARRAY_INIT etc.). Helped-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-08-20 09:53:37 -07:00

1 2

91 Commits