git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	193e28f050	Merge branch 'tr/packed-object-info-wo-recursion' Attempts to reduce the stack footprint of sha1_object_info() and unpack_entry() codepaths. * tr/packed-object-info-wo-recursion: sha1_file: remove recursion in unpack_entry Refactor parts of in_delta_base_cache/cache_or_unpack_entry sha1_file: remove recursion in packed_object_info	2013-04-18 11:46:23 -07:00
Junio C Hamano	b9c78e9723	Merge branch 'jk/check-corrupt-objects-carefully' Have the streaming interface and other codepaths more carefully examine for corrupt objects. * jk/check-corrupt-objects-carefully: clone: leave repo in place after checkout errors clone: run check_everything_connected clone: die on errors from unpack_trees add tests for cloning corrupted repositories streaming_write_entry: propagate streaming errors add test for streaming corrupt blobs avoid infinite loop in read_istream_loose read_istream_filtered: propagate read error from upstream check_sha1_signature: check return value from read_istream stream_blob_to_fd: detect errors reading from stream	2013-04-03 09:34:29 -07:00
Junio C Hamano	37ba4c61d0	Merge branch 'sw/safe-create-leading-dir-race' * sw/safe-create-leading-dir-race: safe_create_leading_directories: fix race that could give a false negative	2013-04-02 15:09:48 -07:00
Jeff King	f54fac5378	check_sha1_signature: check return value from read_istream It's possible for read_istream to return an error, in which case we just end up in an infinite loop (aside from EOF, we do not even look at the result, but just feed it straight into our running hash). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:46:55 -07:00
Thomas Rast	abe601bba5	sha1_file: remove recursion in unpack_entry Similar to the recursion in packed_object_info(), this leads to problems on stack-space-constrained systems in the presence of long delta chains. We proceed in three phases: 1. Dig through the delta chain, saving each delta object's offsets and size on an ad-hoc stack. 2. Unpack the base object at the bottom. 3. Unpack and apply the deltas from the stack. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:25:16 -07:00
Thomas Rast	84dd81c126	Refactor parts of in_delta_base_cache/cache_or_unpack_entry The delta base cache lookup and test were shared. Refactor them; we'll need both parts again. Also, we'll use the clearing routine later. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-27 13:24:43 -07:00
Steven Walter	928734d993	safe_create_leading_directories: fix race that could give a false negative If two processes are racing to create the same directory tree, they will both see that the directory doesn't exist, both try to mkdir(), and one of them will fail. This is okay, as we only care that the directory gets created. So, we add a check for EEXIST from mkdir, and continue when the directory exists, taking the same codepath as the case where the earlier stat() succeeds and finds a directory. Signed-off-by: Steven Walter <stevenrwalter@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-26 21:07:42 -07:00
Thomas Rast	790d96c023	sha1_file: remove recursion in packed_object_info packed_object_info() and packed_delta_info() were mutually recursive. The former would handle ordinary types and defer deltas to the latter; the latter would use the former to resolve the delta base. This arrangement, however, leads to trouble with threaded index-pack and long delta chains on platforms where thread stacks are small, as happened on OS X (512kB thread stacks by default) with the chromium repo. The task of the two functions is not all that hard to describe without any recursion, however. It proceeds in three steps: - determine the representation type and size, based on the outermost object (delta or not) - follow through the delta chain, if any - determine the object type from what is found at the end of the delta chain The only complication stems from the error recovery. If parsing fails at any step, we want to mark that object (within the pack) as bad and try getting the corresponding SHA1 from elsewhere. If that also fails, we want to repeat this process back up the delta chain until we find a reasonable solution or conclude that there is no way to reconstruct the object. (This is conveniently checked by t5303.) To achieve that within the pack, we keep track of the entire delta chain in a stack. When things go sour, we process that stack from the top, marking entries as bad and attempting to re-resolve by sha1. To avoid excessive malloc(), the stack starts out with a small stack-allocated array. The choice of 64 is based on the default of pack.depth, which is 50, in the hope that it covers "most" delta chains without any need for malloc(). It's much harder to make the actual re-resolving by sha1 nonrecursive, so we skip that. If you can't afford that recursion, your corruption problems are more serious than your stack size problems. Reported-by: Stefan Zager <szager@google.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-03-25 15:48:18 -07:00
Nguyễn Thái Ngọc Duy	543c5caa6c	count-objects: report garbage files in pack directory too prepare_packed_git_one() is modified to allow count-objects to hook a report function to so we don't need to duplicate the pack searching logic in count-objects.c. When report_pack_garbage is NULL, the overhead is insignificant. The garbage is reported with warning() instead of error() in packed garbage case because it's not an error to have garbage. Loose garbage is still reported as errors and will be converted to warnings later. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-15 08:13:13 -08:00
Nguyễn Thái Ngọc Duy	d90906a902	sha1_file: reorder code in prepare_packed_git_one() The current loop does while (...) { if (it is not an .idx file) continue; process .idx file; } and is reordered to while (...) { if (it is an .idx file) { process .idx file; } } This makes it easier to add new extension file processing. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-13 07:42:05 -08:00
Michael Haggerty	c595016402	link_alt_odb_entries(): take (char *, len) rather than two pointers Change link_alt_odb_entries() to take the length of the "alt" parameter rather than a pointer to the end of the "alt" string. This is the more common calling convention and simplifies the code a tiny bit. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>	2012-11-08 12:06:53 -05:00
Michael Haggerty	6eac50d827	link_alt_odb_entries(): use string_list_split_in_place() Change link_alt_odb_entry() to take a NUL-terminated string instead of (char *, len). Use string_list_split_in_place() rather than inline code in link_alt_odb_entries(). This approach saves some code and also avoids the (probably harmless) error of passing a non-NUL-terminated string to is_absolute_path(). Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>	2012-11-08 12:06:53 -05:00
Joachim Schmitz	a0788266d3	sha1_file.c: introduce get_max_fd_limit() helper Not all platforms have getrlimit(), but there are other ways to see the maximum number of files that a process can have open. If getrlimit() is unavailable, fall back to sysconf(_SC_OPEN_MAX) if available, and use OPEN_MAX from <limits.h>. Signed-off-by: Joachim Schmitz <jojo@schmitz-digital.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-24 09:46:01 -07:00
Junio C Hamano	fbea95ce10	Merge branch 'hv/link-alt-odb-entry' The code to avoid mistaken attempt to add the object directory itself as its own alternate could read beyond end of a string while comparison. * hv/link-alt-odb-entry: link_alt_odb_entry: fix read over array bounds reported by valgrind	2012-07-30 12:55:01 -07:00
Heiko Voigt	cb2912c324	link_alt_odb_entry: fix read over array bounds reported by valgrind pfxlen can be longer than the path in objdir when relative_base contains the path to gits object directory. Here we are interested in checking if ent->base[] (the part that corresponds to .git/objects) is the same string as objdir, and the code NUL-terminated ent->base[] to LEADING PATH\0XX/XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\0 in preparation for these "duplicate check" step (before we return from the function, the first NUL is turned into '/' so that we can fill XX when probing for loose objects). All we need to do is to compare the string with the path to our object directory. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-29 18:02:51 -07:00
Junio C Hamano	4809ff858b	Merge branch 'hv/submodule-alt-odb' When peeking into object stores of submodules, the code forgot that they might borrow objects from alternate object stores on their own. By Heiko Voigt * hv/submodule-alt-odb: teach add_submodule_odb() to look for alternates	2012-05-23 13:35:06 -07:00
Heiko Voigt	5e73633dbf	teach add_submodule_odb() to look for alternates Since we allow to link other object databases when loading a submodules database we should also load possible alternates. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-14 11:56:42 -07:00
Pete Wyckoff	5eaeda70de	remove blank filename in error message When write_loose_object() finds that it is unable to create a temporary file, it complains, for instance: unable to create temporary sha1 filename : Too many open files That extra space was supposed to be the name of the file, and will be an empty string if the git_mkstemps_mode() fails. The name of the temporary file is unimportant; delete it. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:54 -07:00
Pete Wyckoff	82247e9bd5	remove superfluous newlines in error messages The error handling routines add a newline. Remove the duplicate ones in error messages. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-30 15:45:51 -07:00
Nguyễn Thái Ngọc Duy	090ea12671	parse_object: avoid putting whole blob in core Traditionally, all the callers of check_sha1_signature() first called read_sha1_file() to prepare the whole object data in core, and called this function. The function is used to revalidate what we read from the object database actually matches the object name we used to ask for the data from the object database. Update the API to allow callers to pass NULL as the object data, and have the function read and hash the object data using streaming API to recompute the object name, without having to hold everything in core at the same time. This is most useful in parse_object() that parses a blob object, because this caller does not have to keep the actual blob data around in memory after a "struct blob" is returned. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-03-07 09:07:38 -08:00
Junio C Hamano	a09a0c2709	Merge branch 'jk/maint-avoid-streaming-filtered-contents' into maint * jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode	2012-03-04 22:16:40 -08:00
Junio C Hamano	31e3d834b3	Merge branch 'jk/maint-avoid-streaming-filtered-contents' * jk/maint-avoid-streaming-filtered-contents: do not stream large files to pack when filters are in use teach dry-run convert_to_git not to require a src buffer teach convert_to_git a "dry run" mode	2012-02-26 23:05:38 -08:00
Jeff King	4f22b1015d	do not stream large files to pack when filters are in use Because git's object format requires us to specify the number of bytes in the object in its header, we must know the size before streaming a blob into the object database. This is not a problem when adding a regular file, as we can get the size from stat(). However, when filters are in use (such as autocrlf, or the ident, filter, or eol gitattributes), we have no idea what the ultimate size will be. The current code just punts on the whole issue and ignores filter configuration entirely for files larger than core.bigfilethreshold. This can generate confusing results if you use filters for large binary files, as the filter will suddenly stop working as the file goes over a certain size. Rather than try to handle unknown input sizes with streaming, this patch just turns off the streaming optimization when filters are in use. This has a slight performance regression in a very specific case: if you have autocrlf on, but no gitattributes, a large binary file will avoid the streaming code path because we don't know beforehand whether it will need conversion or not. But if you are handling large binary files, you should be marking them as such via attributes (or at least not using autocrlf, and instead marking your text files as such). And the flip side is that if you have a large _non_-binary file, there is a correctness improvement; before we did not apply the conversion at all. The first half of the new t1051 script covers these failures on input. The second half tests the matching output code paths. These already work correctly, and do not need any adjustment. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-24 14:18:20 -08:00
Junio C Hamano	f3ccea8dd4	Merge branch 'nd/find-pack-entry-recent-cache-invalidation' into maint * nd/find-pack-entry-recent-cache-invalidation: find_pack_entry(): do not keep packed_git pointer locally sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()	2012-02-21 14:56:36 -08:00
Junio C Hamano	c6a4e3f7a7	Merge branch 'mm/empty-loose-error-message' into maint * mm/empty-loose-error-message: fsck: give accurate error message on empty loose object files	2012-02-16 14:00:25 -08:00
Junio C Hamano	dd5253b4bd	Merge branch 'nd/find-pack-entry-recent-cache-invalidation' * nd/find-pack-entry-recent-cache-invalidation: find_pack_entry(): do not keep packed_git pointer locally sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry()	2012-02-12 22:43:03 -08:00
Junio C Hamano	8c18a6f3fa	Merge branch 'mm/empty-loose-error-message' * mm/empty-loose-error-message: fsck: give accurate error message on empty loose object files	2012-02-12 22:42:02 -08:00
Matthieu Moy	33e42de0d2	fsck: give accurate error message on empty loose object files Since `3ba7a06552` (A loose object is not corrupt if it cannot be read due to EMFILE), "git fsck" on a repository with an empty loose object file complains with the error message fatal: failed to read object <sha1>: Invalid argument This comes from a failure of mmap on this empty file, which sets errno to EINVAL. Instead of calling xmmap on empty file, we display a clean error message ourselves, and return a NULL pointer. The new message is error: object file .git/objects/09/<rest-of-sha1> is empty fatal: loose object <sha1> (stored in .git/objects/09/<rest-of-sha1>) is corrupt The second line was already there before the regression in `3ba7a06552`, and the first is an additional message, that should help diagnosing the problem for the user. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-06 11:05:36 -08:00
Nguyễn Thái Ngọc Duy	c01f51cc75	find_pack_entry(): do not keep packed_git pointer locally Commit `f7c22cc` (always start looking up objects in the last used pack first - 2007-05-30) introduce a static packed_git* pointer as an optimization. The kept pointer however may become invalid if free_pack_by_name() happens to free that particular pack. Current code base does not access packs after calling free_pack_by_name() so it should not be a problem. Anyway, move the pointer out so that free_pack_by_name() can reset it to avoid running into troubles in future. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 14:12:42 -08:00
Nguyễn Thái Ngọc Duy	95099731bf	sha1_file.c: move the core logic of find_pack_entry() into fill_pack_entry() The new helper function implements the logic to find the offset for the object in one pack and fill a pack_entry structure. The next patch will restructure the loop and will call the helper from two places. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 14:12:41 -08:00
Ævar Arnfjörð Bjarmason	ab1900a36e	Appease Sun Studio by renaming "tmpfile" On Solaris the system headers define the "tmpfile" name, which'll cause Git compiled with Sun Studio 12 Update 1 to whine about us redefining the name: "pack-write.c", line 76: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "sha1_file.c", line 2455: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "fast-import.c", line 858: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) "builtin/index-pack.c", line 175: warning: name redefined by pragma redefine_extname declared static: tmpfile (E_PRAGMA_REDEFINE_STATIC) Just renaming the "tmpfile" variable to "tmp_file" in the relevant places is the easiest way to fix this. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-21 10:21:04 -08:00
Junio C Hamano	48b303675a	Merge branch 'jc/stream-to-pack' * jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h	2011-12-16 22:33:40 -08:00
Junio C Hamano	df6246ed78	Merge branch 'nd/misc-cleanups' into maint * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-13 22:02:51 -08:00
Junio C Hamano	62cdb6b23a	Merge branch 'nd/misc-cleanups' * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-05 15:10:20 -08:00
Junio C Hamano	568508e765	bulk-checkin: replace fast-import based implementation This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-01 11:46:09 -08:00
Ramkumar Ramachandra	5e12e78e52	sha1_file: don't mix enum with int Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-15 16:09:20 -08:00
Junio C Hamano	ea4f9685cb	unpack_object_header_buffer(): clear the size field upon error The callers do not use the returned size when the function says it did not use any bytes and sets the type to OBJ_BAD, so this should not matter in practice, but it is a good code hygiene anyway. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-27 11:42:57 -07:00
Junio C Hamano	2070950633	Merge branch 'jk/maint-pack-objects-compete-with-delete' * jk/maint-pack-objects-compete-with-delete: downgrade "packfile cannot be accessed" errors to warnings pack-objects: protect against disappearing packs	2011-10-21 16:04:33 -07:00
Jeff King	58a6a9cc43	downgrade "packfile cannot be accessed" errors to warnings These can happen if another process simultaneously prunes a pack. But that is not usually an error condition, because a properly-running prune should have repacked the object into a new pack. So we will notice that the pack has disappeared unexpectedly, print a message, try other packs (possibly after re-scanning the list of packs), and find it in the new pack. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:43:09 -07:00
Jeff King	4c08018204	pack-objects: protect against disappearing packs It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since `c715f78`, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:42:37 -07:00
Junio C Hamano	e99f8c6dcf	Merge branch 'wh/normalize-alt-odb-path' * wh/normalize-alt-odb-path: sha1_file: normalize alt_odb path before comparing and storing	2011-10-05 12:36:22 -07:00
Hui Wang	5bdf0a8468	sha1_file: normalize alt_odb path before comparing and storing When it needs to compare and add an alt object path to the alt_odb_list, we normalize this path first since comparing normalized path is easy to get correct result. Use strbuf to replace some string operations, since it is cleaner and safer. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Hui Wang <Hui.Wang@windriver.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-07 11:47:43 -07:00
Junio C Hamano	2478bd8318	Merge branch 'jc/maint-clone-alternates' * jc/maint-clone-alternates: clone: clone from a repository with relative alternates clone: allow more than one --reference Conflicts: builtin/clone.c	2011-08-28 21:19:21 -07:00
Junio C Hamano	6fcb384869	Merge branch 'rt/zlib-smaller-window' * rt/zlib-smaller-window: test: consolidate definition of $LF Tolerate zlib deflation with window size < 32Kb	2011-08-23 15:40:33 -07:00
Junio C Hamano	e6baf4a1ae	clone: clone from a repository with relative alternates Cloning from a local repository blindly copies or hardlinks all the files under objects/ hierarchy. This results in two issues: - If the repository cloned has an "objects/info/alternates" file, and the command line of clone specifies --reference, the ones specified on the command line get overwritten by the copy from the original repository. - An entry in a "objects/info/alternates" file can specify the object stores it borrows objects from as a path relative to the "objects/" directory. When cloning a repository with such an alternates file, if the new repository is not sitting next to the original repository, such relative paths needs to be adjusted so that they can be used in the new repository. This updates add_to_alternates_file() to take the path to the alternate object store, including the "/objects" part at the end (earlier, it was taking the path to $GIT_DIR and was adding "/objects" itself), as it is technically possible to specify in objects/info/alternates file the path of a directory whose name does not end with "/objects". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-23 09:56:14 -07:00
Roberto Tyley	7f684a2aff	Tolerate zlib deflation with window size < 32Kb Git currently reports loose objects as 'corrupt' if they've been deflated using a window size less than 32Kb, because the experimental_loose_object() function doesn't recognise the header byte as a zlib header. This patch makes the function tolerant of all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice it's accuracy in distingushing the standard loose-object format from the experimental (now abandoned) format. On memory constrained systems zlib may use a much smaller window size - working on Agit, I found that Android uses a 4KB window; giving a header byte of 0x48, not 0x78. Consequently all loose objects generated appear 'corrupt', which is why Agit is a read-only Git client at this time - I don't want my client to generate Git repos that other clients treat as broken :( This patch makes Git tolerant of different deflate settings - it might appear that it changes experimental_loose_object() to the point where it could incorrectly identify the experimental format as the standard one, but the two criteria (bitmask & checksum) can only give a false result for an experimental object where both of the following are true: 1) object size is exactly 8 bytes when uncompressed (bitmask) 2) [single-byte in-pack git type&size header] * 256 + [1st byte of the following zlib header] % 31 = 0 (checksum) As it happens, for all possible combinations of valid object type (1-4) and window bits (0-7), the only time when the checksum will be divisible by 31 is for 0x1838 - ie object type 1, a Commit - which, due the fields all Commit objects must contain, could never be as small as 8 bytes in size. Given this, the combination of the two criteria (bitmask & checksum) always correctly determines the buffer format, and is more tolerant than the previous version. The alternative to this patch is simply removing support for the experimental format, which I am also totally cool with. References: Android uses a 4KB window for deflation: http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53 Code snippet searching for false positives with the zlib checksum: https://gist.github.com/1118177 Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-11 13:02:47 -07:00
Junio C Hamano	96790ca029	Merge branch 'jc/pack-order-tweak' * jc/pack-order-tweak: pack-objects: optimize "recency order" core: log offset pack data accesses happened	2011-08-05 14:54:57 -07:00
Junio C Hamano	d48929e1c3	Merge branch 'jc/legacy-loose-object' into maint * jc/legacy-loose-object: sha1_file.c: "legacy" is really the current format	2011-08-01 14:43:58 -07:00
Junio C Hamano	d907bf8ef3	Merge branch 'jc/index-pack' * jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c	2011-07-19 09:54:51 -07:00
Junio C Hamano	eb4f4076aa	Merge branch 'jc/zlib-wrap' * jc/zlib-wrap: zlib: allow feeding more than 4GB in one go zlib: zlib can only process 4GB at a time zlib: wrap deflateBound() too zlib: wrap deflate side of the API zlib: wrap inflateInit2 used to accept only for gzip format zlib: wrap remaining calls to direct inflate/inflateEnd zlib wrapper: refactor error message formatter Conflicts: sha1_file.c	2011-07-19 09:33:04 -07:00

1 2 3 4 5 ...

545 Commits