git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	0963008cbf	Merge branch 'nd/i18n-progress' Mark the progress indicators from various time-consuming commands for i18n/l10n. * nd/i18n-progress: i18n: mark all progress lines for translation	2014-03-14 14:26:31 -07:00
Nguyễn Thái Ngọc Duy	b790e0f67c	upload-pack: send shallow info over stdin to pack-objects Before `cdab485` (upload-pack: delegate rev walking in shallow fetch to pack-objects - 2013-08-16) upload-pack does not write to the source repository. `cdab485` starts to write $GIT_DIR/shallow_XXXXXX if it's a shallow fetch, so the source repo must be writable. git:// servers do not need write access to repos and usually don't have it, which means `cdab485` breaks shallow clone over git:// Instead of using a temporary file as the media for shallow points, we can send them over stdin to pack-objects as well. Prepend shallow SHA-1 with --shallow so pack-objects knows what is what. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-11 13:32:10 -07:00
Dmitry S. Dolzhenko	25e1940709	builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path() Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 14:44:11 -08:00
Sun He	5889271114	finish_tmp_packfile():use strbuf for pathname construction The old version fixes a maximum length on the buffer, which could be a problem if one is not certain of the length of get_object_directory(). Using strbuf can avoid the protential bug. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 12:15:10 -08:00
Junio C Hamano	2156a98045	Merge branch 'sh/write-pack-file-warning-message-fix' into sh/finish-tmp-packfile * sh/write-pack-file-warning-message-fix: write_pack_file: use correct variable in diagnostic	2014-03-03 12:13:20 -08:00
Sun He	0eea5a6e91	write_pack_file: use correct variable in diagnostic 'pack_tmp_name' is the subject of the utime() check, so report it in the warning, not the uninitialized 'tmpname' Signed-off-by: Sun He <sunheehnus@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 10:43:40 -08:00
Junio C Hamano	0f9e62e084	Merge branch 'jk/pack-bitmap' Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...	2014-02-27 14:01:48 -08:00
Nguyễn Thái Ngọc Duy	754dbc43f0	i18n: mark all progress lines for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:08:37 -08:00
Michael Haggerty	afc711b8e1	rename read_replace_refs to check_replace_refs The semantics of this flag was changed in commit `e1111cef23` inline lookup_replace_object() calls but wasn't renamed at the time to minimize code churn. Rename it now, and add a comment explaining its use. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-20 14:16:55 -08:00
Vicent Marti	ae4f07fbcc	pack-bitmap: implement optional name_hash cache When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Vicent Marti	7cc8f97108	pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Vicent Marti	6b8fda2db1	pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Jeff King	ce2bc42456	pack-objects: split add_object_entry This function actually does three things: 1. Check whether we've already added the object to our packing list. 2. Check whether the object meets our criteria for adding. 3. Actually add the object to our packing list. It's a little hard to see these three phases, because they happen linearly in the rather long function. Instead, this patch breaks them up into three separate helper functions. The result is a little easier to follow, though it unfortunately suffers from some optimization interdependencies between the stages (e.g., during step 3 we use the packing list index from step 1 and the packfile information from step 2). More importantly, though, the various parts can be composed differently, as they will be in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Jeff King	9af270e8c2	do not pretend sha1write returns errors The sha1write function returns an int, but it will always be "0". The failure-prone parts of the function happen in the "flush" callback, which cannot pass an error back to us. So we just end up calling die() during the flush. Let's just drop the return value altogether, as it only confuses callers into thinking that it might be useful. Only one call site actually checked the return value. We can drop that check, since it just led to a die() anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-26 11:50:20 -08:00
Christian Couder	5955654823	replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-05 14:13:21 -08:00
Vicent Marti	68fb36eb92	pack-objects: factor out name_hash As the pack-objects system grows beyond the single pack-objects.c file, more parts (like the soon-to-exist bitmap code) will need to compute hashes for matching deltas. Factor out name_hash to make it available to other files. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:52 -07:00
Vicent Marti	2834bc27c1	pack-objects: refactor the packing list The hash table that stores the packing list for a given `pack-objects` run was tightly coupled to the pack-objects code. In this commit, we refactor the hash table and the underlying storage array into a `packing_data` struct. The functionality for accessing and adding entries to the packing list is hence accessible from other parts of Git besides the `pack-objects` builtin. This refactoring is a requirement for further patches in this series that will require accessing the commit packing list from outside of `pack-objects`. The hash table implementation has been minimally altered: we now use table sizes which are always a power of two, to ensure a uniform index distribution in the array. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:48 -07:00
Junio C Hamano	eeb8e8373f	Merge branch 'jc/pack-objects' * jc/pack-objects: pack-objects: shrink struct object_entry	2013-10-23 13:21:26 -07:00
Junio C Hamano	238504b014	Merge branch 'nd/fetch-into-shallow' When there is no sufficient overlap between old and new history during a fetch into a shallow repository, we unnecessarily sent objects the sending side knows the receiving end has. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c	2013-09-20 12:25:32 -07:00
Nguyễn Thái Ngọc Duy	e76a5fb459	list-objects: reduce one argument in mark_edges_uninteresting mark_edges_uninteresting() is always called with this form mark_edges_uninteresting(revs->commits, revs, ...); Remove the first argument and let mark_edges_uninteresting figure that out by itself. It helps answer the question "are this commit list and revs related in any way?" when looking at mark_edges_uninteresting implementation. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-28 11:54:18 -07:00
Brandon Casey	7c3ecb3254	Don't close pack fd when free'ing pack windows Now that close_one_pack() has been introduced to handle file descriptor pressure, it is not strictly necessary to close the pack file descriptor in unuse_one_window() when we're under memory pressure. Jeff King provided a justification for leaving the pack file open: If you close packfile descriptors, you can run into racy situations where somebody else is repacking and deleting packs, and they go away while you are trying to access them. If you keep a descriptor open, you're fine; they last to the end of the process. If you don't, then they disappear from under you. For normal object access, this isn't that big a deal; we just rescan the packs and retry. But if you are packing yourself (e.g., because you are a pack-objects started by upload-pack for a clone or fetch), it's much harder to recover (and we print some warnings). Let's do so (or uh, not do so). Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 09:27:26 -07:00
Junio C Hamano	63cdcfa40f	pack-objects: shrink struct object_entry Turn some boolean fields into bitfields and use uint32_t for name hash. This shrinks the size of the structure from 128 bytes to 120 bytes. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-04 15:23:35 -08:00
Jeff King	315ea32f1b	Merge branch 'jk/peel-ref' Speeds up "git upload-pack" (what is invoked by "git fetch" on the other side of the connection) by reducing the cost to advertise the branches and tags that are available in the repository. * jk/peel-ref: upload-pack: use peel_ref for ref advertisements peel_ref: check object type before loading peel_ref: do not return a null sha1 peel_ref: use faster deref_tag_noverify	2012-10-25 06:42:27 -04:00
Jeff King	e6dbffa67b	peel_ref: do not return a null sha1 The idea of the peel_ref function is to dereference tag objects recursively until we hit a non-tag, and return the sha1. Conceptually, it should return 0 if it is successful (and fill in the sha1), or -1 if there was nothing to peel. However, the current behavior is much more confusing. For a regular loose ref, the behavior is as described above. But there is an optimization to reuse the peeled-ref value for a ref that came from a packed-refs file. If we have such a ref, we return its peeled value, even if that peeled value is null (indicating that we know the ref definitely does _not_ peel). It might seem like such information is useful to the caller, who would then know not to bother loading and trying to peel the object. Except that they should not bother loading and trying to peel the object _anyway_, because that fallback is already handled by peel_ref. In other words, the whole point of calling this function is that it handles those details internally, and you either get a sha1, or you know that it is not peel-able. This patch catches the null sha1 case internally and converts it into a -1 return value (i.e., there is nothing to peel). This simplifies callers, which do not need to bother checking themselves. Two callers are worth noting: - in pack-objects, a comment indicates that there is a difference between non-peelable tags and unannotated tags. But that is not the case (before or after this patch). Whether you get a null sha1 has to do with internal details of how peel_ref operated. - in show-ref, if peel_ref returns a failure, the caller tries to decide whether to try peeling manually based on whether the REF_ISPACKED flag is set. But this doesn't make any sense. If the flag is set, that does not necessarily mean the ref came from a packed-refs file with the "peeled" extension. But it doesn't matter, because even if it didn't, there's no point in trying to peel it ourselves, as peel_ref would already have done so. In other words, the fallback peeling is guaranteed to fail. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-10-04 20:34:28 -07:00
Nguyễn Thái Ngọc Duy	4c6881204b	i18n: pack-objects: mark parseopt strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-20 12:23:18 -07:00
Junio C Hamano	0958a24d73	Merge branch 'jc/sha1-name-more' Teaches the object name parser things like a "git describe" output is always a commit object, "A" in "git log A" must be a committish, and "A" and "B" in "git log A...B" both must be committish, etc., to prolong the lifetime of abbreviated object names. * jc/sha1-name-more: (27 commits) t1512: match the "other" object names t1512: ignore whitespaces in wc -l output rev-parse --disambiguate=<prefix> rev-parse: A and B in "rev-parse A..B" refer to committish reset: the command takes committish commit-tree: the command wants a tree and commits apply: --build-fake-ancestor expects blobs sha1_name.c: add support for disambiguating other types revision.c: the "log" family, except for "show", takes committish revision.c: allow handle_revision_arg() to take other flags sha1_name.c: introduce get_sha1_committish() sha1_name.c: teach lookup context to get_sha1_with_context() sha1_name.c: many short names can only be committish sha1_name.c: get_sha1_1() takes lookup flags sha1_name.c: get_describe_name() by definition groks only commits sha1_name.c: teach get_short_sha1() a commit-only option sha1_name.c: allow get_short_sha1() to take other flags get_sha1(): fix error status regression sha1_name.c: restructure disambiguation of short names sha1_name.c: correct misnamed "canonical" and "res" ...	2012-07-22 12:55:07 -07:00
Junio C Hamano	8e676e8ba5	revision.c: allow handle_revision_arg() to take other flags The existing "cant_be_filename" that tells the function that the caller knows the arg is not a path (hence it does not have to be checked for absense of the file whose name matches it) is made into a bit in the flag word. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-09 16:42:22 -07:00
Nguyễn Thái Ngọc Duy	cf2ba13ac6	pack-objects: use streaming interface for reading large loose blobs git usually streams large blobs directly to packs. But there are cases where git can create large loose blobs (unpack-objects or hash-object over pipe). Or they can come from other git implementations. core.bigfilethreshold can also be lowered down and introduce a new wave of large loose blobs. Use streaming interface to read/compress/write these blobs in one go. Fall back to normal way if somehow streaming interface cannot be used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-29 10:50:56 -07:00
Nguyễn Thái Ngọc Duy	c9018b0305	pack-objects: refactor write_object() into helper functions The function first decides if we want to copy data taken from existing pack verbatim or we want to encode the data ourselves for the packfile we are creating and then carries out the decision. Separate the latter phase into two helper functions, one for the case the data is reused, the other for the case the data is produced anew. A little twist is that it can later turn out that we cannot reuse the data after we initially decide to do so; in such a case, the "reuse" helper makes a call to "generate" helper. It is easier to follow than the current fallback code that uses "goto" inside a single large function. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-18 14:22:15 -07:00
Nguyễn Thái Ngọc Duy	754980d023	pack-objects, streaming: turn "xx >= big_file_threshold" to ".. > .." This is because all other places do "xx > big_file_threshold" Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-18 14:21:19 -07:00
Jeff King	7e52f5660e	gc: do not explode objects which will be immediately pruned When we pack everything into one big pack with "git repack -Ad", any unreferenced objects in to-be-deleted packs are exploded into loose objects, with the intent that they will be examined and possibly cleaned up by the next run of "git prune". Since the exploded objects will receive the mtime of the pack from which they come, if the source pack is old, those loose objects will end up pruned immediately. In that case, it is much more efficient to skip the exploding step entirely for these objects. This patch teaches pack-objects to receive the expiration information and avoid writing these objects out. It also teaches "git gc" to pass the value of gc.pruneexpire to repack (which in turn learns to pass it along to pack-objects) so that this optimization happens automatically during "git gc" and "git gc --auto". Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-11 11:09:49 -07:00
Michał Kiedrowicz	2b34e486bc	pack-objects: Fix compilation with NO_PTHREDS It looks like commit `99fb6e04` (pack-objects: convert to use parse_options(), 2012-02-01) moved the #ifdef NO_PTHREDS around but hasn't noticed that the 'arg' variable no longer is available. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-26 17:46:00 -08:00
Nguyễn Thái Ngọc Duy	99fb6e04cb	pack-objects: convert to use parse_options() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:05:00 -08:00
Nguyễn Thái Ngọc Duy	3a2ec52e99	pack-objects: remove bogus comment The comment was introduced in `b5d97e6` (pack-objects: run rev-list equivalent internally. - 2006-09-04), stating that git pack-objects [options] base-name <refs...> is acceptable and refs should be passed into rev-list. But that's not true. All arguments after base-name are ignored. Remove the comment and reject this syntax (i.e. no more arguments after base name) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:04:11 -08:00
Nguyễn Thái Ngọc Duy	6a301345a5	pack-objects: do not accept "--index-version=version," Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:03:46 -08:00
Junio C Hamano	c4a01a3cbb	Merge branch 'maint' * maint: Update draft release notes to 1.7.8.4 Update draft release notes to 1.7.7.6 Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:33:39 -08:00
Junio C Hamano	5a6a939481	Merge branch 'maint-1.7.7' into maint * maint-1.7.7: Update draft release notes to 1.7.7.6 Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:31:46 -08:00
Junio C Hamano	901c907d83	Merge branch 'maint-1.7.6' into maint-1.7.7 * maint-1.7.6: Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:31:05 -08:00
Jeff King	15f07e061e	thin-pack: try harder to use preferred base objects as base When creating a pack using objects that reside in existing packs, we try to avoid recomputing futile delta between an object (trg) and a candidate for its base object (src) if they are stored in the same packfile, and trg is not recorded as a delta already. This heuristics makes sense because it is likely that we tried to express trg as a delta based on src but it did not produce a good delta when we created the existing pack. As the pack heuristics prefer producing delta to remove data, and Linus's law dictates that the size of a file grows over time, we tend to record the newest version of the file as inflated, and older ones as delta against it. When creating a thin-pack to transfer recent history, it is likely that we will try to send an object that is recorded in full, as it is newer. But the heuristics to avoid recomputing futile delta effectively forbids us from attempting to express such an object as a delta based on another object. Sending an object in full is often more expensive than sending a suboptimal delta based on other objects, and it is even more so if we could use an object we know the receiving end already has (i.e. preferred base object) as the delta base. Tweak the recomputation avoidance logic, so that we do not punt on computing delta against a preferred base object. The effect of this change can be seen on two simulated upload-pack workloads. The first is based on 44 reflog entries from my git.git origin/master reflog, and represents the packs that kernel.org sent me git updates for the past month or two. The second workload represents much larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to v1.2.0, and so on. The table below shows the average generated pack size and the average CPU time consumed for each dataset, both before and after the patch: dataset \| reflog \| tags --------------------------------- before \| 53358 \| 2750977 size after \| 32398 \| 2668479 change \| -39% \| -3% --------------------------------- before \| 0.18 \| 1.12 CPU after \| 0.18 \| 1.15 change \| +0% \| +3% This patch makes a much bigger difference for packs with a shorter slice of history (since its effect is seen at the boundaries of the pack) though it has some benefit even for larger packs. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-01-12 23:06:20 -08:00
Junio C Hamano	48b303675a	Merge branch 'jc/stream-to-pack' * jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h	2011-12-16 22:33:40 -08:00
Junio C Hamano	2e8722fc9e	Merge branch 'jc/maint-pack-object-cycle' into maint * jc/maint-pack-object-cycle: pack-object: tolerate broken packs that have duplicated objects Conflicts: builtin/pack-objects.c	2011-12-13 22:04:50 -08:00
Junio C Hamano	df6246ed78	Merge branch 'nd/misc-cleanups' into maint * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-13 22:02:51 -08:00
Junio C Hamano	cddec4f8ae	Merge branch 'jc/maint-pack-object-cycle' * jc/maint-pack-object-cycle: pack-object: tolerate broken packs that have duplicated objects Conflicts: builtin/pack-objects.c	2011-12-05 15:19:34 -08:00
Junio C Hamano	62cdb6b23a	Merge branch 'nd/misc-cleanups' * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-05 15:10:20 -08:00
Junio C Hamano	568508e765	bulk-checkin: replace fast-import based implementation This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-01 11:46:09 -08:00
Junio C Hamano	f63c79dbc8	pack-object: tolerate broken packs that have duplicated objects When --reuse-delta is in effect (which is the default), and an existing pack in the repository has the same object registered twice (e.g. one copy in a non-delta format and the other copy in a delta against some other object), an attempt to repack the repository can result in a cyclic delta dependency, causing write_one() function to infinitely recurse into itself. Detect such a case and break the loopy dependency by writing out an object that is involved in such a loop in the non-delta format. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-16 22:06:08 -08:00
Junio C Hamano	84a9ea90e1	Merge branch 'dm/pack-objects-update' * dm/pack-objects-update: pack-objects: don't traverse objects unnecessarily pack-objects: rewrite add_descendants_to_write_order() iteratively pack-objects: use unsigned int for counter and offset values pack-objects: mark add_to_write_order() as inline	2011-11-01 15:20:07 -07:00
Junio C Hamano	0e990530ae	finish_tmp_packfile(): a helper function Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c. This changes the order of finishing multi-pack generation slightly. The code used to - adjust shared perm of temporary packfile - rename temporary packfile to the final name - update mtime of the packfile under the final name - adjust shared perm of temporary idxfile - rename temporary idxfile to the final name but because the helper does not want to do the mtime thing, the updated code does that step first and then all the rest. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-28 12:34:09 -07:00
Junio C Hamano	cdf9db3c83	create_tmp_packfile(): a helper function Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-28 11:52:14 -07:00
Junio C Hamano	c0ad465725	write_pack_header(): a helper function Factor out a small logic out of the private write_pack_file() function in builtin/pack-objects.c Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-28 11:40:48 -07:00
Nguyễn Thái Ngọc Duy	0de1633783	tree-walk.c: do not leak internal structure in tree_entry_len() tree_entry_len() does not simply take two random arguments and return a tree length. The two pointers must point to a tree item structure, or struct name_entry. Passing random pointers will return incorrect value. Force callers to pass struct name_entry instead of two pointers (with hope that they don't manually construct struct name_entry themselves) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-27 11:08:26 -07:00
Junio C Hamano	2070950633	Merge branch 'jk/maint-pack-objects-compete-with-delete' * jk/maint-pack-objects-compete-with-delete: downgrade "packfile cannot be accessed" errors to warnings pack-objects: protect against disappearing packs	2011-10-21 16:04:33 -07:00
Dan McGee	38d4debb6d	pack-objects: don't traverse objects unnecessarily This brings back some of the performance lost in optimizing recency order inside pack objects. We were doing extreme amounts of object re-traversal: for the 2.14 million objects in the Linux kernel repository, we were calling add_to_write_order() over 1.03 billion times (a 0.2% hit rate, making 99.8% of of these calls extraneous). Two optimizations take place here- we can start our objects array iteration from a known point where we left off before we started trying to find our tags, and we don't need to do the deep dives required by add_family_to_write_order() if the object has already been marked as filled. These two optimizations bring some pretty spectacular results via `perf stat`: task-clock: 83373 ms --> 43800 ms (50% faster) cycles: 221,633,461,676 --> 116,307,209,986 (47% fewer) instructions: 149,299,179,939 --> 122,998,800,184 (18% fewer) Helped-by: Ramsay Jones (format string fix in "die" message) Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-20 17:17:49 -07:00
Dan McGee	f380872f0a	pack-objects: rewrite add_descendants_to_write_order() iteratively This removes the need to call this function recursively, shinking the code size slightly and netting a small performance increase. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-18 00:16:32 -07:00
Dan McGee	92bef1a14a	pack-objects: use unsigned int for counter and offset values This is done in some of the new pack layout code introduced in commit `1b4bb16b9e`. This more closely matches the nr_objects global that is unsigned that these variables are based off of and bounded by. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-18 00:16:32 -07:00
Dan McGee	be12681896	pack-objects: mark add_to_write_order() as inline This function is a whole 26 bytes when compiled on x86_64, but is currently invoked over 1.037 billion times when running pack-objects on the Linux kernel git repository. This is hitting the point where micro-optimizations do make a difference, and inlining it only increases the object file size by 38 bytes. As reported by perf, this dropped task-clock from 84183 to 83373 ms, and total cycles from 223.5 billion to 221.6 billion. Not astronomical, but worth getting for adding one word. Signed-off-by: Dan McGee <dpmcgee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-18 00:16:31 -07:00
Jeff King	58a6a9cc43	downgrade "packfile cannot be accessed" errors to warnings These can happen if another process simultaneously prunes a pack. But that is not usually an error condition, because a properly-running prune should have repacked the object into a new pack. So we will notice that the pack has disappeared unexpectedly, print a message, try other packs (possibly after re-scanning the list of packs), and find it in the new pack. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:43:09 -07:00
Jeff King	4c08018204	pack-objects: protect against disappearing packs It's possible that while pack-objects is running, a simultaneously running prune process might delete a pack that we are interested in. Because we load the pack indices early on, we know that the pack contains our item, but by the time we try to open and map it, it is gone. Since `c715f78`, we already protect against this in the normal object access code path, but pack-objects accesses the packs at a lower level. In the normal access path, we call find_pack_entry, which will call find_pack_entry_one on each pack index, which does the actual lookup. If it gets a hit, we will actually open and verify the validity of the matching packfile (using c715f78's is_pack_valid). If we can't open it, we'll issue a warning and pretend that we didn't find it, causing us to go on to the next pack (or on to loose objects). Furthermore, we will cache the descriptor to the opened packfile. Which means that later, when we actually try to access the object, we are likely to still have that packfile opened, and won't care if it has been unlinked from the filesystem. Notice the "likely" above. If there is another pack access in the interim, and we run out of descriptors, we could close the pack. And then a later attempt to access the closed pack could fail (we'll try to re-open it, of course, but it may have been deleted). In practice, this doesn't happen because we tend to look up items and then access them immediately. Pack-objects does not follow this code path. Instead, it accesses the packs at a much lower level, using find_pack_entry_one directly. This means we skip the is_pack_valid check, and may end up with the name of a packfile, but no open descriptor. We can add the same is_pack_valid check here. Unfortunately, the access patterns of pack-objects are not quite as nice for keeping lookup and object access together. We look up each object as we find out about it, and the only later when writing the packfile do we necessarily access it. Which means that the opened packfile may be closed in the interim. In practice, however, adding this check still has value, for three reasons. 1. If you have a reasonable number of packs and/or a reasonable file descriptor limit, you can keep all of your packs open simultaneously. If this is the case, then the race is impossible to trigger. 2. Even if you can't keep all packs open at once, you may end up keeping the deleted one open (i.e., you may get lucky). 3. The race window is shortened. You may notice early that the pack is gone, and not try to access it. Triggering the problem without this check means deleting the pack any time after we read the list of index files, but before we access the looked-up objects. Triggering it with this check means deleting the pack means deleting the pack after we do a lookup (and successfully access the packfile), but before we access the object. Which is a smaller window. Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-10-14 11:42:37 -07:00
Junio C Hamano	2e2e7e9dd0	Merge branch 'jc/fetch-verify' * jc/fetch-verify: fetch: verify we have everything we need before updating our ref rev-list --verify-object list-objects: pass callback data to show_objects()	2011-10-05 12:36:20 -07:00
Junio C Hamano	4947367267	list-objects: pass callback data to show_objects() The traverse_commit_list() API takes two callback functions, one to show commit objects, and the other to show other kinds of objects. Even though the former has a callback data parameter, so that the callback does not have to rely on global state, the latter does not. Give the show_objects() callback the same callback data parameter. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-09-01 15:46:12 -07:00
Junio C Hamano	324b6b1678	Merge branch 'mh/check-attr-relative' * mh/check-attr-relative: (29 commits) test-path-utils: Add subcommand "prefix_path" test-path-utils: Add subcommand "absolute_path" git-check-attr: Normalize paths git-check-attr: Demonstrate problems with relative paths git-check-attr: Demonstrate problems with unnormalized paths git-check-attr: test that no output is written to stderr Rename git_checkattr() to git_check_attr() git-check-attr: Fix command-line handling to match docs git-check-attr: Drive two tests using the same raw data git-check-attr: Add an --all option to show all attributes git-check-attr: Error out if no pathnames are specified git-check-attr: Process command-line args more systematically git-check-attr: Handle each error separately git-check-attr: Extract a function error_with_usage() git-check-attr: Introduce a new variable git-check-attr: Extract a function output_attr() Allow querying all attributes on a file Remove redundant check Remove redundant call to bootstrap_attr_stack() Extract a function collect_all_attrs() ...	2011-08-17 17:36:22 -07:00
Junio C Hamano	96790ca029	Merge branch 'jc/pack-order-tweak' * jc/pack-order-tweak: pack-objects: optimize "recency order" core: log offset pack data accesses happened	2011-08-05 14:54:57 -07:00
Michael Haggerty	d932f4eb9f	Rename git_checkattr() to git_check_attr() Suggested by: Junio Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-04 15:53:21 -07:00
Junio C Hamano	d907bf8ef3	Merge branch 'jc/index-pack' * jc/index-pack: verify-pack: use index-pack --verify index-pack: show histogram when emulating "verify-pack -v" index-pack: start learning to emulate "verify-pack -v" index-pack: a miniscule refactor index-pack --verify: read anomalous offsets from v2 idx file write_idx_file: need_large_offset() helper function index-pack: --verify write_idx_file: introduce a struct to hold idx customization options index-pack: group the delta-base array entries also by type Conflicts: builtin/verify-pack.c cache.h sha1_file.c	2011-07-19 09:54:51 -07:00
Junio C Hamano	1b4bb16b9e	pack-objects: optimize "recency order" This optimizes the "recency order" (see pack-heuristics.txt in Documentation/technical/ directory) used to order objects within a packfile in three ways: - Commits at the tip of tags are written together, in the hope that revision traversal done in incremental fetch (which starts by putting them in a revision queue marked as UNINTERESTING) will see a better locality of these objects; - In the original recency order, trees and blobs are intermixed. Write trees together before blobs, in the hope that this will improve locality when running pathspec-limited revision traversal, i.e. "git log paths..."; - When writing blob objects out, write the whole family of blobs that use the same delta base object together, by starting from the root of the delta chain, and writing its immediate children in a width-first manner, in the hope that this will again improve locality when reading blobs that belong to the same path, which are likely to be deltified against each other. I tried various workloads in the Linux kernel repositories (HEAD at v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how large seeks are needed between adjacent accesses to objects in the pack, and the result looks promising. The history has 2072052 objects, weighing some 490MiB. * Simple commit-only log. $ git log >/dev/null There are 254656 commits in total. v1.7.6 with patch Total number of access : 258,031 258,032 0.0% percentile : 12 12 10.0% percentile : 259 259 20.0% percentile : 294 294 30.0% percentile : 326 326 40.0% percentile : 363 363 50.0% percentile : 415 415 60.0% percentile : 513 513 70.0% percentile : 857 858 80.0% percentile : 10,434 10,441 90.0% percentile : 91,985 91,996 95.0% percentile : 260,852 260,885 99.0% percentile : 1,150,680 1,152,811 99.9% percentile : 3,148,435 3,148,435 Less than 2MiB seek: 99.70% 99.69% 95% of the pack accesses look at data that is no further than 260kB from the previous location we accessed. The patch does not change the order of commit objects very much, and the result is very similar. * Pathspec-limited log. $ git log drivers/net >/dev/null The path is touched by 26551 commits and merges (among 254656 total). v1.7.6 with patch Total number of access : 559,511 558,663 0.0% percentile : 0 0 10.0% percentile : 182 167 20.0% percentile : 259 233 30.0% percentile : 357 304 40.0% percentile : 714 485 50.0% percentile : 5,046 3,976 60.0% percentile : 688,671 443,578 70.0% percentile : 319,574,732 110,370,100 80.0% percentile : 361,647,599 123,707,229 90.0% percentile : 393,195,669 128,947,636 95.0% percentile : 405,496,875 131,609,321 99.0% percentile : 412,942,470 133,078,115 99.5% percentile : 413,172,266 133,163,349 99.9% percentile : 413,354,356 133,240,445 Less than 2MiB seek: 61.71% 62.87% With the current pack heuristics, more than 30% of accesses have to seek further than 300MB; the updated pack heuristics ensures that less than 0.1% of accesses have to seek further than 135MB. This is largely due to the fact that the updated heuristics does not mix blobs and trees together. * Blame. $ git blame drivers/net/ne.c >/dev/null The path is touched by 34 commits and merges. v1.7.6 with patch Total number of access : 178,147 178,166 0.0% percentile : 0 0 10.0% percentile : 142 139 20.0% percentile : 222 194 30.0% percentile : 373 300 40.0% percentile : 1,168 837 50.0% percentile : 11,248 7,334 60.0% percentile : 305,121,284 106,850,130 70.0% percentile : 361,427,854 123,709,715 80.0% percentile : 388,127,343 128,171,047 90.0% percentile : 399,987,762 130,200,707 95.0% percentile : 408,230,673 132,174,308 99.0% percentile : 412,947,017 133,181,160 99.5% percentile : 413,312,798 133,220,425 99.9% percentile : 413,352,366 133,269,051 Less than 2MiB seek: 56.47% 56.83% The result is very similar to the pathspec-limited log above, which only looks at the tree objects. * Packing recent history. $ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) \| git pack-objects --revs --stdout >/dev/null This should pack data worth 71 commits. v1.7.6 with patch Total number of access : 11,511 11,514 0.0% percentile : 0 0 10.0% percentile : 48 47 20.0% percentile : 134 98 30.0% percentile : 332 178 40.0% percentile : 1,386 293 50.0% percentile : 8,030 478 60.0% percentile : 33,676 1,195 70.0% percentile : 147,268 26,216 80.0% percentile : 9,178,662 464,598 90.0% percentile : 67,922,665 965,782 95.0% percentile : 87,773,251 1,226,102 99.0% percentile : 98,011,763 1,932,377 99.5% percentile : 100,074,427 33,642,128 99.9% percentile : 105,336,398 275,772,650 Less than 2MiB seek: 77.09% 99.04% The long-tail part of the result looks worse with the patch, but the change helps majority of the access. 99.04% of the accesses need less than 2MiB of seeking, compared to 77.09% with the current packing heuristics. * Index pack. $ git index-pack -v .git/objects/pack/pack*.pack v1.7.6 with patch Total number of access : 2,791,228 2,788,802 0.0% percentile : 9 9 10.0% percentile : 140 89 20.0% percentile : 233 167 30.0% percentile : 322 235 40.0% percentile : 464 310 50.0% percentile : 862 423 60.0% percentile : 2,566 686 70.0% percentile : 25,827 1,498 80.0% percentile : 1,317,862 4,971 90.0% percentile : 11,926,385 119,398 95.0% percentile : 41,304,149 952,519 99.0% percentile : 227,613,070 6,709,650 99.5% percentile : 321,265,121 11,734,871 99.9% percentile : 382,919,785 33,155,191 Less than 2MiB seek: 81.73% 96.92% As the index-pack command already walks objects in the delta chain order, writing the blobs out in the delta chain order seems to drastically improve the locality of access. Note that a half-a-gigabyte packfile comfortably fits in the buffer cache, and you would unlikely to see much performance difference on a modern and reasonably beefy machine with enough memory and local disks. Benchmarking with cold cache (or over NFS) would be interesting. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-07-08 10:03:24 -07:00
Junio C Hamano	ef49a7a012	zlib: zlib can only process 4GB at a time The size of objects we read from the repository and data we try to put into the repository are represented in "unsigned long", so that on larger architectures we can handle objects that weigh more than 4GB. But the interface defined in zlib.h to communicate with inflate/deflate limits avail_in (how many bytes of input are we calling zlib with) and avail_out (how many bytes of output from zlib are we ready to accept) fields effectively to 4GB by defining their type to be uInt. In many places in our code, we allocate a large buffer (e.g. mmap'ing a large loose object file) and tell zlib its size by assigning the size to avail_in field of the stream, but that will truncate the high octets of the real size. The worst part of this story is that we often pass around z_stream (the state object used by zlib) to keep track of the number of used bytes in input/output buffer by inspecting these two fields, which practically limits our callchain to the same 4GB limit. Wrap z_stream in another structure git_zstream that can express avail_in and avail_out in unsigned long. For now, just die() when the caller gives a size that cannot be given to a single zlib call. In later patches in the series, we would make git_inflate() and git_deflate() internally loop to give callers an illusion that our "improved" version of zlib interface can operate on a buffer larger than 4GB in one go. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:52:15 -07:00
Junio C Hamano	225a6f1068	zlib: wrap deflateBound() too Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:18:17 -07:00
Junio C Hamano	55bb5c9147	zlib: wrap deflate side of the API Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use of deflateInit2 in remote-curl.c to tell the library to use gzip header and trailer in git_deflate_init_gzip(). There is only one caller that cares about the status from deflateEnd(). Introduce git_deflate_end_gently() to let that sole caller retrieve the status and act on it (i.e. die) for now, but we would probably want to make inflate_end/deflate_end die when they ran out of memory and get rid of the _gently() kind. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-06-10 11:10:29 -07:00
Junio C Hamano	15366280c2	Teach core.bigfilethreashold to pack-objects The pack-objects command should take notice of the object file and refrain from attempting to delta large ones, to be consistent with the fast-import command. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-04-05 20:25:49 -07:00
Junio C Hamano	ebcfb3791a	write_idx_file: introduce a struct to hold idx customization options Remove two globals, pack_idx_default version and pack_idx_off32_limit, and place them in a pack_idx_option structure. Allow callers to pass it to write_idx_file() as a parameter. Adjust all callers to the API change. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-27 23:29:03 -08:00
Junio C Hamano	b361888dd5	thread-utils.h: simplify the inclusion All files that include this header file use the same four line incantation: #ifndef NO_PTHREADS #include <pthread.h> #include "thread-utils.h" #endif Move the responsibility for that gymnastics to the header file from the files that include it. This approach makes it easier to later declare new services that are related to threading in thread-utils.h and have them available to all the threading code. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-12-10 12:58:06 -08:00
Junio C Hamano	39f04dbaac	Merge branch 'jn/thinner-wrapper' * jn/thinner-wrapper: Remove pack file handling dependency from wrapper.o pack-objects: mark file-local variable static wrapper: give zlib wrappers their own translation unit strbuf: move strbuf_branchname to sha1_name.c path helpers: move git_mkstemp* to wrapper.c wrapper: move odb_* to environment.c wrapper: move xmmap() to sha1_file.c	2010-12-03 16:13:06 -08:00
Jonathan Nieder	bc9b21755e	pack-objects: mark file-local variable static old_try_to_free_routine is not meant for use from other files. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-11-10 11:08:04 -08:00
Nicolas Pitre	71064a956b	make pack-objects a bit more resilient to repo corruption Right now, packing valid objects could fail when creating a thin pack simply because a pack edge object used as a preferred base is corrupted. Since preferred base objects are not strictly needed to produce a valid pack, let's not consider the inability to read them as a fatal error. Delta compression may well be attempted against other objects in the search window. To avoid warning storms (we are in the inner loop of the delta search window) a warning is emitted only on the first occurrence. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-22 14:59:58 -07:00
Štěpán Němec	884220653f	Put a space between `<' and argument in pack-objects usage string This makes it cosistent with other places (including the git-pack-objects(1) manpage itself) and avoids possible confusion (I, for one, mistook `<object-list' for a `<object-list>' typo at first when preparing this series). Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-08 12:31:08 -07:00
Štěpán Němec	0adda9362a	Use parentheses and `...' where appropriate Remove some stray usage of other bracket types and asterisks for the same purpose. Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-08 12:31:07 -07:00
Štěpán Němec	62b4698e55	Use angles for placeholders consistently Signed-off-by: Štěpán Němec <stepnem@gmail.com> Acked-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-08 12:29:52 -07:00
Erik Faye-Lund	c03c83152d	do not depend on signed integer overflow Signed integer overflow is not defined in C, so do not depend on it. This fixes a problem with GCC 4.4.0 and -O3 where the optimizer would consider "consumed_bytes > consumed_bytes + bytes" as a constant expression, and never execute the die()-call. Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-10-06 11:10:07 -07:00
Johannes Schindelin	8695353147	Fix typo in pack-objects' usage Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-09-30 12:22:02 -07:00
Junio C Hamano	44e08b003d	Merge branch 'js/try-to-free-stackable' * js/try-to-free-stackable: Do not call release_pack_memory in malloc wrappers when GIT_TRACE is used Have set_try_to_free_routine return the previous routine	2010-06-13 11:21:21 -07:00
Junio C Hamano	ea5f75a64a	Merge branch 'np/malloc-threading' * np/malloc-threading: Thread-safe xmalloc and xrealloc needs a recursive mutex Make xmalloc and xrealloc thread-safe	2010-05-21 04:02:16 -07:00
Junio C Hamano	2e0e8b68e3	Merge branch 'lt/deepen-builtin-source' * lt/deepen-builtin-source: Move 'builtin-*' into a 'builtin/' subdirectory Conflicts: Makefile	2010-03-10 15:25:18 -08:00
Linus Torvalds	81b50f3ce4	Move 'builtin-' into a 'builtin/' subdirectory This shrinks the top-level directory a bit, and makes it much more pleasant to use auto-completion on the thing. Instead of [torvalds@nehalem git]$ em buil<tab> Display all 180 possibilities? (y or n) [torvalds@nehalem git]$ em builtin-sh builtin-shortlog.c builtin-show-branch.c builtin-show-ref.c builtin-shortlog.o builtin-show-branch.o builtin-show-ref.o [torvalds@nehalem git]$ em builtin-shor<tab> builtin-shortlog.c builtin-shortlog.o [torvalds@nehalem git]$ em builtin-shortlog.c you get [torvalds@nehalem git]$ em buil<tab> [type] builtin/ builtin.h [torvalds@nehalem git]$ em builtin [auto-completes to] [torvalds@nehalem git]$ em builtin/sh<tab> [type] shortlog.c shortlog.o show-branch.c show-branch.o show-ref.c show-ref.o [torvalds@nehalem git]$ em builtin/sho [auto-completes to] [torvalds@nehalem git]$ em builtin/shor<tab> [type] shortlog.c shortlog.o [torvalds@nehalem git]$ em builtin/shortlog.c which doesn't seem all that different, but not having that annoying break in "Display all 180 possibilities?" is quite a relief. NOTE! If you do this in a clean tree (no object files etc), or using an editor that has auto-completion rules that ignores '.o' files, you won't see that annoying 'Display all 180 possibilities?' message - it will just show the choices instead. I think bash has some cut-off around 100 choices or something. So the reason I see this is that I'm using an odd editory, and thus don't have the rules to cut down on auto-completion. But you can simulate that by using 'ls' instead, or something similar. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-22 14:29:41 -08:00

1 2 3 4

183 Commits