git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	1ddb4d7e5e	Merge branch 'nd/upload-pack-shallow' Serving objects from a shallow repository needs to write a temporary file to be used, but the serving upload-pack may not have write access to the repository which is meant to be read-only. Instead feed these temporary shallow bounds from the standard input of pack-objects so that we do not have to use a temporary file. * nd/upload-pack-shallow: upload-pack: send shallow info over stdin to pack-objects	2014-03-21 12:49:08 -07:00
Junio C Hamano	f4eec8ce05	Merge branch 'sh/finish-tmp-packfile' * sh/finish-tmp-packfile: finish_tmp_packfile():use strbuf for pathname construction	2014-03-18 13:50:24 -07:00
Junio C Hamano	fe9122a352	Merge branch 'dd/use-alloc-grow' Replace open-coded reallocation with ALLOC_GROW() macro. * dd/use-alloc-grow: sha1_file.c: use ALLOC_GROW() in pretend_sha1_file() read-cache.c: use ALLOC_GROW() in add_index_entry() builtin/mktree.c: use ALLOC_GROW() in append_to_tree() attr.c: use ALLOC_GROW() in handle_attr_line() dir.c: use ALLOC_GROW() in create_simplify() reflog-walk.c: use ALLOC_GROW() replace_object.c: use ALLOC_GROW() in register_replace_object() patch-ids.c: use ALLOC_GROW() in add_commit() diffcore-rename.c: use ALLOC_GROW() diff.c: use ALLOC_GROW() commit.c: use ALLOC_GROW() in register_commit_graft() cache-tree.c: use ALLOC_GROW() in find_subtree() bundle.c: use ALLOC_GROW() in add_to_ref_list() builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()	2014-03-18 13:50:21 -07:00
Junio C Hamano	56e2874a81	Merge branch 'sh/write-pack-file-warning-message-fix' A warning from "git pack-objects" were generated by referring to an incorrect variable when forming the filename that we had trouble with. * sh/write-pack-file-warning-message-fix: write_pack_file: use correct variable in diagnostic	2014-03-14 14:27:17 -07:00
Junio C Hamano	3e30cb0fbf	Merge branch 'mh/replace-refs-variable-rename' * mh/replace-refs-variable-rename: Document some functions defined in object.c Add docstrings for lookup_replace_object() and do_lookup_replace_object() rename read_replace_refs to check_replace_refs	2014-03-14 14:27:06 -07:00
Junio C Hamano	0963008cbf	Merge branch 'nd/i18n-progress' Mark the progress indicators from various time-consuming commands for i18n/l10n. * nd/i18n-progress: i18n: mark all progress lines for translation	2014-03-14 14:26:31 -07:00
Nguyễn Thái Ngọc Duy	b790e0f67c	upload-pack: send shallow info over stdin to pack-objects Before `cdab485` (upload-pack: delegate rev walking in shallow fetch to pack-objects - 2013-08-16) upload-pack does not write to the source repository. `cdab485` starts to write $GIT_DIR/shallow_XXXXXX if it's a shallow fetch, so the source repo must be writable. git:// servers do not need write access to repos and usually don't have it, which means `cdab485` breaks shallow clone over git:// Instead of using a temporary file as the media for shallow points, we can send them over stdin to pack-objects as well. Prepend shallow SHA-1 with --shallow so pack-objects knows what is what. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-11 13:32:10 -07:00
Dmitry S. Dolzhenko	25e1940709	builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path() Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 14:44:11 -08:00
Sun He	5889271114	finish_tmp_packfile():use strbuf for pathname construction The old version fixes a maximum length on the buffer, which could be a problem if one is not certain of the length of get_object_directory(). Using strbuf can avoid the protential bug. Helped-by: Michael Haggerty <mhagger@alum.mit.edu> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Sun He <sunheehnus@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 12:15:10 -08:00
Junio C Hamano	2156a98045	Merge branch 'sh/write-pack-file-warning-message-fix' into sh/finish-tmp-packfile * sh/write-pack-file-warning-message-fix: write_pack_file: use correct variable in diagnostic	2014-03-03 12:13:20 -08:00
Sun He	0eea5a6e91	write_pack_file: use correct variable in diagnostic 'pack_tmp_name' is the subject of the utime() check, so report it in the warning, not the uninitialized 'tmpname' Signed-off-by: Sun He <sunheehnus@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-03-03 10:43:40 -08:00
Junio C Hamano	0f9e62e084	Merge branch 'jk/pack-bitmap' Borrow the bitmap index into packfiles from JGit to speed up enumeration of objects involved in a commit range without having to fully traverse the history. * jk/pack-bitmap: (26 commits) ewah: unconditionally ntohll ewah data ewah: support platforms that require aligned reads read-cache: use get_be32 instead of hand-rolled ntoh_l block-sha1: factor out get_be and put_be wrappers do not discard revindex when re-preparing packfiles pack-bitmap: implement optional name_hash cache t/perf: add tests for pack bitmaps t: add basic bitmap functionality tests count-objects: recognize .bitmap in garbage-checking repack: consider bitmaps when performing repacks repack: handle optional files created by pack-objects repack: turn exts array into array-of-struct repack: stop using magic number for ARRAY_SIZE(exts) pack-objects: implement bitmap writing rev-list: add bitmap mode to speed up object lists pack-objects: use bitmaps when packing objects pack-objects: split add_object_entry pack-bitmap: add support for bitmap indexes documentation: add documentation for the bitmap format ewah: compressed bitmap implementation ...	2014-02-27 14:01:48 -08:00
Nguyễn Thái Ngọc Duy	754dbc43f0	i18n: mark all progress lines for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-24 09:08:37 -08:00
Michael Haggerty	afc711b8e1	rename read_replace_refs to check_replace_refs The semantics of this flag was changed in commit `e1111cef23` inline lookup_replace_object() calls but wasn't renamed at the time to minimize code churn. Rename it now, and add a comment explaining its use. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-02-20 14:16:55 -08:00
Vicent Marti	ae4f07fbcc	pack-bitmap: implement optional name_hash cache When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Vicent Marti	7cc8f97108	pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Vicent Marti	6b8fda2db1	pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Jeff King	ce2bc42456	pack-objects: split add_object_entry This function actually does three things: 1. Check whether we've already added the object to our packing list. 2. Check whether the object meets our criteria for adding. 3. Actually add the object to our packing list. It's a little hard to see these three phases, because they happen linearly in the rather long function. Instead, this patch breaks them up into three separate helper functions. The result is a little easier to follow, though it unfortunately suffers from some optimization interdependencies between the stages (e.g., during step 3 we use the packing list index from step 1 and the packfile information from step 2). More importantly, though, the various parts can be composed differently, as they will be in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Jeff King	9af270e8c2	do not pretend sha1write returns errors The sha1write function returns an int, but it will always be "0". The failure-prone parts of the function happen in the "flush" callback, which cannot pass an error back to us. So we just end up calling die() during the flush. Let's just drop the return value altogether, as it only confuses callers into thinking that it might be useful. Only one call site actually checked the return value. We can drop that check, since it just led to a die() anyway. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-26 11:50:20 -08:00
Christian Couder	5955654823	replace {pre,suf}fixcmp() with {starts,ends}_with() Leaving only the function definitions and declarations so that any new topic in flight can still make use of the old functions, replace existing uses of the prefixcmp() and suffixcmp() with new API functions. The change can be recreated by mechanically applying this: $ git grep -l -e prefixcmp -e suffixcmp -- \*.c \| grep -v strbuf\\.c \| xargs perl -pi -e ' s\|!prefixcmp\(\|starts_with\(\|g; s\|prefixcmp\(\|!starts_with\(\|g; s\|!suffixcmp\(\|ends_with\(\|g; s\|suffixcmp\(\|!ends_with\(\|g; ' on the result of preparatory changes in this series. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-05 14:13:21 -08:00
Vicent Marti	68fb36eb92	pack-objects: factor out name_hash As the pack-objects system grows beyond the single pack-objects.c file, more parts (like the soon-to-exist bitmap code) will need to compute hashes for matching deltas. Factor out name_hash to make it available to other files. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:52 -07:00
Vicent Marti	2834bc27c1	pack-objects: refactor the packing list The hash table that stores the packing list for a given `pack-objects` run was tightly coupled to the pack-objects code. In this commit, we refactor the hash table and the underlying storage array into a `packing_data` struct. The functionality for accessing and adding entries to the packing list is hence accessible from other parts of Git besides the `pack-objects` builtin. This refactoring is a requirement for further patches in this series that will require accessing the commit packing list from outside of `pack-objects`. The hash table implementation has been minimally altered: we now use table sizes which are always a power of two, to ensure a uniform index distribution in the array. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-10-24 15:44:48 -07:00
Junio C Hamano	eeb8e8373f	Merge branch 'jc/pack-objects' * jc/pack-objects: pack-objects: shrink struct object_entry	2013-10-23 13:21:26 -07:00
Junio C Hamano	238504b014	Merge branch 'nd/fetch-into-shallow' When there is no sufficient overlap between old and new history during a fetch into a shallow repository, we unnecessarily sent objects the sending side knows the receiving end has. * nd/fetch-into-shallow: Add testcase for needless objects during a shallow fetch list-objects: mark more commits as edges in mark_edges_uninteresting list-objects: reduce one argument in mark_edges_uninteresting upload-pack: delegate rev walking in shallow fetch to pack-objects shallow: add setup_temporary_shallow() shallow: only add shallow graft points to new shallow file move setup_alternate_shallow and write_shallow_commits to shallow.c	2013-09-20 12:25:32 -07:00
Nguyễn Thái Ngọc Duy	e76a5fb459	list-objects: reduce one argument in mark_edges_uninteresting mark_edges_uninteresting() is always called with this form mark_edges_uninteresting(revs->commits, revs, ...); Remove the first argument and let mark_edges_uninteresting figure that out by itself. It helps answer the question "are this commit list and revs related in any way?" when looking at mark_edges_uninteresting implementation. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-28 11:54:18 -07:00
Brandon Casey	7c3ecb3254	Don't close pack fd when free'ing pack windows Now that close_one_pack() has been introduced to handle file descriptor pressure, it is not strictly necessary to close the pack file descriptor in unuse_one_window() when we're under memory pressure. Jeff King provided a justification for leaving the pack file open: If you close packfile descriptors, you can run into racy situations where somebody else is repacking and deleting packs, and they go away while you are trying to access them. If you keep a descriptor open, you're fine; they last to the end of the process. If you don't, then they disappear from under you. For normal object access, this isn't that big a deal; we just rescan the packs and retry. But if you are packing yourself (e.g., because you are a pack-objects started by upload-pack for a clone or fetch), it's much harder to recover (and we print some warnings). Let's do so (or uh, not do so). Signed-off-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-08-02 09:27:26 -07:00
Junio C Hamano	63cdcfa40f	pack-objects: shrink struct object_entry Turn some boolean fields into bitfields and use uint32_t for name hash. This shrinks the size of the structure from 128 bytes to 120 bytes. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-02-04 15:23:35 -08:00
Jeff King	315ea32f1b	Merge branch 'jk/peel-ref' Speeds up "git upload-pack" (what is invoked by "git fetch" on the other side of the connection) by reducing the cost to advertise the branches and tags that are available in the repository. * jk/peel-ref: upload-pack: use peel_ref for ref advertisements peel_ref: check object type before loading peel_ref: do not return a null sha1 peel_ref: use faster deref_tag_noverify	2012-10-25 06:42:27 -04:00
Jeff King	e6dbffa67b	peel_ref: do not return a null sha1 The idea of the peel_ref function is to dereference tag objects recursively until we hit a non-tag, and return the sha1. Conceptually, it should return 0 if it is successful (and fill in the sha1), or -1 if there was nothing to peel. However, the current behavior is much more confusing. For a regular loose ref, the behavior is as described above. But there is an optimization to reuse the peeled-ref value for a ref that came from a packed-refs file. If we have such a ref, we return its peeled value, even if that peeled value is null (indicating that we know the ref definitely does _not_ peel). It might seem like such information is useful to the caller, who would then know not to bother loading and trying to peel the object. Except that they should not bother loading and trying to peel the object _anyway_, because that fallback is already handled by peel_ref. In other words, the whole point of calling this function is that it handles those details internally, and you either get a sha1, or you know that it is not peel-able. This patch catches the null sha1 case internally and converts it into a -1 return value (i.e., there is nothing to peel). This simplifies callers, which do not need to bother checking themselves. Two callers are worth noting: - in pack-objects, a comment indicates that there is a difference between non-peelable tags and unannotated tags. But that is not the case (before or after this patch). Whether you get a null sha1 has to do with internal details of how peel_ref operated. - in show-ref, if peel_ref returns a failure, the caller tries to decide whether to try peeling manually based on whether the REF_ISPACKED flag is set. But this doesn't make any sense. If the flag is set, that does not necessarily mean the ref came from a packed-refs file with the "peeled" extension. But it doesn't matter, because even if it didn't, there's no point in trying to peel it ourselves, as peel_ref would already have done so. In other words, the fallback peeling is guaranteed to fail. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-10-04 20:34:28 -07:00
Nguyễn Thái Ngọc Duy	4c6881204b	i18n: pack-objects: mark parseopt strings for translation Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-08-20 12:23:18 -07:00
Junio C Hamano	0958a24d73	Merge branch 'jc/sha1-name-more' Teaches the object name parser things like a "git describe" output is always a commit object, "A" in "git log A" must be a committish, and "A" and "B" in "git log A...B" both must be committish, etc., to prolong the lifetime of abbreviated object names. * jc/sha1-name-more: (27 commits) t1512: match the "other" object names t1512: ignore whitespaces in wc -l output rev-parse --disambiguate=<prefix> rev-parse: A and B in "rev-parse A..B" refer to committish reset: the command takes committish commit-tree: the command wants a tree and commits apply: --build-fake-ancestor expects blobs sha1_name.c: add support for disambiguating other types revision.c: the "log" family, except for "show", takes committish revision.c: allow handle_revision_arg() to take other flags sha1_name.c: introduce get_sha1_committish() sha1_name.c: teach lookup context to get_sha1_with_context() sha1_name.c: many short names can only be committish sha1_name.c: get_sha1_1() takes lookup flags sha1_name.c: get_describe_name() by definition groks only commits sha1_name.c: teach get_short_sha1() a commit-only option sha1_name.c: allow get_short_sha1() to take other flags get_sha1(): fix error status regression sha1_name.c: restructure disambiguation of short names sha1_name.c: correct misnamed "canonical" and "res" ...	2012-07-22 12:55:07 -07:00
Junio C Hamano	8e676e8ba5	revision.c: allow handle_revision_arg() to take other flags The existing "cant_be_filename" that tells the function that the caller knows the arg is not a path (hence it does not have to be checked for absense of the file whose name matches it) is made into a bit in the flag word. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-07-09 16:42:22 -07:00
Nguyễn Thái Ngọc Duy	cf2ba13ac6	pack-objects: use streaming interface for reading large loose blobs git usually streams large blobs directly to packs. But there are cases where git can create large loose blobs (unpack-objects or hash-object over pipe). Or they can come from other git implementations. core.bigfilethreshold can also be lowered down and introduce a new wave of large loose blobs. Use streaming interface to read/compress/write these blobs in one go. Fall back to normal way if somehow streaming interface cannot be used. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-29 10:50:56 -07:00
Nguyễn Thái Ngọc Duy	c9018b0305	pack-objects: refactor write_object() into helper functions The function first decides if we want to copy data taken from existing pack verbatim or we want to encode the data ourselves for the packfile we are creating and then carries out the decision. Separate the latter phase into two helper functions, one for the case the data is reused, the other for the case the data is produced anew. A little twist is that it can later turn out that we cannot reuse the data after we initially decide to do so; in such a case, the "reuse" helper makes a call to "generate" helper. It is easier to follow than the current fallback code that uses "goto" inside a single large function. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-18 14:22:15 -07:00
Nguyễn Thái Ngọc Duy	754980d023	pack-objects, streaming: turn "xx >= big_file_threshold" to ".. > .." This is because all other places do "xx > big_file_threshold" Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-05-18 14:21:19 -07:00
Jeff King	7e52f5660e	gc: do not explode objects which will be immediately pruned When we pack everything into one big pack with "git repack -Ad", any unreferenced objects in to-be-deleted packs are exploded into loose objects, with the intent that they will be examined and possibly cleaned up by the next run of "git prune". Since the exploded objects will receive the mtime of the pack from which they come, if the source pack is old, those loose objects will end up pruned immediately. In that case, it is much more efficient to skip the exploding step entirely for these objects. This patch teaches pack-objects to receive the expiration information and avoid writing these objects out. It also teaches "git gc" to pass the value of gc.pruneexpire to repack (which in turn learns to pass it along to pack-objects) so that this optimization happens automatically during "git gc" and "git gc --auto". Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-04-11 11:09:49 -07:00
Michał Kiedrowicz	2b34e486bc	pack-objects: Fix compilation with NO_PTHREDS It looks like commit `99fb6e04` (pack-objects: convert to use parse_options(), 2012-02-01) moved the #ifdef NO_PTHREDS around but hasn't noticed that the 'arg' variable no longer is available. Signed-off-by: Michał Kiedrowicz <michal.kiedrowicz@gmail.com> Acked-by: Nguyen Thai Ngoc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-26 17:46:00 -08:00
Nguyễn Thái Ngọc Duy	99fb6e04cb	pack-objects: convert to use parse_options() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:05:00 -08:00
Nguyễn Thái Ngọc Duy	3a2ec52e99	pack-objects: remove bogus comment The comment was introduced in `b5d97e6` (pack-objects: run rev-list equivalent internally. - 2006-09-04), stating that git pack-objects [options] base-name <refs...> is acceptable and refs should be passed into rev-list. But that's not true. All arguments after base-name are ignored. Remove the comment and reject this syntax (i.e. no more arguments after base name) Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:04:11 -08:00
Nguyễn Thái Ngọc Duy	6a301345a5	pack-objects: do not accept "--index-version=version," Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-02-01 13:03:46 -08:00
Junio C Hamano	c4a01a3cbb	Merge branch 'maint' * maint: Update draft release notes to 1.7.8.4 Update draft release notes to 1.7.7.6 Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:33:39 -08:00
Junio C Hamano	5a6a939481	Merge branch 'maint-1.7.7' into maint * maint-1.7.7: Update draft release notes to 1.7.7.6 Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:31:46 -08:00
Junio C Hamano	901c907d83	Merge branch 'maint-1.7.6' into maint-1.7.7 * maint-1.7.6: Update draft release notes to 1.7.6.6 thin-pack: try harder to use preferred base objects as base	2012-01-12 23:31:05 -08:00
Jeff King	15f07e061e	thin-pack: try harder to use preferred base objects as base When creating a pack using objects that reside in existing packs, we try to avoid recomputing futile delta between an object (trg) and a candidate for its base object (src) if they are stored in the same packfile, and trg is not recorded as a delta already. This heuristics makes sense because it is likely that we tried to express trg as a delta based on src but it did not produce a good delta when we created the existing pack. As the pack heuristics prefer producing delta to remove data, and Linus's law dictates that the size of a file grows over time, we tend to record the newest version of the file as inflated, and older ones as delta against it. When creating a thin-pack to transfer recent history, it is likely that we will try to send an object that is recorded in full, as it is newer. But the heuristics to avoid recomputing futile delta effectively forbids us from attempting to express such an object as a delta based on another object. Sending an object in full is often more expensive than sending a suboptimal delta based on other objects, and it is even more so if we could use an object we know the receiving end already has (i.e. preferred base object) as the delta base. Tweak the recomputation avoidance logic, so that we do not punt on computing delta against a preferred base object. The effect of this change can be seen on two simulated upload-pack workloads. The first is based on 44 reflog entries from my git.git origin/master reflog, and represents the packs that kernel.org sent me git updates for the past month or two. The second workload represents much larger fetches, going from git's v1.0.0 tag to v1.1.0, then v1.1.0 to v1.2.0, and so on. The table below shows the average generated pack size and the average CPU time consumed for each dataset, both before and after the patch: dataset \| reflog \| tags --------------------------------- before \| 53358 \| 2750977 size after \| 32398 \| 2668479 change \| -39% \| -3% --------------------------------- before \| 0.18 \| 1.12 CPU after \| 0.18 \| 1.15 change \| +0% \| +3% This patch makes a much bigger difference for packs with a shorter slice of history (since its effect is seen at the boundaries of the pack) though it has some benefit even for larger packs. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2012-01-12 23:06:20 -08:00
Junio C Hamano	48b303675a	Merge branch 'jc/stream-to-pack' * jc/stream-to-pack: bulk-checkin: replace fast-import based implementation csum-file: introduce sha1file_checkpoint finish_tmp_packfile(): a helper function create_tmp_packfile(): a helper function write_pack_header(): a helper function Conflicts: pack.h	2011-12-16 22:33:40 -08:00
Junio C Hamano	2e8722fc9e	Merge branch 'jc/maint-pack-object-cycle' into maint * jc/maint-pack-object-cycle: pack-object: tolerate broken packs that have duplicated objects Conflicts: builtin/pack-objects.c	2011-12-13 22:04:50 -08:00
Junio C Hamano	df6246ed78	Merge branch 'nd/misc-cleanups' into maint * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-13 22:02:51 -08:00
Junio C Hamano	cddec4f8ae	Merge branch 'jc/maint-pack-object-cycle' * jc/maint-pack-object-cycle: pack-object: tolerate broken packs that have duplicated objects Conflicts: builtin/pack-objects.c	2011-12-05 15:19:34 -08:00
Junio C Hamano	62cdb6b23a	Merge branch 'nd/misc-cleanups' * nd/misc-cleanups: unpack_object_header_buffer(): clear the size field upon error tree_entry_interesting: make use of local pointer "item" tree_entry_interesting(): give meaningful names to return values read_directory_recursive: reduce one indentation level get_tree_entry(): do not call find_tree_entry() on an empty tree tree-walk.c: do not leak internal structure in tree_entry_len()	2011-12-05 15:10:20 -08:00
Junio C Hamano	568508e765	bulk-checkin: replace fast-import based implementation This extends the earlier approach to stream a large file directly from the filesystem to its own packfile, and allows "git add" to send large files directly into a single pack. Older code used to spawn fast-import, but the new bulk-checkin API replaces it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-01 11:46:09 -08:00

1 2

88 Commits