git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	13158b9910	Merge branch 'jk/promisor-optim' Handling of "promisor packs" that allows certain objects to be missing and lazily retrievable has been optimized (a bit). * jk/promisor-optim: revision: avoid parsing with --exclude-promisor-objects lookup_unknown_object(): take a repository argument is_promisor_object(): free tree buffer after parsing	2021-04-30 13:50:24 +09:00
Rafael Silva	a643157d5a	repack: avoid loosening promisor objects in partial clones When `git repack -A -d` is run in a partial clone, `pack-objects` is invoked twice: once to repack all promisor objects, and once to repack all non-promisor objects. The latter `pack-objects` invocation is with --exclude-promisor-objects and --unpack-unreachable, which loosens all objects unused during this invocation. Unfortunately, this includes promisor objects. Because the -d argument to `git repack` subsequently deletes all loose objects also in packs, these just-loosened promisor objects will be immediately deleted. However, this extra disk churn is unnecessary in the first place. For example, in a newly-cloned partial repo that filters all blob objects (e.g. `--filter=blob:none`), `repack` ends up unpacking all trees and commits into the filesystem because every object, in this particular case, is a promisor object. Depending on the repo size, this increases the disk usage considerably: In my copy of the linux.git, the object directory peaked 26GB of more disk usage. In order to avoid this extra disk churn, pass the names of the promisor packfiles as --keep-pack arguments to the second invocation of `pack-objects`. This informs `pack-objects` that the promisor objects are already in a safe packfile and, therefore, do not need to be loosened. For testing, we need to validate whether any object was loosened. However, the "evidence" (loosened objects) is deleted during the process which prevents us from inspecting the object directory. Instead, let's teach `pack-objects` to count loosened objects and emit via trace2 thus allowing inspecting the debug events after the process is finished. This new event is used on the added regression test. Lastly, add a new perf test to evaluate the performance impact made by this changes (tested on git.git): Test HEAD^ HEAD ---------------------------------------------------------- 5600.3: gc 134.38(41.93+90.95) 7.80(6.72+1.35) -94.2% For a bigger repository, such as linux.git, the improvement is even bigger: Test HEAD^ HEAD ------------------------------------------------------------------- 5600.3: gc 6833.00(918.07+3162.74) 268.79(227.02+39.18) -96.1% These improvements are particular big because every object in the newly-cloned partial repository is a promisor object. Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Jeff King <peff@peff.net> Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-28 13:36:13 +09:00
brian m. carlson	71b7672b67	builtin/pack-objects: avoid using struct object_id for pack hash We use struct object_id for the names of objects. It isn't intended to be used for other hash values that don't name objects such as the pack hash. Because struct object_id will soon need to have its algorithm member set, using it in this code path would mean that we didn't set that member, only the hash member, which would result in a crash. For both of these reasons, switch to using an unsigned char array of size GIT_MAX_RAWSZ. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-27 16:31:39 +09:00
Junio C Hamano	2eebac2c49	Merge branch 'jk/pack-objects-bitmap-progress-fix' When "git pack-objects" makes a literal copy of a part of existing packfile using the reachability bitmaps, its update to the progress meter was broken. * jk/pack-objects-bitmap-progress-fix: pack-objects: update "nr_seen" progress based on pack-reused count	2021-04-20 17:23:35 -07:00
Patrick Steinhardt	9cf68b27d5	rev-list: allow filtering of provided items When providing an object filter, it is currently impossible to also filter provided items. E.g. when executing `git rev-list HEAD` , the commit this reference points to will be treated as user-provided and is thus excluded from the filtering mechanism. This makes it harder than necessary to properly use the new `--filter=object:type` filter given that even if the user wants to only see blobs, he'll still see commits of provided references. Improve this by introducing a new `--filter-provided-objects` option to the git-rev-parse(1) command. If given, then all user-provided references will be subject to filtering. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-19 14:09:11 -07:00
Jeff King	45a187cc34	lookup_unknown_object(): take a repository argument All of the other lookup_foo() functions take a repository argument, but lookup_unknown_object() was never converted, and it uses the_repository internally. Let's fix that. We could leave a wrapper that uses the_repository, but there aren't that many calls, so we'll just convert them all. I looked briefly at each site to see if we had a repository struct (besides the_repository) we could pass, but none of them do (so this conversion to pass the_repository is a pure noop in each case, though it does take us one step closer to eventually getting rid of the_repository). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-13 13:18:46 -07:00
Jeff King	8e118e8490	pack-objects: update "nr_seen" progress based on pack-reused count When serving a clone or fetch with bitmaps, after deciding which objects need to be sent our "pack reuse" mechanism kicks in: we try to send more-or-less verbatim a bunch of objects from the beginning of the bitmapped packfile without even adding them to the to_pack.objects array. After deciding which objects will be in the "reused" portion, we update nr_result to account for those, and then trigger display_progress() to show the user (who is undoubtedly dazzled that we managed to enumerate so many objects so quickly). But then something confusing happens: the "Enumerating objects" progress meter jumps _backwards_, counting up from zero the number of objects we actually add into to_pack.objects. This worked correctly once upon a time, but was broken in `5af050437a` (pack-objects: show some progress when counting kept objects, 2018-04-15), when the latter half of that progress meter switched to using a separate nr_seen counter, rather than nr_result. Nobody noticed for two reasons: - prior to the pack-reuse fixes from `a14aebeac3` (Merge branch 'jk/packfile-reuse-cleanup', 2020-02-14), the reuse code almost never kicked in anyway - the output looks _kind of_ correct. The "backwards" moment is hard to catch, because we overwrite the old progress number with the new one, and the larger number is displayed only for a second. So unless you look at that exact second, you just see the much smaller value, counting up to the number of non-reused objects (though of course if you catch it in stderr, or look at GIT_TRACE_PACKET from a server with bitmaps, you can see both values). This smaller output isn't wrong per se, but isn't counting what we ever intended to. We should give the user the whole number of objects we considered (which, as per 5af050437a's original purpose, is already _not_ a count of what goes into to_pack.objects). The follow-on "Counting objects" meter shows the actual number of objects we feed into that array. We can easily fix this by bumping (and showing) nr_seen for the pack-reused objects. When the included test is run without this patch, the second pack-objects invocation produces "Enumerating objects: 1" to show the one loose object, even though the resulting pack has hundreds of objects in it. With it, we jump to "Enumerating objects: 674" after deciding on reuse, and then "675" when we add in the loose object. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-12 11:31:30 -07:00
Taylor Blau	3f267a1128	builtin/pack-objects.c: respect 'pack.preferBitmapTips' When writing a new pack with a bitmap, it is sometimes convenient to indicate some reference prefixes which should receive priority when selecting which commits to receive bitmaps. A truly motivated caller could accomplish this by setting 'pack.islandCore', (since all commits in the core island are similarly marked as preferred) but this requires callers to opt into using delta islands, which they may or may not want to do. Introduce a new multi-valued configuration, 'pack.preferBitmapTips' to allow callers to specify a list of reference prefixes. All references which have a prefix contained in 'pack.preferBitmapTips' will mark their tips as "preferred" in the same way as commits are marked as preferred for selection by 'pack.islandCore'. The choice of the verb "prefer" is intentional: marking the NEEDS_BITMAP flag on an object does not guarantee that that object will receive a bitmap. It merely guarantees that that commit will receive a bitmap over any other commit in the same window by bitmap_writer_select_commits(). The test this patch adds reflects this quirk, too. It only tests that a commit (which didn't receive bitmaps by default) is selected for bitmaps after changing the value of 'pack.preferBitmapTips' to include it. Other commits may lose their bitmaps as a byproduct of how the selection process works (bitmap_writer_select_commits() ignores the remainder of a window after seeing a commit with the NEEDS_BITMAP flag). This configuration will aide in selecting important references for multi-pack bitmaps, since they do not respect the same pack.islandCore configuration. (They could, but doing so may be confusing, since it is packs--not bitmaps--which are influenced by the delta-islands configuration). In a fork network repository (one which lists all forks of a given repository as remotes), for example, it is useful to set pack.preferBitmapTips to 'refs/remotes/<root>/heads' and 'refs/remotes/<root>/tags', where '<root>' is an opaque identifier referring to the repository which is at the base of the fork chain. Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-31 23:14:03 -07:00
Junio C Hamano	4730c5e273	Merge branch 'hx/pack-objects-chunk-comment' Comment update. * hx/pack-objects-chunk-comment: pack-objects: fix comment of reused_chunk.difference	2021-03-30 14:35:37 -07:00
Junio C Hamano	2744383cbd	Merge branch 'tb/geometric-repack' "git repack" so far has been only capable of repacking everything under the sun into a single pack (or split by size). A cleverer strategy to reduce the cost of repacking a repository has been introduced. * tb/geometric-repack: builtin/pack-objects.c: ignore missing links with --stdin-packs builtin/repack.c: reword comment around pack-objects flags builtin/repack.c: be more conservative with unsigned overflows builtin/repack.c: assign pack split later t7703: test --geometric repack with loose objects builtin/repack.c: do not repack single packs with --geometric builtin/repack.c: add '--geometric' option packfile: add kept-pack cache for find_kept_pack_entry() builtin/pack-objects.c: rewrite honor-pack-keep logic p5303: measure time to repack with keep p5303: add missing &&-chains builtin/pack-objects.c: add '--stdin-packs' option revision: learn '--no-kept-objects' packfile: introduce 'find_kept_pack_entry()'	2021-03-24 14:36:27 -07:00
Han Xin	bf12013f1a	pack-objects: fix comment of reused_chunk.difference As record_reused_object(offset, offset - hashfile_total(out)) said, reused_chunk.difference should be the offset of original packfile minus the offset of the generated packfile. But the comment presented an opposite way. Signed-off-by: Han Xin <hanxin.hx@alibaba-inc.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-24 13:03:22 -07:00
Taylor Blau	14e7b8344f	builtin/pack-objects.c: ignore missing links with --stdin-packs When 'git pack-objects --stdin-packs' encounters a commit in a pack, it marks it as a starting point of a best-effort reachability traversal that is used to populate the name-hash of the objects listed in the given packs. The traversal expects that it should be able to walk the ancestors of all commits in a pack without issue. Ordinarily this is the case, but it is possible to having missing parents from an unreachable part of the repository. In that case, we'd consider any missing objects in the unreachable portion of the graph to be junk. This should be handled gracefully: since the traversal is best-effort (i.e., we don't strictly need to fill in all of the name-hash fields), we should simply ignore any missing links. This patch does that (by setting the 'ignore_missing_links' bit on the rev_info struct), and ensures we don't regress in the future by adding a test which demonstrates this case. It is a little over-eager, since it will also ignore missing links in reachable parts of the packs (which would indicate a corrupted repository), but '--stdin-packs' is explicitly not about reachability. So this step isn't making anything worse for a repository which contains packs missing reachable objects (since we never drop objects with '--stdin-packs'). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-19 11:19:29 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Jeff King	6325da14af	builtin/pack-objects.c: rewrite honor-pack-keep logic Now that we have find_kept_pack_entry(), we don't have to manually keep hunting through every pack to find a possible "kept" duplicate of the object. This should be faster, assuming only a portion of your total packs are actually kept. Note that we have to re-order the logic a bit here; we can deal with the disqualifying situations first (e.g., finding the object in a non-local pack with --local), then "kept" situation(s), and then just fall back to other "--local" conditions. Here are the results from p5303 (measurements again taken on the kernel): Test HEAD^ HEAD ----------------------------------------------------------------------------------------------- 5303.5: repack (1) 57.26(54.59+10.84) 57.34(54.66+10.88) +0.1% 5303.6: repack with kept (1) 57.33(54.80+10.51) 57.38(54.83+10.49) +0.1% 5303.11: repack (50) 71.54(88.57+4.84) 71.70(88.99+4.74) +0.2% 5303.12: repack with kept (50) 85.12(102.05+4.94) 72.58(89.61+4.78) -14.7% 5303.17: repack (1000) 216.87(490.79+14.57) 217.19(491.72+14.25) +0.1% 5303.18: repack with kept (1000) 665.63(938.87+15.76) 246.12(520.07+14.93) -63.0% and the --stdin-packs timings: 5303.7: repack with --stdin-packs (1) 0.01(0.01+0.00) 0.00(0.00+0.00) -100.0% 5303.13: repack with --stdin-packs (50) 3.53(12.07+0.24) 3.43(11.75+0.24) -2.8% 5303.19: repack with --stdin-packs (1000) 195.83(371.82+8.10) 130.50(307.15+7.66) -33.4% So our repack with an empty .keep pack is roughly as fast as one without a .keep pack up to 50 packs. But the --stdin-packs case scales a little better, too. Notably, it is faster than a repack of the same size and a kept pack. It looks at fewer objects, of course, but the penalty for looking at many packs isn't as costly. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 23:30:52 -08:00
Taylor Blau	339bce27f4	builtin/pack-objects.c: add '--stdin-packs' option In an upcoming commit, 'git repack' will want to create a pack comprised of all of the objects in some packs (the included packs) excluding any objects in some other packs (the excluded packs). This caller could iterate those packs themselves and feed the objects it finds to 'git pack-objects' directly over stdin, but this approach has a few downsides: - It requires every caller that wants to drive 'git pack-objects' in this way to implement pack iteration themselves. This forces the caller to think about details like what order objects are fed to pack-objects, which callers would likely rather not do. - If the set of objects in included packs is large, it requires sending a lot of data over a pipe, which is inefficient. - The caller is forced to keep track of the excluded objects, too, and make sure that it doesn't send any objects that appear in both included and excluded packs. But the biggest downside is the lack of a reachability traversal. Because the caller passes in a list of objects directly, those objects don't get a namehash assigned to them, which can have a negative impact on the delta selection process, causing 'git pack-objects' to fail to find good deltas even when they exist. The caller could formulate a reachability traversal themselves, but the only way to drive 'git pack-objects' in this way is to do a full traversal, and then remove objects in the excluded packs after the traversal is complete. This can be detrimental to callers who care about performance, especially in repositories with many objects. Introduce 'git pack-objects --stdin-packs' which remedies these four concerns. 'git pack-objects --stdin-packs' expects a list of pack names on stdin, where 'pack-xyz.pack' denotes that pack as included, and '^pack-xyz.pack' denotes it as excluded. The resulting pack includes all objects that are present in at least one included pack, and aren't present in any excluded pack. To address the delta selection problem, 'git pack-objects --stdin-packs' works as follows. First, it assembles a list of objects that it is going to pack, as above. Then, a reachability traversal is started, whose tips are any commits mentioned in included packs. Upon visiting an object, we find its corresponding object_entry in the to_pack list, and set its namehash parameter appropriately. To avoid the traversal visiting more objects than it needs to, the traversal is halted upon encountering an object which can be found in an excluded pack (by marking the excluded packs as kept in-core, and passing --no-kept-objects=in-core to the revision machinery). This can cause the traversal to halt early, for example if an object in an included pack is an ancestor of ones in excluded packs. But stopping early is OK, since filling in the namehash fields of objects in the to_pack list is only additive (i.e., having it helps the delta selection process, but leaving it blank doesn't impact the correctness of the resulting pack). Even still, it is unlikely that this hurts us much in practice, since the 'git repack --geometric' caller (which is introduced in a later commit) marks small packs as included, and large ones as excluded. During ordinary use, the small packs usually represent pushes after a large repack, and so are unlikely to be ancestors of objects that already exist in the repository. (I found it convenient while developing this patch to have 'git pack-objects' report the number of objects which were visited and got their namehash fields filled in during traversal. This is also included in the below patch via trace2 data lines). Suggested-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 23:30:52 -08:00
Junio C Hamano	3c12d0b885	Merge branch 'tb/pack-revindex-on-disk' Introduce an on-disk file to record revindex for packdata, which traditionally was always created on the fly and only in-core. * tb/pack-revindex-on-disk: t5325: check both on-disk and in-memory reverse index pack-revindex: ensure that on-disk reverse indexes are given precedence t: support GIT_TEST_WRITE_REV_INDEX t: prepare for GIT_TEST_WRITE_REV_INDEX Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' builtin/pack-objects.c: respect 'pack.writeReverseIndex' builtin/index-pack.c: write reverse indexes builtin/index-pack.c: allow stripping arbitrary extensions pack-write.c: prepare to write 'pack-.rev' files packfile: prepare for the existence of '.rev' files	2021-02-12 14:21:04 -08:00
Junio C Hamano	77db59c2f9	Merge branch 'jv/pack-objects-narrower-ref-iteration' The "pack-objects" command needs to iterate over all the tags when automatic tag following is enabled, but it actually iterated over all refs and then discarded everything outside "refs/tags/" hierarchy, which was quite wasteful. * jv/pack-objects-narrower-ref-iteration: builtin/pack-objects.c: avoid iterating all refs	2021-02-05 16:40:45 -08:00
Junio C Hamano	973e20b83f	Merge branch 'jk/peel-iterated-oid' The peel_ref() API has been replaced with peel_iterated_oid(). * jk/peel-iterated-oid: refs: switch peel_ref() to peel_iterated_oid()	2021-02-03 15:04:49 -08:00
Taylor Blau	e8c58f894b	t: support GIT_TEST_WRITE_REV_INDEX Add a new option that unconditionally enables the pack.writeReverseIndex setting in order to run the whole test suite in a mode that generates on-disk reverse indexes. Additionally, enable this mode in the second run of tests under linux-gcc in 'ci/run-build-and-tests.sh'. Once on-disk reverse indexes are proven out over several releases, we can change the default value of that configuration to 'true', and drop this patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:44 -08:00
Taylor Blau	c97733435a	builtin/pack-objects.c: respect 'pack.writeReverseIndex' Now that we have an implementation that can write the new reverse index format, enable writing a .rev file in 'git pack-objects' by consulting the pack.writeReverseIndex configuration variable. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Jacob Vosmaer	be18153b97	builtin/pack-objects.c: avoid iterating all refs In git-pack-objects, we iterate over all the tags if the --include-tag option is passed on the command line. For some reason this uses for_each_ref which is expensive if the repo has many refs. We should use for_each_tag_ref instead. Because the add_ref_tag callback will now only visit tags we simplified it a bit. The motivation for this change is that we observed performance issues with a repository on gitlab.com that has 500,000 refs but only 2,000 tags. The fetch traffic on that repo is dominated by CI, and when we changed CI to fetch with 'git fetch --no-tags' we saw a dramatic change in the CPU profile of git-pack-objects. This lead us to this particular ref walk. More details in: https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/746#note_483546598 Signed-off-by: Jacob Vosmaer <jacob@gitlab.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-22 17:27:42 -08:00
Jeff King	36a317929b	refs: switch peel_ref() to peel_iterated_oid() The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char refname, const struct object_id oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-21 15:51:31 -08:00
Taylor Blau	eb3fd99efd	check_object(): convert to new revindex API Replace direct accesses to the revindex with calls to 'offset_to_pack_pos()' and 'pack_pos_to_index()'. Since this caller already had some error checking (it can jump to the 'give_up' label if it encounters an error), we can easily check whether or not the provided offset points to an object in the given pack. This error checking existed prior to this patch, too, since the caller checks whether the return value from 'find_pack_revindex()' was NULL or not. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	6a5c10c45f	write_reused_pack_verbatim(): convert to new revindex API Replace a direct access to the revindex array with 'pack_pos_to_offset()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	66cbd3e2fb	write_reused_pack_one(): convert to new revindex API Replace direct revindex accesses with calls to 'pack_pos_to_offset()' and 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Taylor Blau	952fc6870d	write_reuse_object(): convert to new revindex API First replace 'find_pack_revindex()' with its replacement 'offset_to_pack_pos()'. This prevents any bogus OFS_DELTA that may make its way through until 'write_reuse_object()' from causing a bad memory read (if 'revidx' is 'NULL') Next, replace a direct access of '->nr' with the wrapper function 'pack_pos_to_index()'. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-13 21:53:45 -08:00
Jeff King	449fa5ee06	pack-bitmap-write: ignore BITMAP_FLAG_REUSE The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-08 14:48:17 -08:00
Junio C Hamano	2a978f8273	Merge branch 'jc/object-names-are-not-sha-1' A few end-user facing messages have been updated to be hash-algorithm agnostic. * jc/object-names-are-not-sha-1: messages: avoid SHA-1 in end-user facing messages	2020-08-19 16:14:52 -07:00
Junio C Hamano	4279000d3e	messages: avoid SHA-1 in end-user facing messages There are still a handful mentions of SHA-1 when we meant the (hexadecimal) object names in end-user facing messages. Rewrite them. I was hoping that this can mostly be s/SHA-1/object name/, but a few messages needed rephrasing to keep the result readable. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-14 09:33:37 -07:00
Junio C Hamano	d1a8a8979d	Merge branch 'jt/has_object' A new helper function has_object() has been introduced to make it easier to mark object existence checks that do and don't want to trigger lazy fetches, and a few such checks are converted using it. * jt/has_object: fsck: do not lazy fetch known non-promisor object pack-objects: no fetch when allow-{any,promisor} apply: do not lazy fetch when applying binary sha1-file: introduce no-lazy-fetch has_object()	2020-08-13 14:13:39 -07:00
Junio C Hamano	46b225f153	Merge branch 'jk/strvec' The argv_array API is useful for not just managing argv but any "vector" (NULL-terminated array) of strings, and has seen adoption to a certain degree. It has been renamed to "strvec" to reduce the barrier to adoption. * jk/strvec: strvec: rename struct fields strvec: drop argv_array compatibility layer strvec: update documention to avoid argv_array strvec: fix indentation in renamed calls strvec: convert remaining callers away from argv_array name strvec: convert more callers away from argv_array name strvec: convert builtin/ callers away from argv_array name quote: rename sq_dequote_to_argv_array to mention strvec strvec: rename files from argv-array to strvec argv-array: rename to strvec argv-array: use size_t for count and alloc	2020-08-10 10:23:57 -07:00
Jonathan Tan	ee47243d76	pack-objects: no fetch when allow-{any,promisor} The options --missing=allow-{any,promisor} were introduced in `caf3827e2f` ("rev-list: add list-objects filtering support", 2017-11-22) with the following note in the commit message: This patch introduces handling of missing objects to help debugging and development of the "partial clone" mechanism, and once the mechanism is implemented, for a power user to perform operations that are missing-object aware without incurring the cost of checking if a missing link is expected. The idea that these options are missing-object aware (and thus do not need to lazily fetch objects, unlike unaware commands that assume that all objects are present) are assumed in later commits such as `07ef3c6604` ("fetch test: use more robust test for filtered objects", 2020-01-15). However, the current implementations of these options use has_object_file(), which indeed lazily fetches missing objects. Teach these implementations not to do so. Also, update the documentation of these options to be clearer. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-06 13:01:03 -07:00
Jeff King	d70a9eb611	strvec: rename struct fields The "argc" and "argv" names made sense when the struct was argv_array, but now they're just confusing. Let's rename them to "nr" (which we use for counts elsewhere) and "v" (which is rather terse, but reads well when combined with typical variable names like "args.v"). Note that we have to update all of the callers immediately. Playing tricks with the preprocessor is hard here, because we wouldn't want to rewrite unrelated tokens. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-30 19:18:06 -07:00
Jeff King	22f9b7f3f5	strvec: convert builtin/ callers away from argv_array name We eventually want to drop the argv_array name and just use strvec consistently. There's no particular reason we have to do it all at once, or care about interactions between converted and unconverted bits. Because of our preprocessor compat layer, the names are interchangeable to the compiler (so even a definition and declaration using different names is OK). This patch converts all of the files in builtin/ to keep the diff to a manageable size. The conversion was done purely mechanically with: git ls-files '.c' '.h' \| xargs perl -i -pe ' s/ARGV_ARRAY/STRVEC/g; s/argv_array/strvec/g; ' and then selectively staging files with "git add builtin/". We'll deal with any indentation/style fallouts separately. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:18 -07:00
Jeff King	dbbcd44fb4	strvec: rename files from argv-array to strvec This requires updating #include lines across the code-base, but that's all fairly mechanical, and was done with: git ls-files '.c' '.h' \| xargs perl -i -pe 's/argv-array.h/strvec.h/' Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:17 -07:00
Jonathan Tan	e00549aa9b	pack-objects: prefetch objects to be packed When an object to be packed is noticed to be missing, prefetch all to-be-packed objects in one batch. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-21 14:29:42 -07:00
Jonathan Tan	8d5cf95735	pack-objects: refactor to oid_object_info_extended Use oid_object_info_extended() instead of oid_object_info() because a subsequent commit needs to specify an additional flag here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-21 14:29:42 -07:00
Jonathan Tan	dd4b732df7	upload-pack: send part of packfile response as uri Teach upload-pack to send part of its packfile response as URIs. An administrator may configure a repository with one or more "uploadpack.blobpackfileuri" lines, each line containing an OID, a pack hash, and a URI. A client may configure fetch.uriprotocols to be a comma-separated list of protocols that it is willing to use to fetch additional packfiles - this list will be sent to the server. Whenever an object with one of those OIDs would appear in the packfile transmitted by upload-pack, the server may exclude that object, and instead send the URI. The client will then download the packs referred to by those URIs before performing the connectivity check. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-10 18:06:34 -07:00
Junio C Hamano	896833b268	Merge branch 'tb/shallow-cleanup' Code cleanup. * tb/shallow-cleanup: shallow: use struct 'shallow_lock' for additional safety shallow.h: document '{commit,rollback}_shallow_file' shallow: extract a header file for shallow-related functions commit: make 'commit_graft_pos' non-static	2020-05-13 12:19:18 -07:00
Taylor Blau	120ad2b0f1	shallow: extract a header file for shallow-related functions There are many functions in commit.h that are more related to shallow repositories than they are to any sort of generic commit machinery. Likely this began when there were only a few shallow-related functions, and commit.h seemed a reasonable enough place to put them. But, now there are a good number of shallow-related functions, and placing them all in 'commit.h' doesn't make sense. This patch extracts a 'shallow.h', which takes all of the declarations from 'commit.h' for functions which already exist in 'shallow.c'. We will bring the remaining shallow-related functions defined in 'commit.c' in a subsequent patch. For now, move only the ones that already are implemented in 'shallow.c', and update the necessary includes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-30 14:19:13 -07:00
Denton Liu	203c85339f	Use OPT_CALLBACK and OPT_CALLBACK_F In the codebase, there are many options which use OPTION_CALLBACK in a plain ol' struct definition. However, we have the OPT_CALLBACK and OPT_CALLBACK_F macros which are meant to abstract these plain struct definitions away. These macros are useful as they semantically signal to developers that these are just normal callback option with nothing fancy happening. Replace plain struct definitions of OPTION_CALLBACK with OPT_CALLBACK or OPT_CALLBACK_F where applicable. The heavy lifting was done using the following (disgusting) shell script: #!/bin/sh do_replacement () { tr '\n' '\r' \| sed -e 's/{\sOPTION_CALLBACK,\s$[^,]$,$[^,]$,$[^,]$,$[^,]$,$[^,]$,\s0,$\s[^[:space:]}]$\s}/OPT_CALLBACK(\1,\2,\3,\4,\5,\6)/g' \| sed -e 's/{\sOPTION_CALLBACK,\s$[^,]$,$[^,]$,$[^,]$,$[^,]$,$[^,]$,$[^,]$,$\s[^[:space:]}]$\s}/OPT_CALLBACK_F(\1,\2,\3,\4,\5,\6,\7)/g' \| tr '\r' '\n' } for f in $(git ls-files \.c) do do_replacement <"$f" >"$f.tmp" mv "$f.tmp" "$f" done The result was manually inspected and then reformatted to match the style of the surrounding code. Finally, using `git grep OPTION_CALLBACK \.c`, leftover results which were not handled by the script were manually transformed. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-28 10:47:10 -07:00
Junio C Hamano	a768f866e9	Merge branch 'jk/oid-array-cleanups' Code cleanup. * jk/oid-array-cleanups: oidset: stop referring to sha1-array ref-filter: stop referring to "sha1 array" bisect: stop referring to sha1_array test-tool: rename sha1-array to oid-array oid_array: rename source file from sha1-array oid_array: use size_t for iteration oid_array: use size_t for count and allocation	2020-04-22 13:42:49 -07:00
Jeff King	fe299ec5ae	oid_array: rename source file from sha1-array We renamed the actual data structure in `910650d2f8` (Rename sha1_array to oid_array, 2017-03-31), but the file is still called sha1-array. Besides being slightly confusing, it makes it more annoying to grep for leftover occurrences of "sha1" in various files, because the header is included in so many places. Let's complete the transition by renaming the source and header files (and fixing up a few comment references). I kept the "-" in the name, as that seems to be our style; cf. `fc1395f4a4` (sha1_file.c: rename to use dash in file name, 2018-04-10). We also have oidmap.h and oidset.h without any punctuation, but those are "struct oidmap" and "struct oidset" in the code. We _could_ make this "oidarray" to match, but somehow it looks uglier to me because of the length of "array" (plus it would be a very invasive patch for little gain). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-30 10:59:08 -07:00
Junio C Hamano	9fadedd637	Merge branch 'ds/default-pack-use-sparse-to-true' The 'pack.useSparse' configuration variable now defaults to 'true', enabling an optimization that has been experimental since Git 2.21. * ds/default-pack-use-sparse-to-true: pack-objects: flip the use of GIT_TEST_PACK_SPARSE config: set pack.useSparse=true by default	2020-03-29 09:32:51 -07:00
Junio C Hamano	f8cb64e3d4	Merge branch 'bc/sha-256-part-1-of-4' SHA-256 transition continues. * bc/sha-256-part-1-of-4: (22 commits) fast-import: add options for rewriting submodules fast-import: add a generic function to iterate over marks fast-import: make find_marks work on any mark set fast-import: add helper function for inserting mark object entries fast-import: permit reading multiple marks files commit: use expected signature header for SHA-256 worktree: allow repository version 1 init-db: move writing repo version into a function builtin/init-db: add environment variable for new repo hash builtin/init-db: allow specifying hash algorithm on command line setup: allow check_repository_format to read repository format t/helper: make repository tests hash independent t/helper: initialize repository if necessary t/helper/test-dump-split-index: initialize git repository t6300: make hash algorithm independent t6300: abstract away SHA-1-specific constants t: use hash-specific lookup tables to define test constants repository: require a build flag to use SHA-256 hex: add functions to parse hex object IDs in any algorithm hex: introduce parsing variants taking hash algorithms ...	2020-03-26 17:11:20 -07:00
Derrick Stolee	2d657ab95f	pack-objects: flip the use of GIT_TEST_PACK_SPARSE The environment variable GIT_TEST_PACK_SPARSE was previously used to allow testing the --sparse option for "git pack-objects" in the test suite. This allowed interesting cases of "git push" to also test this algorithm. Since pack.useSparse is now true by default, we do not need this variable to _enable_ the --sparse option, but instead to _disable_ it. This flips how we work with the variable a bit. When checking for the variable, default to a value of -1 for "unset". If unset, then take the default from the repo settings, which is currently 1. Then, the --[no-]sparse command-line option will override either of these settings. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-20 14:22:32 -07:00
Junio C Hamano	e8e71848ea	Merge branch 'jk/nth-packed-object-id' Code cleanup to use "struct object_id" more by replacing use of "char sha1" jk/nth-packed-object-id: packfile: drop nth_packed_object_sha1() packed_object_info(): use object_id internally for delta base packed_object_info(): use object_id for returning delta base pack-check: push oid lookup into loop pack-check: convert "internal error" die to a BUG() pack-bitmap: use object_id when loading on-disk bitmaps pack-objects: use object_id struct in pack-reuse code pack-objects: convert oe_set_delta_ext() to use object_id pack-objects: read delta base oid into object_id struct nth_packed_object_oid(): use customary integer return	2020-03-05 10:43:03 -08:00
Junio C Hamano	0df82d99da	Merge branch 'jk/object-filter-with-bitmap' The object reachability bitmap machinery and the partial cloning machinery were not prepared to work well together, because some object-filtering criteria that partial clones use inherently rely on object traversal, but the bitmap machinery is an optimization to bypass that object traversal. There however are some cases where they can work together, and they were taught about them. * jk/object-filter-with-bitmap: rev-list --count: comment on the use of count_right++ pack-objects: support filters with bitmaps pack-bitmap: implement BLOB_LIMIT filtering pack-bitmap: implement BLOB_NONE filtering bitmap: add bitmap_unset() function rev-list: use bitmap filters for traversal pack-bitmap: basic noop bitmap filter infrastructure rev-list: allow commit-only bitmap traversals t5310: factor out bitmap traversal comparison rev-list: allow bitmaps when counting objects rev-list: make --count work with --objects rev-list: factor out bitmap-optimized routines pack-bitmap: refuse to do a bitmap traversal with pathspecs rev-list: fallback to non-bitmap traversal when filtering pack-bitmap: fix leak of haves/wants object lists pack-bitmap: factor out type iterator initialization	2020-03-02 15:07:18 -08:00
Jeff King	f66d4e0250	pack-objects: use object_id struct in pack-reuse code When the pack-reuse code is dumping an OFS_DELTA entry to a client that doesn't support it, we re-write it as a REF_DELTA. To do so, we use nth_packed_object_sha1() to get the oid, but that function is soon going away in favor of the more type-safe nth_packed_object_id(). Let's switch now in preparation. Note that this does incur an extra hash copy (from the pack idx mmap to the object_id and then to the output, rather than straight from mmap to the output). But this is not worth worrying about. It's probably not measurable even when it triggers, and this is fallback code that we expect to trigger very rarely (since everybody supports OFS_DELTA these days anyway). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:53 -08:00
Jeff King	a93c141dde	pack-objects: convert oe_set_delta_ext() to use object_id We already store an object_id internally, and now our sole caller also has one. Let's stop passing around the internal hash array, which adds a bit of type safety. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:52 -08:00
Jeff King	3f83fd5e44	pack-objects: read delta base oid into object_id struct When we're considering reusing an on-disk delta, we get the oid of the base as a pointer to unsigned char bytes of the hash, either into the packfile itself (for REF_DELTA) or into the pack idx (using the revindex to convert the offset into an index entry). Instead, we'd prefer to use a more type-safe object_id as much as possible. We can get the pack idx using nth_packed_object_id() instead. For the packfile bytes, we can copy them out using oidread(). This doesn't even incur an extra copy overall, since the next thing we'd always do with that pointer is pass it to can_reuse_delta(), which needs an object_id anyway (and called oidread() itself). So this patch also converts that function to take the object_id directly. Note that we did previously use NULL as a sentinel value when the object isn't a delta. We could probably get away with using the null oid for this, but instead we'll use an explicit boolean flag, which should make things more obvious for people reading the code later. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:52 -08:00
Jeff King	0763671b8e	nth_packed_object_oid(): use customary integer return Our nth_packed_object_sha1() function returns NULL for error. So when we wrapped it with nth_packed_object_oid(), we kept the same semantics. But it's a bit funny, because the caller actually passes in an out parameter, and the pointer we return is just that same struct they passed to us (or NULL). It's not too terrible, but it does make the interface a little non-idiomatic. Let's switch to our usual "0 for success, negative for error" return value. Most callers either don't check it, or are trivially converted. The one that requires the biggest change is actually improved, as we can ditch an extra aliased pointer variable. Since we are changing the interface in a subtle way that the compiler wouldn't catch, let's also change the name to catch any topics in flight. We can drop the 'o' and make it nth_packed_object_id(). That's slightly shorter, but also less redundant since the 'o' stands for "object" already. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:42 -08:00
brian m. carlson	207899137d	builtin/pack-objects: make hash agnostic Avoid hard-coding a hash size, instead preferring to use the_hash_algo. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 09:33:11 -08:00
Junio C Hamano	78e67cda42	Merge branch 'mt/use-passed-repo-more-in-funcs' Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument	2020-02-14 12:54:22 -08:00
Junio C Hamano	a14aebeac3	Merge branch 'jk/packfile-reuse-cleanup' The way "git pack-objects" reuses objects stored in existing pack to generate its result has been improved. * jk/packfile-reuse-cleanup: pack-bitmap: don't rely on bitmap_git->reuse_objects pack-objects: add checks for duplicate objects pack-objects: improve partial packfile reuse builtin/pack-objects: introduce obj_is_packed() pack-objects: introduce pack.allowPackReuse csum-file: introduce hashfile_total() pack-bitmap: simplify bitmap_has_oid_in_uninteresting() pack-bitmap: uninteresting oid can be outside bitmapped packfile pack-bitmap: introduce bitmap_walk_contains() ewah/bitmap: introduce bitmap_word_alloc() packfile: expose get_delta_base() builtin/pack-objects: report reused packfile objects	2020-02-14 12:54:19 -08:00
Jeff King	3ab3185f99	pack-objects: support filters with bitmaps Just as rev-list recently learned to combine filters and bitmaps, let's do the same for pack-objects. The infrastructure is all there; we just need to pass along our filter options, and the pack-bitmap code will decide to use bitmaps or not. This unsurprisingly makes things faster for partial clones of large repositories (here we're cloning linux.git): Test HEAD^ HEAD ------------------------------------------------------------------------------ 5310.11: simulated partial clone 38.94(37.28+5.87) 11.06(11.27+4.07) -71.6% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	6663ae0a08	pack-bitmap: basic noop bitmap filter infrastructure Currently you can't use object filters with bitmaps, but we plan to support at least some filters with bitmaps. Let's introduce some infrastructure that will help us do that: - prepare_bitmap_walk() now accepts a list_objects_filter_options parameter (which can be NULL for no filtering; all the current callers pass this) - we'll bail early if the filter is incompatible with bitmaps (just as we would if there were no bitmaps at all). Currently all filters are incompatible. - we'll filter the resulting bitmap; since there are no supported filters yet, this is always a noop. There should be no behavior change yet, but we'll support some actual filters in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Jeff King	4eb707ebd6	rev-list: allow commit-only bitmap traversals Ever since we added reachability bitmap support, we've been able to use it with rev-list to get the full list of objects, like: git rev-list --objects --use-bitmap-index --all But you can't do so without --objects, since we weren't ready to just show the commits. However, the internals of the bitmap code are mostly ready for this: they avoid opening up trees when walking to fill in the bitmaps. We just need to actually pass in the rev_info to traverse_bitmap_commit_list() so it knows which types to bother triggering our callback for. For completeness, the perf test now covers both the existing --objects case, as well as the new commits-only behavior (the objects one got way faster when we introduced bitmaps, but obviously isn't improved now). Here are numbers for linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 8.29(8.10+0.19) 1.76(1.72+0.04) -78.8% 5310.8: rev-list (objects) 8.06(7.94+0.12) 8.14(7.94+0.13) +1.0% That run was cheating a little, as I didn't have any commit-graph in the repository, and we'd built it by default these days when running git-gc. Here are numbers with a commit-graph: Test HEAD^ HEAD ------------------------------------------------------------------------ 5310.7: rev-list (commits) 0.70(0.58+0.12) 0.51(0.46+0.04) -27.1% 5310.8: rev-list (objects) 6.20(6.09+0.10) 6.27(6.16+0.11) +1.1% Still an improvement, but a lot less impressive. We could have the perf script remove any commit-graph to show the out-sized effect, but it probably makes sense to leave it in what would be a more typical setup. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-14 10:46:22 -08:00
Matheus Tavares	c8123e72f6	streaming: allow open_istream() to handle any repo Some callers of open_istream() at archive-tar.c and archive-zip.c are capable of working on arbitrary repositories but the repo struct is not passed down to open_istream(), which uses the_repository internally. For now, that's not a problem since the said callers are only being called with the_repository. But to be consistent and avoid future problems, let's allow open_istream() to receive a struct repository and use that instead of the_repository. This parameter addition will also be used in a future patch to make sha1-file.c:check_object_signature() be able to work on arbitrary repos. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Jeff King	92fb0db94c	pack-objects: add checks for duplicate objects Additional checks are added in have_duplicate_entry() and obj_is_packed() to avoid duplicate objects in the reuse bitmap. It was probably buggy to not have such a check before. Git as a client would never both asks for a tag by sha1 and specify "include-tag", but libgit2 will, so a libgit2 client cloning from a Git server would trigger the bug. If a client both asks for a tag by sha1 and specifies "include-tag", we may end up including the tag in the reuse bitmap (due to the first thing), and then later adding it to the packlist (due to the second). This results in duplicate objects in the pack, which git chokes on. We should notice that we are already including it when doing the include-tag portion, and avoid adding it to the packlist. The simplest place to fix this is right in add_ref_tag(), where we could avoid peeling the tag at all if we know that we are already including it. However, this pushes the check instead into have_duplicate_entry(). This fixes not only this case, but also means that we cannot have any similar problems lurking in other code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	bb514de356	pack-objects: improve partial packfile reuse The old code to reuse deltas from an existing packfile just tried to dump a whole segment of the pack verbatim. That's faster than the traditional way of actually adding objects to the packing list, but it didn't kick in very often. This new code is really going for a middle ground: do _some_ per-object work, but way less than we'd traditionally do. The general strategy of the new code is to make a bitmap of objects from the packfile we'll include, and then iterate over it, writing out each object exactly as it is in our on-disk pack, but _not_ adding it to our packlist (which costs memory, and increases the search space for deltas). One complication is that if we're omitting some objects, we can't set a delta against a base that we're not sending. So we have to check each object in try_partial_reuse() to make sure we have its delta. About performance, in the worst case we might have interleaved objects that we are sending or not sending, and we'd have as many chunks as objects. But in practice we send big chunks. For instance, packing torvalds/linux on GitHub servers now reused 6.5M objects, but only needed ~50k chunks. Helped-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	ff483026a9	builtin/pack-objects: introduce obj_is_packed() Let's refactor the way we check if an object is packed by introducing obj_is_packed(). This function is now a simple wrapper around packlist_find(), but it will evolve in a following commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Jeff King	e704fc7978	pack-objects: introduce pack.allowPackReuse Let's make it possible to configure if we want pack reuse or not. The main reason it might not be wanted is probably debugging and performance testing, though pack reuse _might_ cause larger packs, because we wouldn't consider the reused objects as bases for finding new deltas. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-23 10:51:50 -08:00
Elijah Newren	15beaaa3d1	Fix spelling errors in code comments Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-10 16:00:54 +09:00
Junio C Hamano	d8ce144e11	Merge branch 'jk/misc-uninitialized-fixes' Various fixes to codepaths gcc 9 had trouble following dataflow. * jk/misc-uninitialized-fixes: pack-objects: drop packlist index_pos optimization test-read-cache: drop namelen variable diff-delta: set size out-parameter to 0 for NULL delta bulk-checkin: zero-initialize hashfile_checkpoint pack-objects: use object_id in packlist_alloc() git-am: handle missing "author" when parsing commit	2019-09-30 13:19:30 +09:00
Junio C Hamano	128666753b	Merge branch 'jk/drop-release-pack-memory' xmalloc() used to have a mechanism to ditch memory and address space resources as the last resort upon seeing an allocation failure from the underlying malloc(), which made the code complex and thread-unsafe with dubious benefit, as major memory resource users already do limit their uses with various other mechanisms. It has been simplified away. * jk/drop-release-pack-memory: packfile: drop release_pack_memory()	2019-09-18 11:50:08 -07:00
Jeff King	bab28d9f97	builtin/pack-objects: report reused packfile objects To see when packfile reuse kicks in or not, it is useful to show reused packfile objects statistics in the output of upload-pack. Helped-by: James Ramsay <james@jramsay.com.au> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-13 14:40:33 -07:00
Junio C Hamano	f4f8dfe127	Merge branch 'ds/feature-macros' A mechanism to affect the default setting for a (related) group of configuration variables is introduced. * ds/feature-macros: repo-settings: create feature.experimental setting repo-settings: create feature.manyFiles setting repo-settings: parse core.untrackedCache commit-graph: turn on commit-graph by default t6501: use 'git gc' in quiet mode repo-settings: consolidate some config settings	2019-09-09 12:26:36 -07:00
Jeff King	3a37876b5d	pack-objects: drop packlist index_pos optimization Once upon a time, the code to add an object to our packing list in pack-objects all lived in a single function. It computed the position within the hash table once, then used it to check if the object was already present, and if not, to add it. Later, in `2834bc27c1` (pack-objects: refactor the packing list, 2013-10-24), this was split into two functions: packlist_find() and packlist_alloc(). We ended up with an "index_pos" variable that gets passed through several functions to make it from one to the other. The resulting code is rather confusing to follow. The "index_pos" variable is sometimes undefined, if we don't yet have a hash table. This works out in practice because in that case packlist_alloc() won't use it at all, since it will have to create/grow the hash table. But it's hard to verify that, and it does cause gcc 9.2.1's -Wmaybe-uninitialized to complain when compiled with "-flto -O3" (rightfully, since we do pass the uninitialized value as a function parameter, even if nobody ends up using it). All of this is to save computing the hash index again when we're inserting into the hash table, which I found doesn't make a measurable difference in the program runtime (which is not surprising, since we're doing all kinds of other heavyweight things for each object). Let's just drop this index_pos variable entirely, simplifying the code (and pleasing the compiler). We might be better still refactoring this custom hash table to use one of our existing implementations (an oidmap, or a kh_oid_map). I stopped short of that here, but this would be the likely first step towards that anyway. Reported-by: Stephan Beyer <s-beyer@gmx.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-06 11:03:42 -07:00
Jeff King	f1cbd033e2	pack-objects: use object_id in packlist_alloc() The only caller of packlist_alloc() already has a "struct object_id", and we immediately copy the hash they pass us into our own object_id. Let's avoid the unnecessary round-trip to a raw sha1 pointer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-06 11:03:39 -07:00
Derrick Stolee	7211b9e753	repo-settings: consolidate some config settings There are a few important config settings that are not loaded during git_default_config. These are instead loaded on-demand. Centralize these config options to a single scan, and store all of the values in a repo_settings struct. The values for each setting are initialized as negative to indicate "unset". This centralization will be particularly important in a later change to introduce "meta" config settings that change the defaults for these config settings. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-13 13:33:54 -07:00
Jeff King	9827d4c185	packfile: drop release_pack_memory() Long ago, in `97bfeb34df` (Release pack windows before reporting out of memory., 2006-12-24), we taught xmalloc() and friends to try unmapping pack windows when malloc() failed. It's unlikely that his helps a lot in practice, and it has some downsides. First, the downsides: 1. It makes xmalloc() not thread-safe. We've worked around this in pack-objects.c, which installs its own locking version of the try_to_free_routine(). But other threaded code doesn't. 2. It makes the system as a whole harder to reason about. Functions which allocate heap memory under the hood may have farther-reaching effects than expected. That might be worth the tradeoff if there's a benefit. But in practice, it seems unlikely. We're generally dealing with mmap'd files, so the OS is going to do a much better job at responding to memory pressure by dropping individual pages (the exception is systems with NO_MMAP, but even there the OS can probably respond just as well with swapping). So the only thing we're really freeing is address space. On 64-bit systems, we have plenty of that to go around. On 32-bit systems, it could possibly help. But around the same time we made two other changes: `77ccc5bbd1` (Introduce new config option for mmap limit., 2006-12-23) and `60bb8b1453` (Fully activate the sliding window pack access., 2006-12-23). Together that means that a 32-bit system should have no more than 256MB total of packed-git mmaps at one time, split between a few 32MB windows. It's unlikely we have any address space problems since then, but we don't have any data since the features were all added at the same time. Likewise, xmmap() will try to free memory. At first glance, it seems like we'd need this (when we try to mmap a new window, we might need to close an old one to save address space on a 32-bit system). But we're saved again by core.packedGitLimit: if we're going to exceed our 256MB limit, we'll close an existing window before we even call mmap(). So it seems unlikely that this feature is actually doing anything useful. And while we don't have reports of it harming anything (probably because it rarely if ever kicks in), it would be nice to simplify the system overall. This patch drops the whole try_to_free system from xmalloc(), as well as the manual pack memory release in xmmap(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-13 12:21:33 -07:00
Jeff King	25575015ca	repack: silence warnings when auto-enabled bitmaps cannot be built Depending on various config options, a full repack may not be able to build a reachability bitmap index (e.g., if pack.packSizeLimit forces us to write multiple packs). In these cases pack-objects may write a warning to stderr. Since `36eba0323d` (repack: enable bitmaps by default on bare repos, 2019-03-14), we may generate these warnings even when the user did not explicitly ask for bitmaps. This has two downsides: - it can be confusing, if they don't know what bitmaps are - a daemonized auto-gc will write this to its log file, and the presence of the warning may suppress further auto-gc (until gc.logExpiry has elapsed) Let's have repack communicate to pack-objects that the choice to turn on bitmaps was not made explicitly by the user, which in turn allows pack-objects to suppress these warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-31 13:15:51 -07:00
Junio C Hamano	1eb0a12ec3	Merge branch 'nd/tree-walk-with-repo' The tree-walk API learned to pass an in-core repository instance throughout more codepaths. * nd/tree-walk-with-repo: t7814: do not generate same commits in different repos Use the right 'struct repository' instead of the_repository match-trees.c: remove the_repo from shift_tree*() tree-walk.c: remove the_repo from get_tree_entry_follow_symlinks() tree-walk.c: remove the_repo from get_tree_entry() tree-walk.c: remove the_repo from fill_tree_descriptor() sha1-file.c: remove the_repo from read_object_with_reference()	2019-07-19 11:30:21 -07:00
Junio C Hamano	a7db4c193d	Merge branch 'jk/oidhash' Code clean-up to remove hardcoded SHA-1 hash from many places. * jk/oidhash: hashmap: convert sha1hash() to oidhash() hash.h: move object_id definition from cache.h khash: rename oid helper functions khash: drop sha1-specific map types pack-bitmap: convert khash_sha1 maps into kh_oid_map delta-islands: convert island_marks khash to use oids khash: rename kh_oid_t to kh_oid_set khash: drop broken oid_map typedef object: convert create_object() to use object_id object: convert internal hash_obj() to object_id object: convert lookup_object() to use object_id object: convert lookup_unknown_object() to use object_id pack-objects: convert locate_object_entry_hash() to object_id pack-objects: convert packlist_find() to use object_id pack-bitmap-write: convert some helpers to use object_id upload-pack: rename a "sha1" variable to "oid" describe: fix accidental oid/hash type-punning	2019-07-09 15:25:43 -07:00
Junio C Hamano	a4c8352e1e	Merge branch 'jk/delta-islands-progress-fix' The codepath to compute delta islands used to spew progress output without giving the callers any way to squelch it, which has been fixed. * jk/delta-islands-progress-fix: delta-islands: respect progress flag	2019-07-09 15:25:43 -07:00
Nguyễn Thái Ngọc Duy	d3b4705ab8	sha1-file.c: remove the_repo from read_object_with_reference() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-27 12:45:17 -07:00
Jeff King	bdbdf42f8a	delta-islands: respect progress flag The delta island code always prints "Marked %d islands", even if progress has been suppressed with --no-progress or by sending stderr to a non-tty. Let's pass a progress boolean to load_delta_islands(). We already do the same thing for the progress meter in resolve_tree_islands(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 13:29:49 -07:00
Jeff King	0ebbcf70e6	object: convert lookup_unknown_object() to use object_id There are no callers left of lookup_unknown_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 10:06:19 -07:00
Jeff King	3df28caefb	pack-objects: convert packlist_find() to use object_id We take a raw hash pointer, but most of our callers have a "struct object_id" already. Let's switch to taking the full struct, which will let us continue removing uses of raw sha1 buffers. There are two callers that do need special attention: - in rebuild_existing_bitmaps(), we need to switch to nth_packed_object_oid(). This incurs an extra hash copy over pointing straight to the mmap'd sha1, but it shouldn't be measurable compared to the rest of the operation. - in can_reuse_delta() we already spent the effort to copy the sha1 into a "struct object_id", but now we just have to do so a little earlier in the function (we can't easily convert that function's callers because they may be pointing at mmap'd REF_DELTA blocks). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 09:54:58 -07:00
Junio C Hamano	c0e78f7e46	Merge branch 'jk/unused-params-final-batch' * jk/unused-params-final-batch: verify-commit: simplify parameters to run_gpg_verify() show-branch: drop unused parameter from show_independent() rev-list: drop unused void pointer from finish_commit() remove_all_fetch_refspecs(): drop unused "remote" parameter receive-pack: drop unused "commands" from prepare_shallow_update() pack-objects: drop unused rev_info parameters name-rev: drop unused parameters from is_better_name() mktree: drop unused length parameter wt-status: drop unused status parameter read-cache: drop unused parameter from threaded load clone: drop dest parameter from copy_alternates() submodule: drop unused prefix parameter from some functions builtin: consistently pass cmd_* prefix to parse_options cmd_{read,write}_tree: rename "unused" variable that is used	2019-06-13 13:19:34 -07:00
Junio C Hamano	454b419729	Merge branch 'ds/midx-too-many-packs' The code to generate the multi-pack idx file was not prepared to see too many packfiles and ran out of open file descriptor, which has been corrected. * ds/midx-too-many-packs: midx: add packs to packed_git linked list midx: pass a repository pointer	2019-05-19 16:45:30 +09:00
Junio C Hamano	2bfb182bc5	Merge branch 'ew/repack-with-bitmaps-by-default' The connectivity bitmaps are created by default in bare repositories now; also the pathname hash-cache is created by default to avoid making crappy deltas when repacking. * ew/repack-with-bitmaps-by-default: pack-objects: default to writing bitmap hash-cache t5310: correctly remove bitmaps for jgit test repack: enable bitmaps by default on bare repos	2019-05-13 23:50:32 +09:00
Jeff King	7a2a721687	pack-objects: drop unused rev_info parameters When collecting the list of objects to pack in get_object_list(), we pass our rev_info struct around to some functions that don't need it. This is due to `03a9683d22` (Simplify is_kept_pack(), 2009-02-28), where the kept-pack handling was moved out of the revision machinery. Let's drop these unused parameters. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-13 14:22:54 +09:00
Junio C Hamano	3d67555744	Merge branch 'jk/pack-objects-reports-num-objects-to-trace2' The "git pack-objects" command learned to report the number of objects it packed via the trace2 mechanism. * jk/pack-objects-reports-num-objects-to-trace2: pack-objects: write objects packed to trace2	2019-05-09 00:37:23 +09:00
Derrick Stolee	64404a24cf	midx: pass a repository pointer Much of the multi-pack-index code focuses on the multi_pack_index struct, and so we only pass a pointer to the current one. However, we will insert a dependency on the packed_git linked list in a future change, so we will need a repository reference. Inserting these parameters is a significant enough change to split out. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-07 13:48:41 +09:00
Jonathan Tan	9ed8790282	pack-objects: write objects packed to trace2 This is useful when investigating performance of pushes, and other times when no progress information is written (because the pack is written to stdout). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-12 15:19:46 +09:00
brian m. carlson	3c7714485d	pack-bitmap: switch hash tables to use struct object_id Instead of storing unsigned char pointers in the hash tables, switch to storing instances of struct object_id. Update several internal functions and one external function to take pointers to struct object_id. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-04-01 11:57:37 +09:00
Jeff King	d4316604f8	pack-objects: default to writing bitmap hash-cache Enabling pack.writebitmaphashcache should always be a performance win. It costs only 4 bytes per object on disk, and the timings in `ae4f07fbcc` (pack-bitmap: implement optional name_hash cache, 2013-12-21) show it improving fetch and partial-bitmap clone times by 40-50%. The only reason we didn't enable it by default at the time is that early versions of JGit's bitmap reader complained about the presence of optional header bits it didn't understand. But that was changed in JGit's d2fa3987a (Use bitcheck to check for presence of OPT_FULL option, 2013-10-30), which made it into JGit v3.5.0 in late 2014. So let's turn this option on by default. It's backwards-compatible with all versions of Git, and if you are also using JGit on the same repository, you'd only run into problems using a version that's almost 5 years old. We'll drop the manual setting from all of our test scripts, including perf tests. This isn't strictly necessary, but it has two advantages: 1. If the hash-cache ever stops being enabled by default, our perf regression tests will notice. 2. We can use the modified perf tests to show off the behavior of an otherwise unconfigured repo, as shown below. These are the results of a few of a perf tests against linux.git that showed interesting results. You can see the expected speedup in 5310.4, which was noted in `ae4f07fbcc`. Curiously, 5310.8 did not improve (and actually got slower), despite seeing the opposite in `ae4f07fbcc`. I don't have an explanation for that. The tests from p5311 did not exist back then, but do show improvements (a smaller pack due to better deltas, which we found in less time). Test HEAD^ HEAD ------------------------------------------------------------------------------------- 5310.4: simulated fetch 7.39(22.70+0.25) 5.64(11.43+0.22) -23.7% 5310.8: clone (partial bitmap) 18.45(24.83+1.19) 19.94(28.40+1.36) +8.1% 5311.31: server (128 days) 0.41(1.13+0.05) 0.34(0.72+0.02) -17.1% 5311.32: size (128 days) 7.4M 7.0M -4.8% 5311.33: client (128 days) 1.33(1.49+0.06) 1.29(1.37+0.12) -3.0% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-18 14:11:15 +09:00
Derrick Stolee	ae417807b0	trace2:data: pack-objects: add trace2 regions When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-02-22 15:28:21 -08:00
Junio C Hamano	3a14fdec88	Merge branch 'sl/const' Code cleanup. * sl/const: various: tighten constness of some local variables	2019-02-06 22:05:28 -08:00
Junio C Hamano	cba595ab1a	Merge branch 'jk/loose-object-cache-oid' Code clean-up. * jk/loose-object-cache-oid: prefer "hash mismatch" to "sha1 mismatch" sha1-file: avoid "sha1 file" for generic use in messages sha1-file: prefer "loose object file" to "sha1 file" in messages sha1-file: drop has_sha1_file() convert has_sha1_file() callers to has_object_file() sha1-file: convert pass-through functions to object_id sha1-file: modernize loose header/stream functions sha1-file: modernize loose object file functions http: use struct object_id instead of bare sha1 update comment references to sha1_object_info() sha1-file: fix outdated sha1 comment references	2019-02-06 22:05:27 -08:00
Junio C Hamano	5fda343321	Merge branch 'ds/push-sparse-tree-walk' "git pack-objects" learned another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes. * ds/push-sparse-tree-walk: pack-objects: create GIT_TEST_PACK_SPARSE pack-objects: create pack.useSparse setting revision: implement sparse algorithm list-objects: consume sparse tree walk revision: add mark_tree_uninteresting_sparse	2019-02-06 22:05:25 -08:00
Junio C Hamano	7589e63648	Merge branch 'nd/the-index-final' The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument	2019-02-06 22:05:23 -08:00
Junio C Hamano	d243a323a5	Merge branch 'ph/pack-objects-mutex-fix' "git pack-objects" incorrectly used uninitialized mutex, which has been corrected. * ph/pack-objects-mutex-fix: pack-objects: merge read_lock and lock in packing_data struct pack-objects: move read mutex to packing_data struct	2019-02-05 14:26:16 -08:00
Shahzad Lone	33de80b1d5	various: tighten constness of some local variables Signed-off-by: Shahzad Lone <shahzadlone@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-02-04 09:57:10 -08:00
Junio C Hamano	371820d5f1	Merge branch 'bc/tree-walk-oid' The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use	2019-01-29 12:47:56 -08:00
Patrick Hogg	edb673cf10	pack-objects: merge read_lock and lock in packing_data struct Rename the packing_data lock to obd_lock and upgrade it to a recursive mutex to make it suitable for current read_lock usages. Additionally remove the superfluous #ifndef NO_PTHREADS guard around mutex initialization in prepare_packing_data as the mutex functions themselves are already protected. Signed-off-by: Patrick Hogg <phogg@novamoon.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-28 11:22:12 -08:00
Patrick Hogg	459307b139	pack-objects: move read mutex to packing_data struct `ac77d0c37` ("pack-objects: shrink size field in struct object_entry", 2018-04-14) added an extra usage of read_lock/read_unlock in the newly introduced oe_get_size_slow for thread safety in parallel calls to try_delta(). Unfortunately oe_get_size_slow is also used in serial code, some of which is called before the first invocation of ll_find_deltas. As such the read mutex is not guaranteed to be initialized. Resolve this by moving the read mutex to packing_data and initializing it in prepare_packing_data which is initialized in cmd_pack_objects. Signed-off-by: Patrick Hogg <phogg@novamoon.net> Reviewed-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-28 11:22:06 -08:00
Nguyễn Thái Ngọc Duy	f8adbec9fe	cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-24 11:55:06 -08:00

1 2 3 4 5 ...

471 Commits