git-commit-vandalism

Author	SHA1	Message	Date
Han Xin	a1bf5ca29f	unpack-objects: low memory footprint for get_data() in dry_run mode As the name implies, "get_data(size)" will allocate and return a given amount of memory. Allocating memory for a large blob object may cause the system to run out of memory. Before preparing to replace calling of "get_data()" to unpack large blob objects in latter commits, refactor "get_data()" to reduce memory footprint for dry_run mode. Because in dry_run mode, "get_data()" is only used to check the integrity of data, and the returned buffer is not used at all, we can allocate a smaller buffer and use it as zstream output. Make the function return NULL in the dry-run mode, as no callers use the returned buffer. The "find [...]objects/?? -type f \| wc -l" test idiom being used here is adapted from the same "find" use added to another test in `d9545c7f46` (fast-import: implement unpack limit, 2016-04-25). Suggested-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Han Xin <chiyutianyi@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-13 10:22:35 -07:00
Junio C Hamano	a50036da1a	Merge branch 'tb/cruft-packs' A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt	2022-06-03 14:30:37 -07:00
Junio C Hamano	28db3b7b71	Merge branch 'jx/l10n-workflow-change' A workflow change for translators are being proposed. * jx/l10n-workflow-change: l10n: Document the new l10n workflow Makefile: add "po-init" rule to initialize po/XX.po Makefile: add "po-update" rule to update po/XX.po po/git.pot: don't check in result of "make pot" po/git.pot: this is now a generated file Makefile: remove duplicate and unwanted files in FOUND_SOURCE_FILES i18n CI: stop allowing non-ASCII source messages in po/git.pot Makefile: have "make pot" not "reset --hard" Makefile: generate "po/git.pot" from stable LOCALIZED_C Makefile: sort source files before feeding to xgettext	2022-06-03 14:30:36 -07:00
Junio C Hamano	16a0e92ddc	Merge branch 'tb/geom-repack-with-keep-and-max' Teach "git repack --geometric" work better with "--keep-pack" and avoid corrupting the repository when packsize limit is used. * tb/geom-repack-with-keep-and-max: builtin/repack.c: ensure that `names` is sorted t7703: demonstrate object corruption with pack.packSizeLimit repack: respect --keep-pack with geometric repack	2022-06-03 14:30:36 -07:00
Junio C Hamano	c276c21da6	Merge branch 'ds/sparse-sparse-checkout' "sparse-checkout" learns to work well with the sparse-index feature. * ds/sparse-sparse-checkout: sparse-checkout: integrate with sparse index p2000: add test for 'git sparse-checkout [add\|set]' sparse-index: complete partial expansion sparse-index: partially expand directories sparse-checkout: --no-sparse-index needs a full index cache-tree: implement cache_tree_find_path() sparse-index: introduce partially-sparse indexes sparse-index: create expand_index() t1092: stress test 'git sparse-checkout set' t1092: refactor 'sparse-index contents' test	2022-06-03 14:30:35 -07:00
Junio C Hamano	091680472d	Merge branch 'tb/midx-race-in-pack-objects' The multi-pack-index code did not protect the packfile it is going to depend on from getting removed while in use, which has been corrected. * tb/midx-race-in-pack-objects: builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects builtin/pack-objects.c: ensure included `--stdin-packs` exist builtin/pack-objects.c: avoid redundant NULL check pack-bitmap.c: check preferred pack validity when opening MIDX bitmap	2022-06-03 14:30:35 -07:00
Junio C Hamano	b3b2ddced2	Merge branch 'ds/bundle-uri' Preliminary code refactoring around transport and bundle code. * ds/bundle-uri: bundle.h: make "fd" version of read_bundle_header() public remote: allow relative_url() to return an absolute url remote: move relative_url() http: make http_get_file() external fetch-pack: move --keep=* option filling to a function fetch-pack: add a deref_without_lazy_fetch_extended() dir API: add a generalized path_match_flags() function connect.c: refactor sending of agent & object-format	2022-06-03 14:30:34 -07:00
Junio C Hamano	83937e9592	Merge branch 'ns/batch-fsync' Introduce a filesystem-dependent mechanism to optimize the way the bits for many loose object files are ensured to hit the disk platter. * ns/batch-fsync: core.fsyncmethod: performance tests for batch mode t/perf: add iteration setup mechanism to perf-lib core.fsyncmethod: tests for batch mode test-lib-functions: add parsing helpers for ls-files and ls-tree core.fsync: use batch mode and sync loose objects by default on Windows unpack-objects: use the bulk-checkin infrastructure update-index: use the bulk-checkin infrastructure builtin/add: add ODB transaction around add_files_to_cache cache-tree: use ODB transaction around writing a tree core.fsyncmethod: batched disk flushes for loose-objects bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' bulk-checkin: rename 'state' variable and separate 'plugged' boolean	2022-06-03 14:30:34 -07:00
Junio C Hamano	377d347eb3	Merge branch 'en/sparse-cone-becomes-default' Deprecate non-cone mode of the sparse-checkout feature. * en/sparse-cone-becomes-default: Documentation: some sparsity wording clarifications git-sparse-checkout.txt: mark non-cone mode as deprecated git-sparse-checkout.txt: flesh out pattern set sections a bit git-sparse-checkout.txt: add a new EXAMPLES section git-sparse-checkout.txt: shuffle some sections and mark as internal git-sparse-checkout.txt: update docs for deprecation of 'init' git-sparse-checkout.txt: wording updates for the cone mode default sparse-checkout: make --cone the default tests: stop assuming --no-cone is the default mode for sparse-checkout	2022-06-03 14:30:33 -07:00
Junio C Hamano	1fc1879839	Merge branch 'js/use-builtin-add-i' "git add -i" was rewritten in C some time ago and has been in testing; the reimplementation is now exposed to general public by default. * js/use-builtin-add-i: add -i: default to the built-in implementation t2016: require the PERL prereq only when necessary	2022-05-30 23:24:03 -07:00
Taylor Blau	5b92477f89	builtin/gc.c: conditionally avoid pruning objects via loose Expose the new `git repack --cruft` mode from `git gc` via a new opt-in flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding unreachable objects as loose ones, and instead create a cruft pack and `.mtimes` file. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	ddee3703b3	builtin/repack.c: add cruft packs to MIDX during geometric repack When using cruft packs, the following race can occur when a geometric repack that writes a MIDX bitmap takes place afterwords: - First, create an unreachable object and do an all-into-one cruft repack which stores that object in the repository's cruft pack. - Then make that object reachable. - Finally, do a geometric repack and write a MIDX bitmap. Assuming that we are sufficiently unlucky as to select a commit from the MIDX which reaches that object for bitmapping, then the `git multi-pack-index` process will complain that that object is missing. The reason is because we don't include cruft packs in the MIDX when doing a geometric repack. Since the "make that object reachable" doesn't necessarily mean that we'll create a new copy of that object in one of the packs that will get rolled up as part of a geometric repack, it's possible that the MIDX won't see any copies of that now-reachable object. Of course, it's desirable to avoid including cruft packs in the MIDX because it causes the MIDX to store a bunch of objects which are likely to get thrown away. But excluding that pack does open us up to the above race. This patch demonstrates the bug, and resolves it by including cruft packs in the MIDX even when doing a geometric repack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	72263ffc32	builtin/repack.c: use named flags for existing_packs We use the `util` pointer for items in the `existing_packs` string list to indicate which packs are going to be deleted. Since that has so far been the only use of that `util` pointer, we just set it to 0 or 1. But we're going to add an additional state to this field in the next patch, so prepare for that by adding a #define for the first bit so we can more expressively inspect the flags state. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	4571324b99	builtin/repack.c: allow configuring cruft pack generation In servers which set the pack.window configuration to a large value, we can wind up spending quite a lot of time finding new bases when breaking delta chains between reachable and unreachable objects while generating a cruft pack. Introduce a handful of `repack.cruft*` configuration variables to control the parameters used by pack-objects when generating a cruft pack. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	f9825d1cf7	builtin/repack.c: support generating a cruft pack Expose a way to split the contents of a repository into a main and cruft pack when doing an all-into-one repack with `git repack --cruft -d`, and a complementary configuration variable. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	a7d493833f	builtin/pack-objects.c: --cruft with expiration In a previous patch, pack-objects learned how to generate a cruft pack so long as no objects are dropped. This patch teaches pack-objects to handle the case where a non-never `--cruft-expiration` value is passed. This case is slightly more complicated than before, because we want pack-objects to save unreachable objects which would have been pruned when there is another recent (i.e., non-prunable) unreachable object which reaches the other. We'll call these objects "unreachable but reachable-from-recent". Here is how pack-objects handles `--cruft-expiration`: - Instead of adding all objects outside of the kept pack(s) into the packing list, only handle the ones whose mtime is within the grace period. - Construct a reachability traversal whose tips are the unreachable-but-recent objects. - Then, walk along that traversal, stopping if we reach an object in the kept pack. At each step along the traversal, we add the object we are visiting to the packing list. In the majority of these cases, any object we visit in this traversal will already be in our packing list. But we will sometimes encounter reachable-from-recent cruft objects, which we want to retain even if they aged out of the grace period. The most subtle point of this process is that we actually don't need to bother to update the rescued object's mtime. Even though we will write an .mtimes file with a value that is older than the expiration window, it will continue to survive cruft repacks so long as any objects which reach it haven't aged out. That is, a future repack will also exclude that object from the initial packing list, only to discover it later on when doing the reachability traversal. Finally, stopping early once an object is found in a kept pack is safe to do because the kept packs ordinarily represent which packs will survive after repacking. Assuming that it _isn't_ safe to halt a traversal early would mean that there is some ancestor object which is missing, which implies repository corruption (i.e., the complete set of reachable objects isn't present). Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	2fb90409b8	reachable: add options to add_unseen_recent_objects_to_traversal This function behaves very similarly to what we will need in pack-objects in order to implement cruft packs with expiration. But it is lacking a couple of things. Namely, it needs: - a mechanism to communicate the timestamps of individual recent objects to some external caller - and, in the case of packed objects, our future caller will also want to know the originating pack, as well as the offset within that pack at which the object can be found - finally, it needs a way to skip over packs which are marked as kept in-core. To address the first two, add a callback interface in this patch which reports the time of each recent object, as well as a (packed_git, off_t) pair for packed objects. Likewise, add a new option to the packed object iterators to skip over packs which are marked as kept in core. This option will become implicitly tested in a future patch. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	b757353676	builtin/pack-objects.c: --cruft without expiration Teach `pack-objects` how to generate a cruft pack when no objects are dropped (i.e., `--cruft-expiration=never`). Later patches will teach `pack-objects` how to generate a cruft pack that prunes objects. When generating a cruft pack which does not prune objects, we want to collect all unreachable objects into a single pack (noting and updating their mtimes as we accumulate them). Ordinary use will pass the result of a `git repack -A` as a kept pack, so when this patch says "kept pack", readers should think "reachable objects". Generating a non-expiring cruft packs works as follows: - Callers provide a list of every pack they know about, and indicate which packs are about to be removed. - All packs which are going to be removed (we'll call these the redundant ones) are marked as kept in-core. Any packs the caller did not mention (but are known to the `pack-objects` process) are also marked as kept in-core. Packs not mentioned by the caller are assumed to be unknown to them, i.e., they entered the repository after the caller decided which packs should be kept and which should be discarded. Since we do not want to include objects in these "unknown" packs (because we don't know which of their objects are or aren't reachable), these are also marked as kept in-core. - Then, we enumerate all objects in the repository, and add them to our packing list if they do not appear in an in-core kept pack. This results in a new cruft pack which contains all known objects that aren't included in the kept packs. When the kept pack is the result of `git repack -A`, the resulting pack contains all unreachable objects. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	fa23090b0c	builtin/pack-objects.c: return from create_object_entry() A new caller in the next commit will want to immediately modify the object_entry structure created by create_object_entry(). Instead of forcing that caller to wastefully look-up the entry we just created, return it from create_object_entry() instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	1c573cdd72	pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' This structure will be used to communicate the per-object mtimes when writing a cruft pack. Here, we need the full packing_data structure because the mtime information is stored in an array there, not on the individual object_entry's themselves (to avoid paying the overhead in structure width for operations which do not generate a cruft pack). We haven't passed this information down before because one of the two callers (in bulk-checkin.c) does not have a packing_data structure at all. In that case (where no cruft pack will be generated), NULL is passed instead. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	94cd775a6c	pack-mtimes: support reading .mtimes files To store the individual mtimes of objects in a cruft pack, introduce a new `.mtimes` format that can optionally accompany a single pack in the repository. The format is defined in Documentation/technical/pack-format.txt, and stores a 4-byte network order timestamp for each object in name (index) order. This patch prepares for cruft packs by defining the `.mtimes` format, and introducing a basic API that callers can use to read out individual mtimes. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Junio C Hamano	2785b71ef9	Merge branch 'ac/remote-v-with-object-list-filters' "git remote -v" now shows the list-objects-filter used during fetching from the remote, if available. * ac/remote-v-with-object-list-filters: builtin/remote.c: teach `-v` to list filters for promisor remotes	2022-05-26 14:51:32 -07:00
Junio C Hamano	f49c478f62	Merge branch 'tk/simple-autosetupmerge' "git -c branch.autosetupmerge=simple branch $A $B" will set the $B as $A's upstream only when $A and $B shares the same name, and "git -c push.default=simple" on branch $A would push to update the branch $A at the remote $B came from. Also more places use the sole remote, if exists, before defaulting to 'origin'. * tk/simple-autosetupmerge: push: new config option "push.autoSetupRemote" supports "simple" push push: default to single remote even when not named origin branch: new autosetupmerge option 'simple' for matching branches	2022-05-26 14:51:30 -07:00
Ævar Arnfjörð Bjarmason	6dd9a91c32	i18n CI: stop allowing non-ASCII source messages in po/git.pot In the preceding commit we moved away from using xgettext(1) to both generate the po/git.pot, and to merge the incrementally generated po/git.pot+ file as we sourced translations from C, shell and Perl. Doing it this way, which dates back to my initial implementation[1][2][3] was conflating two things: With xgettext(1) the --from-code both controls what encoding is specified in the po/git.pot's header, and what encoding we allow in source messages. We don't ever want to allow non-ASCII in source messages, and doing so has hid e.g. a buggy message introduced in `a6226fd772` (submodule--helper: convert the bulk of cmd_add() to C, 2021-08-10) from us, we'd warn about it before, but only when running "make pot", but the operation would still succeed. Now we'll error out on it when running "make pot". Since the preceding Makefile changes made this easy: let's add a "make check-pot" target with the same prerequisites as the "po/git.pot" target, but without changing the file "po/git.pot". Running it as part of the "static-analysis" CI target will ensure that we catch any such issues in the future. E.g.: $ make check-pot XGETTEXT .build/pot/po/builtin/submodule--helper.c.po xgettext: Non-ASCII string at builtin/submodule--helper.c:3381. Please specify the source encoding through --from-code. make: *** [.build/pot/po/builtin/submodule--helper.c.po] Error 1 1. `cd5513a716` (i18n: Makefile: "pot" target to extract messages marked for translation, 2011-02-22) 2. `adc3b2b276` (Makefile: add xgettext target for *.sh files, 2011-05-14) 3. `5e9637c629` (i18n: add infrastructure for translating Git with gettext, 2011-11-18) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 10:30:28 -07:00
Junio C Hamano	3846c2a1ed	Merge branch 'tb/receive-pack-code-cleanup' Code clean-up. * tb/receive-pack-code-cleanup: builtin/receive-pack.c: remove redundant 'if'	2022-05-25 16:42:49 -07:00
Junio C Hamano	fa61b7703e	Merge branch 'jc/avoid-redundant-submodule-fetch' "git fetch --recurse-submodules" from multiple remotes (either from a remote group, or "--all") used to make one extra "git fetch" in the submodules, which has been corrected. * jc/avoid-redundant-submodule-fetch: fetch: do not run a redundant fetch from submodule	2022-05-25 16:42:49 -07:00
Junio C Hamano	5ed49a75f3	Merge branch 'os/fetch-check-not-current-branch' The way "git fetch" without "--update-head-ok" ensures that HEAD in no worktree points at any ref being updated was too wasteful, which has been optimized a bit. * os/fetch-check-not-current-branch: fetch: limit shared symref check only for local branches	2022-05-25 16:42:48 -07:00
Junio C Hamano	18254f14f2	Merge branch 'jc/show-branch-g-current' The "--current" option of "git show-branch" should have been made incompatible with the "--reflog" mode, but this was not enforced, which has been corrected. * jc/show-branch-g-current: show-branch: -g and --current are incompatible	2022-05-25 16:42:47 -07:00
Taylor Blau	4090511e40	builtin/pack-objects.c: ensure pack validity from MIDX bitmap objects When using a multi-pack bitmap, pack-objects will try to perform its traversal using a call to `traverse_bitmap_commit_list()`, which calls `add_object_entry_from_bitmap()` to add each object it finds to its packing list. This path can cause pack-objects to add objects from packs that don't have open pack_fds on them, by avoiding a call to `is_pack_valid()`. This is because we only call `is_pack_valid()` on the preferred pack (in order to do verbatim reuse via `reuse_partial_packfile_from_bitmap()`) and not others when loading a MIDX bitmap. In this case, `add_object_entry_from_bitmap()` will check whether it wants each object entry by calling `want_object_in_pack()`, which will call `want_found_object` (since its caller already supplied a `found_pack`). In most cases (particularly without `--local`, and when `ignored_packed_keep_on_disk` and `ignored_packed_keep_in_core` are both "0"), we'll take the entry from the pack contained in the MIDX bitmap, all without an open pack_fd. When we then try to use that entry later to assemble the actual pack, we'll be susceptible to any simultaneous writers moving that pack out of the way (e.g., due to a concurrent repack) without having an open file descriptor, causing races that result in errors like: remote: Enumerating objects: 1498802, done. remote: fatal: packfile ./objects/pack/pack-e57d433b5a588daa37fbe946e2b28dfaec03a93e.pack cannot be accessed remote: aborting due to possible repository corruption on the remote side. This race can happen even with multi-pack bitmaps, since we may open a MIDX bitmap that is being rewritten long before its packs are actually unlinked. Work around this by calling `is_pack_valid()` from within `want_found_object()`, matching the behavior in `want_object_in_pack_one()` (which has an analogous call). Most calls to `is_pack_valid()` should be basically no-ops, since only the first call requires us to open a file (subsequent calls realize the file is already open, and return immediately). Importantly, when `want_object_in_pack()` is given a non-NULL `found_pack`, but `want_found_object()` rejects the copy of the object in that pack, we must reset `found_pack` and `found_offset` to NULL and 0, respectively. Failing to do so could lead to other checks in `want_object_in_pack()` (such as `want_object_in_pack_one()`) using the same (invalid) pack as `found_pack`, meaning that we don't call `is_pack_valid()` because `p == *found_pack`. This can lead the caller to believe it can use a copy of an object from an invalid pack. An alternative approach to closing this race would have been to call `is_pack_valid()` on _all_ packs in a multi-pack bitmap on load. This has a couple of problems: - it is unnecessarily expensive in the cases where we don't actually need to open any packs (e.g., in `git rev-list --use-bitmap-index --count`) - more importantly, it means any time we would have hit this race, we'll avoid using bitmaps altogether, leading to significant slowdowns by forcing a full object traversal Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-24 14:27:20 -07:00
Taylor Blau	5045759de8	builtin/pack-objects.c: ensure included `--stdin-packs` exist A subsequent patch will teach `want_object_in_pack()` to set its `found_pack` and `found_offset` poitners to NULL when the provided pack does not pass the `is_pack_valid()` check. The `--stdin-packs` mode of `pack-objects` is not quite prepared to handle this. To prepare it for this change, do the following two things: - Ensure provided packs pass the `is_pack_valid()` check when collecting the caller-provided packs into the "included" and "excluded" lists. - Gracefully handle any _invalid_ packs being passed to `want_object_in_pack()`. Calling `is_pack_valid()` early on makes it substantially less likely that we will have to deal with a pack going away, since we'll have an open file descriptor on its contents much earlier. But even packs with open descriptors can become invalid in the future if we (a) hit our open descriptor limit, forcing us to close some open packs, and (b) one of those just-closed packs has gone away in the meantime. `add_object_entry_from_pack()` depends on having a non-NULL `*found_pack`, since it passes that pointer to `packed_object_info()`, meaning that we would SEGV if the pointer became NULL (like we propose to do in `want_object_in_pack()` in the following patch). But avoiding calling `packed_object_info()` entirely is OK, too, since its only purpose is to identify which objects in the included packs are commits, so that they can form the tips of the advisory traversal used to discover the object namehashes. Failing to do this means that at worst we will produce lower-quality deltas, but it does not prevent us from generating the pack as long as we can find a copy of each object from the disappearing pack in some other part of the repository. Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-24 14:27:19 -07:00
Taylor Blau	58a6abb7ba	builtin/pack-objects.c: avoid redundant NULL check Before calling `for_each_object_in_pack()`, the caller `read_packs_list_from_stdin()` loops through each of the `include_packs` and checks that its `->util` pointer (which is used to store the `struct packed_git *` itself) is non-NULL. This check is redundant, because `read_packs_list_from_stdin()` already checks that the included packs are non-NULL earlier on in the same function (and it does not add any new entries in between). Remove this check, since it is not doing anything in the meantime. Co-authored-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-24 14:27:19 -07:00
Junio C Hamano	ea78f9ee7a	Merge branch 'ab/commit-plug-leaks' Leakfix in the top-level called-once function. * ab/commit-plug-leaks: commit: fix "author_ident" leak	2022-05-23 14:39:54 -07:00
Derrick Stolee	598b1e7d09	sparse-checkout: integrate with sparse index When modifying the sparse-checkout definition, the sparse-checkout builtin calls update_sparsity() to modify the SKIP_WORKTREE bits of all cache entries in the index. Before, we needed the index to be fully expanded in order to ensure we had the full list of files necessary that match the new patterns. Insert a call to reset_sparse_directories() that expands sparse directories that are within the new pattern list, but only far enough that every necessary file path now exists as a cache entry. The remaining logic within update_sparsity() will modify the SKIP_WORKTREE bits appropriately. This allows us to disable command_requires_full_index within the sparse-checkout builtin. Add tests that demonstrate that we are not expanding to a full index unnecessarily. We can see the improved performance in the p2000 test script: Test HEAD~1 HEAD ------------------------------------------------------------------------ 2000.24: git ... (sparse-v3) 2.14(1.55+0.58) 1.57(1.03+0.53) -26.6% 2000.25: git ... (sparse-v4) 2.20(1.62+0.57) 1.58(0.98+0.59) -28.2% These reductions of 26-28% are small compared to most examples, but the time is dominated by writing a new copy of the base repository to the worktree and then deleting it again. The fact that the previous index expansion was such a large portion of the time is telling how important it is to complete this sparse index integration. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:22 -07:00
Derrick Stolee	2d443389fd	sparse-checkout: --no-sparse-index needs a full index When the --no-sparse-index option is supplied, the sparse-checkout builtin should explicitly ask to expand a sparse index to a full one. This is currently done implicitly due to the command_requires_full_index protection, but that will be removed in an upcoming change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Derrick Stolee	9fadb373dd	sparse-index: introduce partially-sparse indexes A future change will present a temporary, in-memory mode where the index can both contain sparse directory entries but also not be completely collapsed to the smallest possible sparse directories. This will be necessary for modifying the sparse-checkout definition while using a sparse index. For now, convert the single-bit member 'sparse_index' in 'struct index_state' to be a an 'enum sparse_index_mode' with three modes: * INDEX_EXPANDED (0): No sparse directories exist. This is always the case for repositories that do not use cone-mode sparse-checkout. * INDEX_COLLAPSED: Sparse directories may exist. Files outside the sparse-checkout cone are reduced to sparse directory entries whenever possible. * INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file entries outside the sparse-checkout cone may exist. Running convert_to_sparse() may further reduce those files to sparse directory entries. The main reason to store this extra information is to allow convert_to_sparse() to short-circuit when the index is already in INDEX_EXPANDED mode but to actually do the necessary work when in INDEX_PARTIALLY_SPARSE mode. The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-23 11:08:21 -07:00
Junio C Hamano	538dc459a0	Merge branch 'ep/maint-equals-null-cocci' Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-20 15:26:59 -07:00
Junio C Hamano	acdeb10f91	Merge branch 'ds/sparse-colon-path' "git show :<path>" learned to work better with the sparse-index feature. * ds/sparse-colon-path: rev-parse: integrate with sparse index object-name: diagnose trees in index properly object-name: reject trees found in the index show: integrate with the sparse index t1092: add compatibility tests for 'git show'	2022-05-20 15:26:58 -07:00
Junio C Hamano	5a9253cd45	Merge branch 'vd/sparse-stash' Teach "git stash" to work better with sparse index entries. * vd/sparse-stash: unpack-trees: preserve index sparsity stash: apply stash using 'merge_ort_nonrecursive()' read-cache: set sparsity when index is new sparse-index: expose 'is_sparse_index_allowed()' stash: integrate with sparse index stash: expand sparse-checkout compatibility testing	2022-05-20 15:26:58 -07:00
Junio C Hamano	945b9f2c31	Merge branch 'cd/bisect-messages-from-pre-flight-states' "git bisect" was too silent before it is ready to start computing the actual bisection, which has been corrected. * cd/bisect-messages-from-pre-flight-states: bisect: output bisect setup status in bisect log bisect: output state before we are ready to compute bisection	2022-05-20 15:26:58 -07:00
Junio C Hamano	ed54e1b31a	Merge branch 'gc/pull-recurse-submodules' "git pull" without "--recurse-submodules=<arg>" made submodule.recurse take precedence over fetch.recurseSubmodules by mistake, which has been corrected. * gc/pull-recurse-submodules: pull: do not let submodule.recurse override fetch.recurseSubmodules	2022-05-20 15:26:57 -07:00
Junio C Hamano	87d6bec2c8	Merge branch 'gf/unused-includes' Remove unused includes. * gf/unused-includes: apply.c: remove unnecessary include serve.c: remove unnecessary include	2022-05-20 15:26:53 -07:00
Taylor Blau	66731ff921	builtin/repack.c: ensure that `names` is sorted The previous patch demonstrates a scenario where the list of packs written by `pack-objects` (and stored in the `names` string_list) is out-of-order, and can thus cause us to delete packs we shouldn't. This patch resolves that bug by ensuring that `names` is sorted in all cases, not just when delete_redundant && pack_everything & ALL_INTO_ONE is true. Because we did sort `names` in that case (which, prior to `--geometric` repacks, was the only time we would actually delete packs, this is only a bug for `--geometric` repacks. It would be sufficient to only sort `names` when `delete_redundant` is set to a non-zero value. But sorting a small list of strings is cheap, and it is defensive against future calls to `string_list_has_string()` on this list. Co-discovered-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-20 13:54:44 -07:00
Victoria Dye	4b5a808bb9	repack: respect --keep-pack with geometric repack Update 'repack' to ignore packs named on the command line with the '--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat command line-kept packs the same way it treats packs with an on-disk '.keep' file (that is, skip the pack and do not include it in the 'geometry' structure). Without this handling, a '--keep-pack' pack would be included in the 'geometry' structure. If the pack is before the geometry split line (with at least one other pack and/or loose objects present), 'repack' assumes the pack's contents are "rolled up" into another pack via 'pack-objects'. However, because the internally-invoked 'pack-objects' properly excludes '--keep-pack' objects, any new pack it creates will not contain the kept objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since it assumes 'pack-objects' created a new pack with its contents), resulting in possible object loss and repository corruption. Add a test ensuring that '--keep-pack' packs are now appropriately handled. Co-authored-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Victoria Dye <vdye@github.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-20 12:56:29 -07:00
Taylor Blau	af845a604d	builtin/receive-pack.c: remove redundant 'if' In `c7c4bdeccf` (run-command API: remove "env" member, always use "env_array", 2021-11-25), there was a push to replace cld.env = env->v; with strvec_pushv(&cld.env_array, env->v); The conversion in `c7c4bdeccf` was mostly plug-and-play, with the snag that some instances of strvec_pushv() became guarded with a NULL check to ensure that the second argument was non-NULL. This conversion was slightly over-eager to add a conditional in builtin/receive-pack.c::unpack(), since we know at the point that we add the result of `tmp_objdir_env()` into the child process's environment, that `tmp_objdir` is non-NULL. This follows from the conditional just before our strvec_pushv() call (which returns from the function if `tmp_objdir` was NULL), as well as the call to tmp_objdir_add_as_alternate() just below, which relies on its argument (`tmp_objdir`) being non-NULL. In the meantime, this extra conditional isn't hurting anything. But it is redundant and thus unnecessarily confusing. So let's remove it. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-18 13:58:39 -07:00
Junio C Hamano	0353c68818	fetch: do not run a redundant fetch from submodule When `7dce19d3` (fetch/pull: Add the --recurse-submodules option, 2010-11-12) introduced the "--recurse-submodule" option, the approach taken was to perform fetches in submodules only once, after all the main fetching (it may usually be a fetch from a single remote, but it could be fetching from a group of remotes using fetch_multiple()) succeeded. Later we added "--all" to fetch from all defined remotes, which complicated things even more. If your project has a submodule, and you try to run "git fetch --recurse-submodule --all", you'd see a fetch for the top-level, which invokes another fetch for the submodule, followed by another fetch for the same submodule. All but the last fetch for the submodule come from a "git fetch --recurse-submodules" subprocess that is spawned via the fetch_multiple() interface for the remotes, and the last fetch comes from the code at the end. Because recursive fetching from submodules is done in each fetch for the top-level in fetch_multiple(), the last fetch in the submodule is redundant. It only matters when fetch_one() interacts with a single remote at the top-level. While we are at it, there is one optimization that exists in dealing with a group of remote, but is missing when "--all" is used. In the former, when the group turns out to be a group of one, instead of spawning "git fetch" as a subprocess via the fetch_multiple() interface, we use the normal fetch_one() code path. Do the same when handing "--all", if it turns out that we have only one remote defined. Reviewed-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-18 09:08:57 -07:00
Derrick Stolee	1d04e719e7	remote: move relative_url() This method was initially written in `63e95beb0` (submodule: port resolve_relative_url from shell to C, 2016-05-15). As we will need similar functionality in the bundle URI feature, extract this to be available in remote.h. The code is almost exactly the same, except for the following trivial differences: * Fix whitespace and wrapping issues with the prototype and argument lists. * Let's call starts_with_dot_{,dot_}slash_native() instead of the functionally identical "starts_with_dot_{,dot_}slash()" wrappers "builtin/submodule--helper.c". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-16 15:02:10 -07:00
Ævar Arnfjörð Bjarmason	9fd512c8d6	dir API: add a generalized path_match_flags() function Add a path_match_flags() function and have the two sets of starts_with_dot_{,dot_}slash() functions added in `63e95beb08` (submodule: port resolve_relative_url from shell to C, 2016-04-15) and `a2b26ffb1a` (fsck: convert gitmodules url to URL passed to curl, 2020-04-18) be thin wrappers for it. As the latter of those notes the fsck version was copied from the initial builtin/submodule--helper.c version. Since the code added in `a2b26ffb1a` was doing really doing the same as win32_is_dir_sep() added in `1cadad6f65` (git clone <url> C:\cygwin\home\USER\repo' is working (again), 2018-12-15) let's move the latter to git-compat-util.h is a is_xplatform_dir_sep(). We can then call either it or the platform-specific is_dir_sep() from this new function. Let's likewise change code in various other places that was hardcoding checks for "'/' \|\| '\\'" with the new is_xplatform_dir_sep(). As can be seen in those callers some of them still concern themselves with ':' (Mac OS classic?), but let's leave the question of whether that should be consolidated for some other time. As we expect to make wider use of the "native" case in the future, define and use two starts_with_dot_{,dot_}slash_native() convenience wrappers. This makes the diff in builtin/submodule--helper.c much smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-16 15:02:09 -07:00
Orgad Shaneh	f7400da800	fetch: limit shared symref check only for local branches This check was introduced in `8ee5d73137` (Fix fetch/pull when run without --update-head-ok, 2008-10-13) in order to protect against replacing the ref of the active branch by mistake, for example by running git fetch origin master:master. It was later extended in `8bc1f39f41` (fetch: protect branches checked out in all worktrees, 2021-12-01) to scan all worktrees. This operation is very expensive (takes about 30s in my repository) when there are many tags or branches, and it is executed on every fetch, even if no local heads are updated at all. Limit it to protect only refs/heads/* to improve fetch performance. Signed-off-by: Orgad Shaneh <orgads@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-16 10:58:01 -07:00
Junio C Hamano	00d8c31105	commit: fix "author_ident" leak Since `4c28e4ada0` (commit: die before asking to edit the log message, 2010-12-20), we have been "leaking" the "author_ident" when prepare_to_commit() fails. Instead of returning from right there, introduce an exit status variable and jump to the clean-up label at the end. Instead of explicitly releasing the resource with strbuf_release(), mark the variable with UNLEAK() at the end, together with two other variables that are already marked as such. If this were in a utility function that is called number of times, but these are different, we should explicitly release resources that grow proportionally to the size of the problem being solved, but cmd_commit() is like main() and there is no point in spending extra cycles to release individual pieces of resource at the end, just before process exit will clean everything for us for free anyway. This fixes a leak demonstrated by e.g. "t3505-cherry-pick-empty.sh", but unfortunately we cannot mark it or other affected tests as passing now with "TEST_PASSES_SANITIZE_LEAK=true" as we'll need to fix many other memory leaks before doing so. Incidentally there are two tests that always passes the leak checker with or without this change. Mark them as such. This is based on an earlier patch by Ævar, but takes a different approach that is more maintainable. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-12 15:51:32 -07:00
Glen Choo	5819417365	pull: do not let submodule.recurse override fetch.recurseSubmodules Fix a bug in "git pull" where `submodule.recurse` is preferred over `fetch.recurseSubmodules` when performing a fetch (Documentation/config/fetch.txt says that `fetch.recurseSubmodules` should be preferred.). Do this by passing the value of the "--recurse-submodules" CLI option to the underlying fetch, instead of passing a value that combines the CLI option and config variables. In other words, this bug occurred because builtin/pull.c is conflating two similar-sounding, but different concepts: - Whether "git pull" itself should care about submodules e.g. whether it should update the submodule worktrees after performing a merge. - The value of "--recurse-submodules" to pass to the underlying "git fetch". Thus, when `submodule.recurse` is set, the underlying "git fetch" gets invoked with "--recurse-submodules[=value]", overriding the value of `fetch.recurseSubmodules`. An alternative (and more obvious) approach to fix the bug would be to teach "git pull" to understand `fetch.recurseSubmodules`, but the proposed solution works better because: - We don't maintain two identical config-parsing implementions in "git pull" and "git fetch". - It works better with other commands invoked by "git pull" e.g. "git merge" won't accidentally respect `fetch.recurseSubmodules`. Reported-by: Huang Zou <huang.zou@schrodinger.com> Helped-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-11 15:42:30 -07:00

1 2 3 4 5 ...

10281 Commits