git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	3a4d71f52f	Merge branch 'jt/fetch-pack-trace2-filter-spec' "git fetch" client logs the partial clone filter used in the trace2 output. * jt/fetch-pack-trace2-filter-spec: fetch-pack: write effective filter to trace2	2022-08-05 15:52:14 -07:00
Junio C Hamano	4e0d160bbc	Merge branch 'rs/mergesort' Make our mergesort implementation type-safe. * rs/mergesort: mergesort: remove llist_mergesort() packfile: use DEFINE_LIST_SORT fetch-pack: use DEFINE_LIST_SORT commit: use DEFINE_LIST_SORT blame: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT_DEBUG mergesort: add macros for typed sort of linked lists mergesort: tighten merge loop mergesort: unify ranks loops	2022-08-03 13:36:09 -07:00
Jonathan Tan	1007557a4e	fetch-pack: write effective filter to trace2 Administrators of a managed Git environment (like the one at $DAYJOB) might want to quantify the performance change of fetches with and without filters from the client's point of view, and also detect if a server does not support it. Therefore, log the filter information being sent to the server whenever a fetch (or clone) occurs. Note that this is not necessarily the same as what's specified on the CLI, because during a fetch, the configured filter is used whenever a filter is not specified on the CLI. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-26 23:45:01 -07:00
René Scharfe	6fc9fec07b	fetch-pack: use DEFINE_LIST_SORT Build a static typed ref sorting function using DEFINE_LIST_SORT along with a typed comparison function near its only two callers instead of having an exported version that calls llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of a slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 23231 389 0 113689 137309 2185d fetch-pack.o 29158 80 0 146864 176102 2afe6 remote.o With this patch: __TEXT __DATA __OBJC others dec hex 23591 389 0 117759 141739 229ab fetch-pack.o 29070 80 0 145718 174868 2ab14 remote.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:39 -07:00
Junio C Hamano	b3b2ddced2	Merge branch 'ds/bundle-uri' Preliminary code refactoring around transport and bundle code. * ds/bundle-uri: bundle.h: make "fd" version of read_bundle_header() public remote: allow relative_url() to return an absolute url remote: move relative_url() http: make http_get_file() external fetch-pack: move --keep=* option filling to a function fetch-pack: add a deref_without_lazy_fetch_extended() dir API: add a generalized path_match_flags() function connect.c: refactor sending of agent & object-format	2022-06-03 14:30:34 -07:00
Junio C Hamano	9cf4e0c8d2	Merge branch 'jt/fetch-peek-optional-section' "git fetch" unnecessarily failed when an unexpected optional section appeared in the output, which has been corrected. * jt/fetch-peek-optional-section: fetch-pack: make unexpected peek result non-fatal	2022-05-25 16:42:48 -07:00
Ævar Arnfjörð Bjarmason	1f6cf4508e	fetch-pack: move --keep=* option filling to a function Move the populating of the --keep=* option argument to "index-pack" to a static function, a subsequent commit will make use of it in another function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-16 15:02:09 -07:00
Ævar Arnfjörð Bjarmason	a6e65fb39c	fetch-pack: add a deref_without_lazy_fetch_extended() Add a version of the deref_without_lazy_fetch function which can be called with custom oi_flags and to grab information about the "object_type". This will be used for the bundle-uri client in a subsequent commit. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-16 15:02:09 -07:00
Jonathan Tan	7709acf7be	fetch-pack: make unexpected peek result non-fatal When a Git server responds to a fetch request, it may send optional sections before the packfile section. To handle this, the Git client calls packet_reader_peek() (see process_section_header()) in order to see what's next without consuming the line. However, as implemented, Git errors out whenever what's peeked is not an ordinary line. This is not only unexpected (here, we only need to know whether the upcoming line is the section header we want) but causes errors to include the name of a section header that is irrelevant to the cause of the error. For example, at $DAYJOB, we have seen "fatal: error reading section header 'shallow-info'" error messages when none of the repositories involved are shallow. Therefore, fix this so that the peek returns 1 if the upcoming line is the wanted section header and nothing else. Because of this change, reader->line may now be NULL later in the function, so update the error message printing code accordingly. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-16 09:11:12 -07:00
Robert Coup	4dfd0925cb	fetch-pack: add refetch Allow a "refetch" where the contents of the local object store are ignored and a full fetch is performed, not attempting to find or negotiate common commits with the remote. A key use case is to apply a new partial clone blob/tree filter and refetch all the associated matching content, which would otherwise not be transferred when the commit objects are already present locally. Signed-off-by: Robert Coup <robert@coup.net.nz> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-28 10:25:52 -07:00
Junio C Hamano	68fd3b35f7	Merge branch 'ps/fetch-optim-with-commit-graph' A couple of optimization to "git fetch". * ps/fetch-optim-with-commit-graph: fetch: skip computing output width when not printing anything fetch-pack: use commit-graph when computing cutoff	2022-02-23 16:58:03 -08:00
Junio C Hamano	c69e455bbc	Merge branch 'bs/forbid-i18n-of-protocol-token-in-fetch-pack' L10n support for a few error messages. * bs/forbid-i18n-of-protocol-token-in-fetch-pack: fetch-pack: parameterize message containing 'ready' keyword	2022-02-23 16:58:03 -08:00
Bagas Sanjaya	3d3c23b3a7	fetch-pack: parameterize message containing 'ready' keyword The protocol keyword 'ready' isn't meant for translation. Pass it as parameter instead of spell it in die() message (and potentially confuse translators). Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-11 14:37:09 -08:00
Patrick Steinhardt	6fd1cc8f98	fetch-pack: use commit-graph when computing cutoff During packfile negotiation we iterate over all refs announced by the remote side to check whether their IDs refer to commits already known to us. If a commit is known to us already, then its date is a potential cutoff point for commits we have in common with the remote side. There is potentially a lot of commits announced by the remote depending on how many refs there are in the remote repository, and for every one of them we need to search for it in our object database and, if found, parse the corresponding object to find out whether it is a candidate for the cutoff date. This can be sped up by trying to look up commits via the commit-graph first, which is a lot more efficient. Benchmarks in a repository with about 2,1 million refs and an up-to-date commit-graph show an almost 20% speedup when mirror-fetching: Benchmark 1: git fetch +refs/:refs/ (v2.35.0) Time (mean ± σ): 115.587 s ± 2.009 s [User: 109.874 s, System: 11.305 s] Range (min … max): 113.584 s … 118.820 s 5 runs Benchmark 2: git fetch +refs/:refs/ (HEAD) Time (mean ± σ): 96.859 s ± 0.624 s [User: 91.948 s, System: 10.980 s] Range (min … max): 96.180 s … 97.875 s 5 runs Summary 'git fetch +refs/:refs/ (HEAD)' ran 1.19 ± 0.02 times faster than 'git fetch +refs/:refs/ (v2.35.0)' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-10 09:59:38 -08:00
Jean-Noël Avila	6fa00ee843	i18n: factorize "--foo requires --bar" and the like They are all replaced by "the option '%s' requires '%s'", which is a new string but replaces 17 previous unique strings. Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Reviewed-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-05 13:31:00 -08:00
Junio C Hamano	353a27ad95	Merge branch 'jk/fetch-pack-avoid-sigpipe-to-index-pack' "git fetch", when received a bad packfile, can fail with SIGPIPE. This wasn't wrong per-se, but we now detect the situation and fail in a more predictable way. * jk/fetch-pack-avoid-sigpipe-to-index-pack: fetch-pack: ignore SIGPIPE when writing to index-pack	2021-12-10 14:35:12 -08:00
Jeff King	2a4aed42ec	fetch-pack: ignore SIGPIPE when writing to index-pack When fetching, we send the incoming pack to index-pack (or unpack-objects) via the sideband demuxer. If index-pack hits an error (e.g., because an object fails fsck), then it will die immediately. This may cause us to get SIGPIPE on the fetch, as we're still trying to write pack contents from the sideband demuxer (which is typically a thread, and thus takes down the whole fetch process). You can see this in action with: ./t5702-protocol-v2.sh --stress --run=59 which ends with (wrapped for readability): test_must_fail: died by signal 13: git -c protocol.version=2 \ -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \ clone http://127.0.0.1:5708/smart/http_parent http_child not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object This is mostly cosmetic. The actual error of interest (in this case, the object that failed the fsck check) comes from index-pack straight to stderr, so the user still sees it. They _might_ even see fetch-pack complaining about index-pack failing, because the main thread is racing with the sideband-demuxer. But they'll definitely see the signal death in the exit code, which is what the test is complaining about. We can make this more predictable by just ignoring SIGPIPE. The sideband demuxer uses write_or_die(), so it will notice and stop (gracefully, because we hook die_routine() to exit just the thread). And during this section we're not writing anywhere else where we'd be concerned about SIGPIPE preventing us from wasting effort writing to nowhere. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-19 14:22:16 -08:00
Ivan Frade	88e9b1e3fc	fetch-pack: redact packfile urls in traces In some setups, packfile uris act as bearer token. It is not recommended to expose them plainly in logs, although in special circunstances (e.g. debug) it makes sense to write them. Redact the packfile URL paths by default, unless the GIT_TRACE_REDACT variable is set to false. This mimics the redacting of the Authorization header in HTTP. Signed-off-by: Ivan Frade <ifrade@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-11 10:07:43 -08:00
Patrick Steinhardt	62b5a35a33	fetch-pack: optimize loading of refs via commit graph In order to negotiate a packfile, we need to dereference refs to see which commits we have in common with the remote. To do so, we first look up the object's type -- if it's a tag, we peel until we hit a non-tag object. If we hit a commit eventually, then we return that commit. In case the object ID points to a commit directly, we can avoid the initial lookup of the object type by opportunistically looking up the commit via the commit-graph, if available, which gives us a slight speed bump of about 2% in a huge repository with about 2.3M refs: Benchmark #1: HEAD~: git-fetch Time (mean ± σ): 31.634 s ± 0.258 s [User: 28.400 s, System: 5.090 s] Range (min … max): 31.280 s … 31.896 s 5 runs Benchmark #2: HEAD: git-fetch Time (mean ± σ): 31.129 s ± 0.543 s [User: 27.976 s, System: 5.056 s] Range (min … max): 30.172 s … 31.479 s 5 runs Summary 'HEAD: git-fetch' ran 1.02 ± 0.02 times faster than 'HEAD~: git-fetch' In case this fails, we fall back to the old code which peels the objects to a commit. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-01 12:43:56 -07:00
Patrick Steinhardt	9fec7b2130	connected: refactor iterator to return next object ID directly The object ID iterator used by the connectivity checks returns the next object ID via an out-parameter and then uses a return code to indicate whether an item was found. This is a bit roundabout: instead of a separate error code, we can just return the next object ID directly and use `NULL` pointers as indicator that the iterator got no items left. Furthermore, this avoids a copy of the object ID. Refactor the iterator and all its implementations to return object IDs directly. This brings a tiny performance improvement when doing a mirror-fetch of a repository with about 2.3M refs: Benchmark #1: 328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch Time (mean ± σ): 30.110 s ± 0.148 s [User: 27.161 s, System: 5.075 s] Range (min … max): 29.934 s … 30.406 s 10 runs Benchmark #2: 328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch Time (mean ± σ): 29.899 s ± 0.109 s [User: 26.916 s, System: 5.104 s] Range (min … max): 29.696 s … 29.996 s 10 runs Summary '328dc58b49919c43897240f2eabfa30be2ce32a4: git-fetch' ran 1.01 ± 0.01 times faster than '328dc58b49919c43897240f2eabfa30be2ce32a4~: git-fetch' While this 1% speedup could be labelled as statistically insignificant, the speedup is consistent on my machine. Furthermore, this is an end to end test, so it is expected that the improvement in the connectivity check itself is more significant. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-09-01 12:43:56 -07:00
Junio C Hamano	1b2be06e04	Merge branch 'ps/fetch-pack-load-refs-optim' Loading of ref tips to prepare for common ancestry negotiation in "git fetch-pack" has been optimized by taking advantage of the commit graph when available. * ps/fetch-pack-load-refs-optim: fetch-pack: speed up loading of refs via commit graph	2021-08-24 15:32:41 -07:00
Patrick Steinhardt	3e5e6c6e94	fetch-pack: speed up loading of refs via commit graph When doing reference negotiation, git-fetch-pack(1) is loading all refs from disk in order to determine which commits it has in common with the remote repository. This can be quite expensive in repositories with many references though: in a real-world repository with around 2.2 million refs, fetching a single commit by its ID takes around 44 seconds. Dominating the loading time is decompression and parsing of the objects which are referenced by commits. Given the fact that we only care about commits (or tags which can be peeled to one) in this context, there is thus an easy performance win by switching the parsing logic to make use of the commit graph in case we have one available. Like this, we avoid hitting the object database to parse these commits but instead only load them from the commit-graph. This results in a significant performance boost when executing git-fetch in said repository with 2.2 million refs: Benchmark #1: HEAD~: git fetch $remote $commit Time (mean ± σ): 44.168 s ± 0.341 s [User: 42.985 s, System: 1.106 s] Range (min … max): 43.565 s … 44.577 s 10 runs Benchmark #2: HEAD: git fetch $remote $commit Time (mean ± σ): 19.498 s ± 0.724 s [User: 18.751 s, System: 0.690 s] Range (min … max): 18.629 s … 20.454 s 10 runs Summary 'HEAD: git fetch $remote $commit' ran 2.27 ± 0.09 times faster than 'HEAD~: git fetch $remote $commit' Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-04 10:45:32 -07:00
Jeff King	ae1a7eefff	fetch-pack: signal v2 server that we are done making requests When fetching with the v0 protocol over ssh (or a local upload-pack with pipes), the server closes the connection as soon as it is finished sending the pack. So even though the client may still be operating on the data via index-pack (e.g., resolving deltas, checking connectivity, etc), the server has released all resources. With the v2 protocol, however, the server considers the ssh session only as a transport, with individual requests coming over it. After sending the pack, it goes back to its main loop, waiting for another request to come from the client. As a result, the ssh session hangs around until the client process ends, which may be much later (because resolving deltas, etc, may consume a lot of CPU). This is bad for two reasons: - it's consuming resources on the server to leave open a connection that won't see any more use - if something bad happens to the ssh connection in the meantime (say, it gets killed by the network because it's idle, as happened in a real-world report), then ssh will exit non-zero, and we'll propagate the error up the stack. The server is correct here not to hang up after serving the pack. The v2 protocol's design is meant to allow multiple requests like this, and hanging up would be the wrong thing for a hypothetical client which was planning to make more requests (though in practice, the git.git client never would, and I doubt any other implementations would either). The right thing is instead for the client to signal to the server that it's not interested in making more requests. We can do that by closing the pipe descriptor we use to write to ssh. This will propagate to the server upload-pack as an EOF when it tries to read the next request (and then it will close its half, and the whole connection will go away). It's important to do this "half duplex" shutdown, because we have to do it _before_ we actually receive the pack. This is an artifact of the way fetch-pack and index-pack (or unpack-objects) interact. We hand the connection off to index-pack (really, a sideband demuxer which feeds it), and then wait until it returns. And it doesn't do that until it has resolved all of the deltas in the pack, even though it was done reading from the server long before. So just closing the connection fully after index-pack returns would be too late; we'd have held it open much longer than was necessary. And teaching index-pack to close the connection is awkward. It's not even seeing the whole conversation (the sideband demuxer is, but it doesn't actually know what's in the packets, or when the end comes). Note that this close() is happening deep within the transport code. It's possible that a caller would want to perform other operations over the same ssh transport after receiving the pack. But as of the current code, none of the callers do, and there haven't been discussions of any plans to change this. If we need to support that later, we can probably do so by passing down a flag for "you're the last request on the transport; it's OK to close" instead of the code just assuming that's true. The description above all discusses v2 ssh, so it's worth thinking about how this interacts with other protocols: - in v0 protocols, we could do the same half-duplex shutdown (it just goes into the v0 do_fetch_pack() instead). This does work, but since it doesn't have the same persistence problem in the first place, there's little reason to change it at this point. - local fetches against git-upload-pack on the same machine will behave the same as ssh (they are talking over two pipes, and see EOF on their input pipe) - fetches against git-daemon will run this same code, and close one of the descriptors. In practice, this won't do anything, since there our two descriptors are dups of each other, and not part of a half-duplex pair. The right thing would probably be to call shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't face the same error-code problem (since it's just a TCP connection), so it's really only an optimization problem. And git:// is not that widely used these days, and has less impact on server resources than an ssh termination. - v2 http doesn't suffer from this problem in the first place, as our pipes terminate at a local git-remote-https, which is passing data along as individual requests via curl. Probably curl is keeping the TCP/TLS connection open for more requests, and we might be able to tell it manually "hey, we are done making requests now". But I think that's much less important. It again doesn't suffer from the error-code problem, and HTTP keepalive is pretty well understood (importantly, the timeouts can be set low, because clients like curl know how to reconnect for subsequent requests if necessary). So it's probably not worth figuring out how to tell curl that we're done (though if we do, this patch is probably the first step anyway; fetch-pack closes the pipe back to remote-https, which would be the signal that it should tell curl we're done). The code is pretty straightforward. We close the pipe at the right moment, and set it to -1 to mark it as invalid. I modified the later cleanup code to avoid calling close(-1). That's not strictly necessary, since close(-1) is a noop, but hopefully makes things a bit more obvious to a reader. I suspect that trying to call more transport functions after the close() (e.g., calling transport_fetch_refs() again) would fail, as it's not smart enough to realize we need to re-open the ssh connection. But that's already true when v0 is in use. And no current callers want to do that (and again, the solution is probably a flag in the transport code to keep things open, which can be added later). There's no test here, as the situation it covers is inherently racy (the question is when upload-pack exits, compared to when index-pack finishes resolving deltas and exits). The rather gross shell snippet below does recreate the problematic situation; when run on a sufficiently-large repository (git.git works fine), it kills an "idle" upload-pack while the client is resolving deltas, leading to a failed clone. ( git clone --no-local --progress . foo.git 2>&1 echo >&2 "clone exit code=$?" ) \| tr '\r' '\n' \| while read line do case "$done,$line" in ,Resolving*) echo "hit resolving deltas; killing upload-pack" killall -9 git-upload-pack done=t ;; esac done Reported-by: Greg Pflaum <greg.pflaum@pnp-hcl.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-20 07:38:40 +09:00
Jonathan Tan	9c1e657a8f	fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-05 10:41:29 +09:00
Jonathan Tan	6871d0cec6	fetch-pack: refactor command and capability write A subsequent commit will need this functionality independent of the rest of send_fetch_request(), so put this into its own function. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-08 21:50:22 -07:00
Jonathan Tan	57c3451b2e	fetch-pack: refactor add_haves() A subsequent commit will need part, but not all, of the functionality in add_haves(), so move some of its functionality to its sole caller send_fetch_request(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-08 21:50:21 -07:00
Jonathan Tan	8102570374	fetch-pack: refactor process_acks() A subsequent commit will need part, but not all, of the functionality in process_acks(), so move some of its functionality to its sole caller do_fetch_pack_v2(). As a side effect, the resulting code is also shorter. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-08 21:50:21 -07:00
Junio C Hamano	6db01a7308	Merge branch 'jt/fetch-pack-request-fix' into jt/push-negotiation * jt/fetch-pack-request-fix: fetch-pack: buffer object-format with other args	2021-04-08 21:50:10 -07:00
Jonathan Tan	81ed96a9b2	fetch-pack: buffer object-format with other args In send_fetch_request(), "object-format" is written directly to the file descriptor, as opposed to the other arguments, which are buffered. Buffer "object-format" as well. "object-format" must be buffered; in particular, it must appear after "command=fetch" in the request. This divergence was introduced in `4b831208bb` ("fetch-pack: parse and advertise the object-format capability", 2020-05-27), perhaps as an oversight (the surrounding code at the point of this commit has already been using a request buffer.) Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-08 21:49:47 -07:00
Junio C Hamano	22eee7f455	Merge branch 'll/clone-reject-shallow' "git clone --reject-shallow" option fails the clone as soon as we notice that we are cloning from a shallow repository. * ll/clone-reject-shallow: builtin/clone.c: add --reject-shallow option	2021-04-08 13:23:25 -07:00
Junio C Hamano	5644419d04	Merge branch 'ab/fsck-api-cleanup' Fsck API clean-up. * ab/fsck-api-cleanup: fetch-pack: use new fsck API to printing dangling submodules fetch-pack: use file-scope static struct for fsck_options fetch-pack: don't needlessly copy fsck_options fsck.c: move gitmodules_{found,done} into fsck_options fsck.c: add an fsck_set_msg_type() API that takes enums fsck.c: pass along the fsck_msg_id in the fsck_error callback fsck.[ch]: move FOREACH_FSCK_MSG_ID & fsck_msg_id from .c to .h fsck.c: give "FOREACH_MSG_ID" a more specific name fsck.c: undefine temporary STR macro after use fsck.c: call parse_msg_type() early in fsck_set_msg_type() fsck.h: re-order and re-assign "enum fsck_msg_type" fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum fsck.c: refactor fsck_msg_type() to limit scope of "int msg_type" fsck.c: rename remaining fsck_msg_id "id" to "msg_id" fsck.c: remove (mostly) redundant append_msg_id() function fsck.c: rename variables in fsck_set_msg_type() for less confusion fsck.h: use "enum object_type" instead of "int" fsck.h: use designed initializers for FSCK_OPTIONS_{DEFAULT,STRICT} fsck.c: refactor and rename common config callback	2021-04-07 16:54:09 -07:00
Li Linchao	4fe788b1b0	builtin/clone.c: add --reject-shallow option In some scenarios, users may want more history than the repository offered for cloning, which happens to be a shallow repository, can give them. But because users don't know it is a shallow repository until they download it to local, we may want to refuse to clone this kind of repository, without creating any unnecessary files. The '--depth=x' option cannot be used as a solution; the source may be deep enough to give us 'x' commits when cloned, but the user may later need to deepen the history to arbitrary depth. Teach '--reject-shallow' option to "git clone" to abort as soon as we find out that we are cloning from a shallow repository. Signed-off-by: Li Linchao <lilinchao@oschina.cn> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-01 12:58:58 -07:00
Ævar Arnfjörð Bjarmason	3745e2693d	fetch-pack: use new fsck API to printing dangling submodules Refactor the check added in `5476e1efde` (fetch-pack: print and use dangling .gitmodules, 2021-02-22) to make use of us now passing the "msg_id" to the user defined "error_func". We can now compare against the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated message. Let's also replace register_found_gitmodules() with directly manipulating the "gitmodules_found" member. A recent commit moved it into "fsck_options" so we could do this here. I'm sticking this callback in fsck.c. Perhaps in the future we'd like to accumulate such callbacks into another file (maybe fsck-cb.c, similar to parse-options-cb.c?), but while we've got just the one let's just put it into fsck.c. A better alternative in this case would be some library some more obvious library shared by fetch-pack.c ad builtin/index-pack.c, but there isn't such a thing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	c96e184cae	fetch-pack: use file-scope static struct for fsck_options Change code added in `5476e1efde` (fetch-pack: print and use dangling .gitmodules, 2021-02-22) so that we use a file-scoped "static struct fsck_options" instead of defining one in the "fsck_gitmodules_oids()" function. We use this pattern in all of builtin/{fsck,index-pack,mktag,unpack-objects}.c. It's odd to see fetch-pack be the odd one out. One might think that we're using other fsck_options structs in fetch-pack, or doing on fsck twice there, but we're not. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	c15087d17b	fsck.c: move gitmodules_{found,done} into fsck_options Move the gitmodules_{found,done} static variables added in `159e7b080b` (fsck: detect gitmodules files, 2018-05-02) into the fsck_options struct. It makes sense to keep all the context in the same place. This requires changing the recently added register_found_gitmodules() function added in `5476e1efde` (fetch-pack: print and use dangling .gitmodules, 2021-02-22) to take fsck_options. That function will be removed in a subsequent commit, but as it'll require the new gitmodules_found attribute of "fsck_options" we need this intermediate step first. An earlier version of this patch removed the small amount of duplication we now have between FSCK_OPTIONS_{DEFAULT,STRICT} with a FSCK_OPTIONS_COMMON macro. I don't think such de-duplication is worth it for this amount of copy/pasting. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Junio C Hamano	6c46f864e5	Merge branch 'jt/transfer-fsck-across-packs-fix' The code to fsck objects received across multiple packs during a single git fetch session has been broken when the packfile URI feature was in use. A workaround has been added by disabling the codepath to avoid keeping a packfile that is too small. * jt/transfer-fsck-across-packs-fix: fetch-pack: do not mix --pack_header and packfile uri	2021-03-08 16:04:47 -08:00
Jonathan Tan	2aec3bc4b6	fetch-pack: do not mix --pack_header and packfile uri When fetching (as opposed to cloning) from a repository with packfile URIs enabled, an error like this may occur: fatal: pack has bad object at offset 12: unknown object type 5 fatal: finish_http_pack_request gave result -1 fatal: fetch-pack: expected keep then TAB at start of http-fetch output This bug was introduced in `b664e9ffa1` ("fetch-pack: with packfile URIs, use index-pack arg", 2021-02-22), when the index-pack args used when processing the inline packfile of a fetch response and when processing packfile URIs were unified. This bug happens because fetch, by default, partially reads (and consumes) the header of the inline packfile to determine if it should store the downloaded objects as a packfile or loose objects, and thus passes --pack_header=<...> to index-pack to inform it that some bytes are missing. However, when it subsequently fetches the additional packfiles linked by URIs, it reuses the same index-pack arguments, thus wrongly passing --index-pack-arg=--pack_header=<...> when no bytes are missing. This does not happen when cloning because "git clone" always passes do_keep, which instructs the fetch mechanism to always retain the packfile, eliminating the need to read the header. There are a few ways to fix this, including filtering out pack_header arguments when downloading the additional packfiles, but I decided to stick to always using index-pack throughout when packfile URIs are present - thus, Git no longer needs to read the bytes, and no longer needs --pack_header here. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-05 15:04:09 -08:00
Junio C Hamano	6ee353d42f	Merge branch 'jt/transfer-fsck-across-packs' The approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example. Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process. * jt/transfer-fsck-across-packs: fetch-pack: print and use dangling .gitmodules fetch-pack: with packfile URIs, use index-pack arg http-fetch: allow custom index-pack args http: allow custom index-pack args	2021-03-01 14:02:57 -08:00
Jonathan Tan	5476e1efde	fetch-pack: print and use dangling .gitmodules Teach index-pack to print dangling .gitmodules links after its "keep" or "pack" line instead of declaring an error, and teach fetch-pack to check such lines printed. This allows the tree side of the .gitmodules link to be in one packfile and the blob side to be in another without failing the fsck check, because it is now fetch-pack which checks such objects after all packfiles have been downloaded and indexed (and not index-pack on an individual packfile, as it is before this commit). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Jonathan Tan	b664e9ffa1	fetch-pack: with packfile URIs, use index-pack arg Unify the index-pack arguments used when processing the inline pack and when downloading packfiles referenced by URIs. This is done by teaching get_pack() to also store the index-pack arguments whenever at least one packfile URI is given, and then when processing the packfile URI(s), using the stored arguments. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Jonathan Tan	27e35ba6c6	http-fetch: allow custom index-pack args This is the next step in teaching fetch-pack to pass its index-pack arguments when processing packfiles referenced by URIs. The "--keep" in fetch-pack.c will be replaced with a full message in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-22 12:07:40 -08:00
Christian Couder	33add2ad7d	fetch-pack: refactor writing promisor file Let's replace the 2 different pieces of code that write a promisor file in 'builtin/repack.c' and 'fetch-pack.c' with a new function called 'write_promisor_file()' in 'pack-write.c' and 'pack.h'. This might also help us in the future, if we want to put back the ref names and associated hashes that were in the promisor files we are repacking in 'builtin/repack.c' as suggested by a NEEDSWORK comment just above the code we are refactoring. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 16:01:07 -08:00
Christian Couder	9d7fa3be31	fetch-pack: rename helper to create_promisor_file() As we are going to refactor the code that actually writes the promisor file into a separate function in a following commit, let's rename the current write_promisor_file() function to create_promisor_file(). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 16:01:07 -08:00
Junio C Hamano	eae47db865	Merge branch 'rs/fetch-pack-invalid-lockfile' "fetch-pack" could pass NULL pointer to unlink(2) when it sees an invalid filename; the error checking has been tightened to make this impossible. * rs/fetch-pack-invalid-lockfile: fetch-pack: disregard invalid pack lockfiles	2020-12-08 15:11:20 -08:00
René Scharfe	6031af387e	fetch-pack: disregard invalid pack lockfiles `9da69a6539` (fetch-pack: support more than one pack lockfile, 2020-06-10) started to use a string_list for pack lockfile names instead of a single string pointer. It removed a NULL check from transport_unlock_pack() as well, which is the function that eventually deletes these lockfiles and releases their name strings. index_pack_lockfile() can return NULL if it doesn't like the contents it reads from the file descriptor passed to it. unlink(2) is declared to not accept NULL pointers (at least with glibc). Undefined Behavior Sanitizer together with Address Sanitizer detects a case where a NULL lockfile name is passed to unlink(2) by transport_unlock_pack() in t1060 (make SANITIZE=address,undefined; cd t; ./t1060-object-corruption.sh). Reinstate the NULL check to avoid undefined behavior, but put it right at the source, so that the number of items in the string_list reflects the number of valid lockfiles. Signed-off-by: René Scharfe <l.s.r@web.de> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-30 14:35:00 -08:00
Josh Steadmon	1e905bbc00	fetch-pack: advertise session ID in capabilities When the server sent a session-id capability and transfer.advertiseSID is true, advertise fetch-pack's own session ID back to the server. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-11 18:26:53 -08:00
Junio C Hamano	b4100f366c	Merge branch 'jt/lazy-fetch' Updates to on-demand fetching code in lazily cloned repositories. * jt/lazy-fetch: fetch: no FETCH_HEAD display if --no-write-fetch-head fetch-pack: remove no_dependents code promisor-remote: lazy-fetch objects in subprocess fetch-pack: do not lazy-fetch during ref iteration fetch: only populate existing_refs if needed fetch: avoid reading submodule config until needed fetch: allow refspecs specified through stdin negotiator/noop: add noop fetch negotiator	2020-09-03 12:37:04 -07:00
Junio C Hamano	bdccf5e086	Merge branch 'jt/fetch-pack-loosen-validation-with-packfile-uri' Bugfix for "git fetch" when the packfile URI capability is in use. * jt/fetch-pack-loosen-validation-with-packfile-uri: fetch-pack: make packfile URIs work with transfer.fsckobjects fetch-pack: document only_packfile in get_pack() (various): document from_promisor parameter	2020-09-03 12:37:01 -07:00
Jonathan Tan	0bd96bea2f	fetch-pack: make packfile URIs work with transfer.fsckobjects When fetching with packfile URIs and transfer.fsckobjects=1, use the --fsck-objects instead of the --strict flag when invoking index-pack so that links are not checked, only objects. This is because incomplete links are expected. (A subsequent connectivity check will be done when all the packs have been downloaded regardless of whether transfer.fsckobjects is set.) This is similar to `98a2ea46c2` ("fetch-pack: do not check links for partial fetch", 2018-03-15), but for packfile URIs instead of partial clones. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-24 17:34:24 -07:00

1 2 3 4 5 ...

380 Commits