The strbuf_getline() interface allows a byte other than LF or NUL as
the line terminator, but this is only because I wrote these
codepaths anticipating that there might be a value other than NUL
and LF that could be useful when I introduced line_termination long
time ago. No useful caller that uses other value has emerged.
By now, it is clear that the interface is overly broad without a
good reason. Many codepaths have hardcoded preference to read
either LF terminated or NUL terminated records from their input, and
then call strbuf_getline() with LF or NUL as the third parameter.
This step introduces two thin wrappers around strbuf_getline(),
namely, strbuf_getline_lf() and strbuf_getline_nul(), and
mechanically rewrites these call sites to call either one of
them. The changes contained in this patch are:
* introduction of these two functions in strbuf.[ch]
* mechanical conversion of all callers to strbuf_getline() with
either '\n' or '\0' as the third parameter to instead call the
respective thin wrapper.
After this step, output from "git grep 'strbuf_getline('" would
become a lot smaller. An interim goal of this series is to make
this an empty set, so that we can have strbuf_getline_crlf() take
over the shorter name strbuf_getline().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert this function to use struct object_id. Express several
hardcoded constants in terms of GIT_SHA1_HEXSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Use struct object_id in three fields in struct ref and convert all the
necessary places that use it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:
- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).
* nd/shallow-clone: (31 commits)
t5537: fix incorrect expectation in test case 10
shallow: remove unused code
send-pack.c: mark a file-local function static
git-clone.txt: remove shallow clone limitations
prune: clean .git/shallow after pruning objects
clone: use git protocol for cloning shallow repo locally
send-pack: support pushing from a shallow clone via http
receive-pack: support pushing to a shallow clone via http
smart-http: support shallow fetch/clone
remote-curl: pass ref SHA-1 to fetch-pack as well
send-pack: support pushing to a shallow clone
receive-pack: allow pushes that update .git/shallow
connected.c: add new variant that runs with --shallow-file
add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
receive/send-pack: support pushing from a shallow clone
receive-pack: reorder some code in unpack()
fetch: add --update-shallow to accept refs that update .git/shallow
upload-pack: make sure deepening preserves shallow roots
fetch: support fetching from a shallow repository
clone: support remote shallow repository
...
Be more careful when parsing remote repository URL given in the
scp-style host:path notation.
* tb/clone-ssh-with-colon-for-port:
git_connect(): use common return point
connect.c: refactor url parsing
git_connect(): refactor the port handling for ssh
git fetch: support host:/~repo
t5500: add test cases for diag-url
git fetch-pack: add --diag-url
git_connect: factor out discovery of the protocol and its parts
git_connect: remove artificial limit of a remote command
t5601: add tests for ssh
t5601: remove clear_ssh, refactor setup_ssh_wrapper
Cloning from a shallow repository does not follow the "8 steps for new
.git/shallow" because if it does we need to get through step 6 for all
refs. That means commit walking down to the bottom.
Instead the rule to create .git/shallow is simpler and, more
importantly, cheap: if a shallow commit is found in the pack, it's
probably used (i.e. reachable from some refs), so we add it. Others
are dropped.
One may notice this method seems flawed by the word "probably". A
shallow commit may not be reachable from any refs at all if it's
attached to an object island (a group of objects that are not
reachable by any refs).
If that object island is not complete, a new fetch request may send
more objects to connect it to some ref. At that time, because we
incorrectly installed the shallow commit in this island, the user will
not see anything after that commit (fsck is still ok). This is not
desired.
Given that object islands are rare (C Git never sends such islands for
security reasons) and do not really harm the repository integrity, a
tradeoff is made to surprise the user occasionally but work faster
everyday.
A new option --strict could be added later that follows exactly the 8
steps. "git prune" can also learn to remove dangling objects _and_ the
shallow commits that are attached to them from .git/shallow.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No callers pass a non-empty pointer as shallow_points at this
stage. As a result, all clients still refuse to talk to shallow
repository on the other end.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main purpose is to trace the URL parser called by git_connect() in
connect.c
The main features of the parser can be listed as this:
- parse out host and path for URLs with a scheme (git:// file:// ssh://)
- parse host names embedded by [] correctly
- extract the port number, if present
- separate URLs like "file" (which are local)
from URLs like "host:repo" which should use ssh
Add the new parameter "--diag-url" to "git fetch-pack", which prints
the value for protocol, host and path to stderr and exits.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow a safer "rewind of the remote tip" push than blind "--force",
by requiring that the overwritten remote ref to be unchanged since
the new history to replace it was prepared.
The machinery is more or less ready. The "--force" option is again
the big red button to override any safety, thanks to J6t's sanity
(the original round allowed --lockref to defeat --force).
The logic to choose the default implemented here is fragile
(e.g. "git fetch" after seeing a failure will update the
remote-tracking branch and will make the next "push" pass,
defeating the safety pretty easily). It is suitable only for the
simplest workflows, and it may hurt users more than it helps them.
* jc/push-cas:
push: teach --force-with-lease to smart-http transport
send-pack: fix parsing of --force-with-lease option
t5540/5541: smart-http does not support "--force-with-lease"
t5533: test "push --force-with-lease"
push --force-with-lease: tie it all together
push --force-with-lease: implement logic to populate old_sha1_expect[]
remote.c: add command line option parser for "--force-with-lease"
builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN
cache.h: move remote/connect API out of it
This is an extension of c6807a4 (clone: open a shortcut for
connectivity check - 2013-05-26) to reduce the cost of connectivity
check at clone time, this time with smart http protocol.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me. This structure is not
about the local ref used by sha1-name API to name local objects.
It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values. It belongs to "remote.h" together
with "struct refspec".
While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean up pkt-line API, implementation and its callers to make them
more robust.
* jk/pkt-line-cleanup:
do not use GIT_TRACE_PACKET=3 in tests
remote-curl: always parse incoming refs
remote-curl: move ref-parsing code up in file
remote-curl: pass buffer straight to get_remote_heads
teach get_remote_heads to read from a memory buffer
pkt-line: share buffer/descriptor reading implementation
pkt-line: provide a LARGE_PACKET_MAX static buffer
pkt-line: move LARGE_PACKET_MAX definition from sideband
pkt-line: teach packet_read_line to chomp newlines
pkt-line: provide a generic reading function with options
pkt-line: drop safe_write function
pkt-line: move a misplaced comment
write_or_die: raise SIGPIPE when we get EPIPE
upload-archive: use argv_array to store client arguments
upload-archive: do not copy repo name
send-pack: prefer prefixcmp over memcmp in receive_status
fetch-pack: fix out-of-bounds buffer offset in get_ack
upload-pack: remove packet debugging harness
upload-pack: do not add duplicate objects to shallow list
upload-pack: use get_sha1_hex to parse "shallow" lines
Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though "git fetch" has full infrastructure to parse refspecs to
be fetched and match them against the list of refs to come up with
the final list of refs to be fetched, the list of refs that are
requested to be fetched were internally converted to a plain list of
strings at the transport layer and then passed to the underlying
fetch-pack driver.
Stop this conversion and instead pass around an array of refs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch_pack() is used by transport.c, part of libgit.a while it stays
in builtin/fetch-pack.c. Move it to fetch-pack.c so that we won't get
undefined reference if a program that uses libgit.a happens to pull it
in.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
This helps removes the hack in fetch_pack() that copies my_args to args.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
It used to be that if "--all", "--depth", and also explicit references
were sought, then the explicit references were not handled correctly
in filter_refs() because the "--all --depth" code took precedence over
the explicit reference handling, and the explicit references were
never noted as having been found. So check for explicitly sought
references before proceeding to the "--all --depth" logic.
This fixes two test cases in t5500.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Set the final value at initialization rather than initializing it then
sometimes changing it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplifies the logic without changing the behavior.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify flow within loop: first decide whether to keep the reference,
then keep/free it. This makes it clearer that each ref has exactly
two possible destinies, and removes duplication of the code for
appending the reference to the linked list.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of temporarily storing matched refs to temporary array
"return_refs", simply append them to newlist as we go. This changes
the order of references in newlist to strictly sorted if "--all" and
"--depth" and named references are all specified, but that usage is
broken anyway (see the last two tests in t5500).
This changes the last test in t5500 from segfaulting into just
emitting a spurious error (this will be fixed in a moment).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove any references that are available from the remote from the
sought list (rather than overwriting their names with NUL characters,
as previously). Mark matching entries by writing a non-NULL pointer
to string_list_item::util during the iteration, then use
filter_string_list() later to filter out the entries that have been
marked.
Document this aspect of fetch_pack() in a comment in the header file.
(More documentation is obviously still needed.)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fetch_pack() removes duplicates from the "sought" list, thereby
shrinking the list. But previously, the caller was not informed about
the shrinkage. This would cause a spurious error message to be
emitted by cmd_fetch_pack() if "git fetch-pack" is called with
duplicate refnames.
Instead, remove duplicates using string_list_remove_duplicates(),
which adjusts sought->nr to reflect the new length of the list.
The last test of t5500 inexplicably *required* "git fetch-pack" to
fail when fetching a list of references that contains duplicates;
i.e., it insisted on the buggy behavior. So change the test to expect
the correct behavior.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once a match has been found at sought_pos, the entry is zeroed and no
future attempts will match that entry. So increment sought_pos to
avoid checking against the zeroed-out entry during the next iteration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of juggling <nr_heads,heads> (sometimes called
<nr_match,match>), pass around the list of references to be sought in
a single string_list variable called "sought". Future commits will
make more use of string_list functionality.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fetch-pack's verbose mode is more of a debugging mode (and
in fact takes two "-v" arguments to trigger via the
porcelain layer). Let's mention the server version as
another possible item of interest.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same spirit as the previous fix, stop asking for thin-pack, no-progress
and include-tag capabilities when the other end does not claim to support them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit ff5effdf taught both clients and servers of the git protocol
to send an "agent" capability that just advertises their version for
statistics and debugging purposes. The protocol-capabilities.txt
document however indicates that the client's advertisement is
actually a response, and should never include capabilities not
mentioned in the server's advertisement.
Adding the unconditional advertisement in the server programs was
OK, then, but the clients broke the protocol. The server
implementation of git-core itself does not care, but at least one
does: the Google Code git server (or any server using Dulwich), will
hang up with an internal error upon seeing an unknown capability.
Instead, each client must record whether we saw an agent string from
the server, and respond with its agent only if the server mentioned
it first.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of having the client advertise a particular version
number in the git protocol, we have managed extensions and
backwards compatibility by having clients and servers
advertise capabilities that they support. This is far more
robust than having each side consult a table of
known versions, and provides sufficient information for the
protocol interaction to complete.
However, it does not allow servers to keep statistics on
which client versions are being used. This information is
not necessary to complete the network request (the
capabilities provide enough information for that), but it
may be helpful to conduct a general survey of client
versions in use.
We already send the client version in the user-agent header
for http requests; adding it here allows us to gather
similar statistics for non-http requests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The way "fetch-pack" that is given multiple references to fetch tried to
remove duplicates was very inefficient.
By Jeff King
* jk/fetch-pack-remove-dups-optim:
fetch-pack: sort incoming heads list earlier
fetch-pack: avoid quadratic loop in filter_refs
fetch-pack: sort the list of incoming refs
add sorting infrastructure for list refs
fetch-pack: avoid quadratic behavior in remove_duplicates
fetch-pack: sort incoming heads
Tighten constness of some local variables in a callchain.
By Michael Haggerty
* mh/fetch-pack-constness:
cmd_fetch_pack(): respect constness of argv parameter
cmd_fetch_pack(): combine the loop termination conditions
cmd_fetch_pack(): handle non-option arguments outside of the loop
cmd_fetch_pack(): declare dest to be const
Commit 4435968 started sorting heads fed to fetch-pack so
that later commits could use more optimized algorithms;
commit 7db8d53 switched the remove_duplicates function to
such an algorithm.
Of course, the sorting is more effective if you do it
_before_ the algorithm in question.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a list of refs that we want to compare against the
"match" array. The current code searches the match list
linearly, giving quadratic behavior over the number of refs
when you want to fetch all of them.
Instead, we can compare the lists as we go, giving us linear
behavior.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having the list sorted means we can avoid some quadratic
algorithms when comparing lists.
These should typically be sorted already, but they do come
from the remote, so let's be extra careful. Our ref-sorting
implementation does a mergesort, so we do not have to care
about performance degrading in the common case that the list
is already sorted.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We remove duplicate entries from the list of refs we are
fed in fetch-pack. The original algorithm is quadratic over
the number of refs, but since the list is now guaranteed to
be sorted, we can do it in linear time.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no reason to preserve the incoming order of the
heads we're requested to fetch. By having them sorted, we
can replace some of the quadratic algorithms with linear
ones.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code cast away the constness of the strings passed to the
function in argument argv[], which could result in their being
modified by filter_refs(). Fix by copying reference names from argv
and putting them into our own array (similarly to how refnames passed
to stdin were already handled).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If an argument that does not start with '-' is found, the loop is
terminated. So move that check into the for-loop condition.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it more obvious that the code is always executed unless
there is an error, and that the first initialization of nr_heads is
unnecessary.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need for it to be non-const, and this avoids the need
for casting away the constness of an argv element.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git fetch" encounters repositories with too many references, the
command line of "fetch-pack" that is run by a helper e.g. remote-curl,
may fail to hold all of them. Now such an internal invocation can feed
the references through the standard input of "fetch-pack".
By Ivan Todoroski
* it/fetch-pack-many-refs:
remote-curl: main test case for the OS command line overflow
fetch-pack: test cases for the new --stdin option
remote-curl: send the refs to fetch-pack on stdin
fetch-pack: new --stdin option to read refs from stdin
If a remote repo has too many tags (or branches), cloning it over the
smart HTTP transport can fail because remote-curl.c puts all the refs
from the remote repo on the fetch-pack command line. This can make the
command line longer than the global OS command line limit, causing
fetch-pack to fail.
This is especially a problem on Windows where the command line limit is
orders of magnitude shorter than Linux. There are already real repos out
there that msysGit cannot clone over smart HTTP due to this problem.
Here is an easy way to trigger this problem:
git init too-many-refs
cd too-many-refs
echo bla > bla.txt
git add .
git commit -m test
sha=$(git rev-parse HEAD)
tag=$(perl -e 'print "bla" x 30')
for i in `seq 50000`; do
echo $sha refs/tags/$tag-$i >> .git/packed-refs
done
Then share this repo over the smart HTTP protocol and try cloning it:
$ git clone http://localhost/.../too-many-refs/.git
Cloning into 'too-many-refs'...
fatal: cannot exec 'fetch-pack': Argument list too long
50k tags is obviously an absurd number, but it is required to
demonstrate the problem on Linux because it has a much more generous
command line limit. On Windows the clone fails with as little as 500
tags in the above loop, which is getting uncomfortably close to the
number of tags you might see in real long lived repos.
This is not just theoretical, msysGit is already failing to clone our
company repo due to this. It's a large repo converted from CVS, nearly
10 years of history.
Four possible solutions were discussed on the Git mailing list (in no
particular order):
1) Call fetch-pack multiple times with smaller batches of refs.
This was dismissed as inefficient and inelegant.
2) Add option --refs-fd=$n to pass a an fd from where to read the refs.
This was rejected because inheriting descriptors other than
stdin/stdout/stderr through exec() is apparently problematic on Windows,
plus it would require changes to the run-command API to open extra
pipes.
3) Add option --refs-from=$tmpfile to pass the refs using a temp file.
This was not favored because of the temp file requirement.
4) Add option --stdin to pass the refs on stdin, one per line.
In the end this option was chosen as the most efficient and most
desirable from scripting perspective.
There was however a small complication when using stdin to pass refs to
fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for
communication with the remote server.
If we are going to sneak refs on stdin line by line, it would have to be
done very carefully in the presence of --stateless-rpc, because when
reading refs line by line we might read ahead too much data into our
buffer and eat some of the remote protocol data which is also coming on
stdin.
One way to solve this would be to refactor get_remote_heads() in
fetch-pack.c to accept a residual buffer from our stdin line parsing
above, but this function is used in several places so other callers
would be burdened by this residual buffer interface even when most of
them don't need it.
In the end we settled on the following solution:
If --stdin is specified without --stateless-rpc, fetch-pack would read
the refs from stdin one per line, in a script friendly format.
However if --stdin is specified together with --stateless-rpc,
fetch-pack would read the refs from stdin in packetized format
(pkt-line) with a flush packet terminating the list of refs. This way we
can read the exact number of bytes that we need from stdin, and then
get_remote_heads() can continue reading from the same fd without losing
a single byte of remote protocol data.
This way the --stdin option only loses generality and scriptability when
used together with --stateless-rpc, which is not easily scriptable
anyway because it also uses pkt-line when talking to the remote server.
Signed-off-by: Ivan Todoroski <grnch@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>