With the same mechanism as used to tell where "HEAD" points at to
the other end, we can tell the target of other symbolic refs as
well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One long-standing flaw in the pack transfer protocol was that there
was no way to tell the other end which branch "HEAD" points at.
With a capability "symref=HEAD:refs/heads/master", let the sender to
tell the receiver what symbolic ref points at what ref.
This capability can be repeated more than once to represent symbolic
refs other than HEAD, such as "refs/remotes/origin/HEAD").
Add an infrastructure to collect symbolic refs, format them as extra
capabilities and put it on the wire. For now, just send information
on the "HEAD" and nothing else.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callee does not use cb_data, and the caller is an intermediate
function in a callchain that later wants to use the cb_data for its
own use. Clarify the code by breaking the dataflow explicitly by
not passing cb_data down to mark_our_ref().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason not to turn on keepalives by default.
They take very little bandwidth, and significantly less than
the progress reporting they are replacing. And in the case
that progress reporting is on, we should never need to send
a keepalive anyway, as we will constantly be showing
progress and resetting the keepalive timer.
We do not necessarily know what the client's idea of a
reasonable timeout is, so let's keep this on the low side of
5 seconds. That is high enough that we will always prefer
our normal 1-second progress reports to sending a keepalive
packet, but low enough that no sane client should consider
the connection hung.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack has started pack-objects, there may be a quiet
period while pack-objects prepares the pack (i.e., counting objects
and delta compression). Normally we would see (and send to the
client) progress information, but if "--quiet" is in effect,
pack-objects will produce nothing at all until the pack data is
ready. On a large repository, this can take tens of seconds (or even
minutes if the system is loaded or the repository is badly packed).
Clients or intermediate proxies can sometimes give up in this
situation, assuming that the server or connection has hung.
This patch introduces a "keepalive" option; if upload-pack sees no
data from pack-objects for a certain number of seconds, it will send
an empty sideband data packet to let the other side know that we are
still working on it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack has a special revision walking code for shallow
recipients. It works almost like the similar code in pack-objects
except:
1. in upload-pack, graft points could be added for deepening;
2. also when the repository is deepened, the shallow point will be
moved further away from the tip, but the old shallow point will be
marked as edge to produce more efficient packs. See 6523078 (make
shallow repository deepening more network efficient - 2009-09-03).
Pass the file to pack-objects via --shallow-file. This will override
$GIT_DIR/shallow and give pack-objects the exact repository shape
that upload-pack has.
mark edge commits by revision command arguments. Even if old shallow
points are passed as "--not" revisions as in this patch, they will not
be picked up by mark_edges_uninteresting() because this function looks
up to parents for edges, while in this case the edge is the children,
in the opposite direction. This will be fixed in an later patch when
all given uninteresting commits are marked as edges.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me. This structure is not
about the local ref used by sha1-name API to name local objects.
It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values. It belongs to "remote.h" together
with "struct refspec".
While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the client sends a 'shallow' line for an object that the server does
not have, the server currently dies with the error: "did not find object
for shallow <obj-id>". The client may have truncated the history at
the commit by fetching shallowly from a different server, or the commit
may have been garbage collected by the server. In either case, this
unknown commit is not relevant for calculating the pack that is to be
sent and can be safely ignored, and it is not used when recomputing where
the updated history of the client is cauterised.
The documentation in technical/pack-protocol.txt has been updated to
remove the restriction that "Clients MUST NOT mention an obj-id which it
does not know exists on the server". This requirement is not realistic
because clients cannot know whether an object has been garbage collected
by the server.
Signed-off-by: Michael Heemskerk <mheemskerk@atlassian.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean up pkt-line API, implementation and its callers to make them
more robust.
* jk/pkt-line-cleanup:
do not use GIT_TRACE_PACKET=3 in tests
remote-curl: always parse incoming refs
remote-curl: move ref-parsing code up in file
remote-curl: pass buffer straight to get_remote_heads
teach get_remote_heads to read from a memory buffer
pkt-line: share buffer/descriptor reading implementation
pkt-line: provide a LARGE_PACKET_MAX static buffer
pkt-line: move LARGE_PACKET_MAX definition from sideband
pkt-line: teach packet_read_line to chomp newlines
pkt-line: provide a generic reading function with options
pkt-line: drop safe_write function
pkt-line: move a misplaced comment
write_or_die: raise SIGPIPE when we get EPIPE
upload-archive: use argv_array to store client arguments
upload-archive: do not copy repo name
send-pack: prefer prefixcmp over memcmp in receive_status
fetch-pack: fix out-of-bounds buffer offset in get_ack
upload-pack: remove packet debugging harness
upload-pack: do not add duplicate objects to shallow list
upload-pack: use get_sha1_hex to parse "shallow" lines
Recent optimization broke shallow clones.
* jk/peel-ref:
upload-pack: load non-tip "want" objects from disk
upload-pack: make sure "want" objects are parsed
upload-pack: drop lookup-before-parse optimization
Allows requests to fetch objects at any tip of refs (including
hidden ones). It seems that there may be use cases even outside
Gerrit (e.g. $gmane/215701).
* jc/fetch-raw-sha1:
fetch: fetch objects by their exact SHA-1 object names
upload-pack: optionally allow fetching from the tips of hidden refs
fetch: use struct ref to represent refs to be fetched
parse_fetch_refspec(): clarify the codeflow a bit
It is a long-time security feature that upload-pack will not
serve any "want" lines that do not correspond to the tip of
one of our refs. Traditionally, this was enforced by
checking the objects in the in-memory hash; they should have
been loaded and received the OUR_REF flag during the
advertisement.
The stateless-rpc mode, however, has a race condition here:
one process advertises, and another receives the want lines,
so the refs may have changed in the interim. To address
this, commit 051e400 added a new verification mode; if the
object is not OUR_REF, we set a "has_non_tip" flag, and then
later verify that the requested objects are reachable from
our current tips.
However, we still die immediately when the object is not in
our in-memory hash, and at this point we should only have
loaded our tip objects. So the check_non_tip code path does
not ever actually trigger, as any non-tip objects would
have already caused us to die.
We can fix that by using parse_object instead of
lookup_object, which will load the object from disk if it
has not already been loaded.
We still need to check that parse_object does not return
NULL, though, as it is possible we do not have the object
at all. A more appropriate error message would be "no such
object" rather than "not our ref"; however, we do not want
to leak information about what objects are or are not in
the object database, so we continue to use the same "not
our ref" message that would be produced by an unreachable
object.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack receives a "want" line from the client, it
adds it to an object array. We call lookup_object to find
the actual object, which will only check for objects already
in memory. This works because we are expecting to find
objects that we already loaded during the ref advertisement.
We use the resulting object structs for a variety of
purposes. Some of them care only about the object flags, but
others care about the type of the object (e.g.,
ok_to_give_up), or even feed them to the revision parser
(when --depth is used), which assumes that objects it
receives are fully parsed.
Once upon a time, this was OK; any object we loaded into
memory would also have been parsed. But since 435c833
(upload-pack: use peel_ref for ref advertisements,
2012-10-04), we try to avoid parsing objects during the ref
advertisement. This means that lookup_object may return an
object with a type of OBJ_NONE. The resulting mess depends
on the exact set of objects, but can include the revision
parser barfing, or the shallow code sending the wrong set of
objects.
This patch teaches upload-pack to parse each "want" object
as we receive it. We do not replace the lookup_object call
with parse_object, as the current code is careful not to let
just any object appear on a "want" line, but rather only one
we have previously advertised (whereas parse_object would
actually load any arbitrary object from disk).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we receive a "have" line from the client, we want to
load the object pointed to by the sha1. However, we are
careful to do:
o = lookup_object(sha1);
if (!o || !o->parsed)
o = parse_object(sha1);
to avoid loading the object from disk if we have already
seen it. However, since ccdc603 (parse_object: try internal
cache before reading object db), parse_object already does
this optimization internally. We can just call parse_object
directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is just write_or_die by another name. The one
distinction is that write_or_die will treat EPIPE specially
by suppressing error messages. That's fine, as we die by
SIGPIPE anyway (and in the off chance that it is disabled,
write_or_die will simulate it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you set the GIT_DEBUG_SEND_PACK environment variable,
upload-pack will dump lines it receives in the receive_needs
phase to a descriptor. This debugging harness is a strict
subset of what GIT_TRACE_PACKET can do. Let's just drop it
in favor of that.
A few tests used GIT_DEBUG_SEND_PACK to confirm which
objects get sent; we have to adapt them to the new output
format.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the client tells us it has a shallow object via
"shallow <sha1>", we make sure we have the object, mark it
with a flag, then add it to a dynamic array of shallow
objects. This means that a client can get us to allocate
arbitrary amounts of memory just by flooding us with shallow
lines (whether they have the objects or not). You can
demonstrate it easily with:
yes '0035shallow e83c5163316f89bfbde7d9ab23ca2e25604af290' |
git-upload-pack git.git
We already protect against duplicates in want lines by
checking if our flag is already set; let's do the same thing
here. Note that a client can still get us to allocate some
amount of memory by marking every object in the repo as
"shallow" (or "want"). But this at least bounds it with the
number of objects in the repository, which is not under the
control of an upload-pack client.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we receive a line like "shallow <sha1>" from the
client, we feed the <sha1> part to get_sha1. This is a
mistake, as the argument on a shallow line is defined by
Documentation/technical/pack-protocol.txt to contain an
"obj-id". This is never defined in the BNF, but it is clear
from the text and from the other uses that it is meant to be
a hex sha1, not an arbitrary identifier (and that is what
fetch-pack has always sent).
We should be using get_sha1_hex instead, which doesn't allow
the client to request arbitrary junk like "HEAD@{yesterday}".
Because this is just marking shallow objects, the client
couldn't actually do anything interesting (like fetching
objects from unreachable reflog entries), but we should keep
our parsing tight to be on the safe side.
Because get_sha1 is for the most part a superset of
get_sha1_hex, in theory the only behavior change should be
disallowing non-hex object references. However, there is
one interesting exception: get_sha1 will only parse
a 40-character hex sha1 if the string has exactly 40
characters, whereas get_sha1_hex will just eat the first 40
characters, leaving the rest. That means that current
versions of git-upload-pack will not accept a "shallow"
packet that has a trailing newline, even though the protocol
documentation is clear that newlines are allowed (even
encouraged) in non-binary parts of the protocol.
This never mattered in practice, though, because fetch-pack,
contrary to the protocol documentation, does not include a
newline in its shallow lines. JGit follows its lead (though
it correctly is strict on the parsing end about wanting a
hex object id).
We do not adjust fetch-pack to send newlines here, as it
would break communication with older versions of git (and
there is no actual benefit to doing so, except for
consistency with other parts of the protocol).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the server side to redact the refs/ namespace it shows to the
client.
Will merge to 'master'.
* jc/hidden-refs:
upload/receive-pack: allow hiding ref hierarchies
upload-pack: simplify request validation
upload-pack: share more code
With uploadpack.allowtipsha1inwant configuration option set, future
versions of "git fetch" that allow an exact object name (likely to
have been obtained out of band) on the LHS of the fetch refspec can
make a request with a "want" line that names an object that may not
have been advertised due to transfer.hiderefs configuration.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch --depth" was broken in at least three ways. The
resulting history was deeper than specified by one commit, it was
unclear how to wipe the shallowness of the repository with the
command, and documentation was misleading.
* nd/fetch-depth-is-broken:
fetch: elaborate --depth action
upload-pack: fix off-by-one depth calculation in shallow clone
fetch: add --unshallow for turning shallow repo into complete one
Long time ago, we used to punt on a large (read: asking for more
than 256 refs) fetch request and instead sent a full pack, because
we couldn't fit many refs on the command line of rev-list we run
internally to enumerate the objects to be sent. To fix this,
565ebbf (upload-pack: tighten request validation., 2005-10-24),
added a check to count the number of refs in the request and matched
with the number of refs we advertised, and changed the invocation of
rev-list to pass "--all" to it, still keeping us under the command
line argument limit.
However, these days we feed the list of objects requested and the
list of objects the other end is known to have via standard input,
so there is no longer a valid reason to special case a full clone
request. Remove the code associated with "create_full_pack" to
simplify the logic.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We mark the objects pointed at our refs with "OUR_REF" flag in two
functions (mark_our_ref() and send_ref()), but we can just use the
former as a helper for the latter.
Update the way mark_our_ref() prepares in-core object to use
lookup_unknown_object() to delay reading the actual object data,
just like we did in 435c833 (upload-pack: use peel_ref for ref
advertisements, 2012-10-04).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A minor consistency check patch that does not have much relevance
to the real world.
* nd/upload-pack-shallow-must-be-commit:
upload-pack: only accept commits from "shallow" line
The user can do --depth=2147483647 (*) for restoring complete repo
now. But it's hard to remember. Any other numbers larger than the
longest commit chain in the repository would also do, but some
guessing may be involved. Make easy-to-remember --unshallow an alias
for --depth=2147483647.
Make upload-pack recognize this special number as infinite depth. The
effect is essentially the same as before, except that upload-pack is
more efficient because it does not have to traverse to the bottom
anymore.
The chance of a user actually wanting exactly 2147483647 commits
depth, not infinite, on a repository with a history that long, is
probably too small to consider. The client can learn to add or
subtract one commit to avoid the special treatment when that actually
happens.
(*) This is the largest positive number a 32-bit signed integer can
contain. JGit and older C Git store depth as "int" so both are OK
with this number. Dulwich does not support shallow clone.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only allow cuts at commits, not arbitrary objects. upload-pack will
fail eventually in register_shallow if a non-commit is given with a
generic error "Object %s is a %s, not a commit". Check it early and
give a more accurate error.
This should never show up in an ordinary session. It's for buggy
clients, or when the user manually edits .git/shallow.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack advertises refs, we attempt to peel tags
and advertise the peeled version. We currently hand-roll the
tag dereferencing, and use as many optimizations as we can
to avoid loading non-tag objects into memory.
Not only has peel_ref recently learned these optimizations,
too, but it also contains an even more important one: it
has access to the "peeled" data from the pack-refs file.
That means we can avoid not only loading annotated tags
entirely, but also avoid doing any kind of object lookup at
all.
This cut the CPU time to advertise refs by 50% in the
linux-2.6 repo, as measured by:
echo 0000 | git-upload-pack . >/dev/null
best-of-five, warm cache, objects and refs fully packed:
[before] [after]
real 0m0.026s real 0m0.013s
user 0m0.024s user 0m0.008s
sys 0m0.000s sys 0m0.000s
Those numbers are irrelevantly small compared to an actual
fetch. Here's a larger repo (400K refs, of which 12K are
unique, and of which only 107 are unique annotated tags):
[before] [after]
real 0m0.704s real 0m0.596s
user 0m0.600s user 0m0.496s
sys 0m0.096s sys 0m0.092s
This shows only a 15% speedup (mostly because it has fewer
actual tags to parse), but a larger absolute value (100ms,
which isn't a lot compared to a real fetch, but this
advertisement happens on every fetch, even if the client is
just finding out they are completely up to date).
In truly pathological cases, where you have a large number
of unique annotated tags, it can make an even bigger
difference. Here are the numbers for a linux-2.6 repository
that has had every seventh commit tagged (so about 50K
tags):
[before] [after]
real 0m0.443s real 0m0.097s
user 0m0.416s user 0m0.080s
sys 0m0.024s sys 0m0.012s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of having the client advertise a particular version
number in the git protocol, we have managed extensions and
backwards compatibility by having clients and servers
advertise capabilities that they support. This is far more
robust than having each side consult a table of
known versions, and provides sufficient information for the
protocol interaction to complete.
However, it does not allow servers to keep statistics on
which client versions are being used. This information is
not necessary to complete the network request (the
capabilities provide enough information for that), but it
may be helpful to conduct a general survey of client
versions in use.
We already send the client version in the user-agent header
for http requests; adding it here allows us to gather
similar statistics for non-http requests.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have been carefully choosing feature names used in the protocol
extensions so that the vocabulary does not contain a word that is a
substring of another word, so it is not a real problem, but we have
recently added "quiet" feature word, which would mean we cannot later
add some other word with "quiet" (e.g. "quiet-push"), which is awkward.
Let's make sure that we can eventually be able to do so by teaching the
clients and servers that feature words consist of non whitespace
letters. This parser also allows us to later add features with parameters
e.g. "feature=1.5" (parameter values need to be quoted for whitespaces,
but we will worry about the detauls when we do introduce them).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When upload-pack advertises refs, it dereferences any tags
it sees, and shows the resulting sha1 to the client. It does
this by calling deref_tag. That function must load and parse
each tag object to find the sha1 of the tagged object.
However, it also ends up parsing the tagged object itself,
which is not strictly necessary for upload-pack's use.
Each tag produces two object loads (assuming it is not a
recursive tag), when it could get away with only a single
one. Dropping the second load halves the effort we spend.
The downside is that we are no longer verifying the
resulting object by loading it. In particular:
1. We never cross-check the "type" field given in the tag
object with the type of the pointed-to object. If the
tag says it points to a tag but doesn't, then we will
keep peeling and realize the error. If the tag says it
points to a non-tag but actually points to a tag, we
will stop peeling and just advertise the pointed-to
tag.
2. If we are missing the pointed-to object, we will not
realize (because we never even look it up in the object
db).
However, both of these are errors in the object database,
and both will be detected if a client actually requests the
broken objects in question. So we are simply pushing the
verification away from the advertising stage, and down to
the actual fetching stage.
On my test repo with 120K refs, this drops the time to
advertise the refs from ~3.2s to ~2.0s.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we advertise a ref, the first thing we do is parse the
pointed-to object. This gives us two things:
1. a "struct object" we can use to store flags
2. the type of the object, so we know whether we need to
dereference it as a tag
Instead, we can just use lookup_unknown_object to get an
object struct, and then fill in just the type field using
sha1_object_info (which, in the case of packed files, can
find the information without actually inflating the object
data).
This can save time if you have a large number of refs, and
the client isn't actually going to request those refs (e.g.,
because most of them are already up-to-date).
The downside is that we are no longer verifying objects that
we advertise by fully parsing them (however, we do still
know we actually have them, because sha1_object_info must
find them to get the type). While we might fail to detect a
corrupt object here, if the client actually fetches the
object, we will parse (and verify) it then.
On a repository with 120K refs, the advertisement portion of
upload-pack goes from ~3.4s to 3.2s (the failure to speed up
more is largely due to the fact that most of these refs are
tags, which need dereferenced to find the tag destination
anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the skeleton implementation of i18n in Git to one that can show
localized strings to users for our C, Shell and Perl programs using
either GNU libintl or the Solaris gettext implementation.
This new internationalization support is enabled by default. If
gettext isn't available, or if Git is compiled with
NO_GETTEXT=YesPlease, Git falls back on its current behavior of
showing interface messages in English. When using the autoconf script
we'll auto-detect if the gettext libraries are installed and act
appropriately.
This change is somewhat large because as well as adding a C, Shell and
Perl i18n interface we're adding a lot of tests for them, and for
those tests to work we need a skeleton PO file to actually test
translations. A minimal Icelandic translation is included for this
purpose. Icelandic includes multi-byte characters which makes it easy
to test various edge cases, and it's a language I happen to
understand.
The rest of the commit message goes into detail about various
sub-parts of this commit.
= Installation
Gettext .mo files will be installed and looked for in the standard
$(prefix)/share/locale path. GIT_TEXTDOMAINDIR can also be set to
override that, but that's only intended to be used to test Git itself.
= Perl
Perl code that's to be localized should use the new Git::I18n
module. It imports a __ function into the caller's package by default.
Instead of using the high level Locale::TextDomain interface I've
opted to use the low-level (equivalent to the C interface)
Locale::Messages module, which Locale::TextDomain itself uses.
Locale::TextDomain does a lot of redundant work we don't need, and
some of it would potentially introduce bugs. It tries to set the
$TEXTDOMAIN based on package of the caller, and has its own
hardcoded paths where it'll search for messages.
I found it easier just to completely avoid it rather than try to
circumvent its behavior. In any case, this is an issue wholly
internal Git::I18N. Its guts can be changed later if that's deemed
necessary.
See <AANLkTilYD_NyIZMyj9dHtVk-ylVBfvyxpCC7982LWnVd@mail.gmail.com> for
a further elaboration on this topic.
= Shell
Shell code that's to be localized should use the git-sh-i18n
library. It's basically just a wrapper for the system's gettext.sh.
If gettext.sh isn't available we'll fall back on gettext(1) if it's
available. The latter is available without the former on Solaris,
which has its own non-GNU gettext implementation. We also need to
emulate eval_gettext() there.
If neither are present we'll use a dumb printf(1) fall-through
wrapper.
= About libcharset.h and langinfo.h
We use libcharset to query the character set of the current locale if
it's available. I.e. we'll use it instead of nl_langinfo if
HAVE_LIBCHARSET_H is set.
The GNU gettext manual recommends using langinfo.h's
nl_langinfo(CODESET) to acquire the current character set, but on
systems that have libcharset.h's locale_charset() using the latter is
either saner, or the only option on those systems.
GNU and Solaris have a nl_langinfo(CODESET), FreeBSD can use either,
but MinGW and some others need to use libcharset.h's locale_charset()
instead.
=Credits
This patch is based on work by Jeff Epler <jepler@unpythonic.net> who
did the initial Makefile / C work, and a lot of comments from the Git
mailing list, including Jonathan Nieder, Jakub Narebski, Johannes
Sixt, Erik Faye-Lund, Peter Krefting, Junio C Hamano, Thomas Rast and
others.
[jc: squashed a small Makefile fix from Ramsay]
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fetch-verify:
fetch: verify we have everything we need before updating our ref
rev-list --verify-object
list-objects: pass callback data to show_objects()
The traverse_commit_list() API takes two callback functions, one to show
commit objects, and the other to show other kinds of objects. Even though
the former has a callback data parameter, so that the callback does not
have to rely on global state, the latter does not.
Give the show_objects() callback the same callback data parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes a segfault introduced by 051e400; via it, no longer able to
trigger the http/smartserv race.
Signed-off-by: Brian Harring <ferringb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two copies of traverse_commit_list callback that show the object
name followed by pathname the object was found, to produce output similar
to "rev-list --objects".
Unify them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A request to fetch from a client over smart HTTP protocol is served in
multiple steps. In the first round, the server side shows the set of refs
it has and their values, and the client picks from them and sends "I want
to fetch the history leading to these commits".
When the server tries to respond to this second request, its refs may have
progressed by a push from elsewhere. By design, we do not allow fetching
objects that are not at the tip of an advertised ref, and the server
rejects such a request. The client needs to try again, which is not ideal
especially for a busy server.
Teach upload-pack (which is the workhorse driven by git-daemon and smart
http server interface) that it is OK for a smart-http client to ask for
commits that are not at the tip of any advertised ref, as long as they are
reachable from advertised refs.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change upload-pack and receive-pack to use the namespace-prefixed refs
when working with the repository, and use the unprefixed refs when
talking to the client, maintaining the masquerade. This allows
clone, pull, fetch, and push to work with a suitably configured
GIT_NAMESPACE.
receive-pack advertises refs outside the current namespace as .have refs
(as it currently does for refs in alternates), so that the client can
use them to minimize data transfer but will otherwise ignore them.
With appropriate configuration, this also allows http-backend to expose
namespaces as multiple repositories with different paths. This only
requires setting GIT_NAMESPACE, which http-backend passes through to
upload-pack and receive-pack.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a pthread-enabled version of upload-pack, there's a race condition
that can cause a deadlock on the fflush(NULL) we call from run-command.
What happens is this:
1. Upload-pack is informed we are doing a shallow clone.
2. We call start_async() to spawn a thread that will generate rev-list
results to feed to pack-objects. It gets a file descriptor to a
pipe which will eventually hook to pack-objects.
3. The rev-list thread uses fdopen to create a new output stream
around the fd we gave it, called pack_pipe.
4. The thread writes results to pack_pipe. Outside of our control,
libc is doing locking on the stream. We keep writing until the OS
pipe buffer is full, and then we block in write(), still holding
the lock.
5. The main thread now uses start_command to spawn pack-objects.
Before forking, it calls fflush(NULL) to flush every stdio output
buffer. It blocks trying to get the lock on pack_pipe.
And we have a deadlock. The thread will block until somebody starts
reading from the pipe. But nobody will read from the pipe until we
finish flushing to the pipe.
To fix this, we swap the start order: we start the
pack-objects reader first, and then the rev-list writer
after. Thus the problematic fflush(NULL) happens before we
even open the new file descriptor (and even if it didn't,
flushing should no longer block, as the reader at the end of
the pipe is now active).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-fetch-pack-stop-early:
enable "no-done" extension only when fetching over smart-http
* sp/maint-upload-pack-stop-early:
enable "no-done" extension only when serving over smart-http
Last night I had to make these two emergency reverts, but now we have a
better understanding of which part of the topic was broken, let's get rid
of the revert to fix it correctly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do not advertise no-done capability when upload-pack is not serving over
smart-http, as there is no way for this server to know when it should stop
reading in-flight data from the client, even though it is necessary to
drain all the in-flight data in order to unblock the client.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
This reverts 3e63b21 (upload-pack: Implement no-done capability,
2011-03-14). Together with 761ecf0 (fetch-pack: Implement no-done
capability, 2011-03-14) it seems to make the fetch-pack process out of
sync and makes it keep talking long after upload-pack stopped listening to
it, terminating the process with SIGPIPE.
If the client requests both multi_ack_detailed and no-done then
upload-pack is free to immediately send a PACK following its first
'ACK %s ready' message. The upload-pack response actually winds
up being:
ACK %s common
... (maybe more) ...
ACK %s ready
NAK
ACK %s
PACK.... the pack stream ....
For smart HTTP connections this saves one HTTP RPC, reducing
the overall latency for a trivial fetch. For git:// and ssh://
a no-done option slightly reduces latency by removing one
server->client->server round-trip at the end of the common
ancestor negotiation.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a client is merely following the remote (and has not made any
new commits itself), all "have %s" lines sent by the client will be
common to the server. As all lines are common upload-pack never
calls ok_to_give_up() and does not compute if it has a good cut
point in the commit graph.
Without this computation the following client is going to send all
tagged commits, as these were determined to be COMMON_REF during the
initial advertisement, but the client does not parse their history
to transitively pass the COMMON flag and empty its queue of commits.
For git.git with 339 commit tags, it takes clients 11 rounds of
negotation to fully send all tagged commits and exhaust its queue
of things to send as common. This is pretty slow for a client that
has not done any local development activity.
Force computing ok_to_give_up() and send "ACK %s ready" at the end
of the current round if this round only contained common objects
and ok_to_give_up() was therefore not called. This may allow the
client to break early, avoiding transmission of the COMMON_REFs.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shows a trace of all packets coming in or out of a given
program. This can help with debugging object negotiation or
other protocol issues.
To keep the code changes simple, we operate at the lowest
level, meaning we don't necessarily understand what's in the
packets. The one exception is a packet starting with "PACK",
which causes us to skip that packet and turn off tracing
(since the gigantic pack data will not be interesting to
read, at least not in the trace format).
We show both written and read packets. In the local case,
this may mean you will see packets twice (written by the
sender and read by the receiver). However, for cases where
the other end is remote, this allows you to see the full
conversation.
Packet tracing can be enabled with GIT_TRACE_PACKET=<foo>,
where <foo> takes the same arguments as GIT_TRACE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add commit_list prefix to insert_by_date function and to sort_by_date,
so it's clear that these functions refer to commit_list structure.
Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When printing an error message saying a ref was requested that we do not
have, only print that ref, rather than the ref and everything sent to us
on the same packet line (e.g. protocol support specifications).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A bit of history in chronological order, the newest at bottom:
- 80ccaa7 (upload-pack: Move the revision walker into a separate function.)
do_rev_list was introduced with create_full_pack argument
- 21edd3f (upload-pack: Run rev-list in an asynchronous function.)
do_rev_list was now called by start_async, create_full_pack was
passed by rev_list.data
- f0cea83 (Shift object enumeration out of upload-pack)
rev_list.data was now zero permanently. Creating full pack was
done by passing --all to pack-objects
- ae6a560 (run-command: support custom fd-set in async)
rev_list.data = 0 was found out redudant and got rid of.
Get rid of the code as well, for less headache while reading do_rev_list.
[jc: noticed by Elijah Newren]
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds the possibility to supply a set of non-0 file
descriptors for async process communication instead of the
default-created pipe.
Additionally, we now support bi-directional communiction with the
async procedure, by giving the async function both read and write
file descriptors.
To retain compatiblity and similar "API feel" with start_command,
we require start_async callers to set .out = -1 to get a readable
file descriptor. If either of .in or .out is 0, we supply no file
descriptor to the async process.
[sp: Note: Erik started this patch, and a huge bulk of it is
his work. All bugs were introduced later by Shawn.]
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This hook runs after "git fetch" in the repository the objects are
fetched from as the user who fetched, and has security implications.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/smart-http: (37 commits)
http-backend: Let gcc check the format of more printf-type functions.
http-backend: Fix access beyond end of string.
http-backend: Fix bad treatment of uintmax_t in Content-Length
t5551-http-fetch: Work around broken Accept header in libcurl
t5551-http-fetch: Work around some libcurl versions
http-backend: Protect GIT_PROJECT_ROOT from /../ requests
Git-aware CGI to provide dumb HTTP transport
http-backend: Test configuration options
http-backend: Use http.getanyfile to disable dumb HTTP serving
test smart http fetch and push
http tests: use /dumb/ URL prefix
set httpd port before sourcing lib-httpd
t5540-http-push: remove redundant fetches
Smart HTTP fetch: gzip requests
Smart fetch over HTTP: client side
Smart push over HTTP: client side
Discover refs via smart HTTP server when available
http-backend: more explict LocationMatch
http-backend: add example for gitweb on same URL
http-backend: use mod_alias instead of mod_rewrite
...
Conflicts:
.gitignore
remote-curl.c
In theory it is possible for sideband channel #2 to be delayed if
pack data is quick to come up for sideband channel #1. And because
data for channel #2 is read only 128 bytes at a time while pack data
is read 8192 bytes at a time, it is possible for many pack blocks to
be sent to the client before the progress message fifo is emptied,
making the situation even worse. This would result in totally garbled
progress display on the client's console as local progress gets mixed
with partial remote progress lines.
Let's prevent such situations by giving transmission priority to
progress messages over pack data at all times.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --stateless-rpc is passed as a command line parameter to
upload-pack or receive-pack the programs now assume they may
perform only a single read-write cycle with stdin and stdout.
This fits with the HTTP POST request processing model where a
program may read the request, write a response, and must exit.
When --advertise-refs is passed as a command line parameter only
the initial ref advertisement is output, and the program exits
immediately. This fits with the HTTP GET request model, where
no request content is received but a response must be produced.
HTTP headers and/or environment are not processed here, but
instead are assumed to be handled by the program invoking
either service backend.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When multi_ack_detailed is enabled the ACK continue messages returned
by the remote upload-pack are broken out to describe the different
states within the peer. This permits the client to better understand
the server's in-memory state.
The fetch-pack/upload-pack protocol now looks like:
NAK
---------------------------------
Always sent in response to "done" if there was no common base
selected from the "have" lines (or no have lines were sent).
* no multi_ack or multi_ack_detailed:
Sent when the client has sent a pkt-line flush ("0000") and
the server has not yet found a common base object.
* either multi_ack or multi_ack_detailed:
Always sent in response to a pkt-line flush.
ACK %s
-----------------------------------
* no multi_ack or multi_ack_detailed:
Sent in response to "have" when the object exists on the remote
side and is therefore an object in common between the peers.
The argument is the SHA-1 of the common object.
* either multi_ack or multi_ack_detailed:
Sent in response to "done" if there are common objects.
The argument is the last SHA-1 determined to be common.
ACK %s continue
-----------------------------------
* multi_ack only:
Sent in response to "have".
The remote side wants the client to consider this object as
common, and immediately stop transmitting additional "have"
lines for objects that are reachable from it. The reason
the client should stop is not given, but is one of the two
cases below available under multi_ack_detailed.
ACK %s common
-----------------------------------
* multi_ack_detailed only:
Sent in response to "have". Both sides have this object.
Like with "ACK %s continue" above the client should stop
sending have lines reachable for objects from the argument.
ACK %s ready
-----------------------------------
* multi_ack_detailed only:
Sent in response to "have".
The client should stop transmitting objects which are reachable
from the argument, and send "done" soon to get the objects.
If the remote side has the specified object, it should
first send an "ACK %s common" message prior to sending
"ACK %s ready".
Clients may still submit additional "have" lines if there are
more side branches for the client to explore that might be added
to the common set and reduce the number of objects to transfer.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were several unchecked use of fdopen(); replace them with xfdopen()
that checks and dies.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 2d14d65 (Use a clearer style to issue commands to remote helpers,
2009-09-03) I happened to notice two changes like this:
- write_in_full(helper->in, "list\n", 5);
+
+ strbuf_addstr(&buf, "list\n");
+ write_in_full(helper->in, buf.buf, buf.len);
+ strbuf_reset(&buf);
IMHO, it would be better to define a new function,
static inline ssize_t write_str_in_full(int fd, const char *str)
{
return write_in_full(fd, str, strlen(str));
}
and then use it like this:
- strbuf_addstr(&buf, "list\n");
- write_in_full(helper->in, buf.buf, buf.len);
- strbuf_reset(&buf);
+ write_str_in_full(helper->in, "list\n");
Thus not requiring the added allocation, and still avoiding
the maintenance risk of literal string lengths.
These days, compilers are good enough that strlen("literal")
imposes no run-time cost.
Transformed via this:
perl -pi -e \
's/write_in_full\((.*?), (".*?"), \d+\)/write_str_in_full($1, $2)/'\
$(git grep -l 'write_in_full.*"')
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
First of all, I can't find any reason why thin pack generation is
explicitly disabled when dealing with a shallow repository. The
possible delta base objects are collected from the edge commits which
are always obtained through history walking with the same shallow refs
as the client, Therefore the client is always going to have those base
objects available. So let's remove that restriction.
Then we can make shallow repository deepening much more efficient by
using the remote's unshallowed commits as edge commits to get preferred
base objects for thin pack generation. On git.git, this makes the data
transfer for the deepening of a shallow repository from depth 1 to depth 2
around 134 KB instead of 3.68 MB.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The majority of code in core git appears to use a single
space after if/for/while. This is an attempt to bring more
code to this standard. These are entirely cosmetic changes.
Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A request to clone the repository does not give any "have" but asks for
all the refs we offer with "want". When a request does not ask to clone
the repository fully, but asks to fetch some refs into an empty
repository, it will not give any "have" but its "want" won't ask for all
the refs we offer.
If we suppose (and I would say this is a rather big if) that it makes
sense to distinguish these two cases, a hook cannot reliably do this
alone. The hook can detect lack of "have" and bunch of "want", but there
is no direct way to tell if the other end asked for all refs we offered,
or merely most of them.
Between the time we talked with the other end and the time the hook got
called, we may have acquired more refs or lost some refs in the repository
by concurrent operations. Given that we plan to introduce selective
advertisement of refs with a protocol extension, it would become even more
difficult for hooks to guess between these two cases.
This adds "kind [clone|fetch]" to hook's input, as a stable interface to
allow the hooks to tell these cases apart.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After upload-pack successfully finishes its operation, post-upload-pack
hook can be called for logging purposes.
The hook is passed various pieces of information, one per line, from its
standard input. Currently the following items can be fed to the hook, but
more types of information may be added in the future:
want SHA-1::
40-byte hexadecimal object name the client asked to include in the
resulting pack. Can occur one or more times in the input.
have SHA-1::
40-byte hexadecimal object name the client asked to exclude from
the resulting pack, claiming to have them already. Can occur zero
or more times in the input.
time float::
Number of seconds spent for creating the packfile.
size decimal::
Size of the resulting packfile in bytes.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cc/replace:
t6050: check pushing something based on a replaced commit
Documentation: add documentation for "git replace"
Add git-replace to .gitignore
builtin-replace: use "usage_msg_opt" to give better error messages
parse-options: add new function "usage_msg_opt"
builtin-replace: teach "git replace" to actually replace
Add new "git replace" command
environment: add global variable to disable replacement
mktag: call "check_sha1_signature" with the replacement sha1
replace_object: add a test case
object: call "check_sha1_signature" with the replacement sha1
sha1_file: add a "read_sha1_file_repl" function
replace_object: add mechanism to replace objects found in "refs/replace/"
refs: add a "for_each_replace_ref" function
upload-pack runs pack-objects, which generates progress indicator output
on its stderr. If the client requests a sideband, this indicator is sent
to the client; but if it did not, then the progress is written to
upload-pack's own stderr.
If upload-pack is itself run from git-daemon (and if the client did not
request a sideband) the progress indicator never reaches the client and it
need not be generated in the first place. With this patch the progress
indicator is suppressed in this situation.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Offload object enumeration in upload-pack to pack-objects, but fall
back on internal revision walker for shallow interaction. Aside from
architecturally making more sense, this also leaves the door open for
pack-objects to employ a revision cache mechanism. Test t5530 updated
in order to explicitly check both enumeration methods.
Signed-off-by: Nick Edelen <sirnot@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new "read_replace_refs" global variable is set to 1 by
default, so that replace refs are used by default. But
reachability traversal and packing commands ("cmd_fsck",
"cmd_prune", "cmd_pack_objects", "upload_pack",
"cmd_unpack_objects") set it to 0, as they must work with the
original DAG.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/pack-object-memuse:
show_object(): push path_name() call further down
process_{tree,blob}: show objects without buffering
Conflicts:
builtin-pack-objects.c
builtin-rev-list.c
list-objects.c
list-objects.h
upload-pack.c
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow
- more clarity into who "owns" the name (ie now when we free the name in
the show_object callback, it's because we generated it ourselves by
calling path_name())
- not calling path_name() at all, either because we don't care about the
name in the first place, or because we are actually happy walking the
linked list of "struct name_path *" and the last component.
Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.
Why? We use that name for two things:
- add_preferred_base_object(), which actually _wants_ to traverse the
path, and now does it by looking for '/' characters!
- for 'name_hash()', which only cares about the last 16 characters of a
name, so again, generating the full name seems to be just unnecessary
work.
Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.
This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Here's a less trivial thing, and slightly more dubious one.
I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.
Now, there are possible downsides to this:
- the "buffer using object_array" _can_ in theory result in at least
better I-cache usage (two tight loops rather than one more spread out
one). I don't think this is a real issue, but in theory..
- this _does_ change the order of the objects printed. Instead of doing a
"process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
over the commits (which puts all the root trees _first_ in the object
list, this patch just adds them to the list of pending objects, and
then we'll traverse them in that order (and thus show each root tree
object together with the objects we discover under it)
I _think_ the new ordering actually makes more sense, but the object
ordering is actually a subtle thing when it comes to packing
efficiency, so any change in order is going to have implications for
packing. Good or bad, I dunno.
- There may be some reason why we did it that odd way with the object
array, that I have simply forgotten.
Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.
This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...
Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.
Or maybe there _used_ to be a reason, and no longer is.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The goal of this patch is to get rid of the "static struct rev_info
revs" static variable in "builtin-rev-list.c".
To do that, we need to pass the revs to the "show_commit" function
in "builtin-rev-list.c" and this in turn means that the
"traverse_commit_list" function in "list-objects.c" must be passed
functions pointers to functions with 2 parameters instead of one.
So we have to change all the callers and all the functions passed
to "traverse_commit_list".
Anyway this makes the code more clean and more generic, so it
should be a good thing in the long run.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These weren't used outside and can be safely moved
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Certain remote commands, when asked to do something in a
particular directory that was not actually a git repository,
would say "unable to chdir or not a git archive". The
"chdir" bit is an unnecessary detail, and the term "git
archive" is much less common these days than "git repository".
So let's switch them all to:
fatal: '%s' does not appear to be a git repository
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Programs that use git_config need to find the global configuration.
When runtime prefix computation is enabled, this requires that
git_extract_argv0_path() is called early in the program's main().
This commit adds the necessary calls.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a mechanical conversion of all '*.c' files with:
s/((?:die|error|warning)\("git)-(\S+:)/$1 $2/;
The result was manually inspected and no false positive was found.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will need the command invocation path in system_path(). This path was
passed to setup_path(), but system_path() can be called earlier, for
example via:
main
commit_pager_choice
setup_pager
git_config
git_etc_gitconfig
system_path
Therefore, we introduce git_set_argv0_path() and call it as soon as
possible.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In upload-pack we must explicitly close the output channel of rev-list.
(On Unix, the channel is closed automatically because process that runs
rev-list terminates.)
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
The new protocol extension "include-tag" allows the client side
of the connection (fetch-pack) to request that the server side of the
native git protocol (upload-pack / pack-objects) use --include-tag
as it prepares the packfile, thus ensuring that an annotated tag object
will be included in the resulting packfile if the object it refers to
was also included into the packfile.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To facilitate testing and verification of the requests sent by
git-fetch to the remote side we permit logging the received packet
lines to the file descriptor specified in GIT_DEBUG_SEND_PACK has
been set. Special start and end lines are included to indicate
the start and end of each connection.
$ GIT_DEBUG_SEND_PACK=3 git fetch 3>UPLOAD_LOG
$ cat UPLOAD_LOG
#S
want 8e10cf4e007ad7e003463c30c34b1050b039db78 multi_ack side-band-64k thin-pack ofs-delta
want ddfa4a33562179aca1ace2bcc662244a17d0b503
#E
#S
want 3253df4d1cf6fb138b52b1938473bcfec1483223 multi_ack side-band-64k thin-pack ofs-delta
#E
>From the above trace the first connection opened by git-fetch was to
download two refs (with values 8e and dd) and the second connection
was opened to automatically follow an annotated tag (32).
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mk/maint-parse-careful:
receive-pack: use strict mode for unpacking objects
index-pack: introduce checking mode
unpack-objects: prevent writing of inconsistent objects
unpack-object: cache for non written objects
add common fsck error printing function
builtin-fsck: move common object checking code to fsck.c
builtin-fsck: reports missing parent commits
Remove unused object-ref code
builtin-fsck: move away from object-refs to fsck_walk
add generic, type aware object chain walker
Conflicts:
Makefile
builtin-fsck.c
* mk/maint-parse-careful:
peel_onion: handle NULL
check return value from parse_commit() in various functions
parse_commit: don't fail, if object is NULL
revision.c: handle tag->tagged == NULL
reachable.c::process_tree/blob: check for NULL
process_tag: handle tag->tagged == NULL
check results of parse_commit in merge_bases
list-objects.c::process_tree/blob: check for NULL
reachable.c::add_one_tree: handle NULL from lookup_tree
mark_blob/tree_uninteresting: check for NULL
get_sha1_oneline: check return value of parse_object
read_object_with_reference: don't read beyond the buffer
A failure in prepare_revision_walk can be caused by
a not parseable object.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since git-upload-pack has to spawn git-pack-objects, it has to make sure
that the latter can be found in the PATH. Without this patch an attempt
to clone or pull via ssh from a server fails if the git tools are not in
the standard PATH on the server even though git clone or git pull were
invoked with --upload-pack=/path/to/git-upload-pack.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
upload-pack spawns two processes, rev-list and pack-objects, and carefully
monitors their status so that it can report failure to the remote end.
This change removes the complicated procedures on the grounds of the
following observations:
- If everything is OK, rev-list closes its output pipe end, upon which
pack-objects (which reads from the pipe) sees EOF and terminates itself,
closing its output (and error) pipes. upload-pack reads from both until
it sees EOF in both. It collects the exit codes of the child processes
(which indicate success) and terminates successfully.
- If rev-list sees an error, it closes its output and terminates with
failure. pack-objects sees EOF in its input and terminates successfully.
Again upload-pack reads its inputs until EOF. When it now collects
the exit codes of its child processes, it notices the failure of rev-list
and signals failure to the remote end.
- If pack-objects sees an error, it terminates with failure. Since this
breaks the pipe to rev-list, rev-list is killed with SIGPIPE.
upload-pack reads its input until EOF, then collects the exit codes of
the child processes, notices their failures, and signals failure to the
remote end.
- If upload-pack itself dies unexpectedly, pack-objects is killed with
SIGPIPE, and subsequently also rev-list.
The upshot of this is that precise monitoring of child processes is not
required because both terminate if either one of them dies unexpectedly.
This allows us to use finish_command() and finish_async() instead of
an explicit waitpid(2) call.
The change is smaller than it looks because most of it only reduces the
indentation of a large part of the inner loop.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This gets rid of an explicit fork().
Since upload-pack has to coordinate two processes (rev-list and
pack-objects), we cannot use the normal finish_async(), but have to monitor
the process explicitly. Hence, there are no changes at this front.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This allows us later to use start_async() with this function, and at
the same time is a nice cleanup that makes a long function
(create_pack_file()) shorter.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This gets rid of an explicit fork/exec.
Since upload-pack has to coordinate two processes (rev-list and
pack-objects), we cannot use the normal finish_command(), but have to
monitor the processes explicitly. Hence, the waitpid() call remains.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This uses "git-apply --whitespace=strip" to fix whitespace errors that have
crept in to our source files over time. There are a few files that need
to have trailing whitespaces (most notably, test vectors). The results
still passes the test, and build result in Documentation/ area is unchanged.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now, we don't close the read end of the pipe when git-upload-pack
runs git-pack-object, so we hang forever (why don't we get SIGALRM?)
instead of dying with SIGPIPE if the latter dies, which seems to be the
norm if the client disconnects.
Thanks to Johannes Schindelin <Johannes.Schindelin@gmx.de> for
pointing out where this close() needed to go.
This patch has been tested on kernel.org for several weeks and appear
to resolve the problem of git-upload-pack processes hanging around
forever.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* 'js/fetch-progress' (early part):
Fixup no-progress for fetch & clone
fetch & clone: do not output progress when not on a tty
Conflicts:
git-fetch.sh
The intent of the commit 'fetch & clone: do not output progress when
not on a tty' was to make fetching and cloning less chatty when
output was not redirected (such as in a cron job).
However, there was a serious thinko in that commit. It assumed that
the client _and_ the server got this update at the same time. But
this is obviously not the case, and therefore upload-pack died on
seeing the option "--no-progress".
This patch fixes that issue by making it a protocol option. So, until
your server is updated, you still see the progress, but once the
server has this patch, it will be quiet.
A minor issue was also fixed: when cloning, the checkout did not
heed no_progress.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Previous step converted use of strncmp() with literal string
mechanically even when the result is only used as a boolean:
if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo")))
This step manually cleans them up to read:
if (!prefixcmp(arg, "foo"))
Signed-off-by: Junio C Hamano <junkio@cox.net>
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily. Leftover from this will be fixed in a separate step, including
idiotic conversions like
if (!strncmp("foo", arg, 3))
=>
if (!(-prefixcmp(arg, "foo")))
This was done by using this script in px.perl
#!/usr/bin/perl -i.bak -p
if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
}
if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
}
and running:
$ git grep -l strncmp -- '*.c' | xargs perl px.perl
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds the option "--no-progress" to fetch-pack and upload-pack,
and makes fetch and clone pass this option when stdout is not a tty.
While at documenting that option, also document --strict and --timeout
options for upload-pack.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently do not support fetching/cloning from a shallow repository
nor pushing into one. Make sure these are not attempted so that we
do not have to worry about corrupting repositories needlessly.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have a number of badly checked write() calls. Often we are
expecting write() to write exactly the size we requested or fail,
this fails to handle interrupts or short writes. Switch to using
the new write_in_full(). Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xwrite().
Note, the changes to config handling are much larger and handled
in the next patch in the sequence.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have a number of badly checked read() calls. Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads. Add a read_in_full()
providing those semantics. Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is to adjust to:
count-objects -v: show number of packs as well.
which will break a test in this series.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
A commit may have been put on the shallow list, and then reached from
another branch and marked NOT_SHALLOW without being removed from the
list.
Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, by saying "git fetch -depth <n> <repo>" you can deepen
a shallow repository.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
By specifying a depth, you can now clone a repository such that
all fetched ancestor-chains' length is at most "depth". For example,
if the upstream repository has only 2 branches ("A" and "B"), which
are linear, and you specify depth 3, you will get A, A~1, A~2, A~3,
B, B~1, B~2, and B~3. The ends are automatically made shallow
commits.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
A shallow commit is a commit which has parents, which in turn are
"grafted away", i.e. the commit appears as if it were a root.
Since these shallow commits should not be edited by the user, but
only by core git, they are recorded in the file $GIT_DIR/shallow.
A repository containing shallow commits is called shallow.
The advantage of a shallow repository is that even if the upstream
contains lots of history, your local (shallow) repository needs not
occupy much disk space.
The disadvantage is that you might miss a merge base when pulling
some remote branch.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It is trivial to do now, and it is needed for the upcoming shallow
clone stuff.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* lj/refs: (63 commits)
Fix show-ref usagestring
t3200: git-branch testsuite update
sha1_name.c: avoid compilation warnings.
Make git-branch a builtin
ref-log: fix D/F conflict coming from deleted refs.
git-revert with conflicts to behave as git-merge with conflicts
core.logallrefupdates thinko-fix
git-pack-refs --all
core.logallrefupdates create new log file only for branch heads.
Remove bashism from t3210-pack-refs.sh
ref-log: allow ref@{count} syntax.
pack-refs: call fflush before fsync.
pack-refs: use lockfile as everybody else does.
git-fetch: do not look into $GIT_DIR/refs to see if a tag exists.
lock_ref_sha1_basic does not remove empty directories on BSD
Do not create tag leading directories since git update-ref does it.
Check that a tag exists using show-ref instead of looking for the ref file.
Use git-update-ref to delete a tag instead of rm()ing the ref file.
Fix refs.c;:repack_without_ref() clean-up path
Clean up "git-branch.sh" and add remove recursive dir test cases.
...
There is no reason not to always do this when both ends agree.
Therefore a client that can accept offsets to delta base always sends
the "ofs-delta" flag. The server will stream a pack with or without
offset to delta base depending on whether that flag is provided or not
with no additional cost.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds a "int *flag" parameter to resolve_ref() and makes
for_each_ref() family to call callback function with an extra
"int flag" parameter. They are used to give two bits of
information (REF_ISSYMREF and REF_ISPACKED) about the ref.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a long overdue fix to the API for for_each_ref() family
of functions. It allows the callers to specify a callback data
pointer, so that the caller does not have to use static
variables to communicate with the callback funciton.
The updated for_each_ref() family takes a function of type
int (*fn)(const char *, const unsigned char *, void *)
and a void pointer as parameters, and calls the function with
the name of the ref and its SHA-1 with the caller-supplied void
pointer as parameters.
The commit updates two callers, builtin-name-rev.c and
builtin-pack-refs.c as an example.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The original side-band support added to the upload-pack protocol used the
default 1000-byte packet length. The pkt-line format allows up to 64k, so
prepare the receiver for the maximum size, and have the uploader and
downloader negotiate if larger packet length is allowed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The server side support; this is just the very low level, and the
caller needs to know which band it wants to send things out.
Signed-off-by: Junio C Hamano <junkio@cox.net>
(cherry picked from b786552b67878c7780c50def4c069d46dc54efbe commit)
This abstracts away the size of the hash values when copying them
from memory location to memory location, much as the introduction
of hashcmp abstracted away hash value comparsion.
A few call sites were using char* rather than unsigned char* so
I added the cast rather than open hashcpy to be void*. This is a
reasonable tradeoff as most call sites already use unsigned char*
and the existing hashcmp is also declared to be unsigned char*.
[jc: Splitted the patch to "master" part, to be followed by a
patch for merge-recursive.c which is not in "master" yet.
Fixed the cast in the latter hunk to combine-diff.c which was
wrong in the original.
Also converted ones left-over in combine-diff.c, diff-lib.c and
upload-pack.c ]
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
[jc: I needed to hand merge the changes to the updated codebase,
so the result needs to be checked.]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the downloader has more roots than we do, we will see many
"have" that leads to the root we do not have and let the other
side walk all the way to that root. Tell them to stop sending the
branch'es ancestors by sending a fake "ACK" when we already have
common ancestor for the wanted refs.
Signed-off-by: Junio C Hamano <junkio@cox.net>
No changes to what it does, but separating the codepath clearly
with if ... else if ... chain makes it easier to follow.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.
Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
By using object_array data structure, lift the old limitation of
MAX_HAS/MAX_NEEDS. While we are at it, rename the variables
that hold the objects we use to compute common ancestor to match
the message used at the protocol level. What the other end has
and we also do are "have"s, and what the other end asks for are
"want"s.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Merlyn reports that <sys/poll.h> on OpenBSD 3.8 includes <ctype.h>
and having our custom ctype (done in git-compat-util.h which is
included via cache.h) makes upload-pack.c uncompilable. Try to
work it around by including the system headers first.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This implements a protocol extension between fetch-pack and
upload-pack to allow stderr stream from upload-pack (primarily
used for the progress bar display) to be passed back.
Signed-off-by: Junio C Hamano <junkio@cox.net>