Until recently, get_remote_heads only knew how to read refs
from a file descriptor. To hack around this, we spawned a
thread (or forked a process) to write the buffer back to us.
Now that we can just pass it our buffer directly, we don't
have to use this hack anymore.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we can read packet data from memory as easily as a
descriptor, get_remote_heads can take either one as a
source. This will allow further refactoring in remote-curl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packet_read function reads from a descriptor. The
packet_get_line function is similar, but reads from an
in-memory buffer, and uses a completely separate
implementation. This patch teaches the generic packet_read
function to accept either source, and we can do away with
packet_get_line's implementation.
There are two other differences to account for between the
old and new functions. The first is that we used to read
into a strbuf, but now read into a fixed size buffer. The
only two callers are fine with that, and in fact it
simplifies their code, since they can use the same
static-buffer interface as the rest of the packet_read_line
callers (and we provide a similar convenience wrapper for
reading from a buffer rather than a descriptor).
This is technically an externally-visible behavior change in
that we used to accept arbitrary sized packets up to 65532
bytes, and now cap out at LARGE_PACKET_MAX, 65520. In
practice this doesn't matter, as we use it only for parsing
smart-http headers (of which there is exactly one defined,
and it is small and fixed-size). And any extension headers
would be breaking the protocol to go over LARGE_PACKET_MAX
anyway.
The other difference is that packet_get_line would return
on error rather than dying. However, both callers of
packet_get_line are actually improved by dying.
The first caller does its own error checking, but we can
drop that; as a result, we'll actually get more specific
reporting about protocol breakage when packet_read dies
internally. The only downside is that packet_read will not
print the smart-http URL that failed, but that's not a big
deal; anybody not debugging can already see the remote's URL
already, and anybody debugging would want to run with
GIT_CURL_VERBOSE anyway to see way more information.
The second caller, which is just trying to skip past any
extra smart-http headers (of which there are none defined,
but which we allow to keep room for future expansion), did
not error check at all. As a result, it would treat an error
just like a flush packet. The resulting mess would generally
cause an error later in get_remote_heads, but now we get
error reporting much closer to the source of the problem.
Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having the packet sizes defined near the packet read/write
functions makes more sense.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally we had a single function for reading packetized
data: packet_read_line. Commit 46284dd grew a more "gentle"
form, packet_read, that returns an error instead of dying
upon reading a truncated input stream. However, it is not
clear from the names which should be called, or what the
difference is.
Let's instead make packet_read be a generic public interface
that can take option flags, and update the single callsite
that uses it. This is less code, more clear, and paves the
way for introducing more options into the generic interface
later. The function signature is changed, so there should be
no hidden conflicts with topics in flight.
While we're at it, we'll document how error conditions are
handled based on the options, and rename the confusing
"return_line_fail" option to "gentle_on_eof". While we are
cleaning up the names, we can drop the "return_line_fail"
checks in packet_read_internal entirely. They look like
this:
ret = safe_read(..., return_line_fail);
if (return_line_fail && ret < 0)
...
The check for return_line_fail is a no-op; safe_read will
only ever return an error value if return_line_fail was true
in the first place.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is just write_or_die by another name. The one
distinction is that write_or_die will treat EPIPE specially
by suppressing error messages. That's fine, as we die by
SIGPIPE anyway (and in the off chance that it is disabled,
write_or_die will simulate it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment describing the packet writing interface was
originally written above packet_write, but migrated to be
above safe_write in f3a3214, probably because it is meant to
generally describe the packet writing interface and not a
single function. Let's move it into the header file, where
users of the interface are more likely to see it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The write_or_die function will always die on an error,
including EPIPE. However, it currently treats EPIPE
specially by suppressing any error message, and by exiting
with exit code 0.
Suppressing the error message makes some sense; a pipe death
may just be a sign that the other side is not interested in
what we have to say. However, exiting with a successful
error code is not a good idea, as write_or_die is frequently
used in cases where we want to be careful about having
written all of the output, and we may need to signal to our
caller that we have done so (e.g., you would not want a push
whose other end has hung up to report success).
This distinction doesn't typically matter in git, because we
do not ignore SIGPIPE in the first place. Which means that
we will not get EPIPE, but instead will just die when we get
a SIGPIPE. But it's possible for a default handler to be set
by a parent process, or for us to add a callsite inside one
of our few SIGPIPE-ignoring blocks of code.
This patch converts write_or_die to actually raise SIGPIPE
when we see EPIPE, rather than exiting with zero. This
brings the behavior in line with the "normal" case that we
die from SIGPIPE (and any callers who want to check why we
died will see the same thing). We also give the same
treatment to other related functions, including
write_or_whine_pipe and maybe_flush_or_die.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current parsing scheme for upload-archive is to pack
arguments into a fixed-size buffer, separated by NULs, and
put a pointer to each argument in the buffer into a
fixed-size argv array.
This works fine, and the limits are high enough that nobody
reasonable is going to hit them, but it makes the code hard
to follow. Instead, let's just stuff the arguments into an
argv_array, which is much simpler. That lifts the "all
arguments must fit inside 4K together" limit.
We could also trivially lift the MAX_ARGS limitation (in
fact, we have to keep extra code to enforce it). But that
would mean a client could force us to allocate an arbitrary
amount of memory simply by sending us "argument" lines. By
limiting the MAX_ARGS, we limit an attacker to about 4
megabytes (64 times a maximum 64K packet buffer). That may
sound like a lot compared to the 4K limit, but it's not a
big deal compared to what git-archive will actually allocate
while working (e.g., to load blobs into memory). The
important thing is that it is bounded.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to the comment, enter_repo will modify its input.
However, this has not been the case since 1c64b48
(enter_repo: do not modify input, 2011-10-04). Drop the
now-useless copy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code predates prefixcmp, so it used memcmp along with
static sizes. Replacing these memcmps with prefixcmp makes
the code much more readable, and the lack of static sizes
will make refactoring it in future patches simpler.
Note that we used to be unnecessarily liberal in parsing the
"unpack" status line, and would accept "unpack ok\njunk". No
version of git has ever produced that, and it violates the
BNF in Documentation/technical/pack-protocol.txt. Let's take
this opportunity to tighten the check by converting the
prefix comparison into a strcmp.
While we're in the area, let's also fix a vague error
message that does not follow our usual conventions (it
writes directly to stderr and does not use the "error:"
prefix).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we read acks from the remote, we expect either:
ACK <sha1>
or
ACK <sha1> <multi-ack-flag>
We parse the "ACK <sha1>" bit from the line, and then start
looking for the flag strings at "line+45"; if we don't have
them, we assume it's of the first type. But if we do have
the first type, then line+45 is not necessarily inside our
string at all!
It turns out that this works most of the time due to the way
we parse the packets. They should come in with a newline,
and packet_read puts an extra NUL into the buffer, so we end
up with:
ACK <sha1>\n\0
with the newline at offset 44 and the NUL at offset 45. We
then strip the newline, putting a NUL at offset 44. So
when we look at "line+45", we are looking past the end of
our string; but it's OK, because we hit the terminator from
the original string.
This breaks down, however, if the other side does not
terminate their packets with a newline. In that case, our
packet is one character shorter, and we start looking
through uninitialized memory for the flag. No known
implementation sends such a packet, so it has never come up
in practice.
This patch tightens the check by looking for a short,
flagless ACK before trying to parse the flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you set the GIT_DEBUG_SEND_PACK environment variable,
upload-pack will dump lines it receives in the receive_needs
phase to a descriptor. This debugging harness is a strict
subset of what GIT_TRACE_PACKET can do. Let's just drop it
in favor of that.
A few tests used GIT_DEBUG_SEND_PACK to confirm which
objects get sent; we have to adapt them to the new output
format.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the client tells us it has a shallow object via
"shallow <sha1>", we make sure we have the object, mark it
with a flag, then add it to a dynamic array of shallow
objects. This means that a client can get us to allocate
arbitrary amounts of memory just by flooding us with shallow
lines (whether they have the objects or not). You can
demonstrate it easily with:
yes '0035shallow e83c5163316f89bfbde7d9ab23ca2e25604af290' |
git-upload-pack git.git
We already protect against duplicates in want lines by
checking if our flag is already set; let's do the same thing
here. Note that a client can still get us to allocate some
amount of memory by marking every object in the repo as
"shallow" (or "want"). But this at least bounds it with the
number of objects in the repository, which is not under the
control of an upload-pack client.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we receive a line like "shallow <sha1>" from the
client, we feed the <sha1> part to get_sha1. This is a
mistake, as the argument on a shallow line is defined by
Documentation/technical/pack-protocol.txt to contain an
"obj-id". This is never defined in the BNF, but it is clear
from the text and from the other uses that it is meant to be
a hex sha1, not an arbitrary identifier (and that is what
fetch-pack has always sent).
We should be using get_sha1_hex instead, which doesn't allow
the client to request arbitrary junk like "HEAD@{yesterday}".
Because this is just marking shallow objects, the client
couldn't actually do anything interesting (like fetching
objects from unreachable reflog entries), but we should keep
our parsing tight to be on the safe side.
Because get_sha1 is for the most part a superset of
get_sha1_hex, in theory the only behavior change should be
disallowing non-hex object references. However, there is
one interesting exception: get_sha1 will only parse
a 40-character hex sha1 if the string has exactly 40
characters, whereas get_sha1_hex will just eat the first 40
characters, leaving the rest. That means that current
versions of git-upload-pack will not accept a "shallow"
packet that has a trailing newline, even though the protocol
documentation is clear that newlines are allowed (even
encouraged) in non-binary parts of the protocol.
This never mattered in practice, though, because fetch-pack,
contrary to the protocol documentation, does not include a
newline in its shallow lines. JGit follows its lead (though
it correctly is strict on the parsing end about wanting a
hex object id).
We do not adjust fetch-pack to send newlines here, as it
would break communication with older versions of git (and
there is no actual benefit to doing so, except for
consistency with other parts of the protocol).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the server side to redact the refs/ namespace it shows to the
client.
Will merge to 'master'.
* jc/hidden-refs:
upload/receive-pack: allow hiding ref hierarchies
upload-pack: simplify request validation
upload-pack: share more code
Add diff.algorithm configuration so that the user does not type
"diff --histogram".
* mp/diff-algo-config:
diff: Introduce --diff-algorithm command line option
config: Introduce diff.algorithm variable
git-completion.bash: Autocomplete --minimal and --histogram for git-diff
Allows skipping the untracked check GIT_PS1_SHOWUNTRACKEDFILES
asks for the git-prompt (in contrib/) per repository.
* mw/bash-prompt-show-untracked-config:
t9903: add extra tests for bash.showDirtyState
t9903: add tests for bash.showUntrackedFiles
shell prompt: add bash.showUntrackedFiles option
"git log --grep=<pattern>" used to look for the pattern in literal
bytes of the commit log message and ignored the log-output encoding.
* jk/read-commit-buffer-data-after-free:
log: re-encode commit messages before grepping
"make COMPUTE_HEADER_DEPENDENCIES=no clean" would try to run "rm
-rf $(dep_dirs)" with an empty dep_dir, but some implementations of
"rm -rf" barf on an empty argument list.
* mk/make-rm-depdirs-could-be-empty:
Makefile: don't run "rm" without any files
The merge at 5bf72ed2 missed another instance of <filepattern> that
we were converting to <pathspec>.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
acd2a45 (Refuse updating the current branch in a non-bare repository
via push, 2009-02-11) changed the default to refuse such a push, but
it forgot to update the docs.
7d182f5 (Documentation: receive.denyCurrentBranch defaults to
'refuse', 2010-03-17) updated Documentation/config.txt, but forgot to
update the user manual.
Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactors a lot of repetitive code sequence from the graph drawing
code and adds it to the combined diff output.
* jk/diff-graph-cleanup:
combine-diff.c: teach combined diffs about line prefix
diff.c: use diff_line_prefix() where applicable
diff: add diff_line_prefix function
diff.c: make constant string arguments const
diff: write prefix to the correct file
graph: output padding for merge subsequent parents
* bw/get-tz-offset-perl:
cvsimport: format commit timestamp ourselves without using strftime
perl/Git.pm: fix get_tz_offset to properly handle DST boundary cases
Move Git::SVN::get_tz to Git::get_tz_offset
Use a new helper that prints a message and counts its display width
to align the help messages parse-options produces.
* jx/utf8-printf-width:
Add utf8_fprintf helper that returns correct number of columns
Instead of requiring the full 40-hex object names on the index
line, we can read submodule commit object names from the textual
diff when synthesizing a fake ancestore tree for "git am -3".
* jc/extended-fake-ancestor-for-gitlink:
apply: verify submodule commit object name better
contrib/subtree updates, but here are only the ones that looked
ready. The remainder of the patches will have another day.
* dg/subtree-fixes:
contrib/subtree: make the manual directory if needed
contrib/subtree: honor DESTDIR
contrib/subtree: fix synopsis
contrib/subtree: better error handling for 'subtree add'
contrib/subtree: use %B for split subject/body
contrib/subtree: remove test number comments
Add 3 extra tests for the bash.showDirtyState config option; the
tests now cover all combinations of the shell var being set/unset
and the config option being missing/enabled/disabled, given a dirty
file.
Signed-off-by: Martin Erik Werner <martinerikwerner@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add 4 tests for the bash.showUntrackedFiles config option, covering
all combinations of the shell var being set/unset and the config
option being enabled/disabled (the other 2 cases, missing config
with and without shell variable, are already covered by existing
tests).
Signed-off-by: Martin Erik Werner <martinerikwerner@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When COMPUTE_HEADER_DEPENDENCIES is set to "auto" and the compiler
does not support it, $(dep_dirs) becomes empty. "make clean" runs
"rm -rf $(dep_dirs)", which can fail in such a case.
Signed-off-by: Matt Kraai <matt.kraai@amo.abbott.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a config option 'bash.showUntrackedFiles' which allows enabling
the prompt showing untracked files on a per-repository basis. This is
useful for some repositories where the 'git ls-files ...' command may
take a long time.
Signed-off-by: Martin Erik Werner <martinerikwerner@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit eff80a9 (Allow custom "comment char") introduced a custom comment
character for commit messages but did not teach git-rebase--interactive
to use it.
Change git-rebase--interactive to read core.commentchar and use its
value when generating commit messages and for the command list.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>