strbuf_getwholeline() did not NUL-terminate the buffer on certain
corner cases in its error codepath.
* jk/getwholeline-getdelim-empty:
strbuf_getwholeline: NUL-terminate getdelim buffer on error
Commit 0cc30e0 (strbuf_getwholeline: use getdelim if it is
available, 2015-04-16) tries to clean up after getdelim()
returns EOF, but gets one case wrong, which can lead in some
obscure cases to us reading uninitialized memory.
After getdelim() returns -1, we re-initialize the strbuf
only if sb->buf is NULL. The thinking was that either:
1. We fed an existing allocated buffer to getdelim(), and
at most it would have realloc'd, leaving our NUL in
place.
2. We didn't have a buffer to feed, so we gave getdelim()
NULL; sb->buf will remain NULL, and we just want to
restore the empty slopbuf.
But that second case isn't quite right. getdelim() may
allocate a buffer, write nothing into it, and then return
EOF. The resulting strbuf rightfully has sb->len set to "0",
but is missing the NUL terminator in the first byte.
Most call-sites are fine with this. They see the EOF and
don't bother looking at the strbuf. Or they notice that
sb->len is empty, and don't look at the contents. But
there's at least one case that does neither, and relies on
parsing the resulting (possibly zero-length) string:
fast-import. You can see this in action with the new test
(though we probably only notice failure there when run with
--valgrind or ASAN).
We can fix this by unconditionally resetting the strbuf when
we have a buffer after getdelim(). That fixes case 2 above.
Case 1 is probably already fine in practice, but it does not
hurt for us to re-assert our invariants (especially because
we are relying on whatever getdelim() happens to do, which
may vary from platform to platform). Our fix covers that
case, too.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not want the output to be interrupted by a NUL byte, so we
cannot use raw fputs. Introduce strbuf_write to avoid having long
arguments in run-command.c.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
We frequently allocate strings as xmalloc(len + 1), where
the extra 1 is for the NUL terminator. This can be done more
simply with xmallocz, which also checks for integer
overflow.
There's no case where switching xmalloc(n+1) to xmallocz(n)
is wrong; the result is the same length, and malloc made no
guarantees about what was in the buffer anyway. But in some
cases, we can stop manually placing NUL at the end of the
allocated buffer. But that's only safe if it's clear that
the contents will always fill the buffer.
In each case where this patch does so, I manually examined
the control flow, and I tried to err on the side of caution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The preliminary clean-up for jc/peace-with-crlf topic.
* jc/strbuf-getline:
strbuf: give strbuf_getline() to the "most text friendly" variant
checkout-index: there are only two possible line terminations
update-index: there are only two possible line terminations
check-ignore: there are only two possible line terminations
check-attr: there are only two possible line terminations
mktree: there are only two possible line terminations
strbuf: introduce strbuf_getline_{lf,nul}()
strbuf: make strbuf_getline_crlf() global
strbuf: miniscule style fix
Now there is no direct caller to strbuf_getline(), we can demote it
to file-scope static that is private to strbuf.c and rename it to
strbuf_getdelim(). Rename strbuf_getline_crlf(), which is designed
to be the most "text friendly" variant, and allow it to take over
this simplest name, strbuf_getline(), so we can add more uses of it
without having to type _crlf over and over again in the coming
steps.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The strbuf_getline() interface allows a byte other than LF or NUL as
the line terminator, but this is only because I wrote these
codepaths anticipating that there might be a value other than NUL
and LF that could be useful when I introduced line_termination long
time ago. No useful caller that uses other value has emerged.
By now, it is clear that the interface is overly broad without a
good reason. Many codepaths have hardcoded preference to read
either LF terminated or NUL terminated records from their input, and
then call strbuf_getline() with LF or NUL as the third parameter.
This step introduces two thin wrappers around strbuf_getline(),
namely, strbuf_getline_lf() and strbuf_getline_nul(), and
mechanically rewrites these call sites to call either one of
them. The changes contained in this patch are:
* introduction of these two functions in strbuf.[ch]
* mechanical conversion of all callers to strbuf_getline() with
either '\n' or '\0' as the third parameter to instead call the
respective thin wrapper.
After this step, output from "git grep 'strbuf_getline('" would
become a lot smaller. An interim goal of this series is to make
this an empty set, so that we can have strbuf_getline_crlf() take
over the shorter name strbuf_getline().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Often we read "text" files that are supplied by the end user
(e.g. commit log message that was edited with $GIT_EDITOR upon 'git
commit -e'), and in some environments lines in a text file are
terminated with CRLF. Existing strbuf_getline() knows to read a
single line and then strip the terminating byte from the result, but
it is handy to have a version that is more tailored for a "text"
input that takes both '\n' and '\r\n' as line terminator (aka
<newline> in POSIX lingo) and returns the body of the line after
stripping <newline>.
Recently reimplemented "git am" uses such a function implemented
privately; move it to strbuf.[ch] and make it available for others.
Note that we do not blindly replace calls to strbuf_getline() that
uses LF as the line terminator with calls to strbuf_getline_crlf()
and this is very much deliberate. Some callers may want to treat an
incoming line that ends with CR (and terminated with LF) to have a
payload that includes the final CR, and such a blind replacement
will result in misconversion when done without code audit.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new call will read from a file descriptor into a strbuf once. The
underlying call xread is just run once. xread only reattempts
reading in case of EINTR, which makes it suitable to use for a
nonblocking read.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The internal stripspace() function has been moved to where it
logically belongs to, i.e. strbuf API, and the command line parser
of "git stripspace" has been updated to use the parse_options API.
* tk/stripspace:
stripspace: use parse-options for command-line parsing
strbuf: make stripspace() part of strbuf
This function is also used in other builtins than stripspace, so it
makes sense to have it in a more generic place. Since it operates
on an strbuf and the function is declared in strbuf.h, move it to
strbuf.c and add the corresponding prefix to its name, just like
other API functions in the strbuf_* family.
Also switch all current users of stripspace() to the new function
name and keep a temporary wrapper inline function for any topic
branches still using stripspace().
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.
However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sha1_to_hex and find_unique_abbrev functions always
write into reusable static buffers. There are a few problems
with this:
- future calls overwrite our result. This is especially
annoying with find_unique_abbrev, which does not have a
ring of buffers, so you cannot even printf() a result
that has two abbreviated sha1s.
- if you want to put the result into another buffer, we
often strcpy, which looks suspicious when auditing for
overflows.
This patch introduces sha1_to_hex_r and find_unique_abbrev_r,
which write into a user-provided buffer. Of course this is
just punting on the overflow-auditing, as the buffer
obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is
much easier to audit, since that is a well-known size.
We retain the non-reentrant forms, which just become thin
wrappers around the reentrant ones. This patch also adds a
strbuf variant of find_unique_abbrev, which will be handy in
later patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_read() used to have one extra iteration (and an unnecessary
strbuf_grow() of 8kB), which was eliminated.
* jh/strbuf-read-use-read-in-full:
strbuf_read(): skip unnecessary strbuf_grow() at eof
The loop in strbuf_read() uses xread() repeatedly while extending
the strbuf until the call returns zero. If the buffer is
sufficiently large to begin with, this results in xread()
returning the remainder of the file to the end (returning
non-zero), the loop extending the strbuf, and then making another
call to xread() to have it return zero.
By using read_in_full(), we can tell when the read reached the end
of file: when it returns less than was requested, it's eof. This
way we can avoid an extra iteration that allocates an extra 8kB
that is never used.
Signed-off-by: Jim Hill <gjthill@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git log" and friends a new "--date=format:..." option to
format timestamps using system's strftime(3).
* jk/date-mode-format:
strbuf: make strbuf_addftime more robust
introduce "format" date-mode
convert "enum date_mode" into a struct
show-branch: use DATE_RELATIVE instead of magic number
The return value of strftime is poorly designed; when it
returns 0, the caller cannot tell if the buffer was not
large enough, or if the output was actually 0 bytes. In the
original implementation of strbuf_addftime, we simply punted
and guessed that our 128-byte hint would be large enough.
We can do better, though, if we're willing to treat strftime
like less of a black box. We can munge the incoming format
to make sure that it never produces 0-length output, and
then "fix" the resulting output. That lets us reliably grow
the buffer based on strftime's return value.
Clever-idea-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is currently declared to return int, which could overflow for
large files.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This feeds the format directly to strftime. Besides being a
little more flexible, the main advantage is that your system
strftime may know more about your locale's preferred format
(e.g., how to spell the days of the week).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We spend a lot of time in strbuf_getwholeline in a tight
loop reading characters from a stdio handle into a buffer.
The libc getdelim() function can do this for us with less
overhead. It's in POSIX.1-2008, and was a GNU extension
before that. Therefore we can't rely on it, but can fall
back to the existing getc loop when it is not available.
The HAVE_GETDELIM knob is turned on automatically for Linux,
where we have glibc. We don't need to set any new
feature-test macros, because we already define _GNU_SOURCE.
Other systems that implement getdelim may need to other
macros (probably _POSIX_C_SOURCE >= 200809L), but we can
address that along with setting the Makefile knob after
testing the feature on those systems.
Running "git rev-parse refs/heads/does-not-exist" on a repo
with an extremely large (1.6GB) packed-refs file went from
(best-of-5):
real 0m8.601s
user 0m8.084s
sys 0m0.524s
to:
real 0m6.768s
user 0m6.340s
sys 0m0.432s
for a wall-clock speedup of 21%.
Based on a patch from Rasmus Villemoes <rv@rasmusvillemoes.dk>.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the recent speedup to strbuf_addch, we can avoid
calling strbuf_grow() in a tight loop of single-character
adds by instead checking strbuf_avail.
Note that we would instead call strbuf_addch directly here,
but it does more work than necessary: it will NUL-terminate
the result for each character read. Instead, in this loop we
read the characters one by one and then add the terminator
manually at the end.
Running "git rev-parse refs/heads/does-not-exist" on a repo
with an extremely large (1.6GB) packed-refs file went from
(best-of-5):
real 0m10.948s
user 0m10.548s
sys 0m0.412s
to:
real 0m8.601s
user 0m8.084s
sys 0m0.524s
for a wall-clock speedup of 21%.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_getwholeline calls getc in a tight loop. On modern
libc implementations, the stdio code locks the handle for
every operation, which means we are paying a significant
overhead. We can get around this by locking the handle for
the whole loop and using the unlocked variant.
Running "git rev-parse refs/heads/does-not-exist" on a repo
with an extremely large (1.6GB) packed-refs file went from:
real 0m18.900s
user 0m18.472s
sys 0m0.448s
to:
real 0m10.953s
user 0m10.384s
sys 0m0.580s
for a wall-clock speedup of 42%. All times are best-of-3,
and done on a glibc 2.19 system.
Note that we call into strbuf_grow while holding the lock.
It's possible for that function to call other stdio
functions (e.g., printing to stderr when dying due to malloc
error); however, the POSIX.1-2001 definition of flockfile
makes it clear that the locks are per-handle, so we are fine
unless somebody else tries to read from our same handle.
This doesn't ever happen in the current code, and is
unlikely to be added in the future (we would have to do
something exotic like add a die_routine that tried to read
from stdin).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_getwholeline calls fgetc in a tight loop. Using the
getc form, which can be implemented as a macro, should be
faster (and we do not care about it evaluating our argument
twice, as we just have a plain variable).
On my glibc system, running "git rev-parse
refs/heads/does-not-exist" on a file with an extremely large
(1.6GB) packed-refs file went from (best of 3 runs):
real 0m19.383s
user 0m18.876s
sys 0m0.528s
to:
real 0m18.900s
user 0m18.472s
sys 0m0.448s
for a wall-clock speedup of 2.5%.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commented output used to blindly add a SP before the payload
line, resulting in "# \t<indented text>\n" when the payload began
with a HT. Instead, produce "#\t<indented text>\n".
* jc/strbuf-add-lines-avoid-sp-ht-sequence:
strbuf_add_commented_lines(): avoid SP-HT sequence in commented lines
The strbuf_add_commented_lines() function passes a pair of prefixes,
one to be used for a non-empty line, and the other for an empty
line, to underlying add_lines(). The former is set to a comment
char followed by a SP, while the latter is set to just the comment
char. This is designed to give a SP after the comment character,
e.g. "# <user text>\n", on a line with some text, and to avoid
emitting an unsightly "# \n" for an empty line.
Teach this machinery to also use the latter space-less prefix when
the payload line begins with a tab, to show e.g. "#\t<user text>\n";
otherwise we will end up showing "# \t<user text>\n" which is
similarly unsightly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move strbuf_addchars() to strbuf.c, where it belongs, and make it
available for other callers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce the use of fixed sized buffer passed to getcwd() calls
by introducing xgetcwd() helper.
* rs/strbuf-getcwd:
use strbuf_add_absolute_path() to add absolute paths
abspath: convert absolute_path() to strbuf
use xgetcwd() to set $GIT_DIR
use xgetcwd() to get the current directory or die
wrapper: add xgetcwd()
abspath: convert real_path_internal() to strbuf
abspath: use strbuf_getcwd() to remember original working directory
setup: convert setup_git_directory_gently_1 et al. to strbuf
unix-sockets: use strbuf_getcwd()
strbuf: add strbuf_getcwd()
Move most of the code of absolute_path() into the new function
strbuf_add_absolute_path() and in the process transform it to use
struct strbuf and xgetcwd() instead of a PATH_MAX-sized buffer,
which can be too small on some file systems.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add strbuf_getcwd(), which puts the current working directory into a
strbuf. Because it doesn't use a fixed-size buffer it supports
arbitrarily long paths, provided the platform's getcwd() does as well.
At least on Linux and FreeBSD it handles paths longer than PATH_MAX
just fine.
Suggested-by: Karsten Blees <karsten.blees@gmail.com>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/strip-suffix:
prepare_packed_git_one: refactor duplicate-pack check
verify-pack: use strbuf_strip_suffix
strbuf: implement strbuf_strip_suffix
index-pack: use strip_suffix to avoid magic numbers
use strip_suffix instead of ends_with in simple cases
replace has_extension with ends_with
implement ends_with via strip_suffix
add strip_suffix function
sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
The ends_with function is essentially a simplified version
of strip_suffix, in which we throw away the stripped length.
Implementing it as an inline on top of strip_suffix has two
advantages:
1. We save a bit of duplicated code.
2. The suffix is typically a string literal, and we call
strlen on it. By making the function inline, many
compilers can replace the strlen call with a constant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
You can use a strbuf to build up a string from parts, and
then detach it. In the general case, you might use multiple
strbuf_add* functions to do the building. However, in many
cases, a single strbuf_addf is sufficient, and we end up
with:
struct strbuf buf = STRBUF_INIT;
...
strbuf_addf(&buf, fmt, some, args);
str = strbuf_detach(&buf, NULL);
We can make this much more readable (and avoid introducing
an extra variable, which can clutter the code) by
introducing a convenience function:
str = xstrfmt(fmt, some, args);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Propagate the error messages from the webserver better to the
client coming over the HTTP transport.
* jk/http-errors:
http: default text charset to iso-8859-1
remote-curl: reencode http error messages
strbuf: add strbuf_reencode helper
http: optionally extract charset parameter from content-type
http: extract type/subtype portion of content-type
t5550: test display of remote http error messages
t/lib-httpd: use write_script to copy CGI scripts
test-lib: preserve GIT_CURL_VERBOSE from the environment
This is a convenience wrapper around `reencode_string_len`
and `strbuf_attach`.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a convenience wrapper to call tolower on each
character of the string.
This makes config's lowercase() function obsolete, though
note that because we have a strbuf, we are careful to
operate over the whole strbuf, rather than assuming that a
NUL is the end-of-string.
We could continue to offer a pure-string lowercase, but
there would be no callers (in most pure-string cases, we
actually duplicate and lowercase the duplicate, for which we
have the xstrdup_tolower wrapper).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have two implementations of the same function; let's drop
that to one. We take the name from daemon.c, but the
implementation (which is just slightly more efficient) from
the config code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_trim() strips whitespace from the end, then the beginning of
a strbuf. Those operations are duplicated in strbuf_rtrim() and
strbuf_ltrim().
Replace strbuf_trim() implementation with calls to strbuf_rtrim(),
then strbuf_ltrim().
Signed-off-by: Brian Gesiak <modocache@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As starts_with() and ends_with() have been used to
replace prefixcmp() and suffixcmp() respectively,
we can now remove them.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
prefixcmp() and suffixcmp() share the common "cmp" suffix that
typically are used to name functions that can be used for ordering,
but they can't, because they are not antisymmetric:
prefixcmp("foo", "foobar") < 0
prefixcmp("foobar", "foo") == 0
We in fact do not use these functions for ordering. Replace them
with functions that just check for equality.
Add starts_with() and end_with() that will be used to replace
prefixcmp() and suffixcmp(), respectively, as the first step. These
are named after corresponding functions/methods in programming
languages, like Java, Python and Ruby.
In vcs-svn/fast_export.c, there was already an ends_with() function
that did the same thing. Let's use the new one instead while at it.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Humanization of downloaded size is done in the same function as text
formatting in 'process.c'. The code cannot be reused easily elsewhere.
Separate text formatting from size simplification and make the
function public in strbuf so that it can easily be used by other
callers.
We now can use strbuf_humanise_bytes() for both downloaded size and
download speed calculation. One of the drawbacks is that speed will
now look like this when download is stalled: "0 bytes/s" instead of
"0 KiB/s".
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some users do want to write a line that begin with a pound sign, #,
in their commit log message. Many tracking system recognise
a token of #<bugid> form, for example.
The support we offer these use cases is not very friendly to the end
users. They have a choice between
- Don't do it. Avoid such a line by rewrapping or indenting; and
- Use --cleanup=whitespace but remove all the hint lines we add.
Give them a way to set a custom comment char, e.g.
$ git -c core.commentchar="%" commit
so that they do not have to do either of the two workarounds.
[jc: although I started the topic, all the tests and documentation
updates, many of the call sites of the new strbuf_add_commented_*()
functions, and the change to git-submodule.sh scripted Porcelain are
from Ralf.]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update imap-send to reuse xml quoting code from http-push codepath,
clean up some code, and fix a small bug.
* mh/unify-xml-in-imap-send-and-http-push:
wrap_in_html(): process message in bulk rather than line-by-line
wrap_in_html(): use strbuf_addstr_xml_quoted()
imap-send: change msg_data from storing (ptr, len) to storing strbuf
imap-send: correctly report errors reading from stdin
imap-send: store all_msgs as a strbuf
lf_to_crlf(): NUL-terminate msg_data::data
xml_entities(): use function strbuf_addstr_xml_quoted()
Add new function strbuf_add_xml_quoted()
Substantially the same code is present in http-push.c and imap-send.c,
so make a library function out of it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The word "delimiter" suggests that the argument separates the
substrings, whereas in fact (1) the delimiter characters are included
in the output, and (2) if the input string ends with the delimiter,
then the output does not include a final empty string. So rename the
"delim" arguments of the strbuf_split() family of functions to
"terminator", which is more suggestive of how it is used.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
While iterating, update str and slen to keep track of the part of the
string that hasn't been processed yet rather than computing things
relative to the start of the original string. This eliminates one
local variable, reduces the scope of another, and reduces the amount
of arithmetic needed within the loop.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Use ALLOC_GROW() rather than inline code to manage memory in
strbuf_split_buf(). Rename "pos" to "nr" because it better describes
the use of the variable and it better conforms to the "ALLOC_GROW"
idiom.
Also, instead of adding a sentinal NULL value after each entry is
added to the list, only add it once after all of the entries have been
added.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
The current behavior is to return NULL when strbuf did not
actually allocate a string. This can be quite surprising to
callers, though, who may feed the strbuf from arbitrary data
and expect to always get a valid value.
In most cases, it does not make a difference because calling
any strbuf function will cause an allocation (even if the
function ends up not inserting any data). But if the code is
structured like:
struct strbuf buf = STRBUF_INIT;
if (some_condition)
strbuf_addstr(&buf, some_string);
return strbuf_detach(&buf, NULL);
then you may or may not return NULL, depending on the
condition. This can cause us to segfault in http-push
(when fed an empty URL) and in http-backend (when an empty
parameter like "foo=bar&&" is in the $QUERY_STRING).
This patch forces strbuf_detach to allocate an empty
NUL-terminated string when it is called on a strbuf that has
not been allocated.
I investigated all call-sites of strbuf_detach. The majority
are either not affected by the change (because they call a
strbuf_* function unconditionally), or can handle the empty
string just as easily as NULL.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions are helpful when we do not want to expose \n to
translators. For example
printf("hello world\n");
can be converted to
printf_ln(_("hello world"));
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tr/maint-bundle-long-subject:
t5704: match tests to modern style
strbuf: improve strbuf_get*line documentation
bundle: use a strbuf to scan the log for boundary commits
bundle: put strbuf_readline_fd in strbuf.c with adjustments
The comment even said that it should eventually go there. While at
it, match the calling convention and name of the function to the
strbuf_get*line family. So it now is strbuf_getwholeline_fd.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/credentials:
t: add test harness for external credential helpers
credentials: add "store" helper
strbuf: add strbuf_add*_urlencode
Makefile: unix sockets may not available on some platforms
credentials: add "cache" helper
docs: end-user documentation for the credential subsystem
credential: make relevance of http path configurable
credential: add credential.*.username
credential: apply helper config
http: use credential API to get passwords
credential: add function for parsing url components
introduce credentials API
t5550: fix typo
test-lib: add test_config_global variant
Conflicts:
strbuf.c
This just follows the rfc3986 rules for percent-encoding
url data into a strbuf.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a contributor asks the integrator to merge her history, a signed tag
can be a good vehicle to communicate the authenticity of the request while
conveying other information such as the purpose of the topic.
E.g. a signed tag "for-linus" can be created, and the integrator can run:
$ git pull git://example.com/work.git/ for-linus
This would allow the integrator to run "git verify-tag FETCH_HEAD" to
validate the signed tag.
Update fmt-merge-msg so that it pre-fills the merge message template with
the body (but not signature) of the tag object to help the integrator write
a better merge message, in the same spirit as the existing merge.log summary
lines.
The message that comes from GPG signature validation is also included in
the merge message template to help the integrator verify it, but they are
prefixed with "#" to make them comments.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This use of strbuf_grow() is a historical artifact that was once used to
ensure that strbuf.buf was allocated and properly nul-terminated. This
was added before the introduction of the slopbuf in b315c5c0, which
guarantees that strbuf.buf always points to a usable nul-terminated string.
So let's remove it.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the case where sb is initialized to the slopbuf (through
strbuf_init(sb,0) or STRBUF_INIT), strbuf_grow() loses the terminating
nul: it grows the buffer, but gives ALLOC_GROW a NULL source to avoid
it being freed. So ALLOC_GROW does not copy anything to the new
memory area.
This subtly broke the call to strbuf_getline in read_next_command()
[fast-import.c:1855], which goes
strbuf_detach(&command_buf, NULL); # command_buf is now = STRBUF_INIT
stdin_eof = strbuf_getline(&command_buf, stdin, '\n');
if (stdin_eof)
return EOF;
In strbuf_getwholeline, this did
strbuf_grow(sb, 0); # loses nul-termination
if (feof(fp))
return EOF;
strbuf_reset(sb); # this would have nul-terminated!
Valgrind found this because fast-import subsequently uses prefixcmp()
on command_buf.buf, which after the EOF exit contains only
uninitialized memory.
Arguably strbuf_getwholeline is also broken, in that it touches the
buffer before deciding whether to do any work. However, it seems more
futureproof to not let the strbuf API lose the nul-termination by its
own fault.
So make sure that strbuf_grow() puts in a nul even if it has nowhere
to copy it from. This makes strbuf_grow(sb, 0) a semantic no-op as
far as readers of the buffer are concerned.
Also remove the nul-termination added by strbuf_init, which is made
redudant.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/maint-config-param:
config: use strbuf_split_str instead of a temporary strbuf
strbuf: allow strbuf_split to work on non-strbufs
config: avoid segfault when parsing command-line config
config: die on error in command-line config
fix "git -c" parsing of values with equals signs
strbuf_split: add a max parameter
The strbuf_split function takes a strbuf as input, and
outputs a list of strbufs. However, there is no reason that
the input has to be a strbuf, and not an arbitrary buffer.
This patch adds strbuf_split_buf for a length-delimited
buffer, and strbuf_split_str for NUL-terminated strings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes when splitting, you only want a limited number of
fields, and for the final field to contain "everything
else", even if it includes the delimiter.
This patch introduces strbuf_split_max, which provides a
"max number of fields" parameter; it behaves similarly to
perl's "split" with a 3rd field.
The existing 2-argument form of strbuf_split is retained for
compatibility and ease-of-use.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_init does not zero-terminate the initial buffer when hint is
non-zero. Fix this so we can rely on the string to be zero-terminated
even if we haven't filled it with anything yet.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a variable-args function, the code for writing into a strbuf is
non-trivial. We ended up cutting and pasting it in several places
because there was no vprintf-style function for strbufs (which in turn
was held up by a lack of va_copy).
Now that we have a fallback va_copy, we can add strbuf_vaddf, the
strbuf equivalent of vsprintf. And we can clean up the cut and paste
mess.
Signed-off-by: Jeff King <peff@peff.net>
Improved-by: Christian Couder <christian.couder@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idiom (a + b < a) works fine for detecting that an unsigned
integer has overflowed, but a more explicit
unsigned_add_overflows(a, b)
might be easier to read.
Define such a macro, expanding roughly to ((a) < UINT_MAX - (b)).
Because the expansion uses each argument only once outside of sizeof()
expressions, it is safe to use with arguments that have side effects.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_branchname is a thin wrapper around interpret_branch_name
from sha1_name.o. Most strbuf.o users do not need it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current command line parser is overly lax in places and allows a
branch whose name begins with a hyphen e.g. "-foo" to be created, but the
parseopt infrastructure in general does not like to parse anything that
begins with a dash as a short-hand refname. "git checkout -foo" won't
work, nor will "git branch -d -foo" (even though "git branch -d -- -foo"
works, it does so by mistake; we should not be taking anything but
pathspecs after double-dash).
All the codepaths that create a new branch ref, including the destination
of "branch -m src dst", use strbuf_check_branch_ref() to validate if the
given name is suitable as a branch name. Tighten it to disallow a branch
that begins with a hyphen.
You can still get rid of historical mistakes with
$ git update-ref -d refs/heads/-foo
and third-party Porcelains are free to keep using update-ref to create
refs with a path component that begins with "-".
Issue originally raised by Clemens Buchacher.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ap/merge-backend-opts:
Document that merge strategies can now take their own options
Extend merge-subtree tests to test -Xsubtree=dir.
Make "subtree" part more orthogonal to the rest of merge-recursive.
pull: Fix parsing of -X<option>
Teach git-pull to pass -X<option> to git-merge
git merge -X<option>
git-merge-file --ours, --theirs
Conflicts:
git-compat-util.h
* jk/warn-author-committer-after-commit:
user_ident_sufficiently_given(): refactor the logic to be usable from elsewhere
commit.c::print_summary: do not release the format string too early
commit: allow suppression of implicit identity advice
commit: show interesting ident information in summary
strbuf: add strbuf_addbuf_percentquote
strbuf_expand: convert "%%" to "%"
Conflicts:
builtin-commit.c
ident.c
Teach "-X <option>" command line argument to "git merge" that is passed to
strategy implementations. "ours" and "theirs" autoresolution introduced
by the previous commit can be asked to the recursive strategy.
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is handy for creating strings which will be fed to printf() or
strbuf_expand().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only way to safely quote arbitrary text in a pretty-print user
format is to replace instances of "%" with "%x25". This is slightly
unreadable, and many users would expect "%%" to produce a single
"%", as that is what printf format specifiers do.
This patch converts "%%" to "%" for all users of strbuf_expand():
(1) git-daemon interpolated paths
(2) pretty-print user formats
(3) merge driver command lines
Case (1) was already doing the conversion itself outside of
strbuf_expand(). Case (2) is the intended beneficiary of this patch.
Case (3) users probably won't notice, but as this is user-facing
behavior, consistently providing the quoting mechanism makes sense.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is just like strbuf_getline() except it retains the
line-termination character. This function will be used by the mailinfo
and mailsplit builtins which require the entire line for parsing.
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
size_t res cannot be less than 0. fread returns 0 on error.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows a common calling sequence
strbuf_branchname(&ref, name);
strbuf_splice(&ref, 0, 0, "refs/heads/", 11);
if (check_ref_format(ref.buf))
die(...);
to be refactored into
if (strbuf_check_branch_ref(&ref, name))
die(...);
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function takes a user-supplied string that is supposed to be a branch
name, and puts it in a strbuf after expanding possible shorthand notation.
A handful of open coded sequence to do this in the existing code have been
changed to use this helper function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be less object code and may be even faster, even if at the
moment there is no callers to take an advantage of that. This
implementation can be trivially made inlinable later.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make all strbuf functions that can fail free() their memory on error if
they have allocated it. They don't shrink buffers that have been grown,
though.
This allows for easier error handling, as callers only need to call
strbuf_release() if A) the command succeeded or B) if they would have had
to do so anyway because they added something to the strbuf themselves.
Bonus hunk: document strbuf_readlink.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was already what 'git apply' did in read_old_data(), just export it
as a real function, and make it be more generic.
In particular, this handles the case of the lstat() st_size data not
matching the readlink() return value properly (which apparently happens
at least on NTFS under Linux). But as a result of this you could also
use the new function without even knowing how big the link is going to
be, and it will allocate an appropriately sized buffer.
So we pass in the st_size of the link as just a hint, rather than a
fixed requirement.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new callback function strbuf_expand_dict_cb() can be used together
with strbuf_expand() if there is only a small number of placeholders
for static replacement texts. It expects its dictionary as an array of
placeholder+value pairs as context parameter, terminated by an entry
with the placeholder member set to NULL.
The new helper is intended to aid converting the remaining calls of
interpolate(). strbuf_expand() is smaller, more flexible and can be
used to go faster than interpolate(), so it should replace the latter.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, the --pretty=format prefix is looked up in a
tight loop in strbuf_expand(), if prefix is found it is then
used as argument for format_commit_item() that does another
search by a switch statement to select the proper operation.
Because the switch statement is already able to discard
unknown matches we don't need the prefix lookup before
to call format_commit_item().
Signed-off-by: Marco Costalba <mcostalba@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now the routine is an open-coded loop that avoids an extra
strlen() in the previous implementation, it got a bit too big to
be inlined. Uninlining it makes code footprint smaller but the
result still retains the avoidance of strlen() cost.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Solaris 9's vsnprintf implementation returns -1 if we pass it a
buffer of length 0. The only way to get it to give us the actual
length necessary for the formatted string is to grow the buffer
out to have at least 1 byte available in the strbuf and then ask
it to compute the length.
If the available space is 0 I'm growing it out by 64 to ensure
we will get an accurate length estimate from all implementations.
Some callers may need to grow the strbuf again but 64 should be a
reasonable enough initial growth.
We also no longer silently fail to append to the string when we are
faced with a broken vsnprintf implementation. On Solaris 9 this
silent failure caused me to no longer be able to execute "git clone"
as we tried to exec the empty string rather than "git-clone".
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new function, strbuf_adddup(), that appends a duplicate of a
part of a struct strbuf to end of the latter.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the --pretty=format placeholders expansions are expensive to
calculate. This is made worse by the current code's use of
interpolate(), which requires _all_ placeholders are to be prepared
up front.
One way to speed this up is to check which placeholders are present
in the format string and to prepare only the expansions that are
needed. That still leaves the allocation overhead of interpolate().
Another way is to use a callback based approach together with the
strbuf library to keep allocations to a minimum and avoid string
copies. That's what this patch does. It introduces a new strbuf
function, strbuf_expand().
The function takes a format string, list of placeholder strings,
a user supplied function 'fn', and an opaque pointer 'context'
to tell 'fn' what thingy to operate on.
The function 'fn' is expected to accept a strbuf, a parsed
placeholder string and the 'context' pointer, and append the
interpolated value for the 'context' thingy, according to the
format specified by the placeholder.
Thanks to Pierre Habouzit for his suggestion to use strchrnul() and
the code surrounding its callsite. And thanks to Junio for most of
this commit message. :)
Here my measurements of most of Paul Mackerras' test cases that
highlighted the performance problem (best of three runs):
(master)
$ time git log --pretty=oneline >/dev/null
real 0m0.390s
user 0m0.340s
sys 0m0.040s
(master)
$ time git log --pretty=raw >/dev/null
real 0m0.434s
user 0m0.408s
sys 0m0.016s
(master)
$ time git log --pretty="format:%H {%P} %ct" >/dev/null
real 0m1.347s
user 0m0.080s
sys 0m1.256s
(interp_find_active -- Dscho)
$ time ./git log --pretty="format:%H {%P} %ct" >/dev/null
real 0m0.694s
user 0m0.020s
sys 0m0.672s
(strbuf_expand -- this patch)
$ time ./git log --pretty="format:%H {%P} %ct" >/dev/null
real 0m0.395s
user 0m0.352s
sys 0m0.028s
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* make strbuf_read_file take a size hint (works like strbuf_read)
* use it in a couple of places.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For that purpose, the ->buf is always initialized with a char * buf living
in the strbuf module. It is made a char * so that we can sloppily accept
things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
initializer for ->buf without making gcc unhappy for very good reasons.
strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
anymore.
as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
->buf isn't an option anymore, if ->buf is going to escape from the scope,
and eventually be free'd.
API changes:
* strbuf_setlen now always works, so just make strbuf_reset a convenience
macro.
* strbuf_detatch takes a size_t* optional argument (meaning it can be
NULL) to copy the buffer's len, as it was needed for this refactor to
make the code more readable, and working like the callers.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add strbuf_remove, change strbuf_insert:
As both are special cases of strbuf_splice, implement them as such.
gcc is able to do the math and generate almost optimal code this way.
Add strbuf_swap:
Exchange the values of its arguments.
Use it in fast-import.c
Also fix spacing issues in strbuf.h
Signed-off-by: Pierre Habouzit <madcoder@debian.org>