We should never need to write the null sha1 into an index
entry (short of the 1 in 2^160 chance that somebody actually
has content that hashes to it). If we attempt to do so, it
is much more likely that it is a bug, since we use the null
sha1 as a sentinel value to mean "not valid".
The presence of null sha1s in the index (which can come
from, among other things, "update-index --cacheinfo", or by
reading a corrupted tree) can cause problems for later
readers, because they cannot distinguish the literal null
sha1 from its use a sentinel value. For example, "git
diff-files" on such an entry would make it appear as if it
is stat-dirty, and until recently, the diff code assumed
such an entry meant that we should be diffing a working tree
file rather than a blob.
Ideally, we would stop such entries from entering even our
in-core index. However, we do sometimes legitimately add
entries with null sha1s in order to represent these sentinel
situations; simply forbidding them in add_index_entry breaks
a lot of the existing code. However, we can at least make
sure that our in-core sentinel representation never makes it
to disk.
To be thorough, we will test an attempt to add both a blob
and a submodule entry. In the former case, we might run into
problems anyway because we will be missing the blob object.
But in the latter case, we do not enforce connectivity
across gitlink entries, making this our only point of
enforcement. The current implementation does not care which
type of entry we are seeing, but testing both cases helps
future-proof the test suite in case that changes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git bundle" did not record boundary commits correctly when there
are many of them.
By Thomas Rast
* tr/maint-bundle-boundary:
bundle: keep around names passed to add_pending_object()
t5510: ensure we stay in the toplevel test dir
t5510: refactor bundle->pack conversion
"git diff-index" and its friends at the plumbing level showed the
"diff --git" header and nothing else for a path whose cached stat
info is dirty without actual difference when asked to produce a
patch. This was a longstanding bug that we could have fixed long
time ago.
By Junio C Hamano
* jc/maint-diff-patch-header:
diff -p: squelch "diff --git" header for stat-dirty paths
t4011: illustrate "diff-index -p" on stat-dirty paths
t4011: modernise style
The code to synthesize the fake ancestor tree used by 3-way merge
fallback in "git am" was not prepared to read a patch created with
a non-standard -p<num> value.
* jc/am-3-nonstandard-popt:
test: "am -3" can accept non-standard -p<num>
am -3: allow nonstandard -p<num> option
A section in a config file with a missing "]" reports the next line
as bad, same goes to a value with a missing end quote.
This happens because the error is not detected until the end of the
line, when line number is already increased. Fix this by decreasing
line number by one for these cases.
Signed-off-by: Martin Stenberg <martin@gnutiken.se>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the fast-import manual explains:
The value of <path> must be in canonical form. That is it must
not:
. contain an empty directory component (e.g. foo//bar is invalid),
. end with a directory separator (e.g. foo/ is invalid),
. start with a directory separator (e.g. /foo is invalid),
Unfortunately the "ls" command accepts these invalid syntaxes and
responds by declaring that the indicated path is missing. This is too
subtle and causes importers to silently misbehave; better to error out
so the operator knows what's happening.
The C, R, and M commands already error out for such paths.
Reported-by: Andrew Sayers <andrew-git@pileofstuff.org>
Analysis-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
OS X's sed and grep would complain with (respectively)
sed: 1: "/^-/{p;q}": extra characters at the end of q command
grep: Regular expression too big
For sed, use an explicit ; to terminate the q command.
For grep, spell the "40 hex digits" explicitly in the regex, which
should be safe as other tests already use this and we haven't got
breakage reports on OS X about them.
Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/maint-avoid-streaming-filtered-contents:
do not stream large files to pack when filters are in use
teach dry-run convert_to_git not to require a src buffer
teach convert_to_git a "dry run" mode
* tr/maint-bundle-long-subject:
t5704: match tests to modern style
strbuf: improve strbuf_get*line documentation
bundle: use a strbuf to scan the log for boundary commits
bundle: put strbuf_readline_fd in strbuf.c with adjustments
The construct 'while IFS== read' makes dash 0.5.6 execute
read without changing IFS, which results in test breakages
all over the place in t0300. Neither dash 0.5.5.1 and older
nor dash 0.5.7 and newer are affected: The problem was
introduded resp. fixed by the commits
55c46b7 ([BUILTIN] Honor tab as IFS whitespace when
splitting fields in readcmd, 2009-08-11)
1d806ac ([VAR] Do not poplocalvars prematurely on regular
utilities, 2010-05-27)
in http://git.kernel.org/?p=utils/dash/dash.git
Putting 'IFS==' before that line makes all versions of dash
work.
This looks like a dash bug, not a misinterpretation of the
standard. However, it's worth working around for two
reasons. One, this version of dash was released in Fedora
14-16, so the bug is found in the wild. And two, at least
one other shell, Solaris /bin/sh, choked on this by
persisting IFS after the read invocation. That is not a
shell we usually care about, and I think this use of IFS is
acceptable by POSIX (which allows other behavior near
"special builtins", but "read" is not one of those). But it
seems that this may be a subtle, not-well-tested case for
some shells. Given that the workaround is so simple, it's
worth just being defensive.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prepare expected output inside test_expect_success that uses it.
Also remove excess blank lines.
Signed-off-by: Tom Grennan <tmgrennan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If any test script is run directly with Solaris 10 /usr/xpg4/bin/sh or
/bin/ksh, it fails spuriously with a message like:
t0000-basic.sh[31]: unset: bad argument count
This happens because those shells bail out when encountering a call to
"unset" with no arguments, and such unset call could take place in
'test-lib.sh'. Fix that issue, and add a proper comment to ensure we
don't regress in this respect.
Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'name' field passed to add_pending_object() is used to later
deduplicate in object_array_remove_duplicates().
git-bundle had a bug in this area since 18449ab (git-bundle: avoid
packing objects which are in the prerequisites, 2007-03-08): it passed
the name of each boundary object in a static buffer. In other words,
all that object_array_remove_duplicates() saw was the name of the
*last* added boundary object.
The recent switch to a strbuf in bc2fed4 (bundle: use a strbuf to scan
the log for boundary commits, 2012-02-22) made this slightly worse: we
now free the buffer at the end, so it is not even guaranteed that it
still points into addressable memory by the time object_array_remove_
duplicates looks at it. On the plus side however, it was now
detectable by valgrind.
The fix is easy: pass a copy of the string to add_pending_object.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last test descended into a subdir without ever re-emerging, which
is not so nice to the next test writer.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not so much a conversion as a "strip everything up to and
including the first blank line", but it will come in handy again.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The plumbing "diff" commands look at the working tree files without
refreshing the index themselves for performance reasons (the calling
script is expected to do that upfront just once, before calling one or
more of them). In the early days of git, they showed the "diff --git"
header before they actually ask the xdiff machinery to produce patches,
and ended up showing only these headers if the real contents are the same
and the difference they noticed was only because the stat info cached in
the index did not match that of the working tree. It was too late for the
implementation to take the header that it already emitted back.
But 3e97c7c (No diff -b/-w output for all-whitespace changes, 2009-11-19)
introduced necessary logic to keep the meta-information headers in a
strbuf and delay their output until the xdiff machinery noticed actual
changes. This was primarily in order to generate patches that ignore
whitespaces. When operating under "-w" mode, we wouldn't know if the
header is needed until we actually look at the resulting patch, so it was
a sensible thing to do, but we did not realize that the same reasoning
applies to stat-dirty paths.
Later, 296c6bb (diff: fix "git show -C -C" output when renaming a binary
file, 2010-05-26) generalized this machinery and added must_show_header
toggle. This is turned on when the header must be shown even when there
is no patch to be produced, e.g. only the mode was changed, or the path
was renamed, without changing the contents. However, when it did so, it
still kept the special case for the "-w" mode, which meant that the
plumbing would keep showing these phantom changes.
This corrects this historical inconsistency by allowing the plumbing to
omit paths that are only stat-dirty from its output in the same way as it
handles whitespace only changes under "-w" option.
The change in the behaviour can be seen in the updated test.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The plumbing that looks at the working tree, i.e. "diff-index" and
"diff-files", always emit the "diff --git a/path b/path" header lines
without anything else for paths that are only stat-dirty (i.e. different
only because the cached stat information in the index no longer matches
that of the working tree, but the real contents are the same), when
these commands are run with "-p" option to produce patches.
Illustrate this current behaviour. Also demonstrate that with the "-w"
option, we (correctly) hold off showing a "diff --git" header until actual
differences have been found. This also suppresses the header for merely
stat-dirty files, which is inconsistent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Match the style to more modern test scripts, namely:
- The first line of each test has prereq, title and opening sq for the
script body. This makes the test shorter while reducing the need for
backslashes.
- Be prepared for the case in which the previous test may have failed.
If a test wants to start from not having "frotz" that the previous test
may have created, write "rm -f frotz", not "rm frotz".
- Prepare the expected output inside your own test.
- The order of comparison to check the result is "diff expected actual",
so that the output will show how the output from the git you just broke
is different from what is expected.
- Write no SP between redirection '>' (or '<' for that matter) and the
filename.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When using regexp search ('sr' parameter / $search_use_regexp variable
is true), check first that regexp is valid.
Without this patch we would get an error from Perl during search (if
searching is performed by gitweb), or highlighting matches substring
(if applicable), if user provided invalid regexp... which means broken
HTML, with error page (including HTTP headers) generated after gitweb
already produced some output.
Add test that illustrates such error: for example for regexp "*\.git"
we would get the following error:
Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE \.git/
at /var/www/cgi-bin/gitweb.cgi line 3084.
Reported-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
print_ref_list looks up the merge_filter_ref and assumes that a valid
pointer is returned. When the object doesn't exist, it tries to
dereference a NULL pointer. This can be the case when git branch
--merged is given an argument that isn't a valid commit name.
Check whether the lookup returns a NULL pointer and die with an error
if it does. Add a test, while we're at it.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a test for the previous one to make sure that "am -3 -p0" can
read patches created with the --no-prefix option.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-am.sh's check_patch_format function would attempt to preview
the patch to guess its format, but would go into an infinite loop
when the patch file happened to be empty. The solution: exit the
loop when "read" fails, not when the line var, "$l1" becomes empty.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This works in both bash and dash:
$ bash -c 'VAR=1 env' | grep VAR
VAR=1
$ dash -c 'VAR=1 env' | grep VAR
VAR=1
But environment variables assigned this way are not necessarily propagated
through a function in POSIX compliant shells:
$ bash -c 'f() { "$@"
}; VAR=1 f "env"' | grep VAR
VAR=1
$ dash -c 'f() { "$@"
}; VAR=1 f "env"' | grep VAR
Fix constructs like this, in particular, setting variables through
test_must_fail.
Based-on-patch-by: Vitor Antunes <vitor.hda@gmail.com>
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plain old $# works to count the number of arguments in
either bash or dash, even if the arguments have spaces.
Based-on-patch-by: Vitor Antunes <vitor.hda@gmail.com>
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the --use-client-spec is given to clone, and the clone
path is a subset of the full tree as specified in the client,
future submits will go to the wrong place.
Factor out getClientSpec() so both clone/sync and submit can
use it. Introduce getClientRoot() that is needed for the client
spec case, and use it instead of p4Where().
Test the five possible submit behaviors (add, modify, rename,
copy, delete).
Reported-by: Laurent Charrière <lcharriere@promptu.com>
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If --use-client-spec was given, set the matching configuration
variable. This is necessary to ensure that future submits
work properly.
The alternatives of requiring the user to set it, or providing
a command-line option on every submit, are error prone.
Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because git's object format requires us to specify the
number of bytes in the object in its header, we must know
the size before streaming a blob into the object database.
This is not a problem when adding a regular file, as we can
get the size from stat(). However, when filters are in use
(such as autocrlf, or the ident, filter, or eol
gitattributes), we have no idea what the ultimate size will
be.
The current code just punts on the whole issue and ignores
filter configuration entirely for files larger than
core.bigfilethreshold. This can generate confusing results
if you use filters for large binary files, as the filter
will suddenly stop working as the file goes over a certain
size. Rather than try to handle unknown input sizes with
streaming, this patch just turns off the streaming
optimization when filters are in use.
This has a slight performance regression in a very specific
case: if you have autocrlf on, but no gitattributes, a large
binary file will avoid the streaming code path because we
don't know beforehand whether it will need conversion or
not. But if you are handling large binary files, you should
be marking them as such via attributes (or at least not
using autocrlf, and instead marking your text files as
such). And the flip side is that if you have a large
_non_-binary file, there is a correctness improvement;
before we did not apply the conversion at all.
The first half of the new t1051 script covers these failures
on input. The second half tests the matching output code
paths. These already work correctly, and do not need any
adjustment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test did not adhere to the current style on several counts:
. empty lines around the test blocks, but within the test string
. ': > file' or even just '> file' with an extra space
. inconsistent indentation
. hand-rolled commits instead of using test_commit
Fix all of them.
There's a catch to the last point: test_commit creates a tag, which the
original test did not create. We still change it to test_commit, and
explicitly delete the tags, so as to highlight that the test relies on not
having them.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first part of the bundle header contains the boundary commits, and
could be approximated by
# v2 git bundle
$(git rev-list --pretty=oneline --boundary <ARGS> | grep ^-)
git-bundle actually spawns exactly this rev-list invocation, and does
the grepping internally.
There was a subtle bug in the latter step: it used fgets() with a
1024-byte buffer. If the user has sufficiently long subjects (e.g.,
by not adhering to the git oneline-subject convention in the first
place), the 'oneline' format can easily overflow the buffer. fgets()
then returns the rest of the line in the next call(s). If one of
these remaining parts started with '-', git-bundle would mistakenly
insert it into the bundle thinking it was a boundary commit.
Fix it by using strbuf_getwholeline() instead, which handles arbitrary
line lengths correctly.
Note that on the receiving side in parse_bundle_header() we were
already using strbuf_getwholeline_fd(), so that part is safe.
Reported-by: Jannis Pohlmann <jannis.pohlmann@codethink.co.uk>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/grep-binary-attribute:
grep: pre-load userdiff drivers when threaded
grep: load file data after checking binary-ness
grep: respect diff attributes for binary-ness
grep: cache userdiff_driver in grep_source
grep: drop grep_buffer's "name" parameter
convert git-grep to use grep_source interface
grep: refactor the concept of "grep source" into an object
grep: move sha1-reading mutex into low-level code
grep: make locking flag global
Commit ff7f218 (gitweb: Fix file links in "grep" search, 2012-01-05),
added $file_href variable, to reduce duplication and have the fix
applied in single place.
Unfortunately it made variable defined inside the loop, not taking into
account the fact that $file_href was set only if file changed.
Therefore for files with multiple matches $file_href was undefined for
second and subsequent matches.
Fix this bug by moving $file_href declaration outside loop.
Adds tests for almost all forms of sarch in gitweb, which were missing
from testuite. Note that it only tests if there are no warnings, and
it doesn't check that gitweb finds what it should find.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "git add --refresh <pathspec>", we incorrectly showed the
path that is unmerged even if it is outside the specified pathspec, even
though we did honor pathspec and refreshed only the paths that matched.
Note that this cange does not affect "git update-index --refresh"; for
hysterical raisins, it does not take a pathspec (it takes real paths) and
more importantly itss command line options are parsed and executed one by
one as they are encountered, so "git update-index --refresh foo" means
"first refresh the index, and then update the entry 'foo' by hashing the
contents in file 'foo'", not "refresh only entry 'foo'".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a repository whose HEAD points to an unborn branch with no commits,
"heads" view and "summary" view (which shows what is shown in "heads"
view) compared the object names of commits at the tip of branches with the
output from "git rev-parse HEAD", which caused comparison of a string with
undef and resulted in a warning in the server log.
This can happen if non-bare repository (with default 'master' branch)
is updated not via committing but by other means like push to it, or
Gerrit. It can happen also just after running "git checkout --orphan
<new branch>" but before creating any new commit on this branch.
Rewrite the comparison so that it also works when $head points at nothing;
in such a case, no branch can be "the current branch", add a test for it.
While at it, rename local variable $head to $head_at, as it points to
current commit rather than current branch name (HEAD contents).
The code still incorrectly shows all branches that point at the same
commit as what HEAD points as "the current branch", even when HEAD is
detached. Fixing this bug is outside the scope of this patch.
Reported-by: Rajesh Boyapati
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>