There are two spots that call lookup_object() and assume
that a non-NULL result means we have the object:
1. When we're checking the objects given to us by the user
on the command line.
2. When we're checking if a reflog entry is valid.
This generally follows fsck's mental model that we will have
looked at and loaded a "struct object" for each object in
the repository. But it misses one case: if another object
_mentioned_ an object, but we didn't actually parse it or
verify that it exists, it will still have a struct.
It's not clear if this is a triggerable bug or not.
Certainly the later parts of the reachability check need to
be careful of this, and do so by checking the HAS_OBJ flag.
But both of these steps happen before we start traversing,
so probably we won't have followed any links yet. Still,
it's easy enough to be defensive here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since fsck tries to continue as much as it can after seeing
an error, we still do the reachability check even if some
heads we were given on the command-line are bogus. But if
_none_ of the heads is is valid, we fallback to checking all
refs and the index, which is not what the user asked for at
all.
Instead of checking "heads", the number of successful heads
we got, check "argc" (which we know only has non-options in
it, because parse_options removed the others).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of checking reachability from the refs, you can ask
fsck to check from a particular set of heads. However, the
error checking here is quite lax. In particular:
1. It claims lookup_object() will report an error, which
is not true. It only does a hash lookup, and the user
has no clue that their argument was skipped.
2. When either the name or sha1 cannot be resolved, we
continue to exit with a successful error code, even
though we didn't check what the user asked us to.
This patch fixes both of these cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.
Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.
However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:
1. mark_object() will not queue objects for examination,
so recursively following links from commits to trees,
etc, did nothing. I.e., we were checking the
reachability of hardly anything at all.
2. When a set of heads is given on the command-line, we
use lookup_object() to see if they exist. But without
the initial pass, we assume nothing exists.
3. When loading reflog entries, we do a similar
lookup_object() check, and complain that the reflog is
broken if the object doesn't exist in our hash.
So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.
One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).
Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.
And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.
One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.
Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.
That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After checking connectivity, fsck looks through the list of
any objects we've seen mentioned, and reports unreachable
and un-"used" ones as dangling. However, it skips any object
which is not marked as "parsed", as that is an object that
we _don't_ have (but that somebody mentioned).
Since 6e454b9a3 (clear parsed flag when we free tree
buffers, 2013-06-05), that flag can't be relied on, and the
correct method is to check the HAS_OBJ flag. The cleanup in
that commit missed this callsite, though. As a result, we
would generally fail to report dangling trees.
We never noticed because there were no tests in this area
(for trees or otherwise). Let's add some.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test creates a multi-level set of trees, but its
cleanup routine only removes the top-level tree. After the
test finishes, the inner tree and the blob it points to
remain, making the inner tree dangling.
A later test ("cleaned up") verifies that we've removed any
cruft and "git fsck" output is clean. This passes only
because of a bug in git-fsck which fails to notice dangling
trees.
In preparation for fixing the bug, let's teach this earlier
test to clean up after itself correctly. We have to remove
the inner tree (and therefore the blob, too, which becomes
dangling after removing that tree).
Since the setup code happens inside a subshell, we can't
just set a variable for each object. However, we can stuff
all of the sha1s into the $T output variable, which is not
used for anything except cleanup.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the rule to convert "unsigned char [20]" into "struct
object_id *" in contrib/coccinelle/
* rs/cocci:
cocci: avoid self-references in object_id transformations
Update to the test framework made in 2.9 timeframe broke running
the tests under valgrind, which has been fixed.
* nd/test-helpers:
valgrind: support test helpers
Portability update and workaround for builds on recent Mac OS X.
* ls/macos-update:
travis-ci: disable GIT_TEST_HTTPD for macOS
Makefile: set NO_OPENSSL on macOS by default
Fix for a racy false-positive test failure.
* as/merge-attr-sleep:
t6026: clarify the point of "kill $(cat sleep.pid)"
t6026: ensure that long-running script really is
Revert "t6026-merge-attr: don't fail if sleep exits early"
Revert "t6026-merge-attr: ensure that the merge driver was called"
t6026-merge-attr: ensure that the merge driver was called
t6026-merge-attr: don't fail if sleep exits early
Recent update to git-sh-setup (a library of shell functions that
are used by our in-tree scripted Porcelain commands) included
another shell library git-sh-i18n without specifying where it is,
relying on the $PATH. This has been fixed to be more explicit by
prefixing $(git --exec-path) output in front.
* ak/sh-setup-dot-source-i18n-fix:
git-sh-setup: be explicit where to dot-source git-sh-i18n from.
"git daemon" used fixed-length buffers to turn URL to the
repository the client asked for into the server side directory
path, using snprintf() to avoid overflowing these buffers, but
allowed possibly truncated paths to the directory. This has been
tightened to reject such a request that causes overlong path to be
required to serve.
* jk/daemon-path-ok-check-truncation:
daemon: detect and reject too-long paths
The code that we have used for the past 10+ years to cycle
4-element ring buffers turns out to be not quite portable in
theoretical world.
* rs/ring-buffer-wraparound:
hex: make wraparound of the index into ring-buffer explicit
"git send-email" attempts to pick up valid e-mails from the
trailers, but people in real world write non-addresses there, like
"Cc: Stable <add@re.ss> # 4.8+", which broke the output depending
on the availability and vintage of Mail::Address perl module.
* mm/send-email-cc-cruft-after-address:
Git.pm: add comment pointing to t9000
t9000-addresses: update expected results after fix
parse_mailboxes: accept extra text after <...> address
The command-line completion script (in contrib/) learned to
complete "git cmd ^mas<HT>" to complete the negative end of
reference to "git cmd ^master".
* cp/completion-negative-refs:
completion: support excluding refs
Extract a small helper out of the function that reads the authors
script file "git am" internally uses.
This by itself is not useful until a second caller appears in the
future for "rebase -i" helper.
* jc/am-read-author-file:
am: refactor read_author_script()
Since 650c44925 (common-main: call git_extract_argv0_path(),
2016-07-01), the argv[0] that is seen in cmd_main() of
individual programs is always the basename of the
executable, as common-main strips off the full path. This
can produce confusing results for git-daemon, which wants to
re-exec itself.
For instance, if the program was originally run as
"/usr/lib/git/git-daemon", it will try just re-execing
"git-daemon", which will find the first instance in $PATH.
If git's exec-path has not been prepended to $PATH, we may
find the git-daemon from a different version (or no
git-daemon at all).
Normally this isn't a problem. Git commands are run as "git
daemon", the git wrapper puts the exec-path at the front of
$PATH, and argv[0] is already "daemon" anyway. But running
git-daemon via its full exec-path, while not really a
recommended method, did work prior to 650c44925. Let's make
it work again.
The real goal of 650c44925 was not to munge argv[0], but to
reliably set the argv0_path global. The only reason it
munges at all is that one caller, the git.c wrapper,
piggy-backed on that computation to find the command
basename. Instead, let's leave argv[0] untouched in
common-main, and have git.c do its own basename computation.
While we're at it, let's drop the return value from
git_extract_argv0_path(). It was only ever used in this one
callsite, and its dual purposes is what led to this
confusion in the first place.
Note that by changing the interface, the compiler can
confirm for us that there are no other callers storing the
return value. But the compiler can't tell us whether any of
the cmd_main() functions (besides git.c) were relying on the
basename munging. However, we can observe that prior to
650c44925, no other cmd_main() functions did that munging,
and no new cmd_main() functions have been introduced since
then. So we can't be regressing any of those cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Translate one message introduced by commit:
* 358718064b i18n: fix unmatched single quote in error message
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
"git archive" and "git mailinfo" stopped reading from local
configuration file with a recent update.
* jc/setup-cleanup-fix:
archive: read local configuration
mailinfo: read local configuration
"git rebase -i" did not work well with core.commentchar
configuration variable for two reasons, both of which have been
fixed.
* js/rebase-i-commentchar-fix:
rebase -i: handle core.commentChar=auto
stripspace: respect repository config
rebase -i: highlight problems with core.commentchar
Using a %(HEAD) placeholder in "for-each-ref --format=" option
caused the command to segfault when on an unborn branch.
* jc/for-each-ref-head-segfault-fix:
for-each-ref: do not segv with %(HEAD) on an unborn branch
Since b9605bc4f2 ("config: only read .git/config from configured
repos", 2016-09-12), we do not read from ".git/config" unless we
know we are in a repository. "git archive" however didn't do the
repository discovery and instead relied on the old behaviour.
Teach the command to run a "gentle" version of repository discovery
so that local configuration variables are honoured.
[jc: stole tests from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since b9605bc4f2 ("config: only read .git/config from configured
repos", 2016-09-12), we do not read from ".git/config" unless we
know we are in a repository. "git mailinfo" however didn't do the
repository discovery and instead relied on the old behaviour. This
was mostly OK because it was merely run as a helper program by other
porcelain scripts that first chdir's up to the root of the working
tree.
Teach the command to run a "gentle" version of repository discovery
so that local configuration variables like mailinfo.scissors are
honoured.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git 2.11.0-rc2 introduced one small l10n update, and this commit fixed
the affected translations all in one batch.
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>