Commit Graph

51447 Commits

Author SHA1 Message Date
Junio C Hamano
9e0f06d55d Sync with Git 2.14.4
* maint-2.14:
  Git 2.14.4
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:15:14 +09:00
Junio C Hamano
4dde7b8799 Git 2.14.4
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 14:12:02 +09:00
Junio C Hamano
7b01c71b64 Sync with Git 2.13.7
* maint-2.13:
  Git 2.13.7
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 14:10:49 +09:00
Junio C Hamano
0114f71344 Git 2.13.7
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-22 13:50:36 +09:00
Junio C Hamano
8528c31d98 Merge branch 'jk/submodule-fix-loose' into maint-2.13
* jk/submodule-fix-loose:
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  is_ntfs_dotgit: match other .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths
2018-05-22 13:48:26 +09:00
Jeff King
b7b1fca175 fsck: complain when .gitmodules is a symlink
We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
73c3f0f704 index-pack: check .gitmodules files with --strict
Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:

  1. We currently don't bother calling fsck_object() at all
     for blobs, since it has traditionally been a noop. We'd
     actually catch these blobs in fsck_finish() at the end,
     but it's more efficient to check them when we already
     have the object loaded in memory.

  2. The second pass done by fsck_finish() needs to access
     the objects, but we're actually indexing the pack in
     this process. In theory we could give the fsck code a
     special callback for accessing the in-pack data, but
     it's actually quite tricky:

       a. We don't have an internal efficient index mapping
	  oids to packfile offsets. We only generate it on
	  the fly as part of writing out the .idx file.

       b. We'd still have to reconstruct deltas, which means
          we'd basically have to replicate all of the
	  reading logic in packfile.c.

     Instead, let's avoid running fsck_finish() until after
     we've written out the .idx file, and then just add it
     to our internal packed_git list.

     This does mean that the objects are "in the repository"
     before we finish our fsck checks. But unpack-objects
     already exhibits this same behavior, and it's an
     acceptable tradeoff here for the same reason: the
     quarantine mechanism means that pushes will be
     fully protected.

In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.

This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
6e328d6cae unpack-objects: call fsck_finish() after fscking objects
As with the previous commit, we must call fsck's "finish"
function in order to catch any queued objects for
.gitmodules checks.

This second pass will be able to access any incoming
objects, because we will have exploded them to loose objects
by now.

This isn't quite ideal, because it means that bad objects
may have been written to the object database (and a
subsequent operation could then reference them, even if the
other side doesn't send the objects again). However, this is
sufficient when used with receive.fsckObjects, since those
loose objects will all be placed in a temporary quarantine
area that will get wiped if we find any problems.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
1995b5e03e fsck: call fsck_finish() after fscking objects
Now that the internal fsck code is capable of checking
.gitmodules files, we just need to teach its callers to use
the "finish" function to check any queued objects.

With this, we can now catch the malicious case in t7415 with
git-fsck.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
ed8b10f631 fsck: check .gitmodules content
This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).

As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
2738744426 fsck: handle promisor objects in .gitmodules check
If we have a tree that points to a .gitmodules blob but
don't have that blob, we can't check its contents. This
produces an fsck error when we encounter it.

But in the case of a promisor object, this absence is
expected, and we must not complain.  Note that this can
technically circumvent our transfer.fsckObjects check.
Imagine a client fetches a tree, but not the matching
.gitmodules blob. An fsck of the incoming objects will show
that we don't have enough information. Later, we do fetch
the actual blob. But we have no idea that it's a .gitmodules
file.

The only ways to get around this would be to re-scan all of
the existing trees whenever new ones enter (which is
expensive), or to somehow persist the gitmodules_found set
between fsck runs (which is complicated).

In practice, it's probably OK to ignore the problem. Any
repository which has all of the objects (including the one
serving the promisor packs) can perform the checks. Since
promisor packs are inherently about a hierarchical topology
in which clients rely on upstream repositories, those
upstream repositories can protect all of their downstream
clients from broken objects.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
159e7b080b fsck: detect gitmodules files
In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.

Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.

In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:

  1. git-fsck may show us the objects in arbitrary order
     (loose objects are fed in sha1 order, but we may also
     have multiple packs, and we process each pack fully in
     sequence).

  2. The type ordering is just what git-pack-objects happens
     to write now. The pack format does not require a
     specific order, and it's possible that future versions
     of Git (or a custom version trying to fool official
     Git's fsck checks!) may order it differently.

  3. We may not even be fscking all of the relevant objects
     at once. Consider pushing with transfer.fsckObjects,
     where one push adds a blob at path "foo", and then a
     second push adds the same blob at path ".gitmodules".
     The blob is not part of the second push at all, but we
     need to mark and check it.

So in the general case, we need to make up to three passes
over the objects: once to make sure we've seen all commits,
then once to cover any trees we might have missed, and then
a final pass to cover any .gitmodules blobs we found in the
second pass.

We can simplify things a bit by loosening the requirement
that we find .gitmodules only at root trees. Technically
a file like "subdir/.gitmodules" is not parsed by Git, but
it's not unreasonable for us to declare that Git is aware of
all ".gitmodules" files and make them eligible for checking.
That lets us drop the root-tree requirement, which
eliminates one pass entirely. And it makes our worst case
much better: instead of potentially queueing every root tree
to be re-examined, the worst case is that we queue each
unique .gitmodules blob for a second look.

This patch just adds the boilerplate to find .gitmodules
files. The actual content checks will come in a subsequent
commit.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
7ac4f3a007 fsck: actually fsck blob data
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:

  1. The fsck_object() function just returns success for any
     blob. Let's a noop fsck_blob(), which we can fill in
     with actual logic later.

  2. The fsck_loose() function in builtin/fsck.c
     just threw away blob content after loading it. Let's
     hold onto it until after we've called fsck_object().

     The easiest way to do this is to just drop the
     parse_loose_object() helper entirely. Incidentally,
     this also fixes a memory leak: if we successfully
     loaded the object data but did not parse it, we would
     have left the function without freeing it.

  3. When fsck_loose() loads the object data, it
     does so with a custom read_loose_object() helper. This
     function streams any blobs, regardless of size, under
     the assumption that we're only checking the sha1.

     Instead, let's actually load blobs smaller than
     big_file_threshold, as the normal object-reading
     code-paths would do. This lets us fsck small files, and
     a NULL return is an indication that the blob was so big
     that it needed to be streamed, and we can pass that
     information along to fsck_blob().

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
ed9c322062 fsck: simplify ".git" check
There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
db5a58c1bd index-pack: make fsck error message more specific
If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King
eedd5949f5 Merge branch 'jk/submodule-name-verify-fix' into jk/submodule-name-verify-fsck
* jk/submodule-name-verify-fix:
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_path: drop clever fallthrough
  skip_prefix: add icase-insensitive variant
  is_{hfs,ntfs}_dotgitmodules: add tests
  path: match NTFS short names for more .git files
  is_hfs_dotgit: match other .git files
  is_ntfs_dotgit: use a size_t for traversing string
  submodule-config: verify submodule names as paths

Note that this includes two bits of evil-merge:

 - there's a new call to verify_path() that doesn't actually
   have a mode available. It should be OK to pass "0" here,
   since we're just manipulating the untracked cache, not an
   actual index entry.

 - the lstat() in builtin/update-index.c:update_one() needs
   to be updated to handle the fsmonitor case (without this
   it still behaves correctly, but does an unnecessary
   lstat).
2018-05-21 23:54:28 -04:00
Jeff King
10ecfa7649 verify_path: disallow symlinks in .gitmodules
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:

  1. It won't be portable to systems without symlinks.

  2. It may behave inconsistently, since Git may look at
     this file in the index or a tree without bothering to
     resolve any symbolic links. We don't do this _yet_, but
     the config infrastructure is there and it's planned for
     the future.

With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:

  a. A symlinked .gitmodules file may circumvent any fsck
     checks of the content.

  b. Git may read and write from the on-disk file without
     sanity checking the symlink target. So for example, if
     you link ".gitmodules" to "../oops" and run "git
     submodule add", we'll write to the file "oops" outside
     the repository.

Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.

Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
eb12dd0c76 update-index: stat updated files earlier
In the update_one(), we check verify_path() on the proposed
path before doing anything else. In preparation for having
verify_path() look at the file mode, let's stat the file
earlier, so we can check the mode accurately.

This is made a bit trickier by the fact that this function
only does an lstat in a few code paths (the ones that flow
down through process_path()). So we can speculatively do the
lstat() here and pass the results down, and just use a dummy
mode for cases where we won't actually be updating the index
from the filesystem.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
641084b618 verify_dotfile: mention case-insensitivity in comment
We're more restrictive than we need to be in matching ".GIT"
on case-sensitive filesystems; let's make a note that this
is intentional.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
e19e5e66d6 verify_path: drop clever fallthrough
We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
41a80924ae skip_prefix: add case-insensitive variant
We have the convenient skip_prefix() helper, but if you want
to do case-insensitive matching, you're stuck doing it by
hand. We could add an extra parameter to the function to
let callers ask for this, but the function is small and
somewhat performance-critical. Let's just re-implement it
for the case-insensitive version.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Johannes Schindelin
dc2d9ba318 is_{hfs,ntfs}_dotgitmodules: add tests
This tests primarily for NTFS issues, but also adds one example of an
HFS+ issue.

Thanks go to Congyi Wu for coming up with the list of examples where
NTFS would possibly equate the filename with `.gitmodules`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Johannes Schindelin
e7cb0b4455 is_ntfs_dotgit: match other .git files
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.

However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.

It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.

For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).

We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.

To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.

These can be recomputed by running the following Perl script:

-- snip --
use warnings;
use strict;

sub compute_short_name_hash ($) {
        my $checksum = 0;
        foreach (split('', $_[0])) {
                $checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
        }

        $checksum = ($checksum * 314159269) & 0xffffffff;
        $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
        $checksum -= (($checksum * 1152921497) >> 60) * 1000000007;

        return scalar reverse sprintf("%x", $checksum & 0xffff);
}

print compute_short_name_hash($ARGV[0]);
-- snap --

E.g., running that with the argument ".gitignore" will
result in "250a" (which then becomes "gi250a" in the code).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
0fc333ba20 is_hfs_dotgit: match other .git files
Both verify_path() and fsck match ".git", ".GIT", and other
variants specific to HFS+. Let's allow matching other
special files like ".gitmodules", which we'll later use to
enforce extra restrictions via verify_path() and fsck.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
11a9f4d807 is_ntfs_dotgit: use a size_t for traversing string
We walk through the "name" string using an int, which can
wrap to a negative value and cause us to read random memory
before our array (e.g., by creating a tree with a name >2GB,
since "int" is still 32 bits even on most 64-bit platforms).
Worse, this is easy to trigger during the fsck_tree() check,
which is supposed to be protecting us from malicious
garbage.

Note one bit of trickiness in the existing code: we
sometimes assign -1 to "len" at the end of the loop, and
then rely on the "len++" in the for-loop's increment to take
it back to 0. This is still legal with a size_t, since
assigning -1 will turn into SIZE_MAX, which then wraps
around to 0 on increment.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
0383bbb901 submodule-config: verify submodule names as paths
Submodule "names" come from the untrusted .gitmodules file,
but we blindly append them to $GIT_DIR/modules to create our
on-disk repo paths. This means you can do bad things by
putting "../" into the name (among other things).

Let's sanity-check these names to avoid building a path that
can be exploited. There are two main decisions:

  1. What should the allowed syntax be?

     It's tempting to reuse verify_path(), since submodule
     names typically come from in-repo paths. But there are
     two reasons not to:

       a. It's technically more strict than what we need, as
          we really care only about breaking out of the
          $GIT_DIR/modules/ hierarchy.  E.g., having a
          submodule named "foo/.git" isn't actually
          dangerous, and it's possible that somebody has
          manually given such a funny name.

       b. Since we'll eventually use this checking logic in
          fsck to prevent downstream repositories, it should
          be consistent across platforms. Because
          verify_path() relies on is_dir_sep(), it wouldn't
          block "foo\..\bar" on a non-Windows machine.

  2. Where should we enforce it? These days most of the
     .gitmodules reads go through submodule-config.c, so
     I've put it there in the reading step. That should
     cover all of the C code.

     We also construct the name for "git submodule add"
     inside the git-submodule.sh script. This is probably
     not a big deal for security since the name is coming
     from the user anyway, but it would be polite to remind
     them if the name they pick is invalid (and we need to
     expose the name-checker to the shell anyway for our
     test scripts).

     This patch issues a warning when reading .gitmodules
     and just ignores the related config entry completely.
     This will generally end up producing a sensible error,
     as it works the same as a .gitmodules file which is
     missing a submodule entry (so "submodule update" will
     barf, but "git clone --recurse-submodules" will print
     an error but not abort the clone.

     There is one minor oddity, which is that we print the
     warning once per malformed config key (since that's how
     the config subsystem gives us the entries). So in the
     new test, for example, the user would see three
     warnings. That's OK, since the intent is that this case
     should never come up outside of malicious repositories
     (and then it might even benefit the user to see the
     message multiple times).

Credit for finding this vulnerability and the proof of
concept from which the test script was adapted goes to
Etienne Stalmans.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:50:11 -04:00
Jeff King
2dc417ab1f get_main_ref_store: BUG() when outside a repository
If we don't have a repository, then we can't initialize the
ref store.  Prior to 64a741619d (refs: store the main ref
store inside the repository struct, 2018-04-11), we'd try to
access get_git_dir(), and outside a repository that would
trigger a BUG(). After that commit, though, we directly use
the_repository->git_dir; if it's NULL we'll just segfault.

Let's catch this case and restore the BUG() behavior.
Obviously we don't ever want to hit this code, but a BUG()
is a lot more helpful than a segfault if we do.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-21 09:45:23 +09:00
Stefan Beller
7a1dc605af object.c: clear replace map before freeing it
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 07:53:33 +09:00
SZEDER Gábor
4362da078e t7005-editor: get rid of the SPACES_IN_FILENAMES prereq
The last two tests 'editor with a space' and 'core.editor with a
space' in 't7005-editor.sh' need the SPACES_IN_FILENAMES prereq to
ensure that they are only run on filesystems that allow, well, spaces
in filenames.  However, we have been putting a space in the name of
the trash directory for just over a decade now, so we wouldn't be able
to run any of our tests on such a filesystem in the first place.

This prereq is therefore unnecessary, remove it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-15 12:27:29 +09:00
René Scharfe
7484cf538e t5512: run git fetch inside test
Do the preparatory fetch inside the test of ls-remote --symref to avoid
cluttering the test output and to be able to catch unexpected fetch
failures.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-14 08:49:17 +09:00
Nguyễn Thái Ngọc Duy
b5d5a567fb column: fix off-by-one default width
By default we want to fill the whole screen if possible, but we do not
want to use up _all_ terminal columns because the last character is
going hit the border, push the cursor over and wrap. Keep it at
default value zero, which will make print_columns() set the width at
term_columns() - 1.

This affects the test in t7004 because effective column width before
was 40 but now 39 so we need to compensate it by one or the output at
39 columns has a different layout.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13 10:45:05 +09:00
Jeff King
be11f7ad60 pager: set COLUMNS to term_columns()
After we invoke the pager, our stdout goes to a pipe, not the
terminal, meaning we can no longer use an ioctl to get the
terminal width. For that reason, ad6c3739a3 (pager: find out
the terminal width before spawning the pager, 2012-02-12)
started caching the terminal width.

But that cache is only an in-process variable. Any programs
we spawn will also not be able to run that ioctl, but won't
have access to our cache. They'll end up falling back to our
80-column default.

You can see the problem with:

  git tag --column=row

Since git-tag spawns a pager these days, its spawned
git-column helper will see neither the terminal on stdout
nor a useful COLUMNS value (assuming you do not export it
from your shell already). And you'll end up with 80-column
output in the pager, regardless of your terminal size.

We can fix this by setting COLUMNS right before spawning the
pager. That fixes this case, as well as any more complicated
ones (e.g., a paged program spawns another script which then
generates columnized output).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13 10:45:03 +09:00
Nguyễn Thái Ngọc Duy
011b648646 pack-format.txt: more details on pack file format
The current document mentions OBJ_* constants without their actual
values. A git developer would know these are from cache.h but that's
not very friendly to a person who wants to read this file to implement
a pack file parser.

Similarly, the deltified representation is not documented at all (the
"document" is basically patch-delta.c). Translate that C code to
English with a bit more about what ofs-delta and ref-delta mean.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-13 10:20:03 +09:00
Jeff King
ee930754d8 apply: clarify "-p" documentation
We're not really removing slashes, but slash-separated path
components. Let's make that more clear.

Reported-by: kelly elton <its.the.doc@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11 12:34:17 +09:00
SZEDER Gábor
f6b82970aa t5516-fetch-push: fix broken &&-chain
b2dc968e60 (t5516: refactor oddball tests, 2008-11-07) accidentaly
broke the &&-chain in the test 'push does not update local refs on
failure', but since it was in a subshell, chain-lint couldn't notice
it.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11 12:32:53 +09:00
SZEDER Gábor
cfb482b6c3 t5516-fetch-push: fix 'push with dry-run' test
In a while-at-it cleanup replacing a 'cd dir && <...> && cd ..' with a
subshell, commit 28391a80a9 (receive-pack: allow deletion of corrupt
refs, 2007-11-29) also moved the assignment of the $old_commit
variable to that subshell.  This variable, however, is used outside of
that subshell as a parameter of check_push_result(), to check that a
ref still points to the commit where it is supposed to.  With the
variable remaining unset outside the subshell check_push_result()
doesn't perform that check at all.

Use 'git -C <dir> cmd...', so we don't need to change directory, and
thus don't need the subshell either when setting $old_commit.

Furthermore, change check_push_result() to require at least three
parameters (the repo, the oid, and at least one ref), so it will catch
similar issues earlier should they ever arise.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11 12:32:51 +09:00
SZEDER Gábor
87a6bb701a t5310-pack-bitmaps: make JGit tests work with GIT_TEST_SPLIT_INDEX
The two JGit tests 'we can read jgit bitmaps' and 'jgit can read our
bitmaps' in 't5310-pack-bitmaps.sh' fail when run with
GIT_TEST_SPLIT_INDEX=YesPlease.  Both tests create a clone of the test
repository to check bitmap interoperability with JGit.  With split
indexes enabled the index in the clone repositories contains the
'link' extension, which JGit doesn't support and, consequently, an
exception aborts it:

  <...>
  org.eclipse.jgit.api.errors.JGitInternalException: DIRC extension 'link' not supported by this version.
          at org.eclipse.jgit.dircache.DirCache.readFrom(DirCache.java:562)
  <...>

Since testing bitmaps doesn't need a worktree in the first place,
let's just create bare clones for the two JGit tests, so the cloned
won't have an index, and these two tests can be executed even with
split index enabled.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11 12:14:34 +09:00
Antonio Ospite
f7997e3682 doc: fix config API documentation about config_with_options
In commit dc8441fdb ("config: don't implicitly use gitdir or commondir",
2017-06-14) the function git_config_with_options was renamed to
config_with_options to better reflect the fact that it does not access
the git global config or the repo config by default.

However Documentation/technical/api-config.txt still refers to the
previous name, fix that.

While at it also update the documentation about the extra parameters,
because they too changed since the initial definition.

Signed-off-by: Antonio Ospite <ao2@ao2.it>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-11 10:26:33 +09:00
Stefan Beller
74fd0705bb replace-object.c: remove the_repository from prepare_replace_object
This was missed in 5982da9d2c (replace-object: allow
prepare_replace_object to handle arbitrary repositories, 2018-04-11)

Technically the code works correctly as the replace_map is the same
size in different repositories, however it is hard to read. So convert
the code to the familiar pattern of dereferencing the pointer that we
assign in the sizeof itself.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 19:16:37 +09:00
Stefan Beller
d587307205 object.c: free replace map in raw_object_store_clear
The replace map for objects was missed to free in the object store in
the conversion of 174774cd51 (Merge branch 'sb/object-store-replace',
2018-05-08)

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 19:06:51 +09:00
Martin Ågren
0fa5a2ed8d lock_file: move static locks into functions
Placing `struct lock_file`s on the stack used to be a bad idea, because
the temp- and lockfile-machinery would keep a pointer into the struct.
But after 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05), we can safely have lockfiles on the stack. (This applies
even if a user returns early, leaving a locked lock behind.)

Each of these `struct lock_file`s is used from within a single function.
Move them into the respective functions to make the scope clearer and
drop the staticness.

For good measure, I have inspected these sites and come to believe that
they always release the lock, with the possible exception of bailing out
using `die()` or `exit()` or by returning from a `cmd_foo()`.

As pointed out by Jeff King, it would be bad if someone held on to a
`struct lock_file *` for some reason. After some grepping, I agree with
his findings: no-one appears to be doing that.

After this commit, the remaining occurrences of "static struct
lock_file" are locks that are used from within different functions. That
is, they need to remain static. (Short of more intrusive changes like
passing around pointers to non-static locks.)

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:55:40 +09:00
Martin Ågren
b227586831 lock_file: make function-local locks non-static
Placing `struct lock_file`s on the stack used to be a bad idea, because
the temp- and lockfile-machinery would keep a pointer into the struct.
But after 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05), we can safely have lockfiles on the stack. (This applies
even if a user returns early, leaving a locked lock behind.)

These `struct lock_file`s are local to their respective functions and we
can drop their staticness.

For good measure, I have inspected these sites and come to believe that
they always release the lock, with the possible exception of bailing out
using `die()` or `exit()` or by returning from a `cmd_foo()`.

As pointed out by Jeff King, it would be bad if someone held on to a
`struct lock_file *` for some reason. After some grepping, I agree with
his findings: no-one appears to be doing that.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:54:45 +09:00
Martin Ågren
3c6fad4a3f refs.c: do not die if locking fails in delete_pseudoref()
After taking the lock we check whether we got it and die otherwise. But
since we take the lock using `LOCK_DIE_ON_ERROR`, we would already have
died.

Considering the choice between dropping the dead code and dropping the
flag, let's go for option number three: Drop the flag, write an error
instead of dying, then return -1. This function already returns -1 for
another error, so the caller (or rather, its callers) should be able to
handle this. There is some inconsistency around how we handle errors in
this function and elsewhere in this file, but let's take this small step
towards gentle error-reporting now and leave the rest for another time.

While at it, make the lock non-static and reduce its scope. (Placing
`struct lock_file`s on the stack used to be a bad idea, because the
temp- and lockfile-machinery would keep a pointer into the struct. But
after 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05),
we can safely have lockfiles on the stack.)

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:54:45 +09:00
Martin Ågren
010845157a refs.c: do not die if locking fails in write_pseudoref()
If we could not take the lock, we add an error to the `strbuf err` and
return. However, this code is dead. The reason is that we take the lock
using `LOCK_DIE_ON_ERROR`. Drop the flag to allow our more gentle
error-handling to actually kick in.

We could instead just drop the dead code and die here. But everything is
prepared for gently propagating the error, so let's do that instead.

There is similar dead code in `delete_pseudoref()`, but let's save that
for the next patch.

While at it, make the lock non-static. (Placing `struct lock_file`s on
the stack used to be a bad idea, because the temp- and
lockfile-machinery would keep a pointer into the struct. But after
076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05), we
can safely have lockfiles on the stack.)

Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:54:45 +09:00
Martin Ågren
75d9a25e1f t/helper/test-write-cache: clean up lock-handling
Die in case writing the index fails, so that the caller can notice
(instead of, say, being impressed by how performant the writing is).

While at it, note that after opening a lock with `LOCK_DIE_ON_ERROR`, we
do not need to worry about whether we succeeded. Also, we can move the
`struct lock_file` into the function and drop the staticness. (Placing
`struct lock_file`s on the stack used to be a bad idea, because the
temp- and lockfile-machinery would keep a pointer into the struct. But
after 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05),
we can safely have lockfiles on the stack.)

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:54:44 +09:00
SZEDER Gábor
4ed79d5203 t6050-replace: don't disable stdin for the whole test script
The test script 't6050-replace.sh' starts off with redirecting the
whole test script's stdin from /dev/null.  This redirection has been
there since the test script was introduced in a3e8267225
(replace_object: add a test case, 2009-01-23), but the commit message
doesn't explain why it was deemed necessary.  AFAICT, it has never
been necessary, and t6050 runs just fine and succeeds even without it,
not only the current version but past versions as well.

Besides being unnecessary, this redirection is also harmful, as it
prevents the test helper functions 'test_pause' and 'debug' from
working properly in t6050, because we can't enter any commands to the
shell and the debugger, respectively.

So let's remove that redirection.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-09 11:21:53 +09:00
Ben Peart
6f10a09e0a merge: pass aggressive when rename detection is turned off
Set aggressive flag in git_merge_trees() when rename detection is turned off.
This allows read_tree() to auto resolve more cases that would have otherwise
been handled by the rename detection.

Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08 16:19:41 +09:00
Ben Peart
85b460305c merge: add merge.renames config setting
Add the ability to control rename detection for merge via a config setting.
This setting behaves the same and defaults to the value of diff.renames but only
applies to merge.

Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Helped-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08 16:19:41 +09:00
Ben Peart
a7152e9d22 merge: update documentation for {merge,diff}.renameLimit
Update the documentation to better indicate that the renameLimit setting is
ignored if rename detection is turned off via command line options or config
settings.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08 16:19:41 +09:00
Elijah Newren
1de70dbd1a merge-recursive: fix check for skipability of working tree updates
The can-working-tree-updates-be-skipped check has had a long and blemished
history.  The update can be skipped iff:
  a) The merge is clean
  b) The merge matches what was in HEAD (content, mode, pathname)
  c) The target path is usable (i.e. not involved in D/F conflict)

Traditionally, we split b into parts:
  b1) The merged result matches the content and mode found in HEAD
  b2) The merged target path existed in HEAD

Steps a & b1 are easy to check; we have always gotten those right.  While
it is easy to overlook step c, this was fixed seven years ago with commit
4ab9a157d0 ("merge_content(): Check whether D/F conflicts are still
present", 2010-09-20).  merge-recursive didn't have a readily available
way to directly check step b2, so various approximations were used:

  * In commit b2c8c0a762 ("merge-recursive: When we detect we can skip
    an update, actually skip it", 2011-02-28), it was noted that although
    the code claimed it was skipping the update, it did not actually skip
    the update.  The code was made to skip it, but used lstat(path, ...)
    as an approximation to path-was-tracked-in-index-before-merge.

  * In commit 5b448b8530 ("merge-recursive: When we detect we can skip
    an update, actually skip it", 2011-08-11), the problem with using
    lstat was noted.  It was changed to the approximation
       path2 && strcmp(path, path2)
    which is also wrong.  !path2 || strcmp(path, path2) would have been
    better, but would have fallen short with directory renames.

  * In c5b761fb27 ("merge-recursive: ensure we write updates for
    directory-renamed file", 2018-02-14), the problem with the previous
    approximation was noted and changed to
       was_tracked(path)
    That looks close to what we were trying to answer, but was_tracked()
    as implemented at the time should have been named is_tracked(); it
    returned something different than what we were looking for.

  * To make matters more complex, fixing was_tracked() isn't sufficient
    because the splitting of b into b1 and b2 is wrong.  Consider the
    following merge with a rename/add conflict:
       side A: modify foo, add unrelated bar
       side B: rename foo->bar (but don't modify the mode or contents)
    In this case, the three-way merge of original foo, A's foo, and B's
    bar will result in a desired pathname of bar with the same
    mode/contents that A had for foo.  Thus, A had the right mode and
    contents for the file, and it had the right pathname present (namely,
    bar), but the bar that was present was unrelated to the contents, so
    the working tree update was not skippable.

Fix this by introducing a new function:
   was_tracked_and_matches(o, path, &mfi.oid, mfi.mode)
and use it to directly check for condition b.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-08 16:11:00 +09:00