Commit Graph

751 Commits

Author SHA1 Message Date
Jeff King
48bcc1c3cc add_packed_git: convert strcpy into xsnprintf
We have the path "foo.idx", and we create a buffer big
enough to hold "foo.pack" and "foo.keep", and then strcpy
straight into it. This isn't a bug (we have enough space),
but it's very hard to tell from the strcpy that this is so.

Let's instead use strip_suffix to take off the ".idx",
record the size of our allocation, and use xsnprintf to make
sure we don't violate our assumptions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King
ef1286d3c0 use xsnprintf for generating git object headers
We generally use 32-byte buffers to format git's "type size"
header fields. These should not generally overflow unless
you can produce some truly gigantic objects (and our types
come from our internal array of constant strings). But it is
a good idea to use xsnprintf to make sure this is the case.

Note that we slightly modify the interface to
write_sha1_file_prepare, which nows uses "hdrlen" as an "in"
parameter as well as an "out" (on the way in it stores the
allocated size of the header, and on the way out it returns
the ultimate size of the header).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Junio C Hamano
f0bc854623 Sync with 2.5.2 2015-09-09 14:30:35 -07:00
Junio C Hamano
3d3caf0b78 Sync with 2.4.9 2015-09-04 10:43:23 -07:00
Junio C Hamano
ef0e938a1a Sync with 2.3.9 2015-09-04 10:34:19 -07:00
Junio C Hamano
8267cd11d6 Sync with 2.2.3 2015-09-04 10:29:28 -07:00
Jeff King
5015f01c12 read_info_alternates: handle paths larger than PATH_MAX
This function assumes that the relative_base path passed
into it is no larger than PATH_MAX, and writes into a
fixed-size buffer. However, this path may not have actually
come from the filesystem; for example, add_submodule_odb
generates a path using a strbuf and passes it in. This is
hard to trigger in practice, though, because the long
submodule directory would have to exist on disk before we
would try to open its info/alternates file.

We can easily avoid the bug, though, by simply creating the
filename on the heap.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 09:36:51 -07:00
Junio C Hamano
cbcd3dcaa8 Merge branch 'cb/open-noatime-clear-errno' into maint
When trying to see that an object does not exist, a state errno
leaked from our "first try to open a packfile with O_NOATIME and
then if it fails retry without it" logic on a system that refuses
O_NOATIME.  This confused us and caused us to die, saying that the
packfile is unreadable, when we should have just reported that the
object does not exist in that packfile to the caller.

* cb/open-noatime-clear-errno:
  git_open_noatime: return with errno=0 on success
2015-09-03 19:17:49 -07:00
Junio C Hamano
9b8d731995 Merge branch 'cb/open-noatime-clear-errno'
When trying to see that an object does not exist, a state errno
leaked from our "first try to open a packfile with O_NOATIME and
then if it fails retry without it" logic on a system that refuses
O_NOATIME.  This confused us and caused us to die, saying that the
packfile is unreadable, when we should have just reported that the
object does not exist in that packfile to the caller.

* cb/open-noatime-clear-errno:
  git_open_noatime: return with errno=0 on success
2015-08-25 14:57:10 -07:00
Junio C Hamano
8c9155e031 Merge branch 'jk/git-path'
git_path() and mkpath() are handy helper functions but it is easy
to misuse, as the callers need to be careful to keep the number of
active results below 4.  Their uses have been reduced.

* jk/git-path:
  memoize common git-path "constant" files
  get_repo_path: refactor path-allocation
  find_hook: keep our own static buffer
  refs.c: remove_empty_directories can take a strbuf
  refs.c: avoid git_path assignment in lock_ref_sha1_basic
  refs.c: avoid repeated git_path calls in rename_tmp_log
  refs.c: simplify strbufs in reflog setup and writing
  path.c: drop git_path_submodule
  refs.c: remove extra git_path calls from read_loose_refs
  remote.c: drop extraneous local variable from migrate_file
  prefer mkpathdup to mkpath in assignments
  prefer git_pathdup to git_path in some possibly-dangerous cases
  add_to_alternates_file: don't add duplicate entries
  t5700: modernize style
  cache.h: complete set of git_path_submodule helpers
  cache.h: clarify documentation for git_path, et al
2015-08-19 14:48:56 -07:00
Junio C Hamano
51a22ce147 Merge branch 'jc/finalize-temp-file'
Long overdue micro clean-up.

* jc/finalize-temp-file:
  sha1_file.c: rename move_temp_to_file() to finalize_object_file()
2015-08-19 14:48:55 -07:00
Junio C Hamano
0a489b0680 prepare_packed_git(): refactor garbage reporting in pack directory
The hook to report "garbage" files in $GIT_OBJECT_DIRECTORY/pack/
could be generic but is too specific to count-object's needs.

Move the part to produce human-readable messages to count-objects,
and refine the interface to callback with the "bits" with values
defined in the cache.h header file, so that other callers (e.g.
prune) can later use the same mechanism to enumerate different
kinds of garbage files and do something intelligent about them,
other than reporting in textual messages.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-17 09:14:59 -07:00
Clemens Buchacher
dff6f280df git_open_noatime: return with errno=0 on success
In read_sha1_file_extended we die if read_object fails with a fatal
error. We detect a fatal error if errno is non-zero and is not
ENOENT. If the object could not be read because it does not exist,
this is not considered a fatal error and we want to return NULL.

Somewhere down the line, read_object calls git_open_noatime to open
a pack index file, for example. We first try open with O_NOATIME.
If O_NOATIME fails with EPERM, we retry without O_NOATIME. When the
second open succeeds, errno is however still set to EPERM from the
first attempt. When we finally determine that the object does not
exist, read_object returns NULL and read_sha1_file_extended dies
with a fatal error:

    fatal: failed to read object <sha1>: Operation not permitted

Fix this by resetting errno to zero before we call open again.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Clemens Buchacher <clemens.buchacher@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 13:56:19 -07:00
Johannes Sixt
094c7e6352 prune: close directory earlier during loose-object directory traversal
27e1e22d (prune: factor out loose-object directory traversal, 2014-10-16)
introduced a new function for_each_loose_file_in_objdir() with a helper
for_each_file_in_obj_subdir(). The latter calls callbacks for each file
found during a directory traversal and finally also a callback for the
directory itself.

git-prune uses the function to clean up the object directory. In
particular, in the directory callback it calls rmdir(). On Windows XP,
this rmdir call fails, because the directory is still open while the
callback is called. Close the directory before calling the callback.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 12:06:00 -07:00
Jeff King
77b9b1d13a add_to_alternates_file: don't add duplicate entries
The add_to_alternates_file function blindly uses
hold_lock_file_for_append to copy the existing contents, and
then adds the new line to it. This has two minor problems:

  1. We might add duplicate entries, which are ugly and
     inefficient.

  2. We do not check that the file ends with a newline, in
     which case we would bogusly append to the final line.
     This is quite unlikely in practice, though, as we call
     this function only from git-clone, so presumably we are
     the only writers of the file (and we always add a
     newline).

Instead of using hold_lock_file_for_append, let's copy the
file line by line, which ensures all records are properly
terminated. If we see an extra line, we can simply abort the
update (there is no point in even copying the rest, as we
know that it would be identical to the original).

As a bonus, we also get rid of some calls to the
static-buffer mkpath and git_path functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 15:15:42 -07:00
Junio C Hamano
cb5add5868 sha1_file.c: rename move_temp_to_file() to finalize_object_file()
Since 5a688fe4 ("core.sharedrepository = 0mode" should set, not
loosen, 2009-03-25), we kept reminding ourselves:

    NEEDSWORK: this should be renamed to finalize_temp_file() as
    "moving" is only a part of what it does, when no patch between
    master to pu changes the call sites of this function.

without doing anything about it.  Let's do so.

The purpose of this function was not to move but to finalize.  The
detail of the primarily implementation of finalizing was to link the
temporary file to its final name and then to unlink, which wasn't
even "moving".  The alternative implementation did "move" by calling
rename(2), which is a fun tangent.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 11:10:37 -07:00
Junio C Hamano
7c696007ca Merge branch 'jk/fix-refresh-utime' into maint
Fix a small bug in our use of umask() return value.

* jk/fix-refresh-utime:
  check_and_freshen_file: fix reversed success-check
2015-07-27 12:21:40 -07:00
Junio C Hamano
de62fe8c42 Merge branch 'jk/index-pack-reduce-recheck' into maint
Disable "have we lost a race with competing repack?" check while
receiving a huge object transfer that runs index-pack.

* jk/index-pack-reduce-recheck:
  index-pack: avoid excessive re-reading of pack directory
2015-07-27 12:21:38 -07:00
Junio C Hamano
1f9e0a5348 Merge branch 'jk/fix-refresh-utime'
Fix a small bug in our use of umask() return value.

* jk/fix-refresh-utime:
  check_and_freshen_file: fix reversed success-check
2015-07-10 14:26:15 -07:00
Junio C Hamano
c07173f215 Merge branch 'jk/maint-for-each-packed-object'
The for_each_packed_object() API function did not iterate over
objects in a packfile that hasn't been used yet.

* jk/maint-for-each-packed-object:
  for_each_packed_object: automatically open pack index
2015-07-09 14:31:43 -07:00
Jeff King
3096b2ecdb check_and_freshen_file: fix reversed success-check
When we want to write out a loose object file, we have
always first made sure we don't already have the object
somewhere. Since 33d4221 (write_sha1_file: freshen existing
objects, 2014-10-15), we also update the timestamp on the
file, so that a simultaneous prune knows somebody is
likely to reference it soon.

If our utime() call fails, we treat this the same as not
having the object in the first place; the safe thing to do
is write out another copy. However, the loose-object check
accidentally inverts the utime() check; it returns failure
_only_ when the utime() call actually succeeded. Thus it was
failing to protect us there, and in the normal case where
utime() succeeds, it caused us to pointlessly write out and
link the object.

This passed our freshening tests, because writing out the
new object is certainly _one_ way of updating its utime. So
the normal case was inefficient, but not wrong.

While we're here, let's also drop a comment in front of the
check_and_freshen functions, making a note of their return
type (since it is not our usual "0 for success, -1 for
error").

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-08 15:58:28 -07:00
Junio C Hamano
c5baf18a40 Merge branch 'jk/diagnose-config-mmap-failure' into maint
The configuration reader/writer uses mmap(2) interface to access
the files; when we find a directory, it barfed with "Out of memory?".

* jk/diagnose-config-mmap-failure:
  xmmap(): drop "Out of memory?"
  config.c: rewrite ENODEV into EISDIR when mmap fails
  config.c: avoid xmmap error messages
  config.c: fix mmap leak when writing config
  read-cache.c: drop PROT_WRITE from mmap of index
2015-06-25 11:02:11 -07:00
Junio C Hamano
712b351bd3 Merge branch 'jk/index-pack-reduce-recheck'
Disable "have we lost a race with competing repack?" check while
receiving a huge object transfer that runs index-pack.

* jk/index-pack-reduce-recheck:
  index-pack: avoid excessive re-reading of pack directory
2015-06-24 12:21:54 -07:00
Jeff King
f813e9ea5f for_each_packed_object: automatically open pack index
When for_each_packed_object is called, we call
prepare_packed_git() to make sure we have the actual list of
packs. But the latter does not actually open the pack
indices, meaning that pack->nr_objects may simply be 0 if
the pack has not otherwise been used since the program
started.

In practice, this didn't come up for the current callers,
because they iterate the packed objects only after iterating
all reachable objects (so for it to matter you would have to
have a pack consisting only of unreachable objects). But it
is a dangerous and confusing interface that should be fixed
for future callers.

Note that we do not end the iteration when a pack cannot be
opened, but we do return an error. That lets you complete
the iteration even in actively-repacked repository where an
.idx file may racily go away, but it also lets callers know
that they may not have gotten the complete list (which the
current reachability-check caller does care about).

We have to tweak one of the prune tests due to the changed
return value; an earlier test creates bogus .idx files and
does not clean them up. Having to make this tweak is a good
thing; it means we will not prune in a broken repository,
and the test confirms that we do not negatively impact a
more lenient caller, count-objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 14:53:58 -07:00
Junio C Hamano
070d276cc1 Merge branch 'jh/filter-empty-contents' into maint
The clean/smudge interface did not work well when filtering an
empty contents (failed and then passed the empty input through).
It can be argued that a filter that produces anything but empty for
an empty input is nonsense, but if the user wants to do strange
things, then why not?

* jh/filter-empty-contents:
  sha1_file: pass empty buffer to index empty file
2015-06-16 14:33:44 -07:00
Junio C Hamano
dee47925c1 Merge branch 'jk/diagnose-config-mmap-failure'
The configuration reader/writer uses mmap(2) interface to access
the files; when we find a directory, it barfed with "Out of memory?".

* jk/diagnose-config-mmap-failure:
  xmmap(): drop "Out of memory?"
  config.c: rewrite ENODEV into EISDIR when mmap fails
  config.c: avoid xmmap error messages
  config.c: fix mmap leak when writing config
  read-cache.c: drop PROT_WRITE from mmap of index
2015-06-11 09:29:55 -07:00
Jeff King
0eeb077be7 index-pack: avoid excessive re-reading of pack directory
Since 45e8a74 (has_sha1_file: re-check pack directory before
giving up, 2013-08-30), we spend extra effort for
has_sha1_file to give the right answer when somebody else is
repacking. Usually this effort does not matter, because
after finding that the object does not exist, the next step
is usually to die().

However, some code paths make a large number of
has_sha1_file checks which are _not_ expected to return 1.
The collision test in index-pack.c is such a case. On a
local system, this can cause a performance slowdown of
around 5%. But on a system with high-latency system calls
(like NFS), it can be much worse.

This patch introduces a "quick" flag to has_sha1_file which
callers can use when they would prefer high performance at
the cost of false negatives during repacks. There may be
other code paths that can use this, but the index-pack one
is the most obviously critical, so we'll start with
switching that one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-09 12:26:35 -07:00
Junio C Hamano
3c91e9966a Merge branch 'jk/sha1-file-reduce-useless-warnings' into maint
* jk/sha1-file-reduce-useless-warnings:
  sha1_file: squelch "packfile cannot be accessed" warnings
2015-06-05 12:00:07 -07:00
Junio C Hamano
152722f155 Merge branch 'jh/filter-empty-contents'
The clean/smudge interface did not work well when filtering an
empty contents (failed and then passed the empty input through).
It can be argued that a filter that produces anything but empty for
an empty input is nonsense, but if the user wants to do strange
things, then why not?

* jh/filter-empty-contents:
  sha1_file: pass empty buffer to index empty file
2015-06-01 12:45:11 -07:00
Junio C Hamano
9ca0aaf6de xmmap(): drop "Out of memory?"
We show that message with die_errno(), but the OS is ought to know
why mmap(2) failed much better than we do.  There is no reason for
us to say "Out of memory?" here.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-28 11:35:25 -07:00
Jeff King
1570856b51 config.c: avoid xmmap error messages
The config-writing code uses xmmap to map the existing
config file, which will die if the map fails. This has two
downsides:

  1. The error message is not very helpful, as it lacks any
     context about the file we are mapping:

       $ mkdir foo
       $ git config --file=foo some.key value
       fatal: Out of memory? mmap failed: No such device

  2. We normally do not die in this code path; instead, we'd
     rather report the error and return an appropriate exit
     status (which is part of the public interface
     documented in git-config.1).

This patch introduces a "gentle" form of xmmap which lets us
produce our own error message. We do not want to use mmap
directly, because we would like to use the other
compatibility elements of xmmap (e.g., handling 0-length
maps portably).

The end result is:

    $ git.compile config --file=foo some.key value
    error: unable to mmap 'foo': No such device
    $ echo $?
    3

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-28 11:33:18 -07:00
Junio C Hamano
1e6c8babf8 Merge branch 'jc/hash-object' into maint
"hash-object --literally" introduced in v2.2 was not prepared to
take a really long object type name.

* jc/hash-object:
  write_sha1_file(): do not use a separate sha1[] array
  t1007: add hash-object --literally tests
  hash-object --literally: fix buffer overrun with extra-long object type
  git-hash-object.txt: document --literally option
2015-05-26 13:49:25 -07:00
Junio C Hamano
3b7d373ae2 Merge branch 'kn/cat-file-literally'
Add the "--allow-unknown-type" option to "cat-file" to allow
inspecting loose objects of an experimental or a broken type.

* kn/cat-file-literally:
  t1006: add tests for git cat-file --allow-unknown-type
  cat-file: teach cat-file a '--allow-unknown-type' option
  cat-file: make the options mutually exclusive
  sha1_file: support reading from a loose object of unknown type
2015-05-19 13:17:58 -07:00
Jim Hill
f6a1e1e288 sha1_file: pass empty buffer to index empty file
`git add` of an empty file with a filter pops complaints from
`copy_fd` about a bad file descriptor.

This traces back to these lines in sha1_file.c:index_core:

	if (!size) {
		ret = index_mem(sha1, NULL, size, type, path, flags);

The problem here is that content to be added to the index can be
supplied from an fd, or from a memory buffer, or from a pathname. This
call is supplying a NULL buffer pointer and a zero size.

Downstream logic takes the complete absence of a buffer to mean the
data is to be found elsewhere -- for instance, these, from convert.c:

	if (params->src) {
		write_err = (write_in_full(child_process.in, params->src, params->size) < 0);
	} else {
		write_err = copy_fd(params->fd, child_process.in);
	}

~If there's a buffer, write from that, otherwise the data must be coming
from an open fd.~

Perfectly reasonable logic in a routine that's going to write from
either a buffer or an fd.

So change `index_core` to supply an empty buffer when indexing an empty
file.

There's a patch out there that instead changes the logic quoted above to
take a `-1` fd to mean "use the buffer", but it seems to me that the
distinction between a missing buffer and an empty one carries intrinsic
semantics, where the logic change is adapting the code to handle
incorrect arguments.

Signed-off-by: Jim Hill <gjthill@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-18 10:15:20 -07:00
Junio C Hamano
ebb464f0cb Merge branch 'jk/prune-mtime' into maint
Access to objects in repositories that borrow from another one on a
slow NFS server unnecessarily got more expensive due to recent code
becoming more cautious in a naive way not to lose objects to pruning.

* jk/prune-mtime:
  sha1_file: only freshen packs once per run
  sha1_file: freshen pack objects before loose
  reachable: only mark local objects as recent
2015-05-13 14:05:50 -07:00
Junio C Hamano
051086b947 Merge branch 'jc/hash-object'
"hash-object --literally" introduced in v2.2 was not prepared to
take a really long object type name.

* jc/hash-object:
  write_sha1_file(): do not use a separate sha1[] array
  t1007: add hash-object --literally tests
  hash-object --literally: fix buffer overrun with extra-long object type
  git-hash-object.txt: document --literally option
2015-05-11 14:23:59 -07:00
Junio C Hamano
cedeffeee0 Merge branch 'jk/sha1-file-reduce-useless-warnings'
* jk/sha1-file-reduce-useless-warnings:
  sha1_file: squelch "packfile cannot be accessed" warnings
2015-05-11 14:23:41 -07:00
Junio C Hamano
68a2e6a2c8 Merge branch 'nd/multiple-work-trees'
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.

* nd/multiple-work-trees: (41 commits)
  prune --worktrees: fix expire vs worktree existence condition
  t1501: fix test with split index
  t2026: fix broken &&-chain
  t2026 needs procondition SANITY
  git-checkout.txt: a note about multiple checkout support for submodules
  checkout: add --ignore-other-wortrees
  checkout: pass whole struct to parse_branchname_arg instead of individual flags
  git-common-dir: make "modules/" per-working-directory directory
  checkout: do not fail if target is an empty directory
  t2025: add a test to make sure grafts is working from a linked checkout
  checkout: don't require a work tree when checking out into a new one
  git_path(): keep "info/sparse-checkout" per work-tree
  count-objects: report unused files in $GIT_DIR/worktrees/...
  gc: support prune --worktrees
  gc: factor out gc.pruneexpire parsing code
  gc: style change -- no SP before closing parenthesis
  checkout: clean up half-prepared directories in --to mode
  checkout: reject if the branch is already checked out elsewhere
  prune: strategies for linked checkouts
  checkout: support checking out into a new working directory
  ...
2015-05-11 14:23:39 -07:00
Karthik Nayak
46f034483e sha1_file: support reading from a loose object of unknown type
Update sha1_loose_object_info() to optionally allow it to read
from a loose object file of unknown/bogus type; as the function
usually returns the type of the object it read in the form of enum
for known types, add an optional "typename" field to receive the
name of the type in textual form and a flag to indicate the reading
of a loose object file of unknown/bogus type.

Add parse_sha1_header_extended() which acts as a wrapper around
parse_sha1_header() allowing more information to be obtained.

Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of
unknown/corrupt objects which have a unknown sha1 header size to
a strbuf structure. This was written by Junio C Hamano but tested
by me.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Hepled-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06 13:35:48 -07:00
Junio C Hamano
e3b199aef1 Merge branch 'jk/prune-mtime'
Access to objects in repositories that borrow from another one on a
slow NFS server unnecessarily got more expensive due to recent code
becoming more cautious in a naive way not to lose objects to pruning.

* jk/prune-mtime:
  sha1_file: only freshen packs once per run
  sha1_file: freshen pack objects before loose
  reachable: only mark local objects as recent
2015-05-05 21:00:37 -07:00
Junio C Hamano
1427a7ff70 write_sha1_file(): do not use a separate sha1[] array
In the beginning, write_sha1_file() did not have a way to tell the
caller the name of the object it wrote to the caller.  This was
changed in d6d3f9d0 (This implements the new "recursive tree"
write-tree., 2005-04-09) by adding the "returnsha1" parameter to the
function so that the callers who are interested in the value can
optionally pass a pointer to receive it.

It turns out that all callers do want to know the name of the object
it just has written.  Nobody passes a NULL to this parameter, hence
it is not necessary to use a separate sha1[] array to receive the
result from  write_sha1_file_prepare(), and copy the result to the
returnsha1 supplied by the caller.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-05 10:17:56 -07:00
Eric Sunshine
0c3db67cc8 hash-object --literally: fix buffer overrun with extra-long object type
"hash-object" learned in 5ba9a93 (hash-object: add --literally
option, 2014-09-11) to allow crafting a corrupt/broken object of
unknown type.

When the user-provided type is particularly long, however, it can
overflow the relatively small stack-based character array handed to
write_sha1_file_prepare() by hash_sha1_file() and write_sha1_file(),
leading to stack corruption (and crash).  Introduce a custom helper
to allow arbitrarily long typenames just for "hash-object --literally".

[jc: Eric's original used a strbuf in the more common codepaths, and
I rewrote it to avoid penalizing the non-literally code. Bugs are mine]

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-05 10:14:18 -07:00
Jeff King
ee1c6c34ac sha1_file: only freshen packs once per run
Since 33d4221 (write_sha1_file: freshen existing objects,
2014-10-15), we update the mtime of existing objects that we
would have written out (had they not existed). For the
common case in which many objects are packed, we may update
the mtime on a single packfile repeatedly. This can result
in a noticeable performance problem if calling utime() is
expensive (e.g., because your storage is on NFS).

We can fix this by keeping a per-pack flag that lets us
freshen only once per program invocation.

An alternative would be to keep the packed_git.mtime flag up
to date as we freshen, and freshen only once every N
seconds. In practice, it's not worth the complexity. We are
racing against prune expiration times here, which inherently
must be set to accomodate reasonable program running times
(because they really care about the time between an object
being written and it becoming referenced, and the latter is
typically the last step a program takes).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20 13:09:40 -07:00
Jeff King
b5f52f372e sha1_file: freshen pack objects before loose
When writing out an object file, we first check whether it
already exists and if so optimize out the write. Prior to
33d4221, we did this by calling has_sha1_file(), which will
check for packed objects followed by loose. Since that
commit, we check loose objects first.

For the common case of a repository whose objects are mostly
packed, this means we will make a lot of extra access()
system calls checking for loose objects. We should follow
the same packed-then-loose order that all of our other
lookups use.

Reported-by: Stefan Saasen <ssaasen@atlassian.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20 13:09:38 -07:00
Jeff King
1385bb7ba3 reachable: only mark local objects as recent
When pruning and repacking a repository that has an
alternate object store configured, we may traverse a large
number of objects in the alternate. This serves no purpose,
and may be expensive to do. A longer explanation is below.

Commits d3038d2 and abcb865 taught prune and pack-objects
(respectively) to treat "recent" objects as tips for
reachability, so that we keep whole chunks of history. They
built on the object traversal in 660c889 (sha1_file: add
for_each iterators for loose and packed objects,
2014-10-15), which covers both local and alternate objects.

In both cases, covering alternate objects is unnecessary, as
both commands can only drop objects from the local
repository. In the case of prune, we traverse only the local
object directory. And in the case of repacking, while we may
or may not include local objects in our pack, we will never
reach into the alternate with "repack -d". The "-l" option
is only a question of whether we are migrating objects from
the alternate into our repository, or leaving them
untouched.

It is possible that we may drop an object that is depended
upon by another object in the alternate. For example,
imagine two repositories, A and B, with A pointing to B as
an alternate. Now imagine a commit that is in B which
references a tree that is only in A. Traversing from recent
objects in B might prevent A from dropping that tree. But
this case isn't worth covering. Repo B should take
responsibility for its own objects. It would never have had
the commit in the first place if it did not also have the
tree, and assuming it is using the same "keep recent chunks
of history" scheme, then it would itself keep the tree, as
well.

So checking the alternate objects is not worth doing, and
come with a significant performance impact. In both cases,
we skip any recent objects that have already been marked
SEEN (i.e., that we know are already reachable for prune, or
included in the pack for a repack). So there is a slight
waste of time in opening the alternate packs at all, only to
notice that we have already considered each object. But much
worse, the alternate repository may have a large number of
objects that are not reachable from the local repository at
all, and we end up adding them to the traversal.

We can fix this by considering only local unseen objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20 13:09:27 -07:00
Jeff King
319b678a7b sha1_file: squelch "packfile cannot be accessed" warnings
When we find an object in a packfile index, we make sure we
can still open the packfile itself (or that it is already
open), as it might have been deleted by a simultaneous
repack. If we can't access the packfile, we print a warning
for the user and tell the caller that we don't have the
object (we can then look in other packfiles, or find a loose
version, before giving up).

The warning we print to the user isn't really accomplishing
anything, and it is potentially confusing to users. In the
normal case, it is complete noise; we find the object
elsewhere, and the user does not have to care that we racily
saw a packfile index that became stale. It didn't affect the
operation at all.

A possibly more interesting case is when we later can't find
the object, and report failure to the user. In this case the
warning could be considered a clue toward that ultimate
failure. But it's not really a useful clue in practice. We
wouldn't even print it consistently (since we are racing
with another process, we might not even see the .idx file,
or we might win the race and open the packfile, completing
the operation).

This patch drops the warning entirely (not only from the
fill_pack_entry site, but also from an identical use in
pack-objects). If we did find the warning interesting in the
error case, we could stuff it away and reveal it to the user
when we later die() due to the broken object. But that
complexity just isn't worth it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-30 21:47:39 -07:00
Junio C Hamano
a393c6bfd9 Merge branch 'rs/deflate-init-cleanup' into maint
Code simplification.

* rs/deflate-init-cleanup:
  zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-23 11:23:38 -07:00
Junio C Hamano
6902c4da58 Merge branch 'rs/deflate-init-cleanup'
Code simplification.

* rs/deflate-init-cleanup:
  zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-17 16:01:26 -07:00
René Scharfe
9a6f1287fb zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-05 15:46:03 -08:00
Junio C Hamano
cbc8d6d8f8 Merge branch 'jk/prune-mtime' into maint
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.

* jk/prune-mtime:
  sha1_file: fix iterating loose alternate objects
  for_each_loose_file_in_objdir: take an optional strbuf path
2015-03-05 13:13:08 -08:00
Junio C Hamano
7cf6232e2c Merge branch 'jk/prune-mtime'
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.

* jk/prune-mtime:
  sha1_file: fix iterating loose alternate objects
  for_each_loose_file_in_objdir: take an optional strbuf path
2015-02-22 12:28:28 -08:00
Jonathon Mah
b0a4264277 sha1_file: fix iterating loose alternate objects
The string in 'base' contains a path suffix to a specific object;
when its value is used, the suffix must either be filled (as in
stat_sha1_file, open_sha1_file, check_and_freshen_nonlocal) or
cleared (as in prepare_packed_git) to avoid junk at the end.

660c889e (sha1_file: add for_each iterators for loose and packed
objects, 2014-10-15) introduced loose_from_alt_odb(), but this did
neither and treated 'base' as a complete path to the "base" object
directory, instead of a pointer to the "base" of the full path
string.

The trailing path after 'base' is still initialized to NUL, hiding
the bug in some common cases.  Additionally the descendent
for_each_file_in_obj_subdir() function swallows ENOENT, so an error
only shows if the alternate's path was last filled with a valid
object (where statting /path/to/existing/00/0bjectfile/00 fails).

Signed-off-by: Jonathon Mah <me@JonathonMah.com>
Helped-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-09 14:14:56 -08:00
Jeff King
e6f875e052 for_each_loose_file_in_objdir: take an optional strbuf path
We feed a root "objdir" path to this iterator function,
which then copies the result into a strbuf, so that it can
repeatedly append the object sub-directories to it. Let's
make it easy for callers to just pass us a strbuf in the
first place.

We leave the original interface as a convenience for callers
who want to just pass a const string like the result of
get_object_directory().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-09 14:14:53 -08:00
Nguyễn Thái Ngọc Duy
dcf692625a path.c: make get_pathname() call sites return const char *
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.

After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.

Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.

This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:10 -08:00
Michael Haggerty
3383e19984 sort_string_list(): rename to string_list_sort()
The new name is more consistent with the names of other
string_list-related functions.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-25 10:11:34 -08:00
Junio C Hamano
d70e331c0e Merge branch 'jk/prune-mtime'
Tighten the logic to decide that an unreachable cruft is
sufficiently old by covering corner cases such as an ancient object
becoming reachable and then going unreachable again, in which case
its retention period should be prolonged.

* jk/prune-mtime: (28 commits)
  drop add_object_array_with_mode
  revision: remove definition of unused 'add_object' function
  pack-objects: double-check options before discarding objects
  repack: pack objects mentioned by the index
  pack-objects: use argv_array
  reachable: use revision machinery's --indexed-objects code
  rev-list: add --indexed-objects option
  rev-list: document --reflog option
  t5516: test pushing a tag of an otherwise unreferenced blob
  traverse_commit_list: support pending blobs/trees with paths
  make add_object_array_with_context interface more sane
  write_sha1_file: freshen existing objects
  pack-objects: match prune logic for discarding objects
  pack-objects: refactor unpack-unreachable expiration check
  prune: keep objects reachable from recent objects
  sha1_file: add for_each iterators for loose and packed objects
  count-objects: use for_each_loose_file_in_objdir
  count-objects: do not use xsize_t when counting object size
  prune-packed: use for_each_loose_file_in_objdir
  reachable: mark index blobs as SEEN
  ...
2014-10-29 10:07:56 -07:00
Jeff King
33d4221c79 write_sha1_file: freshen existing objects
When we try to write a loose object file, we first check
whether that object already exists. If so, we skip the
write as an optimization. However, this can interfere with
prune's strategy of using mtimes to mark files in progress.

For example, if a branch contains a particular tree object
and is deleted, that tree object may become unreachable, and
have an old mtime. If a new operation then tries to write
the same tree, this ends up as a noop; we notice we
already have the object and do nothing. A prune running
simultaneously with this operation will see the object as
old, and may delete it.

We can solve this by "freshening" objects that we avoid
writing by updating their mtime. The algorithm for doing so
is essentially the same as that of has_sha1_file. Therefore
we provide a new (static) interface "check_and_freshen",
which finds and optionally freshens the object. It's trivial
to implement freshening and simple checking by tweaking a
single parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:43 -07:00
Jeff King
660c889e46 sha1_file: add for_each iterators for loose and packed objects
We typically iterate over the reachable objects in a
repository by starting at the tips and walking the graph.
There's no easy way to iterate over all of the objects,
including unreachable ones. Let's provide a way of doing so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
27e1e22d5e prune: factor out loose-object directory traversal
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.

Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).

We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:39 -07:00
Jeff King
fe1b22686f foreach_alt_odb: propagate return value from callback
We check the return value of the callback and stop iterating
if it is non-zero. However, we do not make the non-zero
return value available to the caller, so they have no way of
knowing whether the operation succeeded or not (technically
they can keep their own error flag in the callback data, but
that is unlike our other for_each functions).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:35 -07:00
Junio C Hamano
bd107e1052 Merge branch 'mh/lockfile'
The lockfile API and its users have been cleaned up.

* mh/lockfile: (38 commits)
  lockfile.h: extract new header file for the functions in lockfile.c
  hold_locked_index(): move from lockfile.c to read-cache.c
  hold_lock_file_for_append(): restore errno before returning
  get_locked_file_path(): new function
  lockfile.c: rename static functions
  lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
  commit_lock_file_to(): refactor a helper out of commit_lock_file()
  trim_last_path_component(): replace last_path_elm()
  resolve_symlink(): take a strbuf parameter
  resolve_symlink(): use a strbuf for internal scratch space
  lockfile: change lock_file::filename into a strbuf
  commit_lock_file(): use a strbuf to manage temporary space
  try_merge_strategy(): use a statically-allocated lock_file object
  try_merge_strategy(): remove redundant lock_file allocation
  struct lock_file: declare some fields volatile
  lockfile: avoid transitory invalid states
  git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
  dump_marks(): remove a redundant call to rollback_lock_file()
  api-lockfile: document edge cases
  commit_lock_file(): rollback lock file on failure to rename
  ...
2014-10-14 10:49:45 -07:00
Junio C Hamano
f0d8900175 Merge branch 'sp/stream-clean-filter'
When running a required clean filter, we do not have to mmap the
original before feeding the filter.  Instead, stream the file
contents directly to the filter and process its output.

* sp/stream-clean-filter:
  sha1_file: don't convert off_t to size_t too early to avoid potential die()
  convert: stream from fd to required clean filter to reduce used address space
  copy_fd(): do not close the input file descriptor
  mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
  config.c: add git_env_ulong() to parse environment variable
  convert: drop arguments other than 'path' from would_convert_to_git()
2014-10-08 13:05:32 -07:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Steffen Prohaska
9079ab7cb6 sha1_file: don't convert off_t to size_t too early to avoid potential die()
xsize_t() checks if an off_t argument can be safely converted to
a size_t return value.  If the check is executed too early, it could
fail for large files on 32-bit architectures even if the size_t code
path is not taken.  Other paths might be able to handle the large file.
Specifically, index_stream_convert_blob() is able to handle a large file
if a filter is configured that returns a small result.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-22 12:40:55 -07:00
Junio C Hamano
bedd3b4b7b Merge branch 'nd/large-blobs'
Teach a few codepaths to punt (instead of dying) when large blobs
that would not fit in core are involved in the operation.

* nd/large-blobs:
  diff: shortcut for diff'ing two binary SHA-1 objects
  diff --stat: mark any file larger than core.bigfilethreshold binary
  diff.c: allow to pass more flags to diff_populate_filespec
  sha1_file.c: do not die failing to malloc in unpack_compressed_entry
  wrapper.c: introduce gentle xmallocz that does not die()
2014-09-11 10:33:33 -07:00
Junio C Hamano
f655651e09 Merge branch 'rs/strbuf-getcwd'
Reduce the use of fixed sized buffer passed to getcwd() calls
by introducing xgetcwd() helper.

* rs/strbuf-getcwd:
  use strbuf_add_absolute_path() to add absolute paths
  abspath: convert absolute_path() to strbuf
  use xgetcwd() to set $GIT_DIR
  use xgetcwd() to get the current directory or die
  wrapper: add xgetcwd()
  abspath: convert real_path_internal() to strbuf
  abspath: use strbuf_getcwd() to remember original working directory
  setup: convert setup_git_directory_gently_1 et al. to strbuf
  unix-sockets: use strbuf_getcwd()
  strbuf: add strbuf_getcwd()
2014-09-02 13:28:44 -07:00
Steffen Prohaska
9035d75a2b convert: stream from fd to required clean filter to reduce used address space
The data is streamed to the filter process anyway.  Better avoid mapping
the file if possible.  This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media.  The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit.  The new
implementation can filter files of any size as long as the filter output
is small enough.

The new code path is only taken if the filter is required.  The filter
consumes data directly from the fd.  If it fails, the original data is
not immediately available.  The condition can easily be handled as
a fatal error, which is expected for a required filter anyway.

If the filter was not required, the condition would need to be handled
in a different way, like seeking to 0 and reading the data.  But this
would require more restructuring of the code and is probably not worth
it.  The obvious approach of falling back to reading all data would not
help achieving the main purpose of this patch, which is to handle large
files with limited address space.  If reading all data is an option, we
can simply take the old code path right away and mmap the entire file.

The environment variable GIT_MMAP_LIMIT, which has been introduced in
a previous commit is used to test that the expected code path is taken.
A related test that exercises required filters is modified to verify
that the data actually has been modified on its way from the file system
to the object store.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-28 10:25:15 -07:00
Steffen Prohaska
02710228dd mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
In order to test expectations about mmap in a way similar to testing
expectations about malloc with GIT_ALLOC_LIMIT introduced by
d41489a6 (Add more large blob test cases, 2012-03-07), introduce a
new environment variable GIT_MMAP_LIMIT to limit the largest allowed
mmap length.

xmmap() is modified to check the size of the requested region and
fail it if it is beyond the limit.  Together with GIT_ALLOC_LIMIT
tests can now confirm expectations about memory consumption.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-28 10:25:14 -07:00
Steffen Prohaska
7ce7c7607b convert: drop arguments other than 'path' from would_convert_to_git()
It is only the path that matters in the decision whether to filter
or not.  Clarify this by making path the only argument of
would_convert_to_git().

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-21 15:27:20 -07:00
Nguyễn Thái Ngọc Duy
735efde838 sha1_file.c: do not die failing to malloc in unpack_compressed_entry
Fewer die() gives better control to the caller, provided that the
caller _can_ handle it. And in unpack_compressed_entry() case, it can,
because unpack_compressed_entry() already returns NULL if it fails to
inflate data.

A side effect from this is fsck continues to run when very large blobs
are present (and do not fit in memory).

Noticed-by: Dale R. Worley <worley@alum.mit.edu>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-18 10:15:19 -07:00
Junio C Hamano
9f2de9c121 Merge branch 'kb/perf-trace'
* kb/perf-trace:
  api-trace.txt: add trace API documentation
  progress: simplify performance measurement by using getnanotime()
  wt-status: simplify performance measurement by using getnanotime()
  git: add performance tracing for git's main() function to debug scripts
  trace: add trace_performance facility to debug performance issues
  trace: add high resolution timer function to debug performance issues
  trace: add 'file:line' to all trace output
  trace: move code around, in preparation to file:line output
  trace: add current timestamp to all trace output
  trace: disable additional trace output for unit tests
  trace: add infrastructure to augment trace output with additional info
  sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
  Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
  trace: improve trace performance
  trace: remove redundant printf format attribute
  trace: consistently name the format parameter
  trace: move trace declarations from cache.h to new trace.h
2014-07-22 10:59:19 -07:00
Junio C Hamano
a8c565b227 Merge branch 'ek/alt-odb-entry-fix'
* ek/alt-odb-entry-fix:
  sha1_file: do not add own object directory as alternate
2014-07-21 11:18:46 -07:00
Junio C Hamano
6e4094731a Merge branch 'jk/strip-suffix'
* jk/strip-suffix:
  prepare_packed_git_one: refactor duplicate-pack check
  verify-pack: use strbuf_strip_suffix
  strbuf: implement strbuf_strip_suffix
  index-pack: use strip_suffix to avoid magic numbers
  use strip_suffix instead of ends_with in simple cases
  replace has_extension with ends_with
  implement ends_with via strip_suffix
  add strip_suffix function
  sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
2014-07-16 11:26:00 -07:00
Junio C Hamano
5a3db94539 Merge branch 'rs/fix-alt-odb-path-comparison' into maint
Code to avoid adding the same alternate object store twice was
subtly broken for a long time, but nobody seems to have noticed.

* rs/fix-alt-odb-path-comparison:
  sha1_file: avoid overrunning alternate object base string
2014-07-16 11:17:08 -07:00
Ephrim Khong
539e75069f sha1_file: do not add own object directory as alternate
When adding alternate object directories, we try not to add the
directory of the current repository to avoid cycles.  Unfortunately,
that test was broken, since it compared an absolute with a relative
path.

Signed-off-by: Ephrim Khong <dr.khong@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-15 11:50:15 -07:00
Karsten Blees
67dc598ec4 sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
This changes GIT_TRACE_PACK_ACCESS functionality as follows:
 * supports the same options as GIT_TRACE (e.g. printing to stderr)
 * no longer supports relative paths
 * appends to the trace file rather than overwriting

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 21:25:18 -07:00
Junio C Hamano
b41a4636ee Merge branch 'rs/fix-alt-odb-path-comparison'
* rs/fix-alt-odb-path-comparison:
  sha1_file: avoid overrunning alternate object base string
2014-07-10 11:27:52 -07:00
René Scharfe
80b47854ca sha1_file: avoid overrunning alternate object base string
While checking if a new alternate object database is a duplicate make
sure that old and new base paths have the same length before comparing
them with memcmp.  This avoids overrunning the buffer of the existing
entry if the new one is longer and it stops rejecting foobar/ after
foo/ was already added.

Signed-off-by: Rene Scharfe <ls.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-01 13:30:50 -07:00
Jeff King
47bf4b0fc5 prepare_packed_git_one: refactor duplicate-pack check
When we are reloading the list of packs, we check whether a
particular pack has been loaded. This is slightly tricky,
because we load packs based on the presence of their ".idx"
files, but record the name of the matching ".pack" file.
Therefore we want to compare their bases.

The existing code stripped off ".idx" from a file we found,
then compared that whole base length to strings containing
the ".pack" version. This meant we could end up comparing
bytes past what the ".pack" string contained, if the ".idx"
file name was much longer.

In practice, it worked OK because memcmp would end up seeing
a difference in the two strings and would return early
before hitting the full length. However, memcmp may
sometimes read extra bytes past a difference (e.g., because
it is comparing 64-bit words), or is even free to compare in
reverse order.

Furthermore, our memcmp made no guarantees that we matched
the whole pack name, up to ".pack". So "foo.idx" would match
"foo-bar.pack", which is wrong (but does not typically
happen, because our pack names have a fixed size).

We can fix both issues, avoid magic numbers, and document
that we expect to compare against a string with ".pack" by
using strip_suffix.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:32 -07:00
Jeff King
2975c770ca replace has_extension with ends_with
These two are almost the same function, with the exception
that has_extension only matches if there is content before
the suffix. So ends_with(".exe", ".exe") is true, but
has_extension would not be.

This distinction does not matter to any of the callers,
though, and we can just replace uses of has_extension with
ends_with. We prefer the "ends_with" name because it is more
generic, and there is nothing about the function that
requires it to be used for file extensions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:16 -07:00
René Scharfe
880fb8de67 sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
Instead of using strbuf to create a message string in case a path is
too long for our fixed-size buffer, replace that buffer with a strbuf
and thus get rid of the limitation.

Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-30 13:43:16 -07:00
Junio C Hamano
81bd9b1000 Merge branch 'jk/report-fail-to-read-objects-better' into maint
Reworded the error message given upon a failure to open an existing
loose object file due to e.g. permission issues; it was reported as
the object being corrupt, but that is not quite true.

* jk/report-fail-to-read-objects-better:
  open_sha1_file: report "most interesting" errno
2014-06-25 11:43:58 -07:00
Junio C Hamano
9d1d882e9c Merge branch 'jk/report-fail-to-read-objects-better'
* jk/report-fail-to-read-objects-better:
  open_sha1_file: report "most interesting" errno
2014-06-16 10:06:15 -07:00
Jeff King
d6c8a05bd5 open_sha1_file: report "most interesting" errno
When we try to open a loose object file, we first attempt to
open in the local object database, and then try any
alternates. This means that the errno value when we return
will be from the last place we looked (and due to the way
the code is structured, simply ENOENT if we do not have have
any alternates).

This can cause confusing error messages, as read_sha1_file
checks for ENOENT when reporting a missing object. If errno
is something else, we report that. If it is ENOENT, but
has_loose_object reports that we have it, then we claim the
object is corrupted. For example:

    $ chmod 0 .git/objects/??/*
    $ git rev-list --all
    fatal: loose object b2d6fab18b92d49eac46dc3c5a0bcafabda20131 (stored in .git/objects/b2/d6fab18b92d49eac46dc3c5a0bcafabda20131) is corrupt

This patch instead keeps track of the "most interesting"
errno we receive during our search. We consider ENOENT to be
the least interesting of all, and otherwise report the first
error found (so problems in the object database take
precedence over ones in alternates). Here it is with this
patch:

    $ git rev-list --all
    fatal: failed to read object b2d6fab18b92d49eac46dc3c5a0bcafabda20131: Permission denied

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-15 10:03:06 -07:00
Junio C Hamano
d59c12d7ad Merge branch 'jl/nor-or-nand-and'
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.

* jl/nor-or-nand-and:
  code and test: fix misuses of "nor"
  comments: fix misuses of "nor"
  contrib: fix misuses of "nor"
  Documentation: fix misuses of "nor"
2014-04-08 12:00:28 -07:00
Justin Lebar
235e8d5914 code and test: fix misuses of "nor"
Signed-off-by: Justin Lebar <jlebar@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 15:29:33 -07:00
Justin Lebar
01689909eb comments: fix misuses of "nor"
Signed-off-by: Justin Lebar <jlebar@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 15:29:27 -07:00
Junio C Hamano
fe9122a352 Merge branch 'dd/use-alloc-grow'
Replace open-coded reallocation with ALLOC_GROW() macro.

* dd/use-alloc-grow:
  sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
  read-cache.c: use ALLOC_GROW() in add_index_entry()
  builtin/mktree.c: use ALLOC_GROW() in append_to_tree()
  attr.c: use ALLOC_GROW() in handle_attr_line()
  dir.c: use ALLOC_GROW() in create_simplify()
  reflog-walk.c: use ALLOC_GROW()
  replace_object.c: use ALLOC_GROW() in register_replace_object()
  patch-ids.c: use ALLOC_GROW() in add_commit()
  diffcore-rename.c: use ALLOC_GROW()
  diff.c: use ALLOC_GROW()
  commit.c: use ALLOC_GROW() in register_commit_graft()
  cache-tree.c: use ALLOC_GROW() in find_subtree()
  bundle.c: use ALLOC_GROW() in add_to_ref_list()
  builtin/pack-objects.c: use ALLOC_GROW() in check_pbase_path()
2014-03-18 13:50:21 -07:00
Junio C Hamano
decba94d2c Merge branch 'nd/sha1-file-delta-stack-leakage-fix'
Fix a small leak in the delta stack used when resolving a long
delta chain at runtime.

* nd/sha1-file-delta-stack-leakage-fix:
  sha1_file: fix delta_stack memory leak in unpack_entry
2014-03-18 13:49:23 -07:00
Junio C Hamano
060be00621 Merge branch 'mh/object-code-cleanup'
* mh/object-code-cleanup:
  sha1_file.c: document a bunch of functions defined in the file
  sha1_file_name(): declare to return a const string
  find_pack_entry(): document last_found_pack
  replace_object: use struct members instead of an array
2014-03-14 14:26:29 -07:00
Dmitry S. Dolzhenko
c7353967ca sha1_file.c: use ALLOC_GROW() in pretend_sha1_file()
Helped-by: He Sun <sunheehnus@gmail.com>
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-03 14:54:58 -08:00
Junio C Hamano
0f9e62e084 Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
2014-02-27 14:01:48 -08:00
Michael Haggerty
d40d535b89 sha1_file.c: document a bunch of functions defined in the file
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 16:01:11 -08:00
Michael Haggerty
30d6c6eabf sha1_file_name(): declare to return a const string
Change the return value of sha1_file_name() to (const char *).
(Callers have no business mucking about here.)  Change callers
accordingly, deleting a few superfluous temporary variables along the
way.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 09:10:22 -08:00
Michael Haggerty
1b1005d1b5 find_pack_entry(): document last_found_pack
Add a comment at the declaration of last_found_pack and where it is
used in find_pack_entry().  In the latter, separate the cases (1) to
make a place for the new comment and (2) to turn the success case into
affirmative logic.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 09:09:56 -08:00
Nguyễn Thái Ngọc Duy
019d1e65f5 sha1_file: fix delta_stack memory leak in unpack_entry
This delta_stack array can grow to any length depending on the actual
delta chain, but we forget to free it. Normally it does not matter
because we use small_delta_stack[] from stack and small_delta_stack
can hold 64-delta chains, more than standard --depth=50 in pack-objects.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 09:07:12 -08:00
Junio C Hamano
33d4669aaa Merge branch 'ss/safe-create-leading-dir-with-slash'
"git clone $origin foo\bar\baz" on Windows failed to create the
leading directories (i.e. a moral-equivalent of "mkdir -p").

* ss/safe-create-leading-dir-with-slash:
  safe_create_leading_directories(): on Windows, \ can separate path components
2014-01-27 10:45:37 -08:00
Junio C Hamano
d0956cfa8e Merge branch 'mh/safe-create-leading-directories'
Code clean-up and protection against concurrent write access to the
ref namespace.

* mh/safe-create-leading-directories:
  rename_tmp_log(): on SCLD_VANISHED, retry
  rename_tmp_log(): limit the number of remote_empty_directories() attempts
  rename_tmp_log(): handle a possible mkdir/rmdir race
  rename_ref(): extract function rename_tmp_log()
  remove_dir_recurse(): handle disappearing files and directories
  remove_dir_recurse(): tighten condition for removing unreadable dir
  lock_ref_sha1_basic(): if locking fails with ENOENT, retry
  lock_ref_sha1_basic(): on SCLD_VANISHED, retry
  safe_create_leading_directories(): add new error value SCLD_VANISHED
  cmd_init_db(): when creating directories, handle errors conservatively
  safe_create_leading_directories(): introduce enum for return values
  safe_create_leading_directories(): always restore slash at end of loop
  safe_create_leading_directories(): split on first of multiple slashes
  safe_create_leading_directories(): rename local variable
  safe_create_leading_directories(): add explicit "slash" pointer
  safe_create_leading_directories(): reduce scope of local variable
  safe_create_leading_directories(): fix format of "if" chaining
2014-01-27 10:45:33 -08:00
Michael Haggerty
0f5274033e safe_create_leading_directories(): on Windows, \ can separate path components
When cloning to a directory "C:\foo\bar" from Windows' cmd.exe where
"foo" does not exist yet, Git would throw an error like

    fatal: could not create work tree dir 'c:\foo\bar'.: No such file or directory

Fix this by not hard-coding a platform specific directory separator
into safe_create_leading_directories().

This patch, including its entire commit message, is derived from a
patch by Sebastian Schuberth.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-22 11:00:07 -08:00
Jeff King
1a6d8b9148 do not discard revindex when re-preparing packfiles
When an object lookup fails, we re-read the objects/pack
directory to pick up any new packfiles that may have been
created since our last read. We also discard any pack
revindex structs we've allocated.

The discarding is a problem for the pack-bitmap code, which keeps
a pointer to the revindex for the bitmapped pack. After the
discard, the pointer is invalid, and we may read free()d
memory.

Other revindex users do not keep a bare pointer to the
revindex; instead, they always access it through
revindex_for_pack(), which lazily builds the revindex. So
one solution is to teach the pack-bitmap code a similar
trick. It would be slightly less efficient, but probably not
all that noticeable.

However, it turns out this discarding is not actually
necessary. When we call reprepare_packed_git, we do not
throw away our old pack list. We keep the existing entries,
and only add in new ones. So there is no safety problem; we
will still have the pack struct that matches each revindex.
The packfile itself may go away, of course, but we are
already prepared to handle that, and it may happen outside
of reprepare_packed_git anyway.

Throwing away the revindex may save some RAM if the pack
never gets reused (about 12 bytes per object). But it also
wastes some CPU time (to regenerate the index) if the pack
does get reused. It's hard to say which is more valuable,
but in either case, it happens very rarely (only when we
race with a simultaneous repack). Just leaving the revindex
in place is simple and safe both for current and future
code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-16 14:33:46 -08:00