Commit Graph

647 Commits

Author SHA1 Message Date
Junio C Hamano
d907bf8ef3 Merge branch 'jc/index-pack'
* jc/index-pack:
  verify-pack: use index-pack --verify
  index-pack: show histogram when emulating "verify-pack -v"
  index-pack: start learning to emulate "verify-pack -v"
  index-pack: a miniscule refactor
  index-pack --verify: read anomalous offsets from v2 idx file
  write_idx_file: need_large_offset() helper function
  index-pack: --verify
  write_idx_file: introduce a struct to hold idx customization options
  index-pack: group the delta-base array entries also by type

Conflicts:
	builtin/verify-pack.c
	cache.h
	sha1_file.c
2011-07-19 09:54:51 -07:00
Junio C Hamano
eb4f4076aa Merge branch 'jc/zlib-wrap'
* jc/zlib-wrap:
  zlib: allow feeding more than 4GB in one go
  zlib: zlib can only process 4GB at a time
  zlib: wrap deflateBound() too
  zlib: wrap deflate side of the API
  zlib: wrap inflateInit2 used to accept only for gzip format
  zlib: wrap remaining calls to direct inflate/inflateEnd
  zlib wrapper: refactor error message formatter

Conflicts:
	sha1_file.c
2011-07-19 09:33:04 -07:00
Junio C Hamano
5f2e448370 Merge branch 'jc/legacy-loose-object'
* jc/legacy-loose-object:
  sha1_file.c: "legacy" is really the current format
2011-07-13 14:31:34 -07:00
Junio C Hamano
5f44324d88 core: log offset pack data accesses happened
In a workload other than "git log" (without pathspec nor any option that
causes us to inspect trees and blobs), the recency pack order is said to
cause the access jump around quite a bit. Add a hook to allow us observe
how bad it is.

"git config core.logpackaccess /var/tmp/pal.txt" will give you the log
in the specified file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-06 19:09:29 -07:00
Junio C Hamano
ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Junio C Hamano
55bb5c9147 zlib: wrap deflate side of the API
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().

There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:10:29 -07:00
Junio C Hamano
cc5c54e78b sha1_file.c: "legacy" is really the current format
Every time I look at the read-loose-object codepath, legacy_loose_object()
function makes my brain go through mental contortion. When we were playing
with the experimental loose object format, it may have made sense to call
the traditional format "legacy", in the hope that the experimental one
will some day replace it to become official, but it never happened.

This renames the function (and negates its return value) to detect if we
are looking at the experimental format, and move the code around in its
caller which used to do "if we are looing at legacy, do this special case,
otherwise the normal case is this". The codepath to read from the loose
objects in experimental format is the "unlikely" case.

Someday after Git 2.0, we should drop the support of this format.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-08 16:39:33 -07:00
Junio C Hamano
3de89c9d42 verify-pack: use index-pack --verify
This finally gets rid of the inefficient verify-pack implementation that
walks objects in the packfile in their object name order and replaces it
with a call to index-pack --verify. As a side effect, it also removes
packed_object_info_detail() API which is rather expensive.

As this changes the way errors are reported (verify-pack used to rely on
the usual runtime error detection routine unpack_entry() to diagnose the
CRC errors in an entry in the *.idx file; index-pack --verify checks the
whole *.idx file in one go), update a test that expected the string "CRC"
to appear in the error message.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-05 22:45:38 -07:00
Jim Meyering
23c7df6bdd sha1_file: use the correct type (ssize_t, not size_t) for read-style function
Using an unsigned type, we would fail to detect a read error and then
proceed to try to write (size_t)-1 bytes.

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-26 11:25:59 -07:00
Junio C Hamano
5cfe4256d9 Merge branch 'jc/bigfile'
* jc/bigfile:
  Bigfile: teach "git add" to send a large file straight to a pack
  index_fd(): split into two helper functions
  index_fd(): turn write_object and format_check arguments into one flag
2011-05-25 16:23:26 -07:00
Junio C Hamano
f0270efd46 sha1_file.c: expose helpers to read loose objects
Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header()
available to the streaming read API by exporting them via cache.h header
file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 23:16:53 -07:00
Junio C Hamano
f8c8abc5b7 unpack_object_header(): make it public
This function is used to read and skip over the per-object header
in a packfile.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:54 -07:00
Junio C Hamano
5266d369b2 sha1_object_info_extended(): hint about objects in delta-base cache
An object found in the delta-base cache is not guaranteed to
stay there, but we know it came from a pack and it is likely
to give us a quick access if we read_sha1_file() it right now,
which is a piece of useful information.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:50 -07:00
Junio C Hamano
61d7503da1 Merge branch 'jc/replacing'
* jc/replacing:
  read_sha1_file(): allow selective bypassing of replacement mechanism
  inline lookup_replace_object() calls
  read_sha1_file(): get rid of read_sha1_file_repl() madness
  t6050: make sure we test not just commit replacement
  Declare lookup_replace_object() in cache.h, not in commit.h

Conflicts:
	environment.c
2011-05-19 20:37:21 -07:00
Junio C Hamano
9a49059022 sha1_object_info_extended(): expose a bit more info
The original interface for sha1_object_info() takes an object name and
gives back a type and its size (the latter is given only when it was
asked).  The new interface wraps its implementation and exposes a bit
more pieces of information that the interface used to discard, namely:

 - where the object is stored (loose? cached? packed?)
 - if packed, where in which packfile?

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * In the earlier round, this used u.pack.delta to record the length of
   the delta chain, but the caller is not necessarily interested in the
   length of the delta chain per-se, but may only want to know if it is a
   delta against another object or is stored as a deflated data. Calling
   packed_object_info_detail() involves walking the reverse index chain to
   compute the store size of the object and is unnecessarily expensive.

   We could resurrect the code if a new caller wants to know, but I doubt
   it.
2011-05-19 14:22:47 -07:00
Junio C Hamano
b9a62cbeb9 packed_object_info_detail(): do not return a string
Instead return an integer that can be given to typename() if
the caller wants a string, just like everybody else does.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-16 22:13:34 -07:00
Junio C Hamano
02071b27f1 Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streaming
* jc/convert:
  convert: make it harder to screw up adding a conversion attribute
  convert: make it safer to add conversion attributes
  convert: give saner names to crlf/eol variables, types and functions
  convert: rename the "eol" global variable to "core_eol"

* jc/bigfile:
  Bigfile: teach "git add" to send a large file straight to a pack
  index_fd(): split into two helper functions
  index_fd(): turn write_object and format_check arguments into one flag

* jc/replacing:
  read_sha1_file(): allow selective bypassing of replacement mechanism
  inline lookup_replace_object() calls
  read_sha1_file(): get rid of read_sha1_file_repl() madness
  t6050: make sure we test not just commit replacement
  Declare lookup_replace_object() in cache.h, not in commit.h
2011-05-15 16:30:13 -07:00
Junio C Hamano
f4e516834e git_open_noatime(): drop unused parameter
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28),
the extra parameter added in f2e872aa (Work around EMFILE when there are
too many pack files, 2010-11-01) is not used anymore.

Remove it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-15 15:24:52 -07:00
Junio C Hamano
ccf5ace0dc sha1_file: typofix
The number zero is spelled "zero", not "zer0".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:24:36 -07:00
Junio C Hamano
5bf29b9500 read_sha1_file(): allow selective bypassing of replacement mechanism
The way "object replacement" mechanism was tucked to the read_sha1_file()
interface was suboptimal in a couple of ways:

 - Callers that want it to die with useful diagnosis upon seeing a corrupt
   object does not have a way to say that they do not want any object
   replacement.

 - Callers who do not want it to die but want to handle the errors
   themselves are told to arrange to call read_object(), but the function
   does not use the replacement mechanism, and also it is a file scope
   static function that not many callers can call to begin with.

This adds a read_sha1_file_extended() that takes a set of flags; the
callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask
for object replacement mechanism to kick in.

Later, we could add another flag bit to tell the function to return an
error instead of dying and then remove the misguided "call read_object()
yourself".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:34 -07:00
Junio C Hamano
4bbf5a2615 read_sha1_file(): get rid of read_sha1_file_repl() madness
Most callers want to silently get a replacement object, and they do not
care what the real name of the replacement object is.  Worse yet, no sane
interface to return the underlying object without replacement is provided.

Remove the function and make only the few callers that want the name of
the replacement object find it themselves.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:33 -07:00
Junio C Hamano
4dd1fbc7b1 Bigfile: teach "git add" to send a large file straight to a pack
When adding a new content to the repository, we have always slurped
the blob in its entirety in-core first, and computed the object name
and compressed it into a loose object file.  Handling large binary
files (e.g.  video and audio asset for games) has been problematic
because of this design.

At the middle level of "git add" callchain is an internal API
index_fd() that takes an open file descriptor to read from the
working tree file being added with its size. Teach it to call out to
fast-import when adding a large blob.

The write-out codepath in entry.c::write_entry() should be taught to
stream, instead of reading everything in core. This should not be so
hard to implement, especially if we limit ourselves only to loose
object files and non-delta representation in packfiles.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13 16:11:18 -07:00
Junio C Hamano
7b41e1e15b index_fd(): split into two helper functions
Split out the case where we do not know the size of the input (hence we
read everything into a strbuf before doing anything) to index_pipe(), and
the other case where we mmap or read the whole data to index_bulk().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Junio C Hamano
c4ce46fc7a index_fd(): turn write_object and format_check arguments into one flag
The "format_check" parameter tucked after the existing parameters is too
ugly an afterthought to live in any reasonable API.

Combine it with the other boolean parameter "write_object" into a single
"flags" parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Jim Meyering
0353a0c4ec remove doubled words, e.g., s/to to/to/, and fix related typos
I found that some doubled words had snuck back into projects from which
I'd already removed them, so now there's a "syntax-check" makefile rule in
gnulib to help prevent recurrence.

Running the command below spotted a few in git, too:

  git ls-files | xargs perl -0777 -n \
    -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \
    -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \
    -e 'print "$ARGV:$n:$v\n"}'

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13 11:59:11 -07:00
Junio C Hamano
ad7bb2f68c Merge branch 'jc/maint-rerere-in-workdir'
* jc/maint-rerere-in-workdir:
  rerere: make sure it works even in a workdir attached to a young repository
2011-03-26 20:13:16 -07:00
Junio C Hamano
90a6464b4a rerere: make sure it works even in a workdir attached to a young repository
The git-new-workdir script in contrib/ makes a new work tree by sharing
many subdirectories of the .git directory with the original repository.
When rerere.enabled is set in the original repository, but the user has
not encountered any conflicts yet, the original repository may not yet
have .git/rr-cache directory.

When rerere wants to run in a new work tree created from such a young
original repository, it fails to mkdir(2) .git/rr-cache that is a symlink
to a yet-to-be-created directory.

There are three possible approaches to this:

 - A naive solution is not to create a symlink in the git-new-workdir
   script to a directory the original does not have (yet).  This is not a
   solution, as we tend to lazily create subdirectories of .git/, and
   having rerere.enabled configuration set is a strong indication that the
   user _wants_ to have this lazy creation to happen;

 - We could always create .git/rr-cache upon repository creation.  This is
   tempting but will not help people with existing repositories.

 - Detect this case by seeing that mkdir(2) failed with EEXIST, checking
   that the path is a symlink, and try running mkdir(2) on the link
   target.

This patch solves the issue by doing the third one.

Strictly speaking, this is incomplete.  It does not attempt to handle
relative symbolic link that points into the original repository, but this
is good enough to help people who use contrib/workdir/git-new-workdir
script.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-23 16:05:44 -07:00
Junio C Hamano
3ed8868474 Merge branch 'jn/maint-c99-format'
* jn/maint-c99-format:
  unbreak and eliminate NO_C99_FORMAT
  mktag: avoid %td in format string
2011-03-23 14:55:46 -07:00
Jonathan Nieder
28bd70d811 unbreak and eliminate NO_C99_FORMAT
In the spirit of v1.5.0.2~21 (Check for PRIuMAX rather than
NO_C99_FORMAT in fast-import.c, 2007-02-20), use PRIuMAX from
git-compat-util.h on all platforms instead of C99-specific formats
like %zu with dangerous fallbacks to %u or %lu.

So now C99-challenged platforms can build git without provoking
warnings or errors from printf, even if pointers do not have the same
size as an int or long.

The need for a fallback PRIuMAX is detected in git-compat-util.h with
"#ifndef PRIuMAX".  So while at it, simplify the Makefile and configure
script by eliminating the NO_C99_FORMAT knob altogether.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-17 15:30:49 -07:00
Junio C Hamano
674ef90904 Merge branch 'sp/maint-fd-limit'
* sp/maint-fd-limit:
  sha1_file.c: Don't retain open fds on small packs
  mingw: add minimum getrlimit() compatibility stub
  Limit file descriptors used by packs
2011-03-15 14:22:23 -07:00
Shawn O. Pearce
d131b7afea sha1_file.c: Don't retain open fds on small packs
If a pack file is small enough that its entire contents fits within
one mmap window, mmap the file and then immediately close its file
descriptor.  This reduces the number of file descriptors that are
needed to read from repositories with many tiny pack files, such
as one that has received 1000 pushes (and created 1000 small pack
files) since its last repack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-02 11:25:30 -08:00
Shawn O. Pearce
c7934306d1 Limit file descriptors used by packs
Rather than using 'errno == EMFILE' after a failed open() call
to indicate the process is out of file descriptors and an LRU
pack window should be closed, place a hard upper limit on the
number of open packs based on the actual rlimit of the process.

By using a hard upper limit that is below the rlimit of the current
process it is not necessary to check for EMFILE on every single
fd-allocating system call.  Instead reserving 25 file descriptors
makes it safe to assume the system call won't fail due to being over
the filedescriptor limit.  Here 25 is chosen as a WAG, but considers
3 for stdin/stdout/stderr, and at least a few for other Git code
to operate on temporary files.  An additional 20 is reserved as it
is not known what the C library needs to perform other services on
Git's behalf, such as nsswitch or name resolution.

This fixes a case where running `git gc --auto` in a repository
with more than 1024 packs (but an rlimit of 1024 open fds) fails
due to the temporary output file not being able to allocate a
file descriptor.  The output file is opened by pack-objects after
object enumeration and delta compression are done, both of which
have already opened all of the packs and fully populated the file
descriptor table.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28 13:08:31 -08:00
Junio C Hamano
fc7ae9c156 Merge branch 'nd/hash-object-sanity'
* nd/hash-object-sanity:
  Make hash-object more robust against malformed objects

Conflicts:
	cache.h
2011-02-27 21:58:30 -08:00
Jonathan Nieder
dab0d4108d correct type of EMPTY_TREE_SHA1_BIN
Functions such as hashcmp that expect a binary SHA-1 value take
parameters of type "unsigned char *" to avoid accepting a textual
SHA-1 passed by mistake.  Unfortunately, this means passing the string
literal EMPTY_TREE_SHA1_BIN requires an ugly cast.  Tweak the
definition of EMPTY_TREE_SHA1_BIN to produce a value of more
convenient type.

In the future the definition might change to

	extern const unsigned char empty_tree_sha1_bin[20];
	#define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-14 10:48:06 -08:00
Nguyễn Thái Ngọc Duy
c4d9986f5f sha1_object_info: examine cached_object store too
Cached object store was added in d66b37b (Add pretend_sha1_file()
interface. - 2007-02-04) as a way to temporarily inject some objects
to object store.

But only read_sha1_file() knows about this store. While it will return
an object from this store, sha1_object_info() will happily say
"object not found".

Teach sha1_object_info() about the cached store for consistency.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:48 -08:00
Nguyễn Thái Ngọc Duy
c597ba8010 sha1_file.c: move find_cached_object up so sha1_object_info can use it
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:46 -08:00
Nguyễn Thái Ngọc Duy
c879daa237 Make hash-object more robust against malformed objects
Commits, trees and tags have structure. Don't let users feed git
with malformed ones. Sooner or later git will die() when
encountering them.

Note that this patch does not check semantics. A tree that points
to non-existent objects is perfectly OK (and should be so, users
may choose to add commit first, then its associated tree for example).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:25 -08:00
Björn Steinbrink
25f3af3f9d Correctly report corrupted objects
The errno check added in commit 3ba7a06 "A loose object is not corrupt
if it cannot be read due to EMFILE" only checked for whether errno is
not ENOENT and thus incorrectly treated "no error" as an error
condition.

Because of that, it never reached the code path that would report that
the object is corrupted and instead caused funny errors like:

  fatal: failed to read object 333c4768ce595793fdab1ef3a036413e2a883853: Success

So we have to extend the check to cover the case in which the object
file was successfully read, but its contents are corrupted.

Reported-by: Will Palmer <wmpalmer@gmail.com>
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-20 13:18:51 -08:00
Junio C Hamano
39f04dbaac Merge branch 'jn/thinner-wrapper'
* jn/thinner-wrapper:
  Remove pack file handling dependency from wrapper.o
  pack-objects: mark file-local variable static
  wrapper: give zlib wrappers their own translation unit
  strbuf: move strbuf_branchname to sha1_name.c
  path helpers: move git_mkstemp* to wrapper.c
  wrapper: move odb_* to environment.c
  wrapper: move xmmap() to sha1_file.c
2010-12-03 16:13:06 -08:00
Jonathan Nieder
e050029385 Remove pack file handling dependency from wrapper.o
As v1.7.0-rc0~43 (slim down "git show-index", 2010-01-21) explains,
use of xmalloc() brings in a dependency on zlib, the sha1 lib, and the
rest of git's object file access machinery via try_to_free_pack_memory.
That is overkill when xmalloc is just being used as a convenience
wrapper to exit when no memory is available.

So defer setting try_to_free_pack_memory as try_to_free_routine until
the first packfile is opened in add_packed_git().

After this change, a simple program using xmalloc() and no other
functions will not pull in any code from libgit.a aside from wrapper.o
and usage.o.

Improved-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-10 11:11:07 -08:00
Jonathan Nieder
58ecbd5ede wrapper: move xmmap() to sha1_file.c
wrapper.o depends on sha1_file.o for a number of reasons.  One is
release_pack_memory().

xmmap function calls mmap, discarding unused pack windows when
necessary to relieve memory pressure.  Simple git programs using
wrapper.o as a friendly libc do not need this functionality.
So move xmmap to sha1_file.o, where release_pack_memory() is.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-10 11:03:13 -08:00
Shawn O. Pearce
f2e872aa5e Work around EMFILE when there are too many pack files
When opening any files in the object database, release unused pack
windows if the open(2) syscall fails due to EMFILE (too many open
files in this process).  This allows Git to degrade gracefully on
a repository with thousands of pack files, and a commit stored in
a loose object in the middle of the history.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 10:21:46 -07:00
Shawn O. Pearce
4865d2b662 Use git_open_noatime when accessing pack data
This utility function avoids an unnecessary update of the access time
for a loose object file.  Just as the atime isn't useful on a loose
object, its not useful on the pack or the corresonding idx file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:25:58 -07:00
Junio C Hamano
3ba7a06552 A loose object is not corrupt if it cannot be read due to EMFILE
"git fsck" bails out with a claim that a loose object that cannot be
read but exists on the filesystem to be corrupt, which is wrong when
read_object() failed due to e.g. EMFILE.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:24:57 -07:00
Junio C Hamano
b6c4ceccb3 read_sha1_file(): report correct name of packfile with a corrupt object
Clarify the error reporting logic by moving the normal codepath (i.e. we
read the object we wanted to read correctly) up and return early.

The logic to report the name of the packfile with a corrupt object,
introduced by e8b15e6 (sha1_file: Show the the type and path to corrupt
objects, 2010-06-10), was totally bogus.  The function that knows which
bad object came from what packfile is has_packed_and_bad(); make it report
which packfile the problem was found.

"Corrupt" is already an adjective, e.g. an object is "corrupt"; we do not
have to say "corrupted object".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:24:47 -07:00
Ævar Arnfjörð Bjarmason
e8b15e6156 sha1_file: Show the the type and path to corrupt objects
Change the error message that's displayed when we encounter corrupt
objects to be more specific. We now print the type (loose or packed)
of corrupted objects, along with the full path to the file in
question.

Before:

    $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
    fatal: object 909ef997367880aaf2133bafa1f1a71aa28e09df is corrupted

After:

    $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
    fatal: loose object 909ef997367880aaf2133bafa1f1a71aa28e09df (stored in .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df) is corrupted

Knowing the path helps to quickly analyze what's wrong:

    $ file .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df
    .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df: empty

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-14 15:35:12 -07:00
Junio C Hamano
e391fdfc69 Merge branch 'jk/maint-sha1-file-name-fix'
* jk/maint-sha1-file-name-fix:
  remove over-eager caching in sha1_file_name
2010-06-13 11:22:00 -07:00
Jeff King
560fb6a183 remove over-eager caching in sha1_file_name
This function takes a sha1 and produces a loose object
filename. It caches the location of the object directory so
that it can fill the sha1 information directly without
allocating a new buffer (and in its original incarnation,
without calling getenv(), though these days we cache that
with the code in environment.c).

This cached base directory can become stale, however, if in
a single process git changes the location of the object
directory (e.g., by running setup_work_tree, which will
chdir to the new worktree).

In most cases this isn't a problem, because we tend to set
up the git repository location and do any chdir()s before
actually looking up any objects, so the first lookup will
cache the correct location. In the case of reset --hard,
however, we do something like:

  1. look up the commit object

  2. notice we are doing --hard, run setup_work_tree

  3. look up the tree object to reset

Step (3) fails because our cache object directory value is
bogus.

This patch simply removes the caching. We use a static
buffer instead of allocating one each time (the original
version treated the malloc'd buffer as a static, so there is
no change in calling semantics).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-25 09:21:28 -07:00
Junio C Hamano
035bf8d7c4 Merge branch 'sp/maint-dumb-http-pack-reidx'
* sp/maint-dumb-http-pack-reidx:
  http.c::new_http_pack_request: do away with the temp variable filename
  http-fetch: Use temporary files for pack-*.idx until verified
  http-fetch: Use index-pack rather than verify-pack to check packs
  Allow parse_pack_index on temporary files
  Extract verify_pack_index for reuse from verify_pack
  Introduce close_pack_index to permit replacement
  http.c: Remove unnecessary strdup of sha1_to_hex result
  http.c: Don't store destination name in request structures
  http.c: Drop useless != NULL test in finish_http_pack_request
  http.c: Tiny refactoring of finish_http_pack_request
  t5550-http-fetch: Use subshell for repository operations
  http.c: Remove bad free of static block
2010-05-21 04:02:19 -07:00
Junio C Hamano
636e87d705 Merge branch 'maint'
* maint:
  Documentation/gitdiffcore: fix order in pickaxe description
  Documentation: fix minor inconsistency
  Documentation: rebase -i ignores options passed to "git am"
  hash_object: correction for zero length file
2010-05-18 22:39:56 -07:00
Dmitry Potapov
08bda2085c hash_object: correction for zero length file
The check whether size is zero was done after if size <= SMALL_FILE_SIZE,
as result, zero size case was never triggered. Instead zero length file
was treated as any other small file. This did not caused any problem, but
if we have a special case for size equal to zero, it is better to make it
work and avoid redundant malloc().

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-18 21:46:36 -07:00
Shawn O. Pearce
7b64469a36 Allow parse_pack_index on temporary files
The easiest way to verify a pack index is to open it through the
standard parse_pack_index function, permitting the header check
to happen when the file is mapped.  However, the dumb HTTP client
needs to verify a pack index before its moved into its proper file
name within the objects/pack directory, to prevent a corrupt index
from being made available.  So permit the caller to specify the
exact path of the index file.

For now we're still using the final destination name within the
sole call site in http.c, but eventually we will start to parse
the temporary path instead.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 17:56:17 -07:00
Shawn O. Pearce
fa5fc15d6e Introduce close_pack_index to permit replacement
By closing the pack index, a caller can later overwrite the index
with an updated index file, possibly after converting from v1 to
the v2 format.  Because p->index_data is NULL after close, on the
next access the index will be opened again and the other members
will be updated with new data.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 17:56:08 -07:00
Jeff King
40d52ff77b make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.

The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".

Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01 23:53:54 -07:00
Jeff King
c00e657df2 fix const-correctness of write_sha1_file
These should take const buffers as input data, but zlib's
next_in pointer is not const-correct. Let's fix it at the
zlib level, though, so the cast happens in one obvious
place. This should be safe, as a similar cast is used in
zlib's example code for a const array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01 23:49:03 -07:00
Junio C Hamano
493e433277 Merge branch 'mm/mkstemps-mode-for-packfiles' into maint
* mm/mkstemps-mode-for-packfiles:
  Use git_mkstemp_mode instead of plain mkstemp to create object files
  git_mkstemps_mode: don't set errno to EINVAL on exit.
  Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
  git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
  Move gitmkstemps to path.c
  Add a testcase for ACL with restrictive umask.
2010-03-08 00:36:00 -08:00
Junio C Hamano
c2b456b895 Merge branch 'nd/root-git'
* nd/root-git:
  Add test for using Git at root of file system
  Support working directory located at root
  Move offset_1st_component() to path.c
  init-db, rev-parse --git-dir: do not append redundant slash
  make_absolute_path(): Do not append redundant slash

Conflicts:
	setup.c
	sha1_file.c
2010-03-07 12:47:15 -08:00
Junio C Hamano
87912fd617 Merge branch 'mm/mkstemps-mode-for-packfiles'
* mm/mkstemps-mode-for-packfiles:
  Use git_mkstemp_mode instead of plain mkstemp to create object files
  git_mkstemps_mode: don't set errno to EINVAL on exit.
  Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
  git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
  Move gitmkstemps to path.c
  Add a testcase for ACL with restrictive umask.
2010-03-07 12:47:14 -08:00
Junio C Hamano
780fc9a0a6 Merge branch 'dp/read-not-mmap-small-loose-object' into maint
* dp/read-not-mmap-small-loose-object:
  hash-object: don't use mmap() for small files
2010-03-04 22:26:17 -08:00
Junio C Hamano
34c014d13e Merge branch 'np/compress-loose-object-memsave'
* np/compress-loose-object-memsave:
  sha1_file: be paranoid when creating loose objects
  sha1_file: don't malloc the whole compressed result when writing out objects
2010-03-02 12:44:09 -08:00
Matthieu Moy
5256b00631 Use git_mkstemp_mode instead of plain mkstemp to create object files
We used to unnecessarily give the read permission to group and others,
regardless of the umask, which isn't serious because the objects are
still protected by their containing directory, but isn't necessary
either.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22 15:24:46 -08:00
Nicolas Pitre
748af44c63 sha1_file: be paranoid when creating loose objects
We don't want the data being deflated and stored into loose objects
to be different from what we expect.  While the deflated data is
protected by a CRC which is good enough for safe data retrieval
operations, we still want to be doubly sure that the source data used
at object creation time is still what we expected once that data has
been deflated and its CRC32 computed.

The most plausible data corruption may occur if the source file is
modified while Git is deflating and writing it out in a loose object.
Or Git itself could have a bug causing memory corruption.  Or even bad
RAM could cause trouble.  So it is best to make sure everything is
coherent and checksum protected from beginning to end.

To do so we compute the SHA1 of the data being deflated _after_ the
deflate operation has consumed that data, and make sure it matches
with the expected SHA1.  This way we can rely on the CRC32 checked by
the inflate operation to provide a good indication that the data is still
coherent with its SHA1 hash.  One pathological case we ignore is when
the data is modified before (or during) deflate call, but changed back
before it is hashed.

There is some overhead of course. Using 'git add' on a set of large files:

Before:

	real    0m25.210s
	user    0m23.783s
	sys     0m1.408s

After:

	real    0m26.537s
	user    0m25.175s
	sys     0m1.358s

The overhead is around 5% for full data coherency guarantee.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21 22:33:25 -08:00
Dmitry Potapov
ea68b0ce9f hash-object: don't use mmap() for small files
Using read() instead of mmap() can be 39% speed up for 1Kb files and is
1% speed up 1Mb files. For larger files, it is better to use mmap(),
because the difference between is not significant, and when there is not
enough memory, mmap() performs much better, because it avoids swapping.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21 11:39:10 -08:00
Nicolas Pitre
9892bebafe sha1_file: don't malloc the whole compressed result when writing out objects
There is no real advantage to malloc the whole output buffer and
deflate the data in a single pass when writing loose objects. That is
like only 1% faster while using more memory, especially with large
files where memory usage is far more. It is best to deflate and write
the data out in small chunks reusing the same memory instead.

For example, using 'git add' on a few large files averaging 40 MB ...

Before:
21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+142640minor)pagefaults 0swaps

After:
21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+104408minor)pagefaults 0swaps

While the runtime stayed relatively the same, the number of minor page
faults went down significantly.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21 11:36:23 -08:00
Nguyễn Thái Ngọc Duy
4bb43de259 Move offset_1st_component() to path.c
The implementation is also lightly modified to use is_dir_sep()
instead of hardcoding '/'.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-16 08:54:34 -08:00
Junio C Hamano
a0075d9e6a Merge branch 'il/maint-xmallocz'
* il/maint-xmallocz:
  Fix integer overflow in unpack_compressed_entry()
  Fix integer overflow in unpack_sha1_rest()
  Fix integer overflow in patch_delta()
  Add xmallocz()
2010-01-27 14:56:38 -08:00
Ilari Liusvaara
4ab07e4d10 Fix integer overflow in unpack_compressed_entry()
Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 13:00:16 -08:00
Ilari Liusvaara
3aee68aa68 Fix integer overflow in unpack_sha1_rest()
[jc: later NUL termination by the caller becomes unnecessary]

Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-26 13:00:10 -08:00
Linus Torvalds
a5031214c4 slim down "git show-index"
As the documentation says, this is primarily for debugging, and
in the longer term we should rename it to test-show-index or something.

In the meantime, just avoid xmalloc (which slurps in the rest of git), and
separating out the trivial hex functions into "hex.o".

This results in

  [torvalds@nehalem git]$ size git-show-index
       text    data     bss     dec     hex filename
     222818    2276  112688  337782   52776 git-show-index (before)
       5696     624    1264    7584    1da0 git-show-index (after)

which is a whole lot better.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-21 20:03:45 -08:00
Junio C Hamano
356521ab22 sha1_file.c: remove unused function
has_pack_file() is not used anywhere.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-12 01:06:09 -08:00
Junio C Hamano
39eea7bdd9 Fix incorrect error check while reading deflated pack data
The loop in get_size_from_delta() feeds a deflated delta data from the
pack stream _until_ we get inflated result of 20 bytes[*] or we reach the
end of stream.

    Side note. This magic number 20 does not have anything to do with the
    size of the hash we use, but comes from 1a3b55c (reduce delta head
    inflated size, 2006-10-18).

The loop reads like this:

    do {
        in = use_pack();
        stream.next_in = in;
        st = git_inflate(&stream, Z_FINISH);
        curpos += stream.next_in - in;
    } while ((st == Z_OK || st == Z_BUF_ERROR) &&
             stream.total_out < sizeof(delta_head));

This git_inflate() can return:

 - Z_STREAM_END, if use_pack() fed it enough input and the delta itself
   was smaller than 20 bytes;

 - Z_OK, when some progress has been made;

 - Z_BUF_ERROR, if no progress is possible, because we either ran out of
   input (due to corrupt pack), or we ran out of output before we saw the
   end of the stream.

The fix b3118bd (sha1_file: Fix infinite loop when pack is corrupted,
2009-10-14) attempted was against a corruption that appears to be a valid
stream that produces a result larger than the output buffer, but we are
not even trying to read the stream to the end in this loop.  If avail_out
becomes zero, total_out will be the same as sizeof(delta_head) so the loop
will terminate without the "fix".  There is no fix from b3118bd needed for
this loop, in other words.

The loop in unpack_compressed_entry() is quite a different story.  It
feeds a deflated stream (either delta or base) and allows the stream to
produce output up to what we expect but no more.

    do {
        in = use_pack();
        stream.next_in = in;
        st = git_inflate(&stream, Z_FINISH);
        curpos += stream.next_in - in;
    } while (st == Z_OK || st == Z_BUF_ERROR)

This _does_ risk falling into an endless interation, as we can exhaust
avail_out if the length we expect is smaller than what the stream wants to
produce (due to pack corruption).  In such a case, avail_out will become
zero and inflate() will return Z_BUF_ERROR, while avail_in may (or may
not) be zero.

But this is not a right fix:

    do {
        in = use_pack();
        stream.next_in = in;
        st = git_inflate(&stream, Z_FINISH);
+       if (st == Z_BUF_ERROR && (stream.avail_in || !stream.avail_out)
+               break; /* wants more input??? */
        curpos += stream.next_in - in;
    } while (st == Z_OK || st == Z_BUF_ERROR)

as Z_BUF_ERROR from inflate() may be telling us that avail_in has also run
out before reading the end of stream marker.  In such a case, both avail_in
and avail_out would be zero, and the loop should iterate to allow the end
of stream marker to be seen by inflate from the input stream.

The right fix for this loop is likely to be to increment the initial
avail_out by one (we allocate one extra byte to terminate it with NUL
anyway, so there is no risk to overrun the buffer), and break out if we
see that avail_out has become zero, in order to detect that the stream
wants to produce more than what we expect.  After the loop, we have a
check that exactly tests this condition:

    if ((st != Z_STREAM_END) || stream.total_out != size) {
        free(buffer);
        return NULL;
    }

So here is a patch (without my previous botched attempts) to fix this
issue.  The first hunk reverts the corresponding hunk from b3118bd, and
the second hunk is the same fix proposed earlier.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-21 23:19:47 -07:00
Shawn O. Pearce
b3118bdc91 sha1_file: Fix infinite loop when pack is corrupted
Some types of corruption to a pack may confuse the deflate stream
which stores an object.  In Andy's reported case a 36 byte region
of the pack was overwritten, leading to what appeared to be a valid
deflate stream that was trying to produce a result larger than our
allocated output buffer could accept.

Z_BUF_ERROR is returned from inflate() if either the input buffer
needs more input bytes, or the output buffer has run out of space.
Previously we only considered the former case, as it meant we needed
to move the stream's input buffer to the next window in the pack.

We now abort the loop if inflate() returns Z_BUF_ERROR without
consuming the entire input buffer it was given, or has filled
the entire output buffer but has not yet returned Z_STREAM_END.
Either state is a clear indicator that this loop is not working
as expected, and should not continue.

This problem cannot occur with loose objects as we open the entire
loose object as a single buffer and treat Z_BUF_ERROR as an error.

Reported-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-14 13:39:37 -07:00
Junio C Hamano
f00ecbe42b Merge branch 'cc/replace'
* cc/replace:
  t6050: check pushing something based on a replaced commit
  Documentation: add documentation for "git replace"
  Add git-replace to .gitignore
  builtin-replace: use "usage_msg_opt" to give better error messages
  parse-options: add new function "usage_msg_opt"
  builtin-replace: teach "git replace" to actually replace
  Add new "git replace" command
  environment: add global variable to disable replacement
  mktag: call "check_sha1_signature" with the replacement sha1
  replace_object: add a test case
  object: call "check_sha1_signature" with the replacement sha1
  sha1_file: add a "read_sha1_file_repl" function
  replace_object: add mechanism to replace objects found in "refs/replace/"
  refs: add a "for_each_replace_ref" function
2009-08-21 18:47:53 -07:00
Pierre Habouzit
f630cfda88 refactor: use bitsizeof() instead of 8 * sizeof()
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-22 21:57:41 -07:00
Junio C Hamano
dd787c19c4 Merge branch 'tr/die_errno'
* tr/die_errno:
  Use die_errno() instead of die() when checking syscalls
  Convert existing die(..., strerror(errno)) to die_errno()
  die_errno(): double % in strerror() output just in case
  Introduce die_errno() that appends strerror(errno) to die()
2009-07-06 09:39:46 -07:00
Thomas Rast
d824cbba02 Convert existing die(..., strerror(errno)) to die_errno()
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Linus Torvalds
48fb7deb5b Fix big left-shifts of unsigned char
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).

This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.

One example of this would be something like

	unsigned long size;
	unsigned char c;

	size += c << 24;

where despite all the variables being unsigned, 'c << 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.

Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.

I may have missed some, but this is the result of looking at

	git grep '[^0-9 	][ 	]*<<[ 	][a-z]' -- '*.c' '*.h'
	git grep '<<[   ]*24'

which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).

I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).

In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "& 0x7fUL" in one place (getting the sign extension
right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "& 0x7f".

I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-18 09:22:46 -07:00
Christian Couder
f5552aee39 sha1_file: add a "read_sha1_file_repl" function
This new function will replace "read_sha1_file". This latter function
becoming just a stub to call the former will a NULL "replacement"
argument.

This new function is needed because sometimes we need to use the
replacement sha1.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-31 17:02:59 -07:00
Christian Couder
6809557029 replace_object: add mechanism to replace objects found in "refs/replace/"
The code implementing this mechanism has been copied more-or-less
from the commit graft code.

This mechanism is used in "read_sha1_file". sha1 passed to this
function that match a ref name in "refs/replace/" are replaced by
the sha1 that has been read in the ref.

We "die" if the replacement recursion depth is too high or if we
can't read the replacement object.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-31 17:02:59 -07:00
Junio C Hamano
2c5942dbae Merge branch 'ar/unlink-err' into maint
* ar/unlink-err:
  print unlink(2) errno in copy_or_link_directory
  replace direct calls to unlink(2) with unlink_or_warn
  Introduce an unlink(2) wrapper which gives warning if unlink failed
2009-05-25 19:01:50 -07:00
Junio C Hamano
065b0702f7 Merge branch 'maint'
* maint:
  grep: fix word-regexp colouring
  completion: use git rev-parse to detect bare repos
  Cope better with a _lot_ of packs
  for-each-ref: fix segfault in copy_email
2009-05-20 18:59:09 -07:00
Johannes Schindelin
fd73ccf279 Cope better with a _lot_ of packs
You might end up with a situation where you have tons of pack files, e.g.
when using hg2git.  In this situation, all kinds of operations may
end up with a "too many files open" error.  Let's recover gracefully from
that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Looks-right-to-me-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-20 18:23:06 -07:00
Junio C Hamano
36587681b4 Merge branch 'ar/unlink-err'
* ar/unlink-err:
  print unlink(2) errno in copy_or_link_directory
  replace direct calls to unlink(2) with unlink_or_warn
  Introduce an unlink(2) wrapper which gives warning if unlink failed
2009-05-18 09:01:06 -07:00
Felipe Contreras
4b25d091ba Fix a bunch of pointer declarations (codestyle)
Essentially; s/type* /type */ as per the coding guidelines.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-01 15:17:31 -07:00
Alex Riesen
691f1a28bf replace direct calls to unlink(2) with unlink_or_warn
This helps to notice when something's going wrong, especially on
systems which lock open files.

I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
  called from such a function
- it is in a static function, returning void and the function is only
  called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-29 18:37:41 -07:00
Johannes Schindelin
348df16679 Rename core.unreliableHardlinks to core.createObject
"Unreliable hardlinks" is a misleading description for what is happening.
So rename it to something less misleading.

Suggested by Linus Torvalds.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-29 16:50:07 -07:00
Johannes Schindelin
be66a6c43d Add an option not to use link(src, dest) && unlink(src) when that is unreliable
It seems that accessing NTFS partitions with ufsd (at least on my EeePC)
has an unnerving bug: if you link() a file and unlink() it right away,
the target of the link() will have the correct size, but consist of NULs.

It seems as if the calls are simply not serialized correctly, as single-stepping
through the function move_temp_to_file() works flawlessly.

As ufsd is "Commertial software" (sic!), I cannot fix it, and have to work
around it in Git.

At the same time, it seems that this fixes msysGit issues 222 and 229 to
assume that Windows cannot handle link() && unlink().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-25 09:49:21 -07:00
Junio C Hamano
03a39a9184 Merge branch 'jc/shared-literally'
* jc/shared-literally:
  t1301: loosen test for forced modes
  set_shared_perm(): sometimes we know what the final mode bits should look like
  move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath
  Move chmod(foo, 0444) into move_temp_to_file()
  "core.sharedrepository = 0mode" should set, not loosen
2009-04-06 00:42:52 -07:00
Junio C Hamano
3c91bf6805 Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  pack-objects: don't loosen objects available in alternate or kept packs
  t7700: demonstrate repack flaw which may loosen objects unnecessarily
  Remove --kept-pack-only option and associated infrastructure
  pack-objects: only repack or loosen objects residing in "local" packs
  git-repack.sh: don't use --kept-pack-only option to pack-objects
  t7700-repack: add two new tests demonstrating repacking flaws

Conflicts:
	t/t7700-repack.sh
2009-04-01 22:34:19 -07:00
Junio C Hamano
17e61b8288 set_shared_perm(): sometimes we know what the final mode bits should look like
adjust_shared_perm() first obtains the mode bits from lstat(2), expecting
to find what the result of applying user's umask is, and then tweaks it
as necessary.  When the file to be adjusted is created with mkstemp(3),
however, the mode thusly obtained does not have anything to do with user's
umask, and we would need to start from 0444 in such a case and there is no
point running lstat(2) for such a path.

This introduces a new API set_shared_perm() to bypass the lstat(2) and
instead force setting the mode bits to the desired value directly.
adjust_shared_perm() becomes a thin wrapper to the function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-28 08:02:15 -07:00
Junio C Hamano
3be1f18e1b move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath
Now move_temp_to_file() is responsible for doing everything that is
necessary to turn a tempfile in $GIT_DIR into its final form, it must make
sure "Coda hack" codepath correctly makes the file read-only.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-28 08:01:21 -07:00
Johan Herland
fb8b193670 Move chmod(foo, 0444) into move_temp_to_file()
When writing out a loose object or a pack (index), move_temp_to_file() is
called to finalize the resulting file. These files (loose files and packs)
should all have permission mode 0444 (modulo adjust_shared_perm()).
Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite
(or even forgetting to chmod() at all), do the chmod() call from within
move_temp_to_file().

Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-27 22:10:58 -07:00
Junio C Hamano
5a688fe470 "core.sharedrepository = 0mode" should set, not loosen
This fixes the behaviour of octal notation to how it is defined in the
documentation, while keeping the traditional "loosen only" semantics
intact for "group" and "everybody".

Three main points of this patch are:

 - For an explicit octal notation, the internal shared_repository variable
   is set to a negative value, so that we can tell "group" (which is to
   "OR" in 0660) and 0660 (which is to "SET" to 0660);

 - git-init did not set shared_repository variable early enough to affect
   the initial creation of many files, notably copied templates and the
   configuration.  We set it very early when a command-line option
   specifies a custom value.

 - Many codepaths create files inside $GIT_DIR by various ways that all
   involve mkstemp(), and then call move_temp_to_file() to rename it to
   its final destination.  We can add adjust_shared_perm() call here; for
   the traditional "loosen-only", this would be a no-op for many codepaths
   because the mode is already loose enough, but with the new behaviour it
   makes a difference.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-27 21:51:04 -07:00
Junio C Hamano
89fbda2425 Merge branch 'maint'
* maint:
  Increase the size of the die/warning buffer to avoid truncation
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 19:45:57 -07:00
Junio C Hamano
b0de555410 Merge branch 'maint-1.6.1' into maint
* maint-1.6.1:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:21 -07:00
Junio C Hamano
2a5643da73 Merge branch 'maint-1.6.0' into maint-1.6.1
* maint-1.6.0:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:15 -07:00
Linus Torvalds
e8bd78c3fc close_sha1_file(): make it easier to diagnose errors
A bug report with "unable to write sha1 file" made us realize that we do
not have enough information to guess why close() is failing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-24 14:39:20 -07:00
Brandon Casey
4d6acb7041 Remove --kept-pack-only option and associated infrastructure
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack.  It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-20 13:32:33 -07:00
Junio C Hamano
aec813062b Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  is_kept_pack(): final clean-up
  Simplify is_kept_pack()
  Consolidate ignore_packed logic more
  has_sha1_kept_pack(): take "struct rev_info"
  has_sha1_pack(): refactor "pretend these packs do not exist" interface
  git-repack: resist stray environment variable
2009-03-11 13:49:56 -07:00
Junio C Hamano
69e020ae00 is_kept_pack(): final clean-up
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.

Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.

This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
03a9683d22 Simplify is_kept_pack()
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.

The new --kept-pack-only option means just that.  When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
386cb77210 Consolidate ignore_packed logic more
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack().  The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
b8431b033f has_sha1_kept_pack(): take "struct rev_info"
Its "ignore_packed" parameter always comes from struct rev_info.  This
patch makes the function take a pointer to the surrounding structure, so
that the refactoring in the next patch becomes easier to review.

There is an unfortunate header file dependency and the easiest workaround
is to temporarily move the function declaration from cache.h to
revision.h; this will be moved back to cache.h once the function loses
this "ignore_packed" parameter altogether in the later part of the
series.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
cd673c1f17 has_sha1_pack(): refactor "pretend these packs do not exist" interface
Most of the callers of this function except only one pass NULL to its last
parameter, ignore_packed.

Introduce has_sha1_kept_pack() function that has the function signature
and the semantics of this function, and convert the sole caller that does
not pass NULL to call this new function.

All other callers and has_sha1_pack() lose the ignore_packed parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Felipe Contreras
a9d98a148d sha1_file.c: fix typo
it's != its

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-25 00:49:54 -08:00
Junio C Hamano
30aa4fb15f Merge branch 'maint'
* maint:
  Prepare for 1.6.1.4.
  Make repack less likely to corrupt repository
  fast-export: ensure we traverse commits in topological order
  Clear the delta base cache if a pack is rebuilt

Conflicts:
	RelNotes
2009-02-11 18:47:30 -08:00
Junio C Hamano
7a134dbbc9 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  Make repack less likely to corrupt repository
  fast-export: ensure we traverse commits in topological order
  Clear the delta base cache if a pack is rebuilt
2009-02-11 18:32:37 -08:00
Shawn O. Pearce
fa3a0c94dc Clear the delta base cache if a pack is rebuilt
There is some risk that re-opening a regenerated pack file with
different offsets could leave stale entries within the delta base
cache that could be matched up against other objects using the same
"struct packed_git*" and pack offset.

Throwing away the entire delta base cache in this case is safer,
as we don't have to worry about a recycled "struct packed_git*"
matching to the wrong base object, resulting in delta apply
errors while unpacking an object.

Suggested-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-11 10:25:24 -08:00
Junio C Hamano
fd8475d9fb Merge branch 'maint'
* maint:
  Clear the delta base cache during fast-import checkpoint
2009-02-10 21:30:45 -08:00
Junio C Hamano
9b27ea9518 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  Clear the delta base cache during fast-import checkpoint
2009-02-10 15:32:26 -08:00
Shawn O. Pearce
3d20c636af Clear the delta base cache during fast-import checkpoint
Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.

Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10 15:30:59 -08:00
Junio C Hamano
141b6b83d7 Merge branch 'lt/maint-wrap-zlib' into maint
* lt/maint-wrap-zlib:
  Wrap inflate and other zlib routines for better error reporting

Conflicts:
	http-push.c
	http-walker.c
	sha1_file.c
2009-02-05 18:01:00 -08:00
Junio C Hamano
8c95d3c31b Sync with 1.6.1.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-29 00:32:52 -08:00
Junio C Hamano
8561b522d7 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  avoid 31-bit truncation in write_loose_object
2009-01-28 23:41:28 -08:00
Jeff King
915308b187 avoid 31-bit truncation in write_loose_object
The size of the content we are adding may be larger than
2.1G (i.e., "git add gigantic-file"). Most of the code-path
to do so uses size_t or unsigned long to record the size,
but write_loose_object uses a signed int.

On platforms where "int" is 32-bits (which includes x86_64
Linux platforms), we end up passing malloc a negative size.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-28 23:40:53 -08:00
Junio C Hamano
36dd939393 Merge branch 'lt/maint-wrap-zlib'
* lt/maint-wrap-zlib:
  Wrap inflate and other zlib routines for better error reporting

Conflicts:
	http-push.c
	http-walker.c
	sha1_file.c
2009-01-21 16:55:17 -08:00
Christian Couder
c2c5b27051 sha1_file: make "read_object" static
This function is only used from "sha1_file.c".

And as we want to add a "replace_object" hook in "read_sha1_file",
we must not let people bypass the hook using something other than
"read_sha1_file".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-13 00:14:55 -08:00
Linus Torvalds
39c68542fc Wrap inflate and other zlib routines for better error reporting
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.

This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting.  Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-11 02:13:06 -08:00
Linus Torvalds
b760d3aa74 Make 'index_path()' use 'strbuf_readlink()'
This makes us able to properly index symlinks even on filesystems where
st_size doesn't match the true size of the link.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-17 13:36:34 -08:00
Junio C Hamano
de0db42278 Merge branch 'maint'
* maint:
  fsck: reduce stack footprint
  make sure packs to be replaced are closed beforehand
2008-12-11 00:36:31 -08:00
Nicolas Pitre
c74faea19e make sure packs to be replaced are closed beforehand
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.

This should fix t5303 on Windows.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-10 17:56:05 -08:00
Junio C Hamano
0fd9d7e66d Merge branch 'bc/maint-keep-pack' into maint
* bc/maint-keep-pack:
  repack: only unpack-unreachable if we are deleting redundant packs
  t7700: test that 'repack -a' packs alternate packed objects
  pack-objects: extend --local to mean ignore non-local loose objects too
  sha1_file.c: split has_loose_object() into local and non-local counterparts
  t7700: demonstrate mishandling of loose objects in an alternate ODB
  builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
  repack: do not fall back to incremental repacking with [-a|-A]
  repack: don't repack local objects in packs with .keep file
  pack-objects: new option --honor-pack-keep
  packed_git: convert pack_local flag into a bitfield and add pack_keep
  t7700: demonstrate mishandling of objects in packs with a .keep file
2008-12-02 23:00:04 -08:00
Junio C Hamano
455d0f5c23 Merge branch 'maint'
* maint:
  sha1_file.c: resolve confusion EACCES vs EPERM
  sha1_file: avoid bogus "file exists" error message
  git checkout: don't warn about unborn branch if -f is already passed
  bash: offer refs instead of filenames for 'git revert'
  bash: remove dashed command leftovers
  git-p4: fix keyword-expansion regex
  fast-export: use an unsorted string list for extra_refs
  Add new testcase to show fast-export does not always exports all tags
2008-11-27 19:23:51 -08:00
Sam Vilain
35243577ab sha1_file.c: resolve confusion EACCES vs EPERM
An earlier commit 916d081 (Nicer error messages in case saving an object
to db goes wrong, 2006-11-09) confused EACCES with EPERM, the latter of
which is an unlikely error from mkstemp().

Signed-off-by: Sam Vilain <sam@vilain.net>
2008-11-27 19:11:21 -08:00
Joey Hess
65117abc04 sha1_file: avoid bogus "file exists" error message
This avoids the following misleading error message:

error: unable to create temporary sha1 filename ./objects/15: File exists

mkstemp can fail for many reasons, one of which, ENOENT, can occur if
the directory for the temp file doesn't exist. create_tmpfile tried to
handle this case by always trying to mkdir the directory, even if it
already existed. This caused errno to be clobbered, so one cannot tell
why mkstemp really failed, and it truncated the buffer to just the
directory name, resulting in the strange error message shown above.

Note that in both occasions that I've seen this failure, it has not been
due to a missing directory, or bad permissions, but some other, unknown
mkstemp failure mode that did not occur when I ran git again. This code
could perhaps be made more robust by retrying mkstemp, in case it was a
transient failure.

Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-27 18:48:53 -08:00
Joey Hess
cbacbf4e55 sha1_file: avoid bogus "file exists" error message
This avoids the following misleading error message:

error: unable to create temporary sha1 filename ./objects/15: File exists

mkstemp can fail for many reasons, one of which, ENOENT, can occur if
the directory for the temp file doesn't exist. create_tmpfile tried to
handle this case by always trying to mkdir the directory, even if it
already existed. This caused errno to be clobbered, so one cannot tell
why mkstemp really failed, and it truncated the buffer to just the
directory name, resulting in the strange error message shown above.

Note that in both occasions that I've seen this failure, it has not been
due to a missing directory, or bad permissions, but some other, unknown
mkstemp failure mode that did not occur when I ran git again. This code
could perhaps be made more robust by retrying mkstemp, in case it was a
transient failure.

Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-23 19:44:19 -08:00
Alex Riesen
f755bb996b Fix handle leak in sha1_file/unpack_objects if there were damaged object data
In the case of bad packed object CRC, unuse_pack wasn't called after
check_pack_crc which calls use_pack.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-23 19:31:05 -08:00
Junio C Hamano
47a792539a Merge branch 'jk/commit-v-strip'
* jk/commit-v-strip:
  status: show "-v" diff even for initial commit
  wt-status: refactor initial commit printing
  define empty tree sha1 as a macro
2008-11-16 00:48:59 -08:00
Junio C Hamano
7b51b77dbc Merge branch 'np/pack-safer'
* np/pack-safer:
  t5303: fix printf format string for portability
  t5303: work around printf breakage in dash
  pack-objects: don't leak pack window reference when splitting packs
  extend test coverage for latest pack corruption resilience improvements
  pack-objects: allow "fixing" a corrupted pack without a full repack
  make find_pack_revindex() aware of the nasty world
  make check_object() resilient to pack corruptions
  make packed_object_info() resilient to pack corruptions
  make unpack_object_header() non fatal
  better validation on delta base object offsets
  close another possibility for propagating pack corruption
2008-11-12 22:26:35 -08:00
Junio C Hamano
ecbbfb15a4 Merge branch 'bc/maint-keep-pack'
* bc/maint-keep-pack:
  t7700: test that 'repack -a' packs alternate packed objects
  pack-objects: extend --local to mean ignore non-local loose objects too
  sha1_file.c: split has_loose_object() into local and non-local counterparts
  t7700: demonstrate mishandling of loose objects in an alternate ODB
  builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
  repack: do not fall back to incremental repacking with [-a|-A]
  repack: don't repack local objects in packs with .keep file
  pack-objects: new option --honor-pack-keep
  packed_git: convert pack_local flag into a bitfield and add pack_keep
  t7700: demonstrate mishandling of objects in packs with a .keep file
2008-11-12 22:00:43 -08:00
Jeff King
14d9c57896 define empty tree sha1 as a macro
This can potentially be used in a few places, so let's make
it available to all parts of the code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 12:52:21 -08:00
Brandon Casey
0f4dc14ac4 sha1_file.c: split has_loose_object() into local and non-local counterparts
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 10:29:22 -08:00
Brandon Casey
8d25931d6f packed_git: convert pack_local flag into a bitfield and add pack_keep
pack_keep will be set when a pack file has an associated .keep file.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 10:28:08 -08:00
Nicolas Pitre
08698b1e32 make find_pack_revindex() aware of the nasty world
It currently calls die() whenever given offset is not found thinking
that such thing should never happen.  But this offset may come from a
corrupted pack whych _could_ happen and not be found.  Callers should
deal with this possibility gracefully instead.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:35 -08:00
Nicolas Pitre
3d77d8774f make packed_object_info() resilient to pack corruptions
In the same spirit as commit 8eca0b47ff, let's try to survive a pack
corruption by making packed_object_info() able to fall back to alternate
packs or loose objects.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:35 -08:00
Nicolas Pitre
09ded04b7e make unpack_object_header() non fatal
It is possible to have pack corruption in the object header.  Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.

So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context.  All
callers of unpack_object_header() are ready for it.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:34 -08:00
Nicolas Pitre
d8f325563d better validation on delta base object offsets
In one case, it was possible to have a bad offset equal to 0 effectively
pointing a delta onto itself and crashing git after too many recursions.
In the other cases, a negative offset could result due to off_t being
signed.  Catch those.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:34 -08:00
Nicolas Pitre
0e8189e270 close another possibility for propagating pack corruption
Abstract
--------

With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking.  This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.

The Context
-----------

The Git database is all checksummed with SHA1 hashes.  Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data.  However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.

First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data.  Any
disk corruption would be caught already in this case.

Then, packed objects are also compressed with zlib but only for their
actual payload.  The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions.  Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.

Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case,  the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero.  In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms.  Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.

Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.

The Solution
------------

As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either.  Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.

So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result.  The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.

Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed.  Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.

What about the streamed pack case?  Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already.  It is therefore worthless doing local validation
in that case.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:15 -08:00
Junio C Hamano
581000a419 Merge branch 'jc/maint-co-track' into maint
* jc/maint-co-track:
  Enhance hold_lock_file_for_{update,append}() API
  demonstrate breakage of detached checkout with symbolic link HEAD
  Fix "checkout --track -b newbranch" on detached HEAD
2008-11-02 13:36:14 -08:00
Junio C Hamano
a157400c97 Merge branch 'jc/maint-co-track'
* jc/maint-co-track:
  Enhance hold_lock_file_for_{update,append}() API
  demonstrate breakage of detached checkout with symbolic link HEAD
  Fix "checkout --track -b newbranch" on detached HEAD

Conflicts:
	builtin-commit.c
2008-10-21 17:58:11 -07:00
Junio C Hamano
acd3b9eca8 Enhance hold_lock_file_for_{update,append}() API
This changes the "die_on_error" boolean parameter to a mere "flags", and
changes the existing callers of hold_lock_file_for_update/append()
functions to pass LOCK_DIE_ON_ERROR.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-19 12:35:37 -07:00
Junio C Hamano
58e0fa5416 Merge branch 'maint'
* maint:
  Hopefully the final draft release notes update before 1.6.0.3
  diff(1): clarify what "T"ypechange status means
  contrib: update packinfo.pl to not use dashed commands
  force_object_loose: Fix memory leak
  tests: shell negation portability fix
2008-10-18 08:26:44 -07:00
Björn Steinbrink
1fb23e6550 force_object_loose: Fix memory leak
read_packed_sha1 expectes its caller to free the buffer it returns, which
force_object_loose didn't do.

This leak is eventually triggered by "git gc", when it is manually invoked
or there are too many packs around, making gc totally unusable when there
are lots of unreachable objects.

Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-18 06:19:06 -07:00
Brandon Casey
f285a2d7ed Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer
Many call sites use strbuf_init(&foo, 0) to initialize local
strbuf variable "foo" which has not been accessed since its
declaration. These can be replaced with a static initialization
using the STRBUF_INIT macro which is just as readable, saves a
function call, and takes up fewer lines.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-12 12:36:19 -07:00
Miklos Vajna
749bc58c5e Cleanup in sha1_file.c::cache_or_unpack_entry()
This patch just removes an unnecessary goto which makes the code easier
to read and shorter.

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-09 08:55:42 -07:00
Nicolas Pitre
9126f0091f fix openssl headers conflicting with custom SHA1 implementations
On ARM I have the following compilation errors:

    CC fast-import.o
In file included from cache.h:8,
                 from builtin.h:6,
                 from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1

This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version.  Compilation of git is probably broken on PPC too for the
same reason.

Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c.  But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.

As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-02 18:06:56 -07:00
Shawn O. Pearce
1ad6d46235 Merge branch 'jc/alternate-push'
* jc/alternate-push:
  push: receiver end advertises refs from alternate repositories
  push: prepare sender to receive extended ref information from the receiver
  receive-pack: make it a builtin
  is_directory(): a generic helper function
2008-09-25 09:39:24 -07:00
Shawn O. Pearce
58245a5e40 Merge branch 'jc/safe-c-l-d'
* jc/safe-c-l-d:
  safe_create_leading_directories(): make it about "leading" directories
2008-09-25 08:50:01 -07:00
Junio C Hamano
3791f77c28 Merge branch 'maint'
* maint:
  sha1_file: link() returns -1 on failure, not errno
  Make git archive respect core.autocrlf when creating zip format archives
  Add new test to demonstrate git archive core.autocrlf inconsistency
  gitweb: avoid warnings for commits without body
  Clarified gitattributes documentation regarding custom hunk header.
  git-svn: fix handling of even funkier branch names
  git-svn: Always create a new RA when calling do_switch for svn://
  git-svn: factor out svnserve test code for later use
  diff/diff-files: do not use --cc too aggressively
2008-09-18 20:30:12 -07:00
Thomas Rast
e32c0a9c38 sha1_file: link() returns -1 on failure, not errno
5723fe7 (Avoid cross-directory renames and linking on object creation,
2008-06-14) changed the call to use link() directly instead of through a
custom wrapper, but forgot that it returns 0 or -1, not 0 or errno.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-18 19:51:13 -07:00
Junio C Hamano
d79796bcf0 push: receiver end advertises refs from alternate repositories
Earlier, when pushing into a repository that borrows from alternate object
stores, we followed the longstanding design decision not to trust refs in
the alternate repository that houses the object store we are borrowing
from.  If your public repository is borrowing from Linus's public
repository, you pushed into it long time ago, and now when you try to push
your updated history that is in sync with more recent history from Linus,
you will end up sending not just your own development, but also the
changes you acquired through Linus's tree, even though the objects needed
for the latter already exists at the receiving end.  This is because the
receiving end does not advertise that the objects only reachable from the
borrowed repository (i.e. Linus's) are already available there.

This solves the issue by making the receiving end advertise refs from
borrowed repositories.  They are not sent with their true names but with a
phoney name ".have" to make sure that the old senders will safely ignore
them (otherwise, the old senders will misbehave, trying to push matching
refs, and mirror push that deletes refs that only exist at the receiving
end).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-09 09:27:46 -07:00
Junio C Hamano
90b4a71c49 is_directory(): a generic helper function
A simple "grep -e stat --and -e S_ISDIR" revealed there are many
open-coded implementations of this function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-09 09:27:45 -07:00
Junio C Hamano
5f0bdf50c2 safe_create_leading_directories(): make it about "leading" directories
We used to allow callers to pass "foo/bar/" to make sure both "foo" and
"foo/bar" exist and have good permissions, but this interface is too error
prone.  If a caller mistakenly passes a path with trailing slashes
(perhaps it forgot to verify the user input) even when it wants to later
mkdir "bar" itself, it will find that it cannot mkdir "bar".  If such a
caller does not bother to check the error for EEXIST, it may even
errorneously die().

Because we have no existing callers to use that obscure feature, this
patch removes it to avoid confusion.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-03 22:35:32 -07:00
Junio C Hamano
5a1e8707a6 Merge branch 'np/verify-pack'
* np/verify-pack:
  discard revindex data when pack list changes
2008-08-27 16:39:46 -07:00
Nicolas Pitre
4b480c6716 discard revindex data when pack list changes
This is needed to fix verify-pack -v with multiple pack arguments.

Also, in theory, revindex data (if any) must be discarded whenever
reprepare_packed_git() is called. In practice this is hard to trigger
though.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-22 22:00:22 -07:00
Junio C Hamano
d8eec50468 Merge branch 'dp/hash-literally'
* dp/hash-literally:
  add --no-filters option to git hash-object
  add --path option to git hash-object
  use parse_options() in git hash-object
  correct usage help string for git-hash-object
  correct argument checking test for git hash-object
  teach index_fd to work with pipes
2008-08-19 21:43:25 -07:00
Steven Grimm
ddd63e64e4 Optimize sha1_object_info for loose objects, not concurrent repacks
When dealing with a repository with lots of loose objects, sha1_object_info
would rescan the packs directory every time an unpacked object was referenced
before finally giving up and looking for the loose object. This caused a lot
of extra unnecessary system calls during git pack-objects; the code was
rereading the entire pack directory once for each loose object file.

This patch looks for a loose object before falling back to rescanning the
pack directory, rather than the other way around.

Signed-off-by: Steven Grimm <koreth@midwinter.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-05 21:21:20 -07:00
Dmitry Potapov
43df4f86e0 teach index_fd to work with pipes
index_fd can now work with file descriptors that are not normal files
but any readable file. If the given file descriptor is a regular file
then mmap() is used; for other files, strbuf_read is used.

The path parameter, which has been used as hint for filters, can be
NULL now to indicate that the file should be hashed literally without
any filter.

The index_pipe function is removed as redundant.

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-03 13:14:35 -07:00
Nicolas Pitre
ac9391093f restore legacy behavior for read_sha1_file()
Since commit 8eca0b47ff, it is possible
for read_sha1_file() to return NULL even with existing objects when they
are corrupted.  Previously a corrupted object would have terminated the
program immediately, effectively making read_sha1_file() return NULL
only when specified object is not found.

Let's restore this behavior for all users of read_sha1_file() and
provide a separate function with the ability to not terminate when
bad objects are encountered.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-14 23:35:32 -07:00
Junio C Hamano
948e7471e0 Merge branch 'sp/maint-pack-memuse'
* sp/maint-pack-memuse:
  Correct pack memory leak causing git gc to try to exceed ulimit

Conflicts:

	sha1_file.c
2008-07-09 14:46:46 -07:00
Shawn O. Pearce
eac12e2d4d Correct pack memory leak causing git gc to try to exceed ulimit
When recursing to unpack a delta base we must unuse_pack() so that
the pack window for the current object does not remain pinned in
memory while the delta base is itself being unpacked and materialized
for our use.

On a long delta chain of 50 objects we may need to access 6 different
windows from a very large (>3G) pack file in order to obtain all
of the delta base content.  If the process ulimit permits us to
map/allocate only 1.5G we must release windows during this recursion
to ensure we stay within the ulimit and transition memory from pack
cache to standard malloc, or other mmap needs.

Inserting an unuse_pack() call prior to the recursion allows us to
avoid pinning the current window, making it available for garbage
collection if memory runs low.

This has been broken since at least before 1.5.1-rc1, and very
likely earlier than that.  Its fixed now.  :)

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-09 14:45:42 -07:00
Ramsay Jones
6e1c23442a Fix some warnings (on cygwin) to allow -Werror
When printing valuds of type uint32_t, we should use PRIu32, and should
not assume that it is unsigned int.  On 32-bit platforms, it could be
defined as unsigned long. The same caution applies to ntohl().

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05 17:26:29 -07:00
Junio C Hamano
bb1ab2db08 Merge branch 'j6t/mingw'
* j6t/mingw: (38 commits)
  compat/pread.c: Add a forward declaration to fix a warning
  Windows: Fix ntohl() related warnings about printf formatting
  Windows: TMP and TEMP environment variables specify a temporary directory.
  Windows: Make 'git help -a' work.
  Windows: Work around an oddity when a pipe with no reader is written to.
  Windows: Make the pager work.
  When installing, be prepared that template_dir may be relative.
  Windows: Use a relative default template_dir and ETC_GITCONFIG
  Windows: Compute the fallback for exec_path from the program invocation.
  Turn builtin_exec_path into a function.
  Windows: Use a customized struct stat that also has the st_blocks member.
  Windows: Add a custom implementation for utime().
  Windows: Add a new lstat and fstat implementation based on Win32 API.
  Windows: Implement a custom spawnve().
  Windows: Implement wrappers for gethostbyname(), socket(), and connect().
  Windows: Work around incompatible sort and find.
  Windows: Implement asynchronous functions as threads.
  Windows: Disambiguate DOS style paths from SSH URLs.
  Windows: A rudimentary poll() emulation.
  Windows: Implement start_command().
  ...
2008-07-02 21:57:52 -07:00
Junio C Hamano
abf7e0df17 Merge branch 'lt/config-fsync'
* lt/config-fsync:
  Add config option to enable 'fsync()' of object files
  Split up default "i18n" and "branch" config parsing into helper routines
  Split up default "user" config parsing into helper routine
  Split up default "core" config parsing into helper routine
2008-06-25 13:19:49 -07:00
Jeff King
2beebd22f4 clone: create intermediate directories of destination repo
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.

We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:

  1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
     creates bar, but this function just creates the leading
     directories.

  2. mkdir_p took a mode argument, but it was completely
     ignored.

Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-25 11:44:15 -07:00
Nicolas Pitre
99093238bb optimize verify-pack a bit
Using find_pack_entry_one() to get object offsets is rather suboptimal
when nth_packed_object_offset() can be used directly.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Jeff King
8e21d63b02 clone: create intermediate directories of destination repo
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.

We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:

  1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
     creates bar, but this function just creates the leading
     directories.

  2. mkdir_p took a mode argument, but it was completely
     ignored.

Acked-by: Daniel Barkalow <barkalow@iabervon.org>

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:23:21 -07:00
Nicolas Pitre
27d69a465d refactor pack structure allocation
New pack structures are currently allocated in 2 different places
and all members have to be initialized explicitly.  This is prone
to errors leading to segmentation faults as found by Teemu Likonen.

Let's have a common place where this structure is allocated, and have
all members explicitly initialized to zero.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 17:03:44 -07:00
Nicolas Pitre
8eca0b47ff implement some resilience against pack corruptions
We should be able to fall back to loose objects or alternative packs when
a pack becomes corrupted.  This is especially true when an object exists
in one pack only as a delta but its base object is corrupted.  Currently
there is no way to retrieve the former object even if the later is
available in another pack or loose.

This patch allows for a delta to be resolved (with a performance cost)
using a base object from a source other than the pack where that delta
is located.  Same thing for non-delta objects: rather than failing
outright, a search is made in other packs or used loose when the
currently active pack has it but corrupted.

Of course git will become extremely noisy with error messages when that
happens.  However, if the operation succeeds nevertheless, a simple
'git repack -a -f -d' will "fix" the corrupted repository given that all
corrupted objects have a good duplicate somewhere in the object store,
possibly manually copied from another source.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-23 21:29:33 -07:00
Patrick Higgins
6ff6af62ec Workaround for AIX mkstemp()
The AIX mkstemp will modify it's template parameter to an empty string if
the call fails. This caused a subsequent mkdir to fail.

Signed-off-by: Patrick Higgins <patrick.higgins@cexp.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-23 16:13:38 -07:00
Johannes Sixt
8385abfda5 Windows: Handle absolute paths in safe_create_leading_directories().
In this function we must be careful to handle drive-local paths else there
is a danger that it runs into an infinite loop.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
2008-06-23 13:30:27 +02:00
Johannes Sixt
80ba074f41 Windows: Use the Windows style PATH separator ';'.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
2008-06-22 11:32:45 +02:00
Linus Torvalds
aafe9fbaf4 Add config option to enable 'fsync()' of object files
As explained in the documentation[*] this is totally useless on
filesystems that do ordered/journalled data writes, but it can be a
useful safety feature on filesystems like HFS+ that only journal the
metadata, not the actual file contents.

It defaults to off, although we could presumably in theory some day
auto-enable it on a per-filesystem basis.

[*] Yes, I updated the docs for the thing.  Hell really _has_ frozen
    over, and the four horsemen are probably just beyond the horizon.
    EVERYBODY PANIC!

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-18 16:50:35 -07:00
Junio C Hamano
79c6dca413 sha1_file.c: simplify parse_pack_index()
It was implemented as a thin wrapper around an otherwise unused
helper function parse_pack_index_file().  The code becomes simpler
and easier to read by consolidating the two.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-16 22:19:00 -07:00
Junio C Hamano
3bfaf01857 create_tempfile: make sure that leading directories can be accessible by peers
In a shared repository, we should make sure adjust_shared_perm() is called
after creating the initial fan-out directories under objects/ directory.

Earlier an logico called the function only when mkdir() failed; we should
do so when mkdir() succeeded.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-16 22:02:12 -07:00
Linus Torvalds
1421c5f274 write_loose_object: don't bother trying to read an old object
Before even calling this, all callers have done a "has_sha1_file(sha1)"
or "has_loose_object(sha1)" check, so there is no point in doing a
second check.

If something races with us on object creation, we handle that in the
final link() that moves it to the right place.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-16 21:46:47 -07:00
Linus Torvalds
c529d75a75 Simplify and rename find_sha1_file()
Now that we've made the loose SHA1 file reading more careful and
streamlined, we only use the old find_sha1_file() function for checking
whether a loose object file exists at all.

As such, the whole 'return stat information' part of it was just
pointless (nobody cares any more), and the naming of the function is not
really all that relevant either.

So simplify it to not do a 'stat()', but just an existence check (which
is what the callers want), and rename it to 'has_loose_object()' which
matches the use.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14 14:39:22 -07:00
Linus Torvalds
44d1c19ee8 Make loose object file reading more careful
We used to do 'stat()+open()+mmap()+close()' to read the loose object
file data, which does work fine, but has a couple of problems:

 - it unnecessarily walks the filename twice (at 'stat()' time and then
   again to open it)

 - NFS generally has open-close consistency guarantees, which means that
   the initial 'stat()' was technically done outside of the normal
   consistency rules.

So change it to do 'open()+fstat()+mmap()+close()' instead, which avoids
both these issues.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14 14:39:22 -07:00
Linus Torvalds
5723fe7e3c Avoid cross-directory renames and linking on object creation
Instead of creating new temporary objects in the top-level git object
directory, create them in the same directory they will finally end up in
anyway.  This avoids making the final atomic "rename to stable name"
operation be a cross-directory event, which makes it a lot easier for
various filesystems.

Several filesystems do things like change the inode number when moving
files across directories (or refuse to do it entirely).

In particular, it can also cause problems for NFS implementations that
change the filehandle of a file when it moves to a different directory,
like the old user-space NFS server did, and like the Linux knfsd still
does if you don't export your filesystems with 'no_subtree_check' or if
you export a filesystem that doesn't have stable inode numbers across
renames).

This change also obviously implies creating the object fan-out
subdirectory at tempfile creation time, rather than at the final
move_temp_to_file() time.  Which actually accounts for most of the size
of the patch.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-14 14:39:22 -07:00
Junio C Hamano
6483925999 sha1_file.c: dead code removal
write_sha1_from_fd() and write_sha1_to_fd() were dead code nobody called,
neither the latter's helper repack_object() was.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-13 23:00:51 -07:00
Linus Torvalds
e9039dd351 Consolidate SHA1 object file close
This consolidates the common operations for closing the new temporary file
that we have written, before we move it into place with the final name.

There's some common code there (make it read-only and check for errors on
close), but more importantly, this also gives a single place to add an
fsync_or_die() call if we want to add a safe mode.

This was triggered due to Denis Bueno apparently twice being able to
corrupt his git repository on OS X due to an unlucky combination of kernel
crashes and a not-very-robust filesystem.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-10 22:23:18 -07:00
Junio C Hamano
6eec46bdda fix sha1_pack_index_name()
An earlier commit 633f43e (Remove redundant code, eliminate one static
variable, 2008-05-24) had a thinko (perhaps an eyeno) that broke
sha1_pack_index_name() function.  One symptom of this was that the http
walker is now completely broken.

This should fix it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-28 10:24:32 -07:00
Junio C Hamano
b84c343c88 Merge branch 'db/clone-in-c'
* db/clone-in-c:
  Add test for cloning with "--reference" repo being a subset of source repo
  Add a test for another combination of --reference
  Test that --reference actually suppresses fetching referenced objects
  clone: fall back to copying if hardlinking fails
  builtin-clone.c: Need to closedir() in copy_or_link_directory()
  builtin-clone: fix initial checkout
  Build in clone
  Provide API access to init_db()
  Add a function to set a non-default work tree
  Allow for having for_each_ref() list extra refs
  Have a constant extern refspec for "--tags"
  Add a library function to add an alternate to the alternates file
  Add a lockfile function to append to a file
  Mark the list of refs to fetch as const

Conflicts:

	cache.h
	t/t5700-clone-reference.sh
2008-05-25 13:41:37 -07:00
Heikki Orsila
633f43e1f7 Remove redundant code, eliminate one static variable
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-24 22:05:06 -07:00
Nicolas Pitre
bbac73117e add a force_object_loose() function
This is meant to force the creation of a loose object even if it
already exists packed.  Needed for the next commit.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-13 22:42:33 -07:00
Daniel Barkalow
bef70b22ba Add a library function to add an alternate to the alternates file
This is in the core so that, if the alternates file has already been
read, the addition can be parsed and put into effect for the current
process.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-04 17:41:44 -07:00
Heikki Orsila
c697ad143b Cleanup xread() loops to use read_in_full()
Signed-off-by: Heikki Orsila <heikki.orsila@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-03 22:15:25 -07:00
Junio C Hamano
628522ec14 sha1-lookup: more memory efficient search in sorted list of SHA-1
Currently, when looking for a packed object from the pack idx, a
simple binary search is used.

A conventional binary search loop looks like this:

        unsigned lo, hi;
        do {
                unsigned mi = (lo + hi) / 2;
                int cmp = "entry pointed at by mi" minus "target";
                if (!cmp)
                        return mi; "mi is the wanted one"
                if (cmp > 0)
                        hi = mi; "mi is larger than target"
                else
                        lo = mi+1; "mi is smaller than target"
        } while (lo < hi);
	"did not find what we wanted"

The invariants are:

  - When entering the loop, 'lo' points at a slot that is never
    above the target (it could be at the target), 'hi' points at
    a slot that is guaranteed to be above the target (it can
    never be at the target).

  - We find a point 'mi' between 'lo' and 'hi' ('mi' could be
    the same as 'lo', but never can be as high as 'hi'), and
    check if 'mi' hits the target.  There are three cases:

     - if it is a hit, we have found what we are looking for;

     - if it is strictly higher than the target, we set it to
       'hi', and repeat the search.

     - if it is strictly lower than the target, we update 'lo'
       to one slot after it, because we allow 'lo' to be at the
       target and 'mi' is known to be below the target.

    If the loop exits, there is no matching entry.

When choosing 'mi', we do not have to take the "middle" but
anywhere in between 'lo' and 'hi', as long as lo <= mi < hi is
satisfied.  When we somehow know that the distance between the
target and 'lo' is much shorter than the target and 'hi', we
could pick 'mi' that is much closer to 'lo' than (hi+lo)/2,
which a conventional binary search would pick.

This patch takes advantage of the fact that the SHA-1 is a good
hash function, and as long as there are enough entries in the
table, we can expect uniform distribution.  An entry that begins
with for example "deadbeef..." is much likely to appear much
later than in the midway of a reasonably populated table.  In
fact, it can be expected to be near 87% (222/256) from the top
of the table.

This is a work-in-progress and has switches to allow easier
experiments and debugging.  Exporting GIT_USE_LOOKUP environment
variable enables this code.

On my admittedly memory starved machine, with a partial KDE
repository (3.0G pack with 95M idx):

    $ GIT_USE_LOOKUP=t git log -800 --stat HEAD >/dev/null
    3.93user 0.16system 0:04.09elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+55588minor)pagefaults 0swaps

Without the patch, the numbers are:

    $ git log -800 --stat HEAD >/dev/null
    4.00user 0.15system 0:04.17elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+60258minor)pagefaults 0swaps

In the same repository:

    $ GIT_USE_LOOKUP=t git log -2000 HEAD >/dev/null
    0.12user 0.00system 0:00.12elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+4241minor)pagefaults 0swaps

Without the patch, the numbers are:

    $ git log -2000 HEAD >/dev/null
    0.05user 0.01system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+8506minor)pagefaults 0swaps

There isn't much time difference, but the number of minor faults
seems to show that we are touching much smaller number of pages,
which is expected.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-04-09 01:23:52 -07:00
Nicolas Pitre
70f5d5d31c fix unimplemented packed_object_info_detail() features
Since commit eb32d236df, there was a TODO
comment in packed_object_info_detail() about the SHA1 of base object to
OBJ_OFS_DELTA objects.  So here it is at last.

While at it, providing the actual storage size information as well is now
trivial.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-01 01:44:46 -08:00
Junio C Hamano
c484166374 Merge branch 'jk/empty-tree'
* jk/empty-tree:
  add--interactive: handle initial commit better
  hard-code the empty tree object
2008-02-20 16:13:28 -08:00
Junio C Hamano
ee4f06c0a6 Merge branch 'mk/maint-parse-careful'
* mk/maint-parse-careful:
  peel_onion: handle NULL
  check return value from parse_commit() in various functions
  parse_commit: don't fail, if object is NULL
  revision.c: handle tag->tagged == NULL
  reachable.c::process_tree/blob: check for NULL
  process_tag: handle tag->tagged == NULL
  check results of parse_commit in merge_bases
  list-objects.c::process_tree/blob: check for NULL
  reachable.c::add_one_tree: handle NULL from lookup_tree
  mark_blob/tree_uninteresting: check for NULL
  get_sha1_oneline: check return value of parse_object
  read_object_with_reference: don't read beyond the buffer
2008-02-18 20:56:01 -08:00
Martin Koegler
50974ec994 read_object_with_reference: don't read beyond the buffer
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-18 19:20:17 -08:00
Jeff King
346245a1bb hard-code the empty tree object
Now any commands may reference the empty tree object by its
sha1 (4b825dc642). This is
useful for showing some diffs, especially for initial
commits.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-13 13:44:17 -08:00
Steffen Prohaska
21e5ad50fc safecrlf: Add mechanism to warn about irreversible crlf conversions
CRLF conversion bears a slight chance of corrupting data.
autocrlf=true will convert CRLF to LF during commit and LF to
CRLF during checkout.  A file that contains a mixture of LF and
CRLF before the commit cannot be recreated by git.  For text
files this is the right thing to do: it corrects line endings
such that we have only LF line endings in the repository.
But for binary files that are accidentally classified as text the
conversion can corrupt data.

If you recognize such corruption early you can easily fix it by
setting the conversion type explicitly in .gitattributes.  Right
after committing you still have the original file in your work
tree and this file is not yet corrupted.  You can explicitly tell
git that this file is binary and git will handle the file
appropriately.

Unfortunately, the desired effect of cleaning up text files with
mixed line endings and the undesired effect of corrupting binary
files cannot be distinguished.  In both cases CRLFs are removed
in an irreversible way.  For text files this is the right thing
to do because CRLFs are line endings, while for binary files
converting CRLFs corrupts data.

This patch adds a mechanism that can either warn the user about
an irreversible conversion or can even refuse to convert.  The
mechanism is controlled by the variable core.safecrlf, with the
following values:

 - false: disable safecrlf mechanism
 - warn: warn about irreversible conversions
 - true: refuse irreversible conversions

The default is to warn.  Users are only affected by this default
if core.autocrlf is set.  But the current default of git is to
leave core.autocrlf unset, so users will not see warnings unless
they deliberately chose to activate the autocrlf mechanism.

The safecrlf mechanism's details depend on the git command.  The
general principles when safecrlf is active (not false) are:

 - we warn/error out if files in the work tree can modified in an
   irreversible way without giving the user a chance to backup the
   original file.

 - for read-only operations that do not modify files in the work tree
   we do not not print annoying warnings.

There are exceptions.  Even though...

 - "git add" itself does not touch the files in the work tree, the
   next checkout would, so the safety triggers;

 - "git apply" to update a text file with a patch does touch the files
   in the work tree, but the operation is about text files and CRLF
   conversion is about fixing the line ending inconsistencies, so the
   safety does not trigger;

 - "git diff" itself does not touch the files in the work tree, it is
   often run to inspect the changes you intend to next "git add".  To
   catch potential problems early, safety triggers.

The concept of a safety check was originally proposed in a similar
way by Linus Torvalds.  Thanks to Dimitry Potapov for insisting
on getting the naked LF/autocrlf=true case right.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
2008-02-06 13:07:28 -08:00
Shawn O. Pearce
c9ced051c3 Fix random fast-import errors when compiled with NO_MMAP
fast-import was relying on the fact that on most systems mmap() and
write() are synchronized by the filesystem's buffer cache.  We were
relying on the ability to mmap() 20 bytes beyond the current end
of the file, then later fill in those bytes with a future write()
call, then read them through the previously obtained mmap() address.

This isn't always true with some implementations of NFS, but it is
especially not true with our NO_MMAP=YesPlease build time option used
on some platforms.  If fast-import was built with NO_MMAP=YesPlease
we used the malloc()+pread() emulation and the subsequent write()
call does not update the trailing 20 bytes of a previously obtained
"mmap()" (aka malloc'd) address.

Under NO_MMAP that behavior causes unpack_entry() in sha1_file.c to
be unable to read an object header (or data) that has been unlucky
enough to be written to the packfile at a location such that it
is in the trailing 20 bytes of a window previously opened on that
same packfile.

This bug has gone unnoticed for a very long time as it is highly data
dependent.  Not only does the object have to be placed at the right
position, but it also needs to be positioned behind some other object
that has been accessed due to a branch cache invalidation.  In other
words the stars had to align just right, and if you did run into
this bug you probably should also have purchased a lottery ticket.

Fortunately the workaround is a lot easier than the bug explanation.

Before we allow unpack_entry() to read data from a pack window
that has also (possibly) been modified through write() we force
all existing windows on that packfile to be closed.  By closing
the windows we ensure that any new access via the emulated mmap()
will reread the packfile, updating to the current file content.

This comes at a slight performance degredation as we cannot reuse
previously cached windows when we update the packfile.  But it
is a fairly minor difference as the window closes happen at only
two points:

 - When the packfile is finalized and its .idx is generated:

   At this stage we are getting ready to update the refs and any
   data access into the packfile is going to be random, and is
   going after only the branch tips (to ensure they are valid).
   Our existing windows (if any) are not likely to be positioned
   at useful locations to access those final tip commits so we
   probably were closing them before anyway.

 - When the branch cache missed and we need to reload:

   At this point fast-import is getting change commands for the next
   commit and it needs to go re-read a tree object it previously
   had written out to the packfile.  What windows we had (if any)
   are not likely to cover the tree in question so we probably were
   closing them before anyway.

We do try to avoid unnecessarily closing windows in the second case
by checking to see if the packfile size has increased since the
last time we called unpack_entry() on that packfile.  If the size
has not changed then we have not written additional data, and any
existing window is still vaild.  This nicely handles the cases where
fast-import is going through a branch cache reload and needs to read
many trees at once.  During such an event we are not likely to be
updating the packfile so we do not cycle the windows between reads.

With this change in place t9301-fast-export.sh (which was broken
by c3b0dec509) finally works again.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-17 22:39:20 -08:00
Jim Meyering
790296fd88 Fix grammar nits in documentation and in code comments.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-03 09:15:17 -08:00
Steffen Prohaska
9e42d6a1c5 sha1_file.c: Fix size_t related printf format warnings
The old way of fixing warnings did not succeed on MinGW.  MinGW
does not support C99 printf format strings for size_t [1].  But
gcc on MinGW issues warnings if C99 printf format is not used.
Hence, the old stragegy to avoid warnings fails.

[1] http://www.mingw.org/MinGWiki/index.php/C99

This commits passes arguments of type size_t through a tiny
helper functions that casts to the type expected by the format
string.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-28 16:03:38 -08:00
Johannes Sixt
85dadc3894 Use is_absolute_path() in sha1_file.c.
There are some places that test for an absolute path. Use the helper
function to ease porting.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-14 15:18:39 -08:00
Junio C Hamano
e2b7eaf0ca Merge branch 'maint'
* maint:
  RelNotes-1.5.3.5: describe recent fixes
  merge-recursive.c: mrtree in merge() is not used before set
  sha1_file.c: avoid gcc signed overflow warnings
  Fix a small memory leak in builtin-add
  honor the http.sslVerify option in shell scripts
2007-10-29 12:53:54 -07:00
Junio C Hamano
7109c889f1 sha1_file.c: avoid gcc signed overflow warnings
With the recent gcc, we get:

sha1_file.c: In check_packed_git_:
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false
sha1_file.c:527: warning: assuming signed overflow does not
occur when assuming that (X + c) < X is always false

for a piece of code that tries to make sure that off_t is large
enough to hold more than 2^32 offset.  The test tried to make
sure these do not wrap-around:

    /* make sure we can deal with large pack offsets */
    off_t x = 0x7fffffffUL, y = 0xffffffffUL;
    if (x > (x + 1) || y > (y + 1)) {

but gcc assumes it can do whatever optimization it wants for a
signed overflow (undefined behaviour) and warns about this
construct.

Follow Linus's suggestion to check sizeof(off_t) instead to work
around the problem.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-29 11:56:57 -07:00