Commit Graph

485 Commits

Author SHA1 Message Date
Junio C Hamano
f8c8abc5b7 unpack_object_header(): make it public
This function is used to read and skip over the per-object header
in a packfile.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:54 -07:00
Junio C Hamano
5266d369b2 sha1_object_info_extended(): hint about objects in delta-base cache
An object found in the delta-base cache is not guaranteed to
stay there, but we know it came from a pack and it is likely
to give us a quick access if we read_sha1_file() it right now,
which is a piece of useful information.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-20 18:38:50 -07:00
Junio C Hamano
9a49059022 sha1_object_info_extended(): expose a bit more info
The original interface for sha1_object_info() takes an object name and
gives back a type and its size (the latter is given only when it was
asked).  The new interface wraps its implementation and exposes a bit
more pieces of information that the interface used to discard, namely:

 - where the object is stored (loose? cached? packed?)
 - if packed, where in which packfile?

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * In the earlier round, this used u.pack.delta to record the length of
   the delta chain, but the caller is not necessarily interested in the
   length of the delta chain per-se, but may only want to know if it is a
   delta against another object or is stored as a deflated data. Calling
   packed_object_info_detail() involves walking the reverse index chain to
   compute the store size of the object and is unnecessarily expensive.

   We could resurrect the code if a new caller wants to know, but I doubt
   it.
2011-05-19 14:22:47 -07:00
Junio C Hamano
b9a62cbeb9 packed_object_info_detail(): do not return a string
Instead return an integer that can be given to typename() if
the caller wants a string, just like everybody else does.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-16 22:13:34 -07:00
Junio C Hamano
02071b27f1 Merge branches 'jc/convert', 'jc/bigfile' and 'jc/replacing' into jc/streaming
* jc/convert:
  convert: make it harder to screw up adding a conversion attribute
  convert: make it safer to add conversion attributes
  convert: give saner names to crlf/eol variables, types and functions
  convert: rename the "eol" global variable to "core_eol"

* jc/bigfile:
  Bigfile: teach "git add" to send a large file straight to a pack
  index_fd(): split into two helper functions
  index_fd(): turn write_object and format_check arguments into one flag

* jc/replacing:
  read_sha1_file(): allow selective bypassing of replacement mechanism
  inline lookup_replace_object() calls
  read_sha1_file(): get rid of read_sha1_file_repl() madness
  t6050: make sure we test not just commit replacement
  Declare lookup_replace_object() in cache.h, not in commit.h
2011-05-15 16:30:13 -07:00
Junio C Hamano
f4e516834e git_open_noatime(): drop unused parameter
Since commit c793430 (Limit file descriptors used by packs, 2011-02-28),
the extra parameter added in f2e872aa (Work around EMFILE when there are
too many pack files, 2010-11-01) is not used anymore.

Remove it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
2011-05-15 15:24:52 -07:00
Junio C Hamano
ccf5ace0dc sha1_file: typofix
The number zero is spelled "zero", not "zer0".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:24:36 -07:00
Junio C Hamano
5bf29b9500 read_sha1_file(): allow selective bypassing of replacement mechanism
The way "object replacement" mechanism was tucked to the read_sha1_file()
interface was suboptimal in a couple of ways:

 - Callers that want it to die with useful diagnosis upon seeing a corrupt
   object does not have a way to say that they do not want any object
   replacement.

 - Callers who do not want it to die but want to handle the errors
   themselves are told to arrange to call read_object(), but the function
   does not use the replacement mechanism, and also it is a file scope
   static function that not many callers can call to begin with.

This adds a read_sha1_file_extended() that takes a set of flags; the
callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask
for object replacement mechanism to kick in.

Later, we could add another flag bit to tell the function to return an
error instead of dying and then remove the misguided "call read_object()
yourself".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:34 -07:00
Junio C Hamano
4bbf5a2615 read_sha1_file(): get rid of read_sha1_file_repl() madness
Most callers want to silently get a replacement object, and they do not
care what the real name of the replacement object is.  Worse yet, no sane
interface to return the underlying object without replacement is provided.

Remove the function and make only the few callers that want the name of
the replacement object find it themselves.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-15 15:23:33 -07:00
Junio C Hamano
4dd1fbc7b1 Bigfile: teach "git add" to send a large file straight to a pack
When adding a new content to the repository, we have always slurped
the blob in its entirety in-core first, and computed the object name
and compressed it into a loose object file.  Handling large binary
files (e.g.  video and audio asset for games) has been problematic
because of this design.

At the middle level of "git add" callchain is an internal API
index_fd() that takes an open file descriptor to read from the
working tree file being added with its size. Teach it to call out to
fast-import when adding a large blob.

The write-out codepath in entry.c::write_entry() should be taught to
stream, instead of reading everything in core. This should not be so
hard to implement, especially if we limit ourselves only to loose
object files and non-delta representation in packfiles.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-13 16:11:18 -07:00
Junio C Hamano
7b41e1e15b index_fd(): split into two helper functions
Split out the case where we do not know the size of the input (hence we
read everything into a strbuf before doing anything) to index_pipe(), and
the other case where we mmap or read the whole data to index_bulk().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Junio C Hamano
c4ce46fc7a index_fd(): turn write_object and format_check arguments into one flag
The "format_check" parameter tucked after the existing parameters is too
ugly an afterthought to live in any reasonable API.

Combine it with the other boolean parameter "write_object" into a single
"flags" parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Jim Meyering
0353a0c4ec remove doubled words, e.g., s/to to/to/, and fix related typos
I found that some doubled words had snuck back into projects from which
I'd already removed them, so now there's a "syntax-check" makefile rule in
gnulib to help prevent recurrence.

Running the command below spotted a few in git, too:

  git ls-files | xargs perl -0777 -n \
    -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \
    -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \
    -e 'print "$ARGV:$n:$v\n"}'

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13 11:59:11 -07:00
Junio C Hamano
ad7bb2f68c Merge branch 'jc/maint-rerere-in-workdir'
* jc/maint-rerere-in-workdir:
  rerere: make sure it works even in a workdir attached to a young repository
2011-03-26 20:13:16 -07:00
Junio C Hamano
90a6464b4a rerere: make sure it works even in a workdir attached to a young repository
The git-new-workdir script in contrib/ makes a new work tree by sharing
many subdirectories of the .git directory with the original repository.
When rerere.enabled is set in the original repository, but the user has
not encountered any conflicts yet, the original repository may not yet
have .git/rr-cache directory.

When rerere wants to run in a new work tree created from such a young
original repository, it fails to mkdir(2) .git/rr-cache that is a symlink
to a yet-to-be-created directory.

There are three possible approaches to this:

 - A naive solution is not to create a symlink in the git-new-workdir
   script to a directory the original does not have (yet).  This is not a
   solution, as we tend to lazily create subdirectories of .git/, and
   having rerere.enabled configuration set is a strong indication that the
   user _wants_ to have this lazy creation to happen;

 - We could always create .git/rr-cache upon repository creation.  This is
   tempting but will not help people with existing repositories.

 - Detect this case by seeing that mkdir(2) failed with EEXIST, checking
   that the path is a symlink, and try running mkdir(2) on the link
   target.

This patch solves the issue by doing the third one.

Strictly speaking, this is incomplete.  It does not attempt to handle
relative symbolic link that points into the original repository, but this
is good enough to help people who use contrib/workdir/git-new-workdir
script.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-23 16:05:44 -07:00
Junio C Hamano
3ed8868474 Merge branch 'jn/maint-c99-format'
* jn/maint-c99-format:
  unbreak and eliminate NO_C99_FORMAT
  mktag: avoid %td in format string
2011-03-23 14:55:46 -07:00
Jonathan Nieder
28bd70d811 unbreak and eliminate NO_C99_FORMAT
In the spirit of v1.5.0.2~21 (Check for PRIuMAX rather than
NO_C99_FORMAT in fast-import.c, 2007-02-20), use PRIuMAX from
git-compat-util.h on all platforms instead of C99-specific formats
like %zu with dangerous fallbacks to %u or %lu.

So now C99-challenged platforms can build git without provoking
warnings or errors from printf, even if pointers do not have the same
size as an int or long.

The need for a fallback PRIuMAX is detected in git-compat-util.h with
"#ifndef PRIuMAX".  So while at it, simplify the Makefile and configure
script by eliminating the NO_C99_FORMAT knob altogether.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-17 15:30:49 -07:00
Junio C Hamano
674ef90904 Merge branch 'sp/maint-fd-limit'
* sp/maint-fd-limit:
  sha1_file.c: Don't retain open fds on small packs
  mingw: add minimum getrlimit() compatibility stub
  Limit file descriptors used by packs
2011-03-15 14:22:23 -07:00
Shawn O. Pearce
d131b7afea sha1_file.c: Don't retain open fds on small packs
If a pack file is small enough that its entire contents fits within
one mmap window, mmap the file and then immediately close its file
descriptor.  This reduces the number of file descriptors that are
needed to read from repositories with many tiny pack files, such
as one that has received 1000 pushes (and created 1000 small pack
files) since its last repack.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-02 11:25:30 -08:00
Shawn O. Pearce
c7934306d1 Limit file descriptors used by packs
Rather than using 'errno == EMFILE' after a failed open() call
to indicate the process is out of file descriptors and an LRU
pack window should be closed, place a hard upper limit on the
number of open packs based on the actual rlimit of the process.

By using a hard upper limit that is below the rlimit of the current
process it is not necessary to check for EMFILE on every single
fd-allocating system call.  Instead reserving 25 file descriptors
makes it safe to assume the system call won't fail due to being over
the filedescriptor limit.  Here 25 is chosen as a WAG, but considers
3 for stdin/stdout/stderr, and at least a few for other Git code
to operate on temporary files.  An additional 20 is reserved as it
is not known what the C library needs to perform other services on
Git's behalf, such as nsswitch or name resolution.

This fixes a case where running `git gc --auto` in a repository
with more than 1024 packs (but an rlimit of 1024 open fds) fails
due to the temporary output file not being able to allocate a
file descriptor.  The output file is opened by pack-objects after
object enumeration and delta compression are done, both of which
have already opened all of the packs and fully populated the file
descriptor table.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-28 13:08:31 -08:00
Junio C Hamano
fc7ae9c156 Merge branch 'nd/hash-object-sanity'
* nd/hash-object-sanity:
  Make hash-object more robust against malformed objects

Conflicts:
	cache.h
2011-02-27 21:58:30 -08:00
Jonathan Nieder
dab0d4108d correct type of EMPTY_TREE_SHA1_BIN
Functions such as hashcmp that expect a binary SHA-1 value take
parameters of type "unsigned char *" to avoid accepting a textual
SHA-1 passed by mistake.  Unfortunately, this means passing the string
literal EMPTY_TREE_SHA1_BIN requires an ugly cast.  Tweak the
definition of EMPTY_TREE_SHA1_BIN to produce a value of more
convenient type.

In the future the definition might change to

	extern const unsigned char empty_tree_sha1_bin[20];
	#define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-14 10:48:06 -08:00
Nguyễn Thái Ngọc Duy
c4d9986f5f sha1_object_info: examine cached_object store too
Cached object store was added in d66b37b (Add pretend_sha1_file()
interface. - 2007-02-04) as a way to temporarily inject some objects
to object store.

But only read_sha1_file() knows about this store. While it will return
an object from this store, sha1_object_info() will happily say
"object not found".

Teach sha1_object_info() about the cached store for consistency.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:48 -08:00
Nguyễn Thái Ngọc Duy
c597ba8010 sha1_file.c: move find_cached_object up so sha1_object_info can use it
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:46 -08:00
Nguyễn Thái Ngọc Duy
c879daa237 Make hash-object more robust against malformed objects
Commits, trees and tags have structure. Don't let users feed git
with malformed ones. Sooner or later git will die() when
encountering them.

Note that this patch does not check semantics. A tree that points
to non-existent objects is perfectly OK (and should be so, users
may choose to add commit first, then its associated tree for example).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:25 -08:00
Björn Steinbrink
25f3af3f9d Correctly report corrupted objects
The errno check added in commit 3ba7a06 "A loose object is not corrupt
if it cannot be read due to EMFILE" only checked for whether errno is
not ENOENT and thus incorrectly treated "no error" as an error
condition.

Because of that, it never reached the code path that would report that
the object is corrupted and instead caused funny errors like:

  fatal: failed to read object 333c4768ce595793fdab1ef3a036413e2a883853: Success

So we have to extend the check to cover the case in which the object
file was successfully read, but its contents are corrupted.

Reported-by: Will Palmer <wmpalmer@gmail.com>
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-01-20 13:18:51 -08:00
Junio C Hamano
39f04dbaac Merge branch 'jn/thinner-wrapper'
* jn/thinner-wrapper:
  Remove pack file handling dependency from wrapper.o
  pack-objects: mark file-local variable static
  wrapper: give zlib wrappers their own translation unit
  strbuf: move strbuf_branchname to sha1_name.c
  path helpers: move git_mkstemp* to wrapper.c
  wrapper: move odb_* to environment.c
  wrapper: move xmmap() to sha1_file.c
2010-12-03 16:13:06 -08:00
Jonathan Nieder
e050029385 Remove pack file handling dependency from wrapper.o
As v1.7.0-rc0~43 (slim down "git show-index", 2010-01-21) explains,
use of xmalloc() brings in a dependency on zlib, the sha1 lib, and the
rest of git's object file access machinery via try_to_free_pack_memory.
That is overkill when xmalloc is just being used as a convenience
wrapper to exit when no memory is available.

So defer setting try_to_free_pack_memory as try_to_free_routine until
the first packfile is opened in add_packed_git().

After this change, a simple program using xmalloc() and no other
functions will not pull in any code from libgit.a aside from wrapper.o
and usage.o.

Improved-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-10 11:11:07 -08:00
Jonathan Nieder
58ecbd5ede wrapper: move xmmap() to sha1_file.c
wrapper.o depends on sha1_file.o for a number of reasons.  One is
release_pack_memory().

xmmap function calls mmap, discarding unused pack windows when
necessary to relieve memory pressure.  Simple git programs using
wrapper.o as a friendly libc do not need this functionality.
So move xmmap to sha1_file.o, where release_pack_memory() is.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-10 11:03:13 -08:00
Shawn O. Pearce
f2e872aa5e Work around EMFILE when there are too many pack files
When opening any files in the object database, release unused pack
windows if the open(2) syscall fails due to EMFILE (too many open
files in this process).  This allows Git to degrade gracefully on
a repository with thousands of pack files, and a commit stored in
a loose object in the middle of the history.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 10:21:46 -07:00
Shawn O. Pearce
4865d2b662 Use git_open_noatime when accessing pack data
This utility function avoids an unnecessary update of the access time
for a loose object file.  Just as the atime isn't useful on a loose
object, its not useful on the pack or the corresonding idx file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:25:58 -07:00
Junio C Hamano
3ba7a06552 A loose object is not corrupt if it cannot be read due to EMFILE
"git fsck" bails out with a claim that a loose object that cannot be
read but exists on the filesystem to be corrupt, which is wrong when
read_object() failed due to e.g. EMFILE.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:24:57 -07:00
Junio C Hamano
b6c4ceccb3 read_sha1_file(): report correct name of packfile with a corrupt object
Clarify the error reporting logic by moving the normal codepath (i.e. we
read the object we wanted to read correctly) up and return early.

The logic to report the name of the packfile with a corrupt object,
introduced by e8b15e6 (sha1_file: Show the the type and path to corrupt
objects, 2010-06-10), was totally bogus.  The function that knows which
bad object came from what packfile is has_packed_and_bad(); make it report
which packfile the problem was found.

"Corrupt" is already an adjective, e.g. an object is "corrupt"; we do not
have to say "corrupted object".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:24:47 -07:00
Ævar Arnfjörð Bjarmason
e8b15e6156 sha1_file: Show the the type and path to corrupt objects
Change the error message that's displayed when we encounter corrupt
objects to be more specific. We now print the type (loose or packed)
of corrupted objects, along with the full path to the file in
question.

Before:

    $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
    fatal: object 909ef997367880aaf2133bafa1f1a71aa28e09df is corrupted

After:

    $ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
    fatal: loose object 909ef997367880aaf2133bafa1f1a71aa28e09df (stored in .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df) is corrupted

Knowing the path helps to quickly analyze what's wrong:

    $ file .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df
    .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df: empty

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-14 15:35:12 -07:00
Junio C Hamano
e391fdfc69 Merge branch 'jk/maint-sha1-file-name-fix'
* jk/maint-sha1-file-name-fix:
  remove over-eager caching in sha1_file_name
2010-06-13 11:22:00 -07:00
Jeff King
560fb6a183 remove over-eager caching in sha1_file_name
This function takes a sha1 and produces a loose object
filename. It caches the location of the object directory so
that it can fill the sha1 information directly without
allocating a new buffer (and in its original incarnation,
without calling getenv(), though these days we cache that
with the code in environment.c).

This cached base directory can become stale, however, if in
a single process git changes the location of the object
directory (e.g., by running setup_work_tree, which will
chdir to the new worktree).

In most cases this isn't a problem, because we tend to set
up the git repository location and do any chdir()s before
actually looking up any objects, so the first lookup will
cache the correct location. In the case of reset --hard,
however, we do something like:

  1. look up the commit object

  2. notice we are doing --hard, run setup_work_tree

  3. look up the tree object to reset

Step (3) fails because our cache object directory value is
bogus.

This patch simply removes the caching. We use a static
buffer instead of allocating one each time (the original
version treated the malloc'd buffer as a static, so there is
no change in calling semantics).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-25 09:21:28 -07:00
Junio C Hamano
035bf8d7c4 Merge branch 'sp/maint-dumb-http-pack-reidx'
* sp/maint-dumb-http-pack-reidx:
  http.c::new_http_pack_request: do away with the temp variable filename
  http-fetch: Use temporary files for pack-*.idx until verified
  http-fetch: Use index-pack rather than verify-pack to check packs
  Allow parse_pack_index on temporary files
  Extract verify_pack_index for reuse from verify_pack
  Introduce close_pack_index to permit replacement
  http.c: Remove unnecessary strdup of sha1_to_hex result
  http.c: Don't store destination name in request structures
  http.c: Drop useless != NULL test in finish_http_pack_request
  http.c: Tiny refactoring of finish_http_pack_request
  t5550-http-fetch: Use subshell for repository operations
  http.c: Remove bad free of static block
2010-05-21 04:02:19 -07:00
Junio C Hamano
636e87d705 Merge branch 'maint'
* maint:
  Documentation/gitdiffcore: fix order in pickaxe description
  Documentation: fix minor inconsistency
  Documentation: rebase -i ignores options passed to "git am"
  hash_object: correction for zero length file
2010-05-18 22:39:56 -07:00
Dmitry Potapov
08bda2085c hash_object: correction for zero length file
The check whether size is zero was done after if size <= SMALL_FILE_SIZE,
as result, zero size case was never triggered. Instead zero length file
was treated as any other small file. This did not caused any problem, but
if we have a special case for size equal to zero, it is better to make it
work and avoid redundant malloc().

Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-18 21:46:36 -07:00
Shawn O. Pearce
7b64469a36 Allow parse_pack_index on temporary files
The easiest way to verify a pack index is to open it through the
standard parse_pack_index function, permitting the header check
to happen when the file is mapped.  However, the dumb HTTP client
needs to verify a pack index before its moved into its proper file
name within the objects/pack directory, to prevent a corrupt index
from being made available.  So permit the caller to specify the
exact path of the index file.

For now we're still using the final destination name within the
sole call site in http.c, but eventually we will start to parse
the temporary path instead.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 17:56:17 -07:00
Shawn O. Pearce
fa5fc15d6e Introduce close_pack_index to permit replacement
By closing the pack index, a caller can later overwrite the index
with an updated index file, possibly after converting from v1 to
the v2 format.  Because p->index_data is NULL after close, on the
next access the index will be opened again and the other members
will be updated with new data.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 17:56:08 -07:00
Jeff King
40d52ff77b make commit_tree a library function
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.

The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".

Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01 23:53:54 -07:00
Jeff King
c00e657df2 fix const-correctness of write_sha1_file
These should take const buffers as input data, but zlib's
next_in pointer is not const-correct. Let's fix it at the
zlib level, though, so the cast happens in one obvious
place. This should be safe, as a similar cast is used in
zlib's example code for a const array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-01 23:49:03 -07:00
Junio C Hamano
493e433277 Merge branch 'mm/mkstemps-mode-for-packfiles' into maint
* mm/mkstemps-mode-for-packfiles:
  Use git_mkstemp_mode instead of plain mkstemp to create object files
  git_mkstemps_mode: don't set errno to EINVAL on exit.
  Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
  git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
  Move gitmkstemps to path.c
  Add a testcase for ACL with restrictive umask.
2010-03-08 00:36:00 -08:00
Junio C Hamano
c2b456b895 Merge branch 'nd/root-git'
* nd/root-git:
  Add test for using Git at root of file system
  Support working directory located at root
  Move offset_1st_component() to path.c
  init-db, rev-parse --git-dir: do not append redundant slash
  make_absolute_path(): Do not append redundant slash

Conflicts:
	setup.c
	sha1_file.c
2010-03-07 12:47:15 -08:00
Junio C Hamano
87912fd617 Merge branch 'mm/mkstemps-mode-for-packfiles'
* mm/mkstemps-mode-for-packfiles:
  Use git_mkstemp_mode instead of plain mkstemp to create object files
  git_mkstemps_mode: don't set errno to EINVAL on exit.
  Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
  git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
  Move gitmkstemps to path.c
  Add a testcase for ACL with restrictive umask.
2010-03-07 12:47:14 -08:00
Junio C Hamano
780fc9a0a6 Merge branch 'dp/read-not-mmap-small-loose-object' into maint
* dp/read-not-mmap-small-loose-object:
  hash-object: don't use mmap() for small files
2010-03-04 22:26:17 -08:00
Junio C Hamano
34c014d13e Merge branch 'np/compress-loose-object-memsave'
* np/compress-loose-object-memsave:
  sha1_file: be paranoid when creating loose objects
  sha1_file: don't malloc the whole compressed result when writing out objects
2010-03-02 12:44:09 -08:00
Matthieu Moy
5256b00631 Use git_mkstemp_mode instead of plain mkstemp to create object files
We used to unnecessarily give the read permission to group and others,
regardless of the umask, which isn't serious because the objects are
still protected by their containing directory, but isn't necessary
either.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22 15:24:46 -08:00
Nicolas Pitre
748af44c63 sha1_file: be paranoid when creating loose objects
We don't want the data being deflated and stored into loose objects
to be different from what we expect.  While the deflated data is
protected by a CRC which is good enough for safe data retrieval
operations, we still want to be doubly sure that the source data used
at object creation time is still what we expected once that data has
been deflated and its CRC32 computed.

The most plausible data corruption may occur if the source file is
modified while Git is deflating and writing it out in a loose object.
Or Git itself could have a bug causing memory corruption.  Or even bad
RAM could cause trouble.  So it is best to make sure everything is
coherent and checksum protected from beginning to end.

To do so we compute the SHA1 of the data being deflated _after_ the
deflate operation has consumed that data, and make sure it matches
with the expected SHA1.  This way we can rely on the CRC32 checked by
the inflate operation to provide a good indication that the data is still
coherent with its SHA1 hash.  One pathological case we ignore is when
the data is modified before (or during) deflate call, but changed back
before it is hashed.

There is some overhead of course. Using 'git add' on a set of large files:

Before:

	real    0m25.210s
	user    0m23.783s
	sys     0m1.408s

After:

	real    0m26.537s
	user    0m25.175s
	sys     0m1.358s

The overhead is around 5% for full data coherency guarantee.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-21 22:33:25 -08:00