Commit Graph

405 Commits

Author SHA1 Message Date
Junio C Hamano
3c91bf6805 Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  pack-objects: don't loosen objects available in alternate or kept packs
  t7700: demonstrate repack flaw which may loosen objects unnecessarily
  Remove --kept-pack-only option and associated infrastructure
  pack-objects: only repack or loosen objects residing in "local" packs
  git-repack.sh: don't use --kept-pack-only option to pack-objects
  t7700-repack: add two new tests demonstrating repacking flaws

Conflicts:
	t/t7700-repack.sh
2009-04-01 22:34:19 -07:00
Junio C Hamano
89fbda2425 Merge branch 'maint'
* maint:
  Increase the size of the die/warning buffer to avoid truncation
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 19:45:57 -07:00
Junio C Hamano
b0de555410 Merge branch 'maint-1.6.1' into maint
* maint-1.6.1:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:21 -07:00
Junio C Hamano
2a5643da73 Merge branch 'maint-1.6.0' into maint-1.6.1
* maint-1.6.0:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:15 -07:00
Linus Torvalds
e8bd78c3fc close_sha1_file(): make it easier to diagnose errors
A bug report with "unable to write sha1 file" made us realize that we do
not have enough information to guess why close() is failing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-24 14:39:20 -07:00
Brandon Casey
4d6acb7041 Remove --kept-pack-only option and associated infrastructure
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack.  It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-20 13:32:33 -07:00
Junio C Hamano
aec813062b Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  is_kept_pack(): final clean-up
  Simplify is_kept_pack()
  Consolidate ignore_packed logic more
  has_sha1_kept_pack(): take "struct rev_info"
  has_sha1_pack(): refactor "pretend these packs do not exist" interface
  git-repack: resist stray environment variable
2009-03-11 13:49:56 -07:00
Junio C Hamano
69e020ae00 is_kept_pack(): final clean-up
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.

Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.

This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
03a9683d22 Simplify is_kept_pack()
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.

The new --kept-pack-only option means just that.  When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
386cb77210 Consolidate ignore_packed logic more
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack().  The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
b8431b033f has_sha1_kept_pack(): take "struct rev_info"
Its "ignore_packed" parameter always comes from struct rev_info.  This
patch makes the function take a pointer to the surrounding structure, so
that the refactoring in the next patch becomes easier to review.

There is an unfortunate header file dependency and the easiest workaround
is to temporarily move the function declaration from cache.h to
revision.h; this will be moved back to cache.h once the function loses
this "ignore_packed" parameter altogether in the later part of the
series.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
cd673c1f17 has_sha1_pack(): refactor "pretend these packs do not exist" interface
Most of the callers of this function except only one pass NULL to its last
parameter, ignore_packed.

Introduce has_sha1_kept_pack() function that has the function signature
and the semantics of this function, and convert the sole caller that does
not pass NULL to call this new function.

All other callers and has_sha1_pack() lose the ignore_packed parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Felipe Contreras
a9d98a148d sha1_file.c: fix typo
it's != its

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-25 00:49:54 -08:00
Junio C Hamano
30aa4fb15f Merge branch 'maint'
* maint:
  Prepare for 1.6.1.4.
  Make repack less likely to corrupt repository
  fast-export: ensure we traverse commits in topological order
  Clear the delta base cache if a pack is rebuilt

Conflicts:
	RelNotes
2009-02-11 18:47:30 -08:00
Junio C Hamano
7a134dbbc9 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  Make repack less likely to corrupt repository
  fast-export: ensure we traverse commits in topological order
  Clear the delta base cache if a pack is rebuilt
2009-02-11 18:32:37 -08:00
Shawn O. Pearce
fa3a0c94dc Clear the delta base cache if a pack is rebuilt
There is some risk that re-opening a regenerated pack file with
different offsets could leave stale entries within the delta base
cache that could be matched up against other objects using the same
"struct packed_git*" and pack offset.

Throwing away the entire delta base cache in this case is safer,
as we don't have to worry about a recycled "struct packed_git*"
matching to the wrong base object, resulting in delta apply
errors while unpacking an object.

Suggested-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-11 10:25:24 -08:00
Junio C Hamano
fd8475d9fb Merge branch 'maint'
* maint:
  Clear the delta base cache during fast-import checkpoint
2009-02-10 21:30:45 -08:00
Junio C Hamano
9b27ea9518 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  Clear the delta base cache during fast-import checkpoint
2009-02-10 15:32:26 -08:00
Shawn O. Pearce
3d20c636af Clear the delta base cache during fast-import checkpoint
Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.

Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-10 15:30:59 -08:00
Junio C Hamano
141b6b83d7 Merge branch 'lt/maint-wrap-zlib' into maint
* lt/maint-wrap-zlib:
  Wrap inflate and other zlib routines for better error reporting

Conflicts:
	http-push.c
	http-walker.c
	sha1_file.c
2009-02-05 18:01:00 -08:00
Junio C Hamano
8c95d3c31b Sync with 1.6.1.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-29 00:32:52 -08:00
Junio C Hamano
8561b522d7 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  avoid 31-bit truncation in write_loose_object
2009-01-28 23:41:28 -08:00
Jeff King
915308b187 avoid 31-bit truncation in write_loose_object
The size of the content we are adding may be larger than
2.1G (i.e., "git add gigantic-file"). Most of the code-path
to do so uses size_t or unsigned long to record the size,
but write_loose_object uses a signed int.

On platforms where "int" is 32-bits (which includes x86_64
Linux platforms), we end up passing malloc a negative size.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-28 23:40:53 -08:00
Junio C Hamano
36dd939393 Merge branch 'lt/maint-wrap-zlib'
* lt/maint-wrap-zlib:
  Wrap inflate and other zlib routines for better error reporting

Conflicts:
	http-push.c
	http-walker.c
	sha1_file.c
2009-01-21 16:55:17 -08:00
Christian Couder
c2c5b27051 sha1_file: make "read_object" static
This function is only used from "sha1_file.c".

And as we want to add a "replace_object" hook in "read_sha1_file",
we must not let people bypass the hook using something other than
"read_sha1_file".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-13 00:14:55 -08:00
Linus Torvalds
39c68542fc Wrap inflate and other zlib routines for better error reporting
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.

This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting.  Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-11 02:13:06 -08:00
Linus Torvalds
b760d3aa74 Make 'index_path()' use 'strbuf_readlink()'
This makes us able to properly index symlinks even on filesystems where
st_size doesn't match the true size of the link.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-17 13:36:34 -08:00
Junio C Hamano
de0db42278 Merge branch 'maint'
* maint:
  fsck: reduce stack footprint
  make sure packs to be replaced are closed beforehand
2008-12-11 00:36:31 -08:00
Nicolas Pitre
c74faea19e make sure packs to be replaced are closed beforehand
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.

This should fix t5303 on Windows.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-12-10 17:56:05 -08:00
Junio C Hamano
0fd9d7e66d Merge branch 'bc/maint-keep-pack' into maint
* bc/maint-keep-pack:
  repack: only unpack-unreachable if we are deleting redundant packs
  t7700: test that 'repack -a' packs alternate packed objects
  pack-objects: extend --local to mean ignore non-local loose objects too
  sha1_file.c: split has_loose_object() into local and non-local counterparts
  t7700: demonstrate mishandling of loose objects in an alternate ODB
  builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
  repack: do not fall back to incremental repacking with [-a|-A]
  repack: don't repack local objects in packs with .keep file
  pack-objects: new option --honor-pack-keep
  packed_git: convert pack_local flag into a bitfield and add pack_keep
  t7700: demonstrate mishandling of objects in packs with a .keep file
2008-12-02 23:00:04 -08:00
Junio C Hamano
455d0f5c23 Merge branch 'maint'
* maint:
  sha1_file.c: resolve confusion EACCES vs EPERM
  sha1_file: avoid bogus "file exists" error message
  git checkout: don't warn about unborn branch if -f is already passed
  bash: offer refs instead of filenames for 'git revert'
  bash: remove dashed command leftovers
  git-p4: fix keyword-expansion regex
  fast-export: use an unsorted string list for extra_refs
  Add new testcase to show fast-export does not always exports all tags
2008-11-27 19:23:51 -08:00
Sam Vilain
35243577ab sha1_file.c: resolve confusion EACCES vs EPERM
An earlier commit 916d081 (Nicer error messages in case saving an object
to db goes wrong, 2006-11-09) confused EACCES with EPERM, the latter of
which is an unlikely error from mkstemp().

Signed-off-by: Sam Vilain <sam@vilain.net>
2008-11-27 19:11:21 -08:00
Joey Hess
65117abc04 sha1_file: avoid bogus "file exists" error message
This avoids the following misleading error message:

error: unable to create temporary sha1 filename ./objects/15: File exists

mkstemp can fail for many reasons, one of which, ENOENT, can occur if
the directory for the temp file doesn't exist. create_tmpfile tried to
handle this case by always trying to mkdir the directory, even if it
already existed. This caused errno to be clobbered, so one cannot tell
why mkstemp really failed, and it truncated the buffer to just the
directory name, resulting in the strange error message shown above.

Note that in both occasions that I've seen this failure, it has not been
due to a missing directory, or bad permissions, but some other, unknown
mkstemp failure mode that did not occur when I ran git again. This code
could perhaps be made more robust by retrying mkstemp, in case it was a
transient failure.

Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-27 18:48:53 -08:00
Joey Hess
cbacbf4e55 sha1_file: avoid bogus "file exists" error message
This avoids the following misleading error message:

error: unable to create temporary sha1 filename ./objects/15: File exists

mkstemp can fail for many reasons, one of which, ENOENT, can occur if
the directory for the temp file doesn't exist. create_tmpfile tried to
handle this case by always trying to mkdir the directory, even if it
already existed. This caused errno to be clobbered, so one cannot tell
why mkstemp really failed, and it truncated the buffer to just the
directory name, resulting in the strange error message shown above.

Note that in both occasions that I've seen this failure, it has not been
due to a missing directory, or bad permissions, but some other, unknown
mkstemp failure mode that did not occur when I ran git again. This code
could perhaps be made more robust by retrying mkstemp, in case it was a
transient failure.

Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-23 19:44:19 -08:00
Alex Riesen
f755bb996b Fix handle leak in sha1_file/unpack_objects if there were damaged object data
In the case of bad packed object CRC, unuse_pack wasn't called after
check_pack_crc which calls use_pack.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-23 19:31:05 -08:00
Junio C Hamano
47a792539a Merge branch 'jk/commit-v-strip'
* jk/commit-v-strip:
  status: show "-v" diff even for initial commit
  wt-status: refactor initial commit printing
  define empty tree sha1 as a macro
2008-11-16 00:48:59 -08:00
Junio C Hamano
7b51b77dbc Merge branch 'np/pack-safer'
* np/pack-safer:
  t5303: fix printf format string for portability
  t5303: work around printf breakage in dash
  pack-objects: don't leak pack window reference when splitting packs
  extend test coverage for latest pack corruption resilience improvements
  pack-objects: allow "fixing" a corrupted pack without a full repack
  make find_pack_revindex() aware of the nasty world
  make check_object() resilient to pack corruptions
  make packed_object_info() resilient to pack corruptions
  make unpack_object_header() non fatal
  better validation on delta base object offsets
  close another possibility for propagating pack corruption
2008-11-12 22:26:35 -08:00
Junio C Hamano
ecbbfb15a4 Merge branch 'bc/maint-keep-pack'
* bc/maint-keep-pack:
  t7700: test that 'repack -a' packs alternate packed objects
  pack-objects: extend --local to mean ignore non-local loose objects too
  sha1_file.c: split has_loose_object() into local and non-local counterparts
  t7700: demonstrate mishandling of loose objects in an alternate ODB
  builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
  repack: do not fall back to incremental repacking with [-a|-A]
  repack: don't repack local objects in packs with .keep file
  pack-objects: new option --honor-pack-keep
  packed_git: convert pack_local flag into a bitfield and add pack_keep
  t7700: demonstrate mishandling of objects in packs with a .keep file
2008-11-12 22:00:43 -08:00
Jeff King
14d9c57896 define empty tree sha1 as a macro
This can potentially be used in a few places, so let's make
it available to all parts of the code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 12:52:21 -08:00
Brandon Casey
0f4dc14ac4 sha1_file.c: split has_loose_object() into local and non-local counterparts
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 10:29:22 -08:00
Brandon Casey
8d25931d6f packed_git: convert pack_local flag into a bitfield and add pack_keep
pack_keep will be set when a pack file has an associated .keep file.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 10:28:08 -08:00
Nicolas Pitre
08698b1e32 make find_pack_revindex() aware of the nasty world
It currently calls die() whenever given offset is not found thinking
that such thing should never happen.  But this offset may come from a
corrupted pack whych _could_ happen and not be found.  Callers should
deal with this possibility gracefully instead.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:35 -08:00
Nicolas Pitre
3d77d8774f make packed_object_info() resilient to pack corruptions
In the same spirit as commit 8eca0b47ff, let's try to survive a pack
corruption by making packed_object_info() able to fall back to alternate
packs or loose objects.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:35 -08:00
Nicolas Pitre
09ded04b7e make unpack_object_header() non fatal
It is possible to have pack corruption in the object header.  Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.

So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context.  All
callers of unpack_object_header() are ready for it.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:34 -08:00
Nicolas Pitre
d8f325563d better validation on delta base object offsets
In one case, it was possible to have a bad offset equal to 0 effectively
pointing a delta onto itself and crashing git after too many recursions.
In the other cases, a negative offset could result due to off_t being
signed.  Catch those.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:34 -08:00
Nicolas Pitre
0e8189e270 close another possibility for propagating pack corruption
Abstract
--------

With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking.  This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.

The Context
-----------

The Git database is all checksummed with SHA1 hashes.  Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data.  However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.

First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data.  Any
disk corruption would be caught already in this case.

Then, packed objects are also compressed with zlib but only for their
actual payload.  The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions.  Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.

Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case,  the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero.  In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms.  Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.

Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.

The Solution
------------

As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either.  Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.

So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result.  The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.

Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed.  Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.

What about the streamed pack case?  Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already.  It is therefore worthless doing local validation
in that case.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-02 15:22:15 -08:00
Junio C Hamano
581000a419 Merge branch 'jc/maint-co-track' into maint
* jc/maint-co-track:
  Enhance hold_lock_file_for_{update,append}() API
  demonstrate breakage of detached checkout with symbolic link HEAD
  Fix "checkout --track -b newbranch" on detached HEAD
2008-11-02 13:36:14 -08:00
Junio C Hamano
a157400c97 Merge branch 'jc/maint-co-track'
* jc/maint-co-track:
  Enhance hold_lock_file_for_{update,append}() API
  demonstrate breakage of detached checkout with symbolic link HEAD
  Fix "checkout --track -b newbranch" on detached HEAD

Conflicts:
	builtin-commit.c
2008-10-21 17:58:11 -07:00
Junio C Hamano
acd3b9eca8 Enhance hold_lock_file_for_{update,append}() API
This changes the "die_on_error" boolean parameter to a mere "flags", and
changes the existing callers of hold_lock_file_for_update/append()
functions to pass LOCK_DIE_ON_ERROR.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-10-19 12:35:37 -07:00
Junio C Hamano
58e0fa5416 Merge branch 'maint'
* maint:
  Hopefully the final draft release notes update before 1.6.0.3
  diff(1): clarify what "T"ypechange status means
  contrib: update packinfo.pl to not use dashed commands
  force_object_loose: Fix memory leak
  tests: shell negation portability fix
2008-10-18 08:26:44 -07:00