Commits, trees and tags have structure. Don't let users feed git
with malformed ones. Sooner or later git will die() when
encountering them.
Note that this patch does not check semantics. A tree that points
to non-existent objects is perfectly OK (and should be so, users
may choose to add commit first, then its associated tree for example).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The errno check added in commit 3ba7a06 "A loose object is not corrupt
if it cannot be read due to EMFILE" only checked for whether errno is
not ENOENT and thus incorrectly treated "no error" as an error
condition.
Because of that, it never reached the code path that would report that
the object is corrupted and instead caused funny errors like:
fatal: failed to read object 333c4768ce595793fdab1ef3a036413e2a883853: Success
So we have to extend the check to cover the case in which the object
file was successfully read, but its contents are corrupted.
Reported-by: Will Palmer <wmpalmer@gmail.com>
Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jn/thinner-wrapper:
Remove pack file handling dependency from wrapper.o
pack-objects: mark file-local variable static
wrapper: give zlib wrappers their own translation unit
strbuf: move strbuf_branchname to sha1_name.c
path helpers: move git_mkstemp* to wrapper.c
wrapper: move odb_* to environment.c
wrapper: move xmmap() to sha1_file.c
As v1.7.0-rc0~43 (slim down "git show-index", 2010-01-21) explains,
use of xmalloc() brings in a dependency on zlib, the sha1 lib, and the
rest of git's object file access machinery via try_to_free_pack_memory.
That is overkill when xmalloc is just being used as a convenience
wrapper to exit when no memory is available.
So defer setting try_to_free_pack_memory as try_to_free_routine until
the first packfile is opened in add_packed_git().
After this change, a simple program using xmalloc() and no other
functions will not pull in any code from libgit.a aside from wrapper.o
and usage.o.
Improved-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
wrapper.o depends on sha1_file.o for a number of reasons. One is
release_pack_memory().
xmmap function calls mmap, discarding unused pack windows when
necessary to relieve memory pressure. Simple git programs using
wrapper.o as a friendly libc do not need this functionality.
So move xmmap to sha1_file.o, where release_pack_memory() is.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When opening any files in the object database, release unused pack
windows if the open(2) syscall fails due to EMFILE (too many open
files in this process). This allows Git to degrade gracefully on
a repository with thousands of pack files, and a commit stored in
a loose object in the middle of the history.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This utility function avoids an unnecessary update of the access time
for a loose object file. Just as the atime isn't useful on a loose
object, its not useful on the pack or the corresonding idx file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck" bails out with a claim that a loose object that cannot be
read but exists on the filesystem to be corrupt, which is wrong when
read_object() failed due to e.g. EMFILE.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clarify the error reporting logic by moving the normal codepath (i.e. we
read the object we wanted to read correctly) up and return early.
The logic to report the name of the packfile with a corrupt object,
introduced by e8b15e6 (sha1_file: Show the the type and path to corrupt
objects, 2010-06-10), was totally bogus. The function that knows which
bad object came from what packfile is has_packed_and_bad(); make it report
which packfile the problem was found.
"Corrupt" is already an adjective, e.g. an object is "corrupt"; we do not
have to say "corrupted object".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the error message that's displayed when we encounter corrupt
objects to be more specific. We now print the type (loose or packed)
of corrupted objects, along with the full path to the file in
question.
Before:
$ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
fatal: object 909ef997367880aaf2133bafa1f1a71aa28e09df is corrupted
After:
$ git cat-file blob 909ef997367880aaf2133bafa1f1a71aa28e09df
fatal: loose object 909ef997367880aaf2133bafa1f1a71aa28e09df (stored in .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df) is corrupted
Knowing the path helps to quickly analyze what's wrong:
$ file .git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df
.git/objects/90/9ef997367880aaf2133bafa1f1a71aa28e09df: empty
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function takes a sha1 and produces a loose object
filename. It caches the location of the object directory so
that it can fill the sha1 information directly without
allocating a new buffer (and in its original incarnation,
without calling getenv(), though these days we cache that
with the code in environment.c).
This cached base directory can become stale, however, if in
a single process git changes the location of the object
directory (e.g., by running setup_work_tree, which will
chdir to the new worktree).
In most cases this isn't a problem, because we tend to set
up the git repository location and do any chdir()s before
actually looking up any objects, so the first lookup will
cache the correct location. In the case of reset --hard,
however, we do something like:
1. look up the commit object
2. notice we are doing --hard, run setup_work_tree
3. look up the tree object to reset
Step (3) fails because our cache object directory value is
bogus.
This patch simply removes the caching. We use a static
buffer instead of allocating one each time (the original
version treated the malloc'd buffer as a static, so there is
no change in calling semantics).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-dumb-http-pack-reidx:
http.c::new_http_pack_request: do away with the temp variable filename
http-fetch: Use temporary files for pack-*.idx until verified
http-fetch: Use index-pack rather than verify-pack to check packs
Allow parse_pack_index on temporary files
Extract verify_pack_index for reuse from verify_pack
Introduce close_pack_index to permit replacement
http.c: Remove unnecessary strdup of sha1_to_hex result
http.c: Don't store destination name in request structures
http.c: Drop useless != NULL test in finish_http_pack_request
http.c: Tiny refactoring of finish_http_pack_request
t5550-http-fetch: Use subshell for repository operations
http.c: Remove bad free of static block
* maint:
Documentation/gitdiffcore: fix order in pickaxe description
Documentation: fix minor inconsistency
Documentation: rebase -i ignores options passed to "git am"
hash_object: correction for zero length file
The check whether size is zero was done after if size <= SMALL_FILE_SIZE,
as result, zero size case was never triggered. Instead zero length file
was treated as any other small file. This did not caused any problem, but
if we have a special case for size equal to zero, it is better to make it
work and avoid redundant malloc().
Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The easiest way to verify a pack index is to open it through the
standard parse_pack_index function, permitting the header check
to happen when the file is mapped. However, the dumb HTTP client
needs to verify a pack index before its moved into its proper file
name within the objects/pack directory, to prevent a corrupt index
from being made available. So permit the caller to specify the
exact path of the index file.
For now we're still using the final destination name within the
sole call site in http.c, but eventually we will start to parse
the temporary path instead.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By closing the pack index, a caller can later overwrite the index
with an updated index file, possibly after converting from v1 to
the v2 format. Because p->index_data is NULL after close, on the
next access the index will be opened again and the other members
will be updated with new data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These should take const buffers as input data, but zlib's
next_in pointer is not const-correct. Let's fix it at the
zlib level, though, so the cast happens in one obvious
place. This should be safe, as a similar cast is used in
zlib's example code for a const array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mm/mkstemps-mode-for-packfiles:
Use git_mkstemp_mode instead of plain mkstemp to create object files
git_mkstemps_mode: don't set errno to EINVAL on exit.
Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
Move gitmkstemps to path.c
Add a testcase for ACL with restrictive umask.
* nd/root-git:
Add test for using Git at root of file system
Support working directory located at root
Move offset_1st_component() to path.c
init-db, rev-parse --git-dir: do not append redundant slash
make_absolute_path(): Do not append redundant slash
Conflicts:
setup.c
sha1_file.c
* mm/mkstemps-mode-for-packfiles:
Use git_mkstemp_mode instead of plain mkstemp to create object files
git_mkstemps_mode: don't set errno to EINVAL on exit.
Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
Move gitmkstemps to path.c
Add a testcase for ACL with restrictive umask.
* np/compress-loose-object-memsave:
sha1_file: be paranoid when creating loose objects
sha1_file: don't malloc the whole compressed result when writing out objects
We used to unnecessarily give the read permission to group and others,
regardless of the umask, which isn't serious because the objects are
still protected by their containing directory, but isn't necessary
either.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We don't want the data being deflated and stored into loose objects
to be different from what we expect. While the deflated data is
protected by a CRC which is good enough for safe data retrieval
operations, we still want to be doubly sure that the source data used
at object creation time is still what we expected once that data has
been deflated and its CRC32 computed.
The most plausible data corruption may occur if the source file is
modified while Git is deflating and writing it out in a loose object.
Or Git itself could have a bug causing memory corruption. Or even bad
RAM could cause trouble. So it is best to make sure everything is
coherent and checksum protected from beginning to end.
To do so we compute the SHA1 of the data being deflated _after_ the
deflate operation has consumed that data, and make sure it matches
with the expected SHA1. This way we can rely on the CRC32 checked by
the inflate operation to provide a good indication that the data is still
coherent with its SHA1 hash. One pathological case we ignore is when
the data is modified before (or during) deflate call, but changed back
before it is hashed.
There is some overhead of course. Using 'git add' on a set of large files:
Before:
real 0m25.210s
user 0m23.783s
sys 0m1.408s
After:
real 0m26.537s
user 0m25.175s
sys 0m1.358s
The overhead is around 5% for full data coherency guarantee.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using read() instead of mmap() can be 39% speed up for 1Kb files and is
1% speed up 1Mb files. For larger files, it is better to use mmap(),
because the difference between is not significant, and when there is not
enough memory, mmap() performs much better, because it avoids swapping.
Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no real advantage to malloc the whole output buffer and
deflate the data in a single pass when writing loose objects. That is
like only 1% faster while using more memory, especially with large
files where memory usage is far more. It is best to deflate and write
the data out in small chunks reusing the same memory instead.
For example, using 'git add' on a few large files averaging 40 MB ...
Before:
21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+142640minor)pagefaults 0swaps
After:
21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+104408minor)pagefaults 0swaps
While the runtime stayed relatively the same, the number of minor page
faults went down significantly.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation is also lightly modified to use is_dir_sep()
instead of hardcoding '/'.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[jc: later NUL termination by the caller becomes unnecessary]
Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the documentation says, this is primarily for debugging, and
in the longer term we should rename it to test-show-index or something.
In the meantime, just avoid xmalloc (which slurps in the rest of git), and
separating out the trivial hex functions into "hex.o".
This results in
[torvalds@nehalem git]$ size git-show-index
text data bss dec hex filename
222818 2276 112688 337782 52776 git-show-index (before)
5696 624 1264 7584 1da0 git-show-index (after)
which is a whole lot better.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop in get_size_from_delta() feeds a deflated delta data from the
pack stream _until_ we get inflated result of 20 bytes[*] or we reach the
end of stream.
Side note. This magic number 20 does not have anything to do with the
size of the hash we use, but comes from 1a3b55c (reduce delta head
inflated size, 2006-10-18).
The loop reads like this:
do {
in = use_pack();
stream.next_in = in;
st = git_inflate(&stream, Z_FINISH);
curpos += stream.next_in - in;
} while ((st == Z_OK || st == Z_BUF_ERROR) &&
stream.total_out < sizeof(delta_head));
This git_inflate() can return:
- Z_STREAM_END, if use_pack() fed it enough input and the delta itself
was smaller than 20 bytes;
- Z_OK, when some progress has been made;
- Z_BUF_ERROR, if no progress is possible, because we either ran out of
input (due to corrupt pack), or we ran out of output before we saw the
end of the stream.
The fix b3118bd (sha1_file: Fix infinite loop when pack is corrupted,
2009-10-14) attempted was against a corruption that appears to be a valid
stream that produces a result larger than the output buffer, but we are
not even trying to read the stream to the end in this loop. If avail_out
becomes zero, total_out will be the same as sizeof(delta_head) so the loop
will terminate without the "fix". There is no fix from b3118bd needed for
this loop, in other words.
The loop in unpack_compressed_entry() is quite a different story. It
feeds a deflated stream (either delta or base) and allows the stream to
produce output up to what we expect but no more.
do {
in = use_pack();
stream.next_in = in;
st = git_inflate(&stream, Z_FINISH);
curpos += stream.next_in - in;
} while (st == Z_OK || st == Z_BUF_ERROR)
This _does_ risk falling into an endless interation, as we can exhaust
avail_out if the length we expect is smaller than what the stream wants to
produce (due to pack corruption). In such a case, avail_out will become
zero and inflate() will return Z_BUF_ERROR, while avail_in may (or may
not) be zero.
But this is not a right fix:
do {
in = use_pack();
stream.next_in = in;
st = git_inflate(&stream, Z_FINISH);
+ if (st == Z_BUF_ERROR && (stream.avail_in || !stream.avail_out)
+ break; /* wants more input??? */
curpos += stream.next_in - in;
} while (st == Z_OK || st == Z_BUF_ERROR)
as Z_BUF_ERROR from inflate() may be telling us that avail_in has also run
out before reading the end of stream marker. In such a case, both avail_in
and avail_out would be zero, and the loop should iterate to allow the end
of stream marker to be seen by inflate from the input stream.
The right fix for this loop is likely to be to increment the initial
avail_out by one (we allocate one extra byte to terminate it with NUL
anyway, so there is no risk to overrun the buffer), and break out if we
see that avail_out has become zero, in order to detect that the stream
wants to produce more than what we expect. After the loop, we have a
check that exactly tests this condition:
if ((st != Z_STREAM_END) || stream.total_out != size) {
free(buffer);
return NULL;
}
So here is a patch (without my previous botched attempts) to fix this
issue. The first hunk reverts the corresponding hunk from b3118bd, and
the second hunk is the same fix proposed earlier.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some types of corruption to a pack may confuse the deflate stream
which stores an object. In Andy's reported case a 36 byte region
of the pack was overwritten, leading to what appeared to be a valid
deflate stream that was trying to produce a result larger than our
allocated output buffer could accept.
Z_BUF_ERROR is returned from inflate() if either the input buffer
needs more input bytes, or the output buffer has run out of space.
Previously we only considered the former case, as it meant we needed
to move the stream's input buffer to the next window in the pack.
We now abort the loop if inflate() returns Z_BUF_ERROR without
consuming the entire input buffer it was given, or has filled
the entire output buffer but has not yet returned Z_STREAM_END.
Either state is a clear indicator that this loop is not working
as expected, and should not continue.
This problem cannot occur with loose objects as we open the entire
loose object as a single buffer and treat Z_BUF_ERROR as an error.
Reported-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cc/replace:
t6050: check pushing something based on a replaced commit
Documentation: add documentation for "git replace"
Add git-replace to .gitignore
builtin-replace: use "usage_msg_opt" to give better error messages
parse-options: add new function "usage_msg_opt"
builtin-replace: teach "git replace" to actually replace
Add new "git replace" command
environment: add global variable to disable replacement
mktag: call "check_sha1_signature" with the replacement sha1
replace_object: add a test case
object: call "check_sha1_signature" with the replacement sha1
sha1_file: add a "read_sha1_file_repl" function
replace_object: add mechanism to replace objects found in "refs/replace/"
refs: add a "for_each_replace_ref" function
* tr/die_errno:
Use die_errno() instead of die() when checking syscalls
Convert existing die(..., strerror(errno)) to die_errno()
die_errno(): double % in strerror() output just in case
Introduce die_errno() that appends strerror(errno) to die()
Change calls to die(..., strerror(errno)) to use the new die_errno().
In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).
This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.
One example of this would be something like
unsigned long size;
unsigned char c;
size += c << 24;
where despite all the variables being unsigned, 'c << 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.
Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.
I may have missed some, but this is the result of looking at
git grep '[^0-9 ][ ]*<<[ ][a-z]' -- '*.c' '*.h'
git grep '<<[ ]*24'
which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).
I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).
In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "& 0x7fUL" in one place (getting the sign extension
right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "& 0x7f".
I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new function will replace "read_sha1_file". This latter function
becoming just a stub to call the former will a NULL "replacement"
argument.
This new function is needed because sometimes we need to use the
replacement sha1.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code implementing this mechanism has been copied more-or-less
from the commit graft code.
This mechanism is used in "read_sha1_file". sha1 passed to this
function that match a ref name in "refs/replace/" are replaced by
the sha1 that has been read in the ref.
We "die" if the replacement recursion depth is too high or if we
can't read the replacement object.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/unlink-err:
print unlink(2) errno in copy_or_link_directory
replace direct calls to unlink(2) with unlink_or_warn
Introduce an unlink(2) wrapper which gives warning if unlink failed
* maint:
grep: fix word-regexp colouring
completion: use git rev-parse to detect bare repos
Cope better with a _lot_ of packs
for-each-ref: fix segfault in copy_email
You might end up with a situation where you have tons of pack files, e.g.
when using hg2git. In this situation, all kinds of operations may
end up with a "too many files open" error. Let's recover gracefully from
that.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Looks-right-to-me-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/unlink-err:
print unlink(2) errno in copy_or_link_directory
replace direct calls to unlink(2) with unlink_or_warn
Introduce an unlink(2) wrapper which gives warning if unlink failed
Essentially; s/type* /type */ as per the coding guidelines.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps to notice when something's going wrong, especially on
systems which lock open files.
I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
called from such a function
- it is in a static function, returning void and the function is only
called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Unreliable hardlinks" is a misleading description for what is happening.
So rename it to something less misleading.
Suggested by Linus Torvalds.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It seems that accessing NTFS partitions with ufsd (at least on my EeePC)
has an unnerving bug: if you link() a file and unlink() it right away,
the target of the link() will have the correct size, but consist of NULs.
It seems as if the calls are simply not serialized correctly, as single-stepping
through the function move_temp_to_file() works flawlessly.
As ufsd is "Commertial software" (sic!), I cannot fix it, and have to work
around it in Git.
At the same time, it seems that this fixes msysGit issues 222 and 229 to
assume that Windows cannot handle link() && unlink().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/shared-literally:
t1301: loosen test for forced modes
set_shared_perm(): sometimes we know what the final mode bits should look like
move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath
Move chmod(foo, 0444) into move_temp_to_file()
"core.sharedrepository = 0mode" should set, not loosen
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
Conflicts:
t/t7700-repack.sh
adjust_shared_perm() first obtains the mode bits from lstat(2), expecting
to find what the result of applying user's umask is, and then tweaks it
as necessary. When the file to be adjusted is created with mkstemp(3),
however, the mode thusly obtained does not have anything to do with user's
umask, and we would need to start from 0444 in such a case and there is no
point running lstat(2) for such a path.
This introduces a new API set_shared_perm() to bypass the lstat(2) and
instead force setting the mode bits to the desired value directly.
adjust_shared_perm() becomes a thin wrapper to the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now move_temp_to_file() is responsible for doing everything that is
necessary to turn a tempfile in $GIT_DIR into its final form, it must make
sure "Coda hack" codepath correctly makes the file read-only.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When writing out a loose object or a pack (index), move_temp_to_file() is
called to finalize the resulting file. These files (loose files and packs)
should all have permission mode 0444 (modulo adjust_shared_perm()).
Therefore, instead of doing chmod(foo, 0444) explicitly from each callsite
(or even forgetting to chmod() at all), do the chmod() call from within
move_temp_to_file().
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes the behaviour of octal notation to how it is defined in the
documentation, while keeping the traditional "loosen only" semantics
intact for "group" and "everybody".
Three main points of this patch are:
- For an explicit octal notation, the internal shared_repository variable
is set to a negative value, so that we can tell "group" (which is to
"OR" in 0660) and 0660 (which is to "SET" to 0660);
- git-init did not set shared_repository variable early enough to affect
the initial creation of many files, notably copied templates and the
configuration. We set it very early when a command-line option
specifies a custom value.
- Many codepaths create files inside $GIT_DIR by various ways that all
involve mkstemp(), and then call move_temp_to_file() to rename it to
its final destination. We can add adjust_shared_perm() call here; for
the traditional "loosen-only", this would be a no-op for many codepaths
because the mode is already loose enough, but with the new behaviour it
makes a difference.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Increase the size of the die/warning buffer to avoid truncation
close_sha1_file(): make it easier to diagnose errors
avoid possible overflow in delta size filtering computation
A bug report with "unable to write sha1 file" made us realize that we do
not have enough information to guess why close() is failing.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack. It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.
Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.
This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.
The new --kept-pack-only option means just that. When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack(). The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Its "ignore_packed" parameter always comes from struct rev_info. This
patch makes the function take a pointer to the surrounding structure, so
that the refactoring in the next patch becomes easier to review.
There is an unfortunate header file dependency and the easiest workaround
is to temporarily move the function declaration from cache.h to
revision.h; this will be moved back to cache.h once the function loses
this "ignore_packed" parameter altogether in the later part of the
series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of this function except only one pass NULL to its last
parameter, ignore_packed.
Introduce has_sha1_kept_pack() function that has the function signature
and the semantics of this function, and convert the sole caller that does
not pass NULL to call this new function.
All other callers and has_sha1_pack() lose the ignore_packed parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Prepare for 1.6.1.4.
Make repack less likely to corrupt repository
fast-export: ensure we traverse commits in topological order
Clear the delta base cache if a pack is rebuilt
Conflicts:
RelNotes
* maint-1.6.0:
Make repack less likely to corrupt repository
fast-export: ensure we traverse commits in topological order
Clear the delta base cache if a pack is rebuilt
There is some risk that re-opening a regenerated pack file with
different offsets could leave stale entries within the delta base
cache that could be matched up against other objects using the same
"struct packed_git*" and pack offset.
Throwing away the entire delta base cache in this case is safer,
as we don't have to worry about a recycled "struct packed_git*"
matching to the wrong base object, resulting in delta apply
errors while unpacking an object.
Suggested-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.
Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of the content we are adding may be larger than
2.1G (i.e., "git add gigantic-file"). Most of the code-path
to do so uses size_t or unsigned long to record the size,
but write_loose_object uses a signed int.
On platforms where "int" is 32-bits (which includes x86_64
Linux platforms), we end up passing malloc a negative size.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is only used from "sha1_file.c".
And as we want to add a "replace_object" hook in "read_sha1_file",
we must not let people bypass the hook using something other than
"read_sha1_file".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.
This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting. Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes us able to properly index symlinks even on filesystems where
st_size doesn't match the true size of the link.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.
This should fix t5303 on Windows.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/maint-keep-pack:
repack: only unpack-unreachable if we are deleting redundant packs
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
* maint:
sha1_file.c: resolve confusion EACCES vs EPERM
sha1_file: avoid bogus "file exists" error message
git checkout: don't warn about unborn branch if -f is already passed
bash: offer refs instead of filenames for 'git revert'
bash: remove dashed command leftovers
git-p4: fix keyword-expansion regex
fast-export: use an unsorted string list for extra_refs
Add new testcase to show fast-export does not always exports all tags
An earlier commit 916d081 (Nicer error messages in case saving an object
to db goes wrong, 2006-11-09) confused EACCES with EPERM, the latter of
which is an unlikely error from mkstemp().
Signed-off-by: Sam Vilain <sam@vilain.net>
This avoids the following misleading error message:
error: unable to create temporary sha1 filename ./objects/15: File exists
mkstemp can fail for many reasons, one of which, ENOENT, can occur if
the directory for the temp file doesn't exist. create_tmpfile tried to
handle this case by always trying to mkdir the directory, even if it
already existed. This caused errno to be clobbered, so one cannot tell
why mkstemp really failed, and it truncated the buffer to just the
directory name, resulting in the strange error message shown above.
Note that in both occasions that I've seen this failure, it has not been
due to a missing directory, or bad permissions, but some other, unknown
mkstemp failure mode that did not occur when I ran git again. This code
could perhaps be made more robust by retrying mkstemp, in case it was a
transient failure.
Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This avoids the following misleading error message:
error: unable to create temporary sha1 filename ./objects/15: File exists
mkstemp can fail for many reasons, one of which, ENOENT, can occur if
the directory for the temp file doesn't exist. create_tmpfile tried to
handle this case by always trying to mkdir the directory, even if it
already existed. This caused errno to be clobbered, so one cannot tell
why mkstemp really failed, and it truncated the buffer to just the
directory name, resulting in the strange error message shown above.
Note that in both occasions that I've seen this failure, it has not been
due to a missing directory, or bad permissions, but some other, unknown
mkstemp failure mode that did not occur when I ran git again. This code
could perhaps be made more robust by retrying mkstemp, in case it was a
transient failure.
Signed-off-by: Joey Hess <joey@kitenet.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the case of bad packed object CRC, unuse_pack wasn't called after
check_pack_crc which calls use_pack.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* np/pack-safer:
t5303: fix printf format string for portability
t5303: work around printf breakage in dash
pack-objects: don't leak pack window reference when splitting packs
extend test coverage for latest pack corruption resilience improvements
pack-objects: allow "fixing" a corrupted pack without a full repack
make find_pack_revindex() aware of the nasty world
make check_object() resilient to pack corruptions
make packed_object_info() resilient to pack corruptions
make unpack_object_header() non fatal
better validation on delta base object offsets
close another possibility for propagating pack corruption
* bc/maint-keep-pack:
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
This can potentially be used in a few places, so let's make
it available to all parts of the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack_keep will be set when a pack file has an associated .keep file.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It currently calls die() whenever given offset is not found thinking
that such thing should never happen. But this offset may come from a
corrupted pack whych _could_ happen and not be found. Callers should
deal with this possibility gracefully instead.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the same spirit as commit 8eca0b47ff, let's try to survive a pack
corruption by making packed_object_info() able to fall back to alternate
packs or loose objects.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is possible to have pack corruption in the object header. Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.
So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context. All
callers of unpack_object_header() are ready for it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>