Commit Graph

56 Commits

Author SHA1 Message Date
Junio C Hamano
cf0b1793ea Merge branch 'sb/object-store'
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.

Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.

* sb/object-store: (27 commits)
  sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
  sha1_file: allow map_sha1_file to handle arbitrary repositories
  sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
  sha1_file: allow open_sha1_file to handle arbitrary repositories
  sha1_file: allow stat_sha1_file to handle arbitrary repositories
  sha1_file: allow sha1_file_name to handle arbitrary repositories
  sha1_file: add repository argument to sha1_loose_object_info
  sha1_file: add repository argument to map_sha1_file
  sha1_file: add repository argument to map_sha1_file_1
  sha1_file: add repository argument to open_sha1_file
  sha1_file: add repository argument to stat_sha1_file
  sha1_file: add repository argument to sha1_file_name
  sha1_file: allow prepare_alt_odb to handle arbitrary repositories
  sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
  sha1_file: add repository argument to prepare_alt_odb
  sha1_file: add repository argument to link_alt_odb_entries
  sha1_file: add repository argument to read_info_alternates
  sha1_file: add repository argument to link_alt_odb_entry
  sha1_file: add raw_object_store argument to alt_odb_usable
  pack: move approximate object count to object store
  ...
2018-04-11 13:09:55 +09:00
Stefan Beller
a80d72db2a object-store: move packed_git and packed_git_mru to object store
In a process with multiple repositories open, packfile accessors
should be associated to a single repository and not shared globally.
Move packed_git and packed_git_mru into the_repository and adjust
callers to reflect this.

[nd: while at there, wrap access to these two fields in get_packed_git()
and get_packed_git_mru(). This allows us to lazily initialize these
fields without caller doing that explicitly]

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:05:46 -07:00
brian m. carlson
17e65451e3 sha1_file: convert check_sha1_signature to struct object_id
Convert this function to take a pointer to struct object_id and rename
it check_object_signature.  Introduce temporaries to convert the return
values of lookup_replace_object and lookup_replace_object_extended into
struct object_id.

The temporaries are needed because in order to convert
lookup_replace_object, open_istream needs to be converted, and
open_istream needs check_sha1_signature to be converted, causing a loop
of dependencies.  The temporaries will be removed in a future patch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
Junio C Hamano
169c9c0169 Merge branch 'bw/c-plus-plus'
Avoid using identifiers that clash with C++ keywords.  Even though
it is not a goal to compile Git with C++ compilers, changes like
this help use of code analysis tools that targets C++ on our
codebase.

* bw/c-plus-plus: (37 commits)
  replace: rename 'new' variables
  trailer: rename 'template' variables
  tempfile: rename 'template' variables
  wrapper: rename 'template' variables
  environment: rename 'namespace' variables
  diff: rename 'template' variables
  environment: rename 'template' variables
  init-db: rename 'template' variables
  unpack-trees: rename 'new' variables
  trailer: rename 'new' variables
  submodule: rename 'new' variables
  split-index: rename 'new' variables
  remote: rename 'new' variables
  ref-filter: rename 'new' variables
  read-cache: rename 'new' variables
  line-log: rename 'new' variables
  imap-send: rename 'new' variables
  http: rename 'new' variables
  entry: rename 'new' variables
  diffcore-delta: rename 'new' variables
  ...
2018-03-06 14:54:07 -08:00
Brandon Williams
debca9d2fe object: rename function 'typename' to 'type_name'
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14 13:10:05 -08:00
brian m. carlson
ccc12e0676 pack-check: convert various uses of SHA-1 to abstract forms
Convert various explicit calls to use SHA-1 functions and constants to
references to the_hash_algo.  Make several strings more generic with
respect to the hash algorithm used.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
Jonathan Tan
0317f45576 pack: move open_pack_index(), parse_pack_index()
alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
brian m. carlson
9fd750461b Convert the verify_pack callback to struct object_id
Make the verify_pack_callback take a pointer to struct object_id.
Change the pack checksum to use GIT_MAX_RAWSZ, even though it is not
strictly an object ID.  Doing so ensures resilience against future hash
size changes, and allows us to remove hard-coded assumptions about how
big the buffer needs to be.

Also, use a union to convert the pointer from nth_packed_object_sha1 to
to a pointer to struct object_id.  This behavior is compatible with GCC
and clang and explicitly sanctioned by C11.  The alternatives are to
just perform a cast, which would run afoul of strict aliasing rules, but
should just work, and changing the pointer into an instance of struct
object_id and copying the value.  The latter operation could seriously
bloat memory usage on fsck, which already uses a lot of memory on some
repositories.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
Junio C Hamano
b8688adb12 Merge branch 'rs/qsort'
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of
the time third parameter is redundant.  A new QSORT() macro lets us
omit it.

* rs/qsort:
  show-branch: use QSORT
  use QSORT, part 2
  coccicheck: use --all-includes by default
  remove unnecessary check before QSORT
  use QSORT
  add QSORT
2016-10-10 14:03:46 -07:00
Junio C Hamano
5c2c2d4aee Merge branch 'jk/verify-packfile-gently'
A low-level function verify_packfile() was meant to show errors
that were detected without dying itself, but under some conditions
it didn't and died instead, which has been fixed.

* jk/verify-packfile-gently:
  verify_packfile: check pack validity before accessing data
2016-09-29 16:57:13 -07:00
René Scharfe
9ed0d8d6e6 use QSORT
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT.  The resulting code is
shorter and supports empty arrays with NULL pointers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29 15:42:18 -07:00
Jeff King
a9445d859e verify_packfile: check pack validity before accessing data
The verify_packfile() does not explicitly open the packfile;
instead, it starts with a sha1 checksum over the whole pack,
and relies on use_pack() to open the packfile as a side
effect.

If the pack cannot be opened for whatever reason (either
because its header information is corrupted, or perhaps
because a simultaneous repack deleted it), then use_pack()
will die(), as it has no way to return an error. This is not
ideal, as verify_packfile() otherwise tries to gently return
an error (this lets programs like git-fsck go on to check
other packs).

Instead, let's check is_pack_valid() up front, and return an
error if it fails. This will open the pack as a side effect,
and then use_pack() will later rely on our cached
descriptor, and avoid calling die().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-22 11:18:13 -07:00
Nguyễn Thái Ngọc Duy
ec9d224903 fsck: use streaming interface for large blobs in pack
For blobs, we want to make sure the on-disk data is not corrupted
(i.e. can be inflated and produce the expected SHA-1). Blob content is
opaque, there's nothing else inside to check for.

For really large blobs, we may want to avoid unpacking the entire blob
in memory, just to check whether it produces the same SHA-1. On 32-bit
systems, we may not have enough virtual address space for such memory
allocation. And even on 64-bit where it's not a problem, allocating a
lot more memory could result in kicking other parts of systems to swap
file, generating lots of I/O and slowing everything down.

For this particular operation, not unpacking the blob and letting
check_sha1_signature, which supports streaming interface, do the job
is sufficient. check_sha1_signature() is not shown in the diff,
unfortunately. But if will be called when "data_valid && !data" is
false.

We will call the callback function "fn" with NULL as "data". The only
callback of this function is fsck_obj_buffer(), which does not touch
"data" at all if it's a blob.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13 09:15:29 -07:00
Jeff King
b32fa95fd8 convert trivial cases to ALLOC_ARRAY
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:

  1. It automatically checks the array-size multiplication
     for overflow.

  2. It always uses sizeof(*array) for the element-size,
     so that it can never go out of sync with the declared
     type of the array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
David Turner
8c24d832cd verify_pack: do not ignore return value of verification function
In verify_pack, a caller-supplied verification function is called.
The function returns an int.  If that return value is non-zero,
verify_pack should fail.

The only caller of verify_pack is in builtin/fsck.c, whose verify_fn
returns a meaningful error code (which was then ignored).  Now, fsck
might return a different error code (with more detail).  This would
happen in the unlikely event that a commit or tree that is a valid git
object but not a valid instance of its type gets into a pack.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Jeff King <peff@peff.net>
2015-12-01 18:19:35 -05:00
Nguyễn Thái Ngọc Duy
1e49f22f07 fsck: print progress
fsck is usually a long process and it would be nice if it prints
progress from time to time.

Progress meter is not printed when --verbose is given because
--verbose prints a lot, there's no need for "alive" indicator.
Progress meter may provide "% complete" information but it would
be lost anyway in the flood of text.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-06 20:31:28 -08:00
Nguyễn Thái Ngọc Duy
c9486eb04d fsck: avoid reading every object twice
During verify_pack() all objects are read for SHA-1 check. Then
fsck_sha1() is called on every object, which read the object again
(fsck_sha1 -> parse_object -> read_sha1_file).

Avoid reading an object twice, do fsck_sha1 while we have an object
uncompressed data in verify_pack.

On git.git, with this patch I got:

$ /usr/bin/time ./git fsck >/dev/null
98.97user 0.90system 1:40.01elapsed 99%CPU (0avgtext+0avgdata 616624maxresident)k
0inputs+0outputs (0major+194186minor)pagefaults 0swaps

Without it:

$ /usr/bin/time ./git fsck >/dev/null
231.23user 2.35system 3:53.82elapsed 99%CPU (0avgtext+0avgdata 636688maxresident)k
0inputs+0outputs (0major+461629minor)pagefaults 0swaps

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-06 20:31:28 -08:00
Nguyễn Thái Ngọc Duy
473935188c verify_packfile(): check as many object as possible in a pack
verify_packfile() checks for whole pack integerity first, then each
object individually. Once we get past whole pack check, we can
identify all objects in the pack. If there's an error with one object,
we should continue to check the next objects to salvage as many
objects as possible instead of stopping the process.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-06 20:31:28 -08:00
Junio C Hamano
ef49a7a012 zlib: zlib can only process 4GB at a time
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.

But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.

In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.

Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-10 11:52:15 -07:00
Stephen Boyd
1e4cd68c00 sparse: Fix errors and silence warnings
* load_file() returns a void pointer but is using 0 for the return
   value

 * builtin/receive-pack.c forgot to include builtin.h

 * packet_trace_prefix can be marked static

 * ll_merge takes a pointer for its last argument, not an int

 * crc32 expects a pointer as the second argument but Z_NULL is defined
   to be 0 (see 38f4d13 sparse fix: Using plain integer as NULL pointer,
   2006-11-18 for more info)

Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-03 10:14:53 -07:00
Jonathan Nieder
9cba13ca5d standardize brace placement in struct definitions
In a struct definitions, unlike functions, the prevailing style is for
the opening brace to go on the same line as the struct name, like so:

 struct foo {
	int bar;
	char *baz;
 };

Indeed, grepping for 'struct [a-z_]* {$' yields about 5 times as many
matches as 'struct [a-z_]*$'.

Linus sayeth:

 Heretic people all over the world have claimed that this inconsistency
 is ...  well ...  inconsistent, but all right-thinking people know that
 (a) K&R are _right_ and (b) K&R are right.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-16 12:49:02 -07:00
Ralf Wildenhues
22e5e58a3c Typos in code comments, an error message, documentation
Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-22 13:25:08 -07:00
Shawn O. Pearce
9b0aa72870 Extract verify_pack_index for reuse from verify_pack
The dumb HTTP transport should verify an index is completely valid
before trying to use it.  That requires checking the header/footer
but also checking the complete content SHA-1.  All of this logic is
already in the front half of verify_pack, so pull it out into a new
function that can be reused.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 17:56:13 -07:00
Mike Hommey
4277c6709d Don't expect verify_pack() callers to set pack_size
Since use_pack() will end up populating pack_size if it is not already set,
we can just adapt the code in verify_packfile() such that it doesn't require
pack_size to be set beforehand.

This allows callers not to have to set pack_size themselves, and we can thus
revert changes from 1c23d794 (Don't die in git-http-fetch when fetching packs).

Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-06 10:56:27 -07:00
Nicolas Pitre
9126f0091f fix openssl headers conflicting with custom SHA1 implementations
On ARM I have the following compilation errors:

    CC fast-import.o
In file included from cache.h:8,
                 from builtin.h:6,
                 from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1

This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version.  Compilation of git is probably broken on PPC too for the
same reason.

Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c.  But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.

As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-02 18:06:56 -07:00
Nicolas Pitre
c41a4a9468 verify-pack: check packed object CRC when using index version 2
To do so, check_pack_crc() moved from builtin-pack-objects.c to
pack-check.c where it is more logical to share.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Nicolas Pitre
77d3ecee85 move show_pack_info() where it belongs
This is called when verify_pack() has its verbose argument set, and
verbose in this context makes sense only for the actual 'git verify-pack'
command.  Therefore let's move show_pack_info() to builtin-verify-pack.c
instead and remove useless verbose argument from verify_pack().

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Nicolas Pitre
99093238bb optimize verify-pack a bit
Using find_pack_entry_one() to get object offsets is rather suboptimal
when nth_packed_object_offset() can be used directly.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-24 23:58:57 -07:00
Nicolas Pitre
1f5c74f6cf call init_pack_revindex() lazily
This makes life much easier for next patch, as well as being more efficient
when the revindex is actually not used.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-23 21:25:20 -07:00
Nicolas Pitre
6241360498 make verify-pack a bit more useful with bad packs
When a pack gets corrupted, its SHA1 checksum will fail.  However, this
is more useful to let the test go on in order to find the actual
problem location than only complain about the SHA1 mismatch and
bail out.

Also, it is more useful to compare the stored pack SHA1 with the one in
the index file instead of the computed SHA1 since the computed SHA1
from a corrupted pack won't match the one stored in the index either.

Finally a few code and message cleanups were thrown in as a bonus.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-01 23:25:38 -07:00
Nicolas Pitre
5f4347bba3 add storage size output to 'git verify-pack -v'
This can possibly break external scripts that depend on the previous
output, but those script can't possibly be critical to Git usage, and
fixing them should be trivial.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-01 01:44:46 -08:00
Nicolas Pitre
70f5d5d31c fix unimplemented packed_object_info_detail() features
Since commit eb32d236df, there was a TODO
comment in packed_object_info_detail() about the SHA1 of base object to
OBJ_OFS_DELTA objects.  So here it is at last.

While at it, providing the actual storage size information as well is now
trivial.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-03-01 01:44:46 -08:00
Alexandre Julliard
3af51928ab pack-check: Sort entries by pack offset before unpacking them.
Because of the way objects are sorted in a pack, unpacking them in
disk order is much more efficient than random access. Tests on the
Wine repository show a gain in pack validation time of about 35%.

Signed-off-by: Alexandre Julliard <julliard@winehq.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-06 16:04:02 -07:00
Shawn O. Pearce
d079837eee Lazily open pack index files on demand
In some repository configurations the user may have many packfiles,
but all of the recent commits/trees/tags/blobs are likely to
be in the most recent packfile (the one with the newest mtime).
It is therefore common to be able to complete an entire operation
by accessing only one packfile, even if there are 25 packfiles
available to the repository.

Rather than opening and mmaping the corresponding .idx file for
every pack found, we now only open and map the .idx when we suspect
there might be an object of interest in there.

Of course we cannot known in advance which packfile contains an
object, so we still need to scan the entire packed_git list to
locate anything.  But odds are users want to access objects in the
most recently created packfiles first, and that may be all they
ever need for the current operation.

Junio observed in b867092f that placing recent packfiles before
older ones can slightly improve access times for recent objects,
without degrading it for historical object access.

This change improves upon Junio's observations by trying even harder
to avoid the .idx files that we won't need.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-26 20:28:08 -07:00
Nicolas Pitre
ddcf786fd7 fixes to output of git-verify-pack -v
Now that the default delta depth is 50, it is a good idea to also bump
MAX_CHAIN to 50.

While at it, make the display a bit prettier by making the MAX_CHAIN
limit inclusive, and display the number of deltas that are above that
limit at the end instead of the beginning.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-25 21:42:47 -07:00
Nicolas Pitre
57059091fa get rid of num_packed_objects()
The coming index format change doesn't allow for the number of objects
to be determined from the size of the index file directly.  Instead, Let's
initialize a field in the packed_git structure with the object count when
the index is validated since the count is always known at that point.

While at it let's reorder some struct packed_git fields to avoid padding
due to needed 64-bit alignment for some of them.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-10 12:48:14 -07:00
Nicolas Pitre
d72308e01c clean up and optimize nth_packed_object_sha1() usage
Let's avoid the open coded pack index reference in pack-object and use
nth_packed_object_sha1() instead.  This will help encapsulating index
format differences in one place.

And while at it there is no reason to copy SHA1's over and over while a
direct pointer to it in the index will do just fine.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-05 14:59:47 -07:00
Nicolas Pitre
4287307833 [PATCH] clean up pack index handling a bit
Especially with the new index format to come, it is more appropriate
to encapsulate more into check_packed_git_idx() and assume less of the
index format in struct packed_git.

To that effect, the index_base is renamed to index_data with void * type
so it is not used directly but other pointers initialized with it. This
allows for a couple pointer cast removal, as well as providing a better
generic name to grep for when adding support for new index versions or
formats.

And index_data is declared const too while at it.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-16 21:27:36 -07:00
Shawn O. Pearce
c4001d92be Use off_t when we really mean a file offset.
Not all platforms have declared 'unsigned long' to be a 64 bit value,
but we want to support a 64 bit packfile (or close enough anyway)
in the near future as some projects are getting large enough that
their packed size exceeds 4 GiB.

By using off_t, the POSIX type that is declared to mean an offset
within a file, we support whatever maximum file size the underlying
operating system will handle.  For most modern systems this is up
around 2^60 or higher.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 11:06:25 -08:00
Shawn O. Pearce
326bf39677 Use uint32_t for all packed object counts.
As we permit up to 2^32-1 objects in a single packfile we cannot
use a signed int to represent the object offset within a packfile,
after 2^31-1 objects we will start seeing negative indexes and
error out or compute bad addresses within the mmap'd index.

This is a minor cleanup that does not introduce any significant
logic changes.  It is roach free.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-07 11:02:33 -08:00
Nicolas Pitre
21666f1aae convert object type handling from a string to a number
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value.  One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.

This is an initial step for the removal of the version using a char array
found in object reading code paths.  The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-02-27 01:34:21 -08:00
Shawn O. Pearce
079afb18fe Loop over pack_windows when inflating/accessing data.
When multiple mmaps start getting used for all pack file access it
is not possible to get all data associated with a specific object
in one contiguous memory region.  This limitation prevents simply
passing a single address and length to SHA1_Update or to inflate.

Instead we need to loop until we have processed all data of interest.

As we loop over the data we are always interested in reusing the same
window 'cursor', as the prior window will no longer be of any use
to us.  This allows the use_pack() call to automatically decrement
the use count of the prior window before setting up access for us
to the next window.

Within each loop we need to make use of the available length output
parameter of use_pack() to tell us how many bytes are available in
the current memory region, as we cannot tell otherwise.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-29 11:36:44 -08:00
Shawn O. Pearce
03e79c88aa Replace use_packed_git with window cursors.
Part of the implementation concept of the sliding mmap window for
pack access is to permit multiple windows per pack to be mapped
independently.  Since the inuse_cnt is associated with the mmap and
not with the file, this value is in struct pack_window and needs to
be incremented/decremented for each pack_window accessed by any code.

To faciliate that implementation we need to replace all uses of
use_packed_git() and unuse_packed_git() with a different API that
follows struct pack_window objects rather than struct packed_git.

The way this works is when we need to start accessing a pack for
the first time we should setup a new window 'cursor' by declaring
a local and setting it to NULL:

  struct pack_windows *w_curs = NULL;

To obtain the memory region which contains a specific section of
the pack file we invoke use_pack(), supplying the address of our
current window cursor:

  unsigned int len;
  unsigned char *addr = use_pack(p, &w_curs, offset, &len);

the returned address `addr` will be the first byte at `offset`
within the pack file.  The optional variable len will also be
updated with the number of bytes remaining following the address.

Multiple calls to use_pack() with the same window cursor will
update the window cursor, moving it from one window to another
when necessary.  In this way each window cursor variable maintains
only one struct pack_window inuse at a time.

Finally before exiting the scope which originally declared the window
cursor we must invoke unuse_pack() to unuse the current window (which
may be different from the one that was first obtained from use_pack):

  unuse_pack(&w_curs);

This implementation is still not complete with regards to multiple
windows, as only one window per pack file is supported right now.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-29 11:36:44 -08:00
Shawn O. Pearce
c41ee586dc Refactor packed_git to prepare for sliding mmap windows.
The idea behind the sliding mmap window pack reader implementation
is to have multiple mmap regions active against the same pack file,
thereby allowing the process to mmap in only the active/hot sections
of the pack and reduce overall virtual address space usage.

To implement this we need to refactor the mmap related data
(pack_base, pack_use_cnt) out of struct packed_git and move them
into a new struct pack_window.

We are refactoring the code to support a single struct pack_window
per packfile, thereby emulating the prior behavior of mmap'ing the
entire pack file.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-29 11:36:44 -08:00
Shawn O. Pearce
4d703a1a90 Replace unpack_entry_gently with unpack_entry.
The unpack_entry_gently function currently has only two callers:
the delta base resolution in sha1_file.c and the main loop of
pack-check.c.  Both of these must change to using unpack_entry
directly when we implement sliding window mmap logic, so I'm doing
it earlier to help break down the change set.

This may cause a slight performance decrease for delta base
resolution as well as for pack-check.c's verify_packfile(), as
the pack use counter will be incremented and decremented for every
object that is unpacked.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-29 11:36:44 -08:00
Nicolas Pitre
43057304c0 many cleanups to sha1_file.c
Those cleanups are mainly to set the table for the support of deltas
with base objects referenced by offsets instead of sha1.  This means
that many pack lookup functions are converted to take a pack/offset
tuple instead of a sha1.

This eliminates many struct pack_entry usages since this structure
carried redundent information in many cases, and it increased stack
footprint needlessly for a couple recursively called functions that used
to declare a local copy of it for every recursion loop.

In the process, packed_object_info_detail() has been reorganized as well
so to look much saner and more amenable to deltas with offset support.

Finally the appropriate adjustments have been made to functions that
depend on the above changes.  But there is no functionality changes yet
simply some code refactoring at this point.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-09-23 01:51:33 -07:00
David Rientjes
a89fccd281 Do not use memcmp(sha1_1, sha1_2, 20) with hardcoded length.
Introduces global inline:

	hashcmp(const unsigned char *sha1, const unsigned char *sha2)

Uses memcmp for comparison and returns the result based on the length of
the hash name (a future runtime decision).

Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-17 14:23:53 -07:00
Florian Forster
1d7f171c3a Remove all void-pointer arithmetic.
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in
various ways. Usually the strategy that required the least changes was used.

Signed-off-by: Florian Forster <octo@verplant.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 01:59:46 -07:00
Junio C Hamano
55e1805dff verify-pack: check integrity in a saner order.
Check internal integrity to report corrupt pack or idx, and
then check cross-integrity between idx and pack.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 15:42:17 -07:00
Junio C Hamano
473ab1659b verify-pack -v: show delta-chain histogram.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-05 11:22:01 -08:00