Commit Graph

145 Commits

Author SHA1 Message Date
Junio C Hamano
f4315eed7f Merge branch 'jk/pack-tag-of-tag' into maint
"git pack-objects --include-tag" was taught that when we know that
we are sending an object C, we want a tag B that directly points at
C but also a tag A that points at the tag B.  We used to miss the
intermediate tag B in some cases.

* jk/pack-tag-of-tag:
  pack-objects: walk tag chains for --include-tag
  t5305: simplify packname handling
  t5305: use "git -C"
  t5305: drop "dry-run" of unpack-objects
  t5305: move cleanup into test block
2016-10-03 13:22:13 -07:00
Jeff King
b773ddea2c pack-objects: walk tag chains for --include-tag
When pack-objects is given --include-tag, it peels each tag
ref down to a non-tag object, and if that non-tag object is
going to be packed, we include the tag, too. But what
happens if we have a chain of tags (e.g., tag "A" points to
tag "B", which points to commit "C")?

We'll peel down to "C" and realize that we want to include
tag "A", but we do not ever consider tag "B", leading to a
broken pack (assuming "B" was not otherwise selected).
Instead, we have to walk the whole chain, adding any tags we
find to the pack.

Interestingly, it doesn't seem possible to trigger this
problem with "git fetch", but you can with "git clone
--single-branch". The reason is that we generate the correct
pack when the client explicitly asks for "A" (because we do
a real reachability analysis there), and "fetch" is more
willing to do so. There are basically two cases:

  1. If "C" is already a ref tip, then the client can deduce
     that it needs "A" itself (via find_non_local_tags), and
     will ask for it explicitly rather than relying on the
     include-tag capability. Everything works.

  2. If "C" is not already a ref tip, then we hope for
     include-tag to send us the correct tag. But it doesn't;
     it generates a broken pack. However, the next step is
     to do a follow-up run of find_non_local_tags(),
     followed by fetch_refs() to backfill any tags we
     learned about.

     In the normal case, fetch_refs() calls quickfetch(),
     which does a connectivity check and sees we have no
     new objects to fetch. We just write the refs.

     But for the broken-pack case, the connectivity check
     fails, and quickfetch will follow-up with the remote,
     asking explicitly for each of the ref tips. This picks
     up the missing object in a new pack.

For a regular "git clone", we are similarly OK, because we
explicitly request all of the tag refs, and get a correct
pack. But with "--single-branch", we kick in tag
auto-following via "include-tag", but do _not_ do a
follow-up backfill. We just take whatever the server sent us
via include-tag and write out tag refs for any tag objects
we were sent. So prior to c6807a4 (clone: open a shortcut
for connectivity check, 2013-05-26), we actually claimed the
clone was a success, but the result was silently
corrupted!  Since c6807a4, index-pack's connectivity
check catches this case, and we correctly complain.

The included test directly checks that pack-objects does not
generate a broken pack, but also confirms that "clone
--single-branch" does not hit the bug.

Note that tag chains introduce another interesting question:
if we are packing the tag "B" but not the commit "C", should
"A" be included?

Both before and after this patch, we do not include "A",
because the initial peel_ref() check only knows about the
bottom-most level, "C". To realize that "B" is involved at
all, we would have to switch to an incremental peel, in
which we examine each tagged object, asking if it is being
packed (and including the outer tag if so).

But that runs contrary to the optimizations in peel_ref(),
which avoid accessing the objects at all, in favor of using
the value we pull from packed-refs. It's OK to walk the
whole chain once we know we're going to include the tag (we
have to access it anyway, so the effort is proportional to
the pack we're generating). But for the initial selection,
we have to look at every ref. If we're only packing a few
objects, we'd still have to parse every single referenced
tag object just to confirm that it isn't part of a tag
chain.

This could be addressed if packed-refs stored the complete
tag chain for each peeled ref (in most cases, this would be
the same cost as now, as each "chain" is only a single
link). But given the size of that project, it's out of scope
for this fix (and probably nobody cares enough anyway, as
it's such an obscure situation). This commit limits itself
to just avoiding the creation of a broken pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 11:45:31 -07:00
Junio C Hamano
78849622ec Merge branch 'jk/pack-objects-optim'
"git pack-objects" has a few options that tell it not to pack
objects found in certain packfiles, which require it to scan .idx
files of all available packs.  The codepaths involved in these
operations have been optimized for a common case of not having any
non-local pack and/or any .kept pack.

* jk/pack-objects-optim:
  pack-objects: compute local/ignore_pack_keep early
  pack-objects: break out of want_object loop early
  find_pack_entry: replace last_found_pack with MRU cache
  add generic most-recently-used list
  sha1_file: drop free_pack_by_name
  t/perf: add tests for many-pack scenarios
2016-08-08 14:48:39 -07:00
Junio C Hamano
aa9136a87e Merge branch 'nd/pack-ofs-4gb-limit' into maint
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.

* nd/pack-ofs-4gb-limit:
  fsck: use streaming interface for large blobs in pack
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  index-pack: correct "offset" type in unpack_entry_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "len" type in unpack_data()
  sha1_file.c: use type off_t* for object_info->disk_sizep
  pack-objects: pass length to check_pack_crc() without truncation
2016-08-08 14:21:36 -07:00
Jeff King
56dfeb6263 pack-objects: compute local/ignore_pack_keep early
In want_object_in_pack(), we can exit early from our loop if
neither "local" nor "ignore_pack_keep" are set. If they are,
however, we must examine each pack to see if it has the
object and is non-local or has a ".keep".

It's quite common for there to be no non-local or .keep
packs at all, in which case we know ahead of time that
looking further will be pointless. We can pre-compute this
by simply iterating over the list of packs ahead of time,
and dropping the flags if there are no packs that could
match.

Another similar strategy would be to modify the loop in
want_object_in_pack() to notice that we have already found
the object once, and that we are looping only to check for
"local" and "keep" attributes. If a pack has neither of
those, we can skip the call to find_pack_entry_one(), which
is the expensive part of the loop.

This has two advantages:

  - it isn't all-or-nothing; we still get some improvement
    when there's a small number of kept or non-local packs,
    and a large number of non-kept local packs

  - it eliminates any possible race where we add new
    non-local or kept packs after our initial scan. In
    practice, I don't think this race matters; we already
    cache the packed_git information, so somebody who adds a
    new pack or .keep file after we've started will not be
    noticed at all, unless we happen to need to call
    reprepare_packed_git() because a lookup fails.

    In other words, we're already racy, and the race is not
    a big deal (losing the race means we might include an
    object in the pack that would not otherwise be, which is
    an acceptable outcome).

However, it also has a disadvantage: we still loop over the
rest of the packs for each object to check their flags. This
is much less expensive than doing the object lookup, but
still not free. So if we wanted to implement that strategy
to cover the non-all-or-nothing cases, we could do so in
addition to this one (so you get the most speedup in the
all-or-nothing case, and the best we can do in the other
cases). But given that the all-or-nothing case is likely the
most common, it is probably not worth the trouble, and we
can revisit this later if evidence points otherwise.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:08 -07:00
Jeff King
cd37996795 pack-objects: break out of want_object loop early
When pack-objects collects the list of objects to pack
(either from stdin, or via its internal rev-list), it
filters each one through want_object_in_pack().

This function loops through each existing packfile, looking
for the object. When we find it, we mark the pack/offset
combo for later use. However, we can't just return "yes, we
want it" at that point. If --honor-pack-keep is in effect,
we must keep looking to find it in _all_ packs, to make sure
none of them has a .keep. Likewise, if --local is in effect,
we must make sure it is not present in any non-local pack.

As a result, the sum effort of these calls is effectively
O(nr_objects * nr_packs). In an ordinary repository, we have
only a handful of packs, and this doesn't make a big
difference. But in pathological cases, it can slow the
counting phase to a crawl.

This patch notices the case that we have neither "--local"
nor "--honor-pack-keep" in effect and breaks out of the loop
early, after finding the first instance. Note that our worst
case is still "objects * packs" (i.e., we might find each
object in the last pack we look in), but in practice we will
often break out early. On an "average" repo, my git.git with
8 packs, this shows a modest 2% (a few dozen milliseconds)
improvement in the counting-objects phase of "git
pack-objects --all <foo" (hackily instrumented by sticking
exit(0) right after list_objects).

But in a much more pathological case, it makes a bigger
difference. I ran the same command on a real-world example
with ~9 million objects across 1300 packs. The counting time
dropped from 413s to 45s, an improvement of about 89%.

Note that this patch won't do anything by itself for a
normal "git gc", as it uses both --honor-pack-keep and
--local.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:07 -07:00
Junio C Hamano
ad2d777604 Merge branch 'nd/pack-ofs-4gb-limit'
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.

* nd/pack-ofs-4gb-limit:
  fsck: use streaming interface for large blobs in pack
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  index-pack: correct "offset" type in unpack_entry_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "len" type in unpack_data()
  sha1_file.c: use type off_t* for object_info->disk_sizep
  pack-objects: pass length to check_pack_crc() without truncation
2016-07-28 10:34:42 -07:00
Nguyễn Thái Ngọc Duy
af92a645d3 pack-objects: do not truncate result in-pack object size on 32-bit systems
A typical diff will not show what's going on and you need to see full
functions. The core code is like this, at the end of of write_one()

	e->idx.offset = *offset;
	size = write_object(f, e, *offset);
	if (!size) {
		e->idx.offset = recursing;
		return WRITE_ONE_BREAK;
	}
	written_list[nr_written++] = &e->idx;

	/* make sure off_t is sufficiently large not to wrap */
	if (signed_add_overflows(*offset, size))
		die("pack too large for current definition of off_t");
	*offset += size;

Here we can see that the in-pack object size is returned by
write_object (or indirectly by write_reuse_object). And it's used to
calculate object offsets, which end up in the pack index file,
generated at the end.

If "size" overflows (on 32-bit sytems, unsigned long is 32-bit while
off_t can be 64-bit), we got wrong offsets and produce incorrect .idx
file, which may make it look like the .pack file is corrupted.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13 09:15:17 -07:00
Nguyễn Thái Ngọc Duy
211c61c6cf pack-objects: pass length to check_pack_crc() without truncation
On 32 bit systems with large file support, unsigned long is 32-bit
while the two offsets in the subtraction expression (pack-objects has
the exact same expression as in sha1_file.c but not shown in diff) are
in 64-bit. If an in-pack object is larger than 2^32 len/datalen is
truncated and we get a misleading "error: bad packed object CRC for
..." as a result.

Use off_t for len and datalen. check_pack_crc() already accepts this
argument as off_t and can deal with 4+ GB.

Noticed-by: Christoph Michelbach <michelbach94@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-12 10:14:29 -07:00
Jeff King
e26a8c4721 repack: extend --keep-unreachable to loose objects
If you use "repack -adk" currently, we will pack all objects
that are already packed into the new pack, and then drop the
old packs. However, loose unreachable objects will be left
as-is. In theory these are meant to expire eventually with
"git prune". But if you are using "repack -k", you probably
want to keep things forever and therefore do not run "git
prune" at all. Meaning those loose objects may build up over
time and end up fooling any object-count heuristics (such as
the one done by "gc --auto", though since git-gc does not
support "repack -k", this really applies to whatever custom
scripts people might have driving "repack -k").

With this patch, we instead stuff any loose unreachable
objects into the pack along with the already-packed
unreachable objects. This may seem wasteful, but it is
really no more so than using "repack -k" in the first place.
We are at a slight disadvantage, in that we have no useful
ordering for the result, or names to hand to the delta code.
However, this is again no worse than what "repack -k" is
already doing for the packed objects. The packing of these
objects doesn't matter much because they should not be
accessed frequently (unless they actually _do_ become
referenced, but then they would get moved to a different
part of the packfile during the next repack).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-14 13:57:45 -07:00
Junio C Hamano
40cfc95856 Merge branch 'nd/error-errno'
The code for warning_errno/die_errno has been refactored and a new
error_errno() reporting helper is introduced.

* nd/error-errno: (41 commits)
  wrapper.c: use warning_errno()
  vcs-svn: use error_errno()
  upload-pack.c: use error_errno()
  unpack-trees.c: use error_errno()
  transport-helper.c: use error_errno()
  sha1_file.c: use {error,die,warning}_errno()
  server-info.c: use error_errno()
  sequencer.c: use error_errno()
  run-command.c: use error_errno()
  rerere.c: use error_errno() and warning_errno()
  reachable.c: use error_errno()
  mailmap.c: use error_errno()
  ident.c: use warning_errno()
  http.c: use error_errno() and warning_errno()
  grep.c: use error_errno()
  gpg-interface.c: use error_errno()
  fast-import.c: use error_errno()
  entry.c: use error_errno()
  editor.c: use error_errno()
  diff-no-index.c: use error_errno()
  ...
2016-05-17 14:38:28 -07:00
Junio C Hamano
54c2af5aa3 Merge branch 'ew/doc-split-pack-disables-bitmap'
Doc update.

* ew/doc-split-pack-disables-bitmap:
  pack-objects: warn on split packs disabling bitmaps
2016-05-10 13:40:28 -07:00
Nguyễn Thái Ngọc Duy
54d47394b4 builtin/pack-objects.c: use die_errno() and warning_errno()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09 12:29:08 -07:00
Eric Wong
9cea46cdda pack-objects: warn on split packs disabling bitmaps
It can be tempting for a server admin to want a stable set of
long-lived packs for dumb clients; but also want to enable bitmaps
to serve smart clients more quickly.

Unfortunately, such a configuration is impossible; so at least warn
users of this incompatibility since commit 21134714 (pack-objects:
turn off bitmaps when we split packs, 2014-10-16).

Tested the warning by inspecting the output of:

	make -C t t5310-pack-bitmaps.sh GIT_TEST_OPTS=-v

Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-28 09:58:14 -07:00
brian m. carlson
7d924c9139 struct name_entry: use struct object_id instead of unsigned char sha1[20]
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:23:42 -07:00
Junio C Hamano
11529ecec9 Merge branch 'jk/tighten-alloc'
Update various codepaths to avoid manually-counted malloc().

* jk/tighten-alloc: (22 commits)
  ewah: convert to REALLOC_ARRAY, etc
  convert ewah/bitmap code to use xmalloc
  diff_populate_gitlink: use a strbuf
  transport_anonymize_url: use xstrfmt
  git-compat-util: drop mempcpy compat code
  sequencer: simplify memory allocation of get_message
  test-path-utils: fix normalize_path_copy output buffer size
  fetch-pack: simplify add_sought_entry
  fast-import: simplify allocation in start_packfile
  write_untracked_extension: use FLEX_ALLOC helper
  prepare_{git,shell}_cmd: use argv_array
  use st_add and st_mult for allocation size computation
  convert trivial cases to FLEX_ARRAY macros
  use xmallocz to avoid size arithmetic
  convert trivial cases to ALLOC_ARRAY
  convert manual allocations to argv_array
  argv-array: add detach function
  add helpers for allocating flex-array structs
  harden REALLOC_ARRAY and xcalloc against size_t overflow
  tree-diff: catch integer overflow in combine_diff_path allocation
  ...
2016-02-26 13:37:16 -08:00
Jeff King
b32fa95fd8 convert trivial cases to ALLOC_ARRAY
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:

  1. It automatically checks the array-size multiplication
     for overflow.

  2. It always uses sizeof(*array) for the element-size,
     so that it can never go out of sync with the declared
     type of the array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King
de1e67d070 list-objects: pass full pathname to callbacks
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.

So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.

This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12 12:51:17 -08:00
Jeff King
bd64516aca list-objects: drop name_path entirely
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12 12:51:15 -08:00
brian m. carlson
ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson
f2fd0760f6 Convert struct object to object_id
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson
7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Junio C Hamano
8746e30541 Merge branch 'ah/pack-objects-usage-strings'
Usage string fix.

* ah/pack-objects-usage-strings:
  pack-objects: place angle brackets around placeholders in usage strings
2015-09-01 16:31:12 -07:00
Alex Henrie
b8c1d27577 pack-objects: place angle brackets around placeholders in usage strings
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-28 11:59:10 -07:00
Charles Bailey
2a514ed805 parse-options: move unsigned long option parsing out of pack-objects.c
The unsigned long option parsing (including 'k'/'m'/'g' suffix
parsing) is more widely applicable.  Add support for OPT_MAGNITUDE
to parse-options.h and change pack-objects.c use this support.

The error behavior on parse errors follows that of OPT_INTEGER.  The
name of the option that failed to parse is reported with a brief
message describing the expect format for the option argument and
then the full usage message for the command invoked.

This differs from the previous behavior for OPT_ULONG used in
pack-objects for --max-pack-size and --window-memory which used to
display the value supplied in the error message and did not display
the full usage message.

Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 15:07:21 -07:00
Michael Haggerty
d155254c73 builtin/pack-objects: rewrite to take an object_id argument
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:29 -07:00
Michael Haggerty
2b2a5be394 each_ref_fn: change to take an object_id parameter
Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".

To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().

This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-25 12:19:27 -07:00
Junio C Hamano
cedeffeee0 Merge branch 'jk/sha1-file-reduce-useless-warnings'
* jk/sha1-file-reduce-useless-warnings:
  sha1_file: squelch "packfile cannot be accessed" warnings
2015-05-11 14:23:41 -07:00
Jeff King
319b678a7b sha1_file: squelch "packfile cannot be accessed" warnings
When we find an object in a packfile index, we make sure we
can still open the packfile itself (or that it is already
open), as it might have been deleted by a simultaneous
repack. If we can't access the packfile, we print a warning
for the user and tell the caller that we don't have the
object (we can then look in other packfiles, or find a loose
version, before giving up).

The warning we print to the user isn't really accomplishing
anything, and it is potentially confusing to users. In the
normal case, it is complete noise; we find the object
elsewhere, and the user does not have to care that we racily
saw a packfile index that became stale. It didn't affect the
operation at all.

A possibly more interesting case is when we later can't find
the object, and report failure to the user. In this case the
warning could be considered a clue toward that ultimate
failure. But it's not really a useful clue in practice. We
wouldn't even print it consistently (since we are racing
with another process, we might not even see the .idx file,
or we might win the race and open the packfile, completing
the operation).

This patch drops the warning entirely (not only from the
fill_pack_entry site, but also from an identical use in
pack-objects). If we did find the warning interesting in the
error case, we could stuff it away and reveal it to the user
when we later die() due to the broken object. But that
complexity just isn't worth it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-30 21:47:39 -07:00
Junio C Hamano
6902c4da58 Merge branch 'rs/deflate-init-cleanup'
Code simplification.

* rs/deflate-init-cleanup:
  zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
2015-03-17 16:01:26 -07:00
René Scharfe
9a6f1287fb zlib: initialize git_zstream in git_deflate_init{,_gzip,_raw}
Clear the git_zstream variable at the start of git_deflate_init() etc.
so that callers don't have to do that.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-05 15:46:03 -08:00
brian m. carlson
2dacf26d09 pack-objects: use --objects-edge-aggressive for shallow repos
When fetching into or pushing from a shallow repository, we want to
aggressively mark edges as uninteresting, since this decreases the pack
size.  However, aggressively marking edges can negatively affect
performance on large non-shallow repositories with lots of refs.

Teach pack-objects a --shallow option to indicate that we're pushing
from or fetching into a shallow repository.  Use
--objects-edge-aggressive only for shallow repositories and otherwise
use --objects-edge, which performs better in the general case.  Update
the callers to pass the --shallow option when they are dealing with a
shallow repository.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-29 09:58:25 -08:00
brian m. carlson
1684c1b219 rev-list: add an option to mark fewer edges as uninteresting
In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we marked an increasing number
of edges uninteresting.  This change, and the subsequent change to make
this conditional on --objects-edge, are used by --thin to make much
smaller packs for shallow clones.

Unfortunately, they cause a significant performance regression when
pushing non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs).  Add an option to git rev-list,
--objects-edge-aggressive, that preserves this more aggressive behavior,
while leaving --objects-edge to provide more performant behavior.
Preserve the current behavior for the moment by using the aggressive
option.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-29 09:57:55 -08:00
Junio C Hamano
d70e331c0e Merge branch 'jk/prune-mtime'
Tighten the logic to decide that an unreachable cruft is
sufficiently old by covering corner cases such as an ancient object
becoming reachable and then going unreachable again, in which case
its retention period should be prolonged.

* jk/prune-mtime: (28 commits)
  drop add_object_array_with_mode
  revision: remove definition of unused 'add_object' function
  pack-objects: double-check options before discarding objects
  repack: pack objects mentioned by the index
  pack-objects: use argv_array
  reachable: use revision machinery's --indexed-objects code
  rev-list: add --indexed-objects option
  rev-list: document --reflog option
  t5516: test pushing a tag of an otherwise unreferenced blob
  traverse_commit_list: support pending blobs/trees with paths
  make add_object_array_with_context interface more sane
  write_sha1_file: freshen existing objects
  pack-objects: match prune logic for discarding objects
  pack-objects: refactor unpack-unreachable expiration check
  prune: keep objects reachable from recent objects
  sha1_file: add for_each iterators for loose and packed objects
  count-objects: use for_each_loose_file_in_objdir
  count-objects: do not use xsize_t when counting object size
  prune-packed: use for_each_loose_file_in_objdir
  reachable: mark index blobs as SEEN
  ...
2014-10-29 10:07:56 -07:00
Junio C Hamano
e4da4fbe0e Merge branch 'eb/no-pthreads'
Allow us build with NO_PTHREADS=NoThanks compilation option.

* eb/no-pthreads:
  Handle atexit list internaly for unthreaded builds
  pack-objects: set number of threads before checking and warning
  index-pack: fix compilation with NO_PTHREADS
2014-10-24 14:59:10 -07:00
Junio C Hamano
26a22d8d00 Merge branch 'jk/pack-objects-no-bitmap-when-splitting'
Splitting pack-objects output into multiple packs is incompatible
with the use of reachability bitmap.

* jk/pack-objects-no-bitmap-when-splitting:
  pack-objects: turn off bitmaps when we split packs
2014-10-24 14:56:10 -07:00
Jeff King
2113471478 pack-objects: turn off bitmaps when we split packs
If a pack.packSizeLimit is set, we may split the pack data
across multiple packfiles. This means we cannot generate
.bitmap files, as they require that all of the reachable
objects are in the same pack. We check that condition when
we are generating the list of objects to pack (and disable
bitmaps if we are not packing everything), but we forgot to
update it when we notice that we needed to split (which
doesn't happen until the actual write phase).

The resulting bitmaps are quite bogus (they mention entries
that do not exist in the pack!) and can cause a fetch or
push to send insufficient objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:08:38 -07:00
Jeff King
b1e757f363 pack-objects: double-check options before discarding objects
When we are given an expiration time like
--unpack-unreachable=2.weeks.ago, we avoid writing out old,
unreachable loose objects entirely, under the assumption
that running "prune" would simply delete them immediately
anyway. However, this is only valid if we computed the same
set of reachable objects as prune would.

In practice, this is the case, because only git-repack uses
the --unpack-unreachable option with an expiration, and it
always feeds as many objects into the pack as possible. But
we can double-check at runtime just to be sure.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
c90f9e13ab repack: pack objects mentioned by the index
When we pack all objects, we use only the objects reachable
from references and reflogs. This misses any objects which
are reachable from the index, but not yet referenced.

By itself this isn't a big deal; the objects can remain
loose until they are actually used in a commit. However, it
does create a problem when we drop packed but unreachable
objects. We try to optimize out the writing of objects that
we will immediately prune, which means we must follow the
same rules as prune in determining what is reachable. And
prune uses the index for this purpose.

This is rather uncommon in practice, as objects in the index
would not usually have been packed in the first place. But
it could happen in a sequence like:

  1. You make a commit on a branch that references blob X.

  2. You repack, moving X into the pack.

  3. You delete the branch (and its reflog), so that X is
     unreferenced.

  4. You "git add" blob X so that it is now referenced only
     by the index.

  5. You repack again with git-gc. The pack-objects we
     invoke will see that X is neither referenced nor
     recent and not bother loosening it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
edfbb2aa53 pack-objects: use argv_array
This saves us from having to bump the rp_av count when we
add new traversal options.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
abcb86553d pack-objects: match prune logic for discarding objects
A recent commit taught git-prune to keep non-recent objects
that are reachable from recent ones. However, pack-objects,
when loosening unreachable objects, tries to optimize out
the write in the case that the object will be immediately
pruned. It now gets this wrong, since its rule does not
reflect the new prune code (and this can be seen by running
t6501 with a strategically placed repack).

Let's teach pack-objects similar logic.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:43 -07:00
Jeff King
d0d46abc16 pack-objects: refactor unpack-unreachable expiration check
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The "would be pruned" check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.

Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime.  But the new rules will depend on the exact
object, so we need to perform the check even for old packs.

Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:42 -07:00
Junio C Hamano
0c45d258ec pack-objects: set number of threads before checking and warning
Under NO_PTHREADS build, we warn when delta_search_threads is not
set to 1, because that is the only sensible value on a single
threaded build.

However, the auto detection that kicks in when that variable is set
to 0 (e.g. there is no configuration variable or command line option,
or an explicit --threads=0 is given from the command line to override
the pack.threads configuration to force auto-detection) was not done
before the condition to issue this warning was tested.

Move the auto-detection code and place it at an appropriate spot.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-13 12:53:46 -07:00
René Scharfe
2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Junio C Hamano
a3d54f9a1f Merge branch 'jk/pack-shallow-always-without-bitmap'
Reachability bitmaps do not work with shallow operations.
Fixes regression in 2.0.

* jk/pack-shallow-always-without-bitmap:
  pack-objects: turn off bitmaps when we see --shallow lines
2014-08-26 11:16:25 -07:00
Jeff King
f7f91086a3 pack-objects: turn off bitmaps when we see --shallow lines
Reachability bitmaps do not work with shallow operations,
because they cache a view of the object reachability that
represents the true objects. Whereas a shallow repository
(or a shallow operation in a repository) is inherently
cutting off the object graph with a graft.

We explicitly disallow the use of bitmaps in shallow
repositories by checking is_repository_shallow(), and we
should continue to do that. However, we also want to
disallow bitmaps when we are serving a fetch to a shallow
client, since we momentarily take on their grafted view of
the world.

It used to be enough to call is_repository_shallow at the
start of pack-objects.  Upload-pack wrote the other side's
shallow state to a temporary file and pointed the whole
pack-objects process at this state with "git --shallow-file",
and from the perspective of pack-objects, we really were
in a shallow repo.  But since b790e0f (upload-pack: send
shallow info over stdin to pack-objects, 2014-03-11), we do
it differently: we send --shallow lines to pack-objects over
stdin, and it registers them itself.

This means that our is_repository_shallow check is way too
early (we have not been told about the shallowness yet), and
that it is insufficient (calling is_repository_shallow is
not enough, as the shallow grafts we register do not change
its return value). Instead, we can just turn off bitmaps
explicitly when we see these lines.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-12 12:17:19 -07:00
Junio C Hamano
25f3119000 Merge branch 'jk/repack-pack-writebitmaps-config'
* jk/repack-pack-writebitmaps-config:
  t7700: drop explicit --no-pack-kept-objects from .keep test
  repack: introduce repack.writeBitmaps config option
  repack: simplify handling of --write-bitmap-index
  pack-objects: stop respecting pack.writebitmaps
2014-06-25 12:23:19 -07:00
Jeff King
15a906c5e9 pack-objects: stop respecting pack.writebitmaps
The handling of the pack.writebitmaps config option
originally happened in pack-objects, which is quite
low-level. It would make more sense for drivers of
pack-objects to read the config, and then manipulate
pack-objects with command-line options.

Recently, repack learned to do so, making the low-level read
of pack.writebitmaps redundant here. Other callers, like
upload-pack, would not generally want to write bitmaps
anyway.

This could be considered a regression for somebody who is
driving pack-objects themselves outside of repack and
expects the config option to be used. However, such users
seem rather unlikely given how new the bitmap code is (and
the fact that they would basically be reimplementing repack
in the first place).

Note that we do not do anything with pack.writeBitmapHashCache
here. That option is not about "do we write bimaps", but
rather "when we are writing bitmaps, how do we do it?". You
would want that to kick in anytime you decide to write them,
similar to how pack.indexVersion is used.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 14:01:53 -07:00
Junio C Hamano
967f8c9184 Merge branch 'jk/pack-bitmap'
* jk/pack-bitmap:
  pack-objects: do not reuse packfiles without --delta-base-offset
  add `ignore_missing_links` mode to revwalk
2014-04-08 12:00:33 -07:00
Junio C Hamano
d59c12d7ad Merge branch 'jl/nor-or-nand-and'
Eradicate mistaken use of "nor" (that is, essentially "nor" used
not in "neither A nor B" ;-)) from in-code comments, command output
strings, and documentations.

* jl/nor-or-nand-and:
  code and test: fix misuses of "nor"
  comments: fix misuses of "nor"
  contrib: fix misuses of "nor"
  Documentation: fix misuses of "nor"
2014-04-08 12:00:28 -07:00