When we pack all objects, we use only the objects reachable
from references and reflogs. This misses any objects which
are reachable from the index, but not yet referenced.
By itself this isn't a big deal; the objects can remain
loose until they are actually used in a commit. However, it
does create a problem when we drop packed but unreachable
objects. We try to optimize out the writing of objects that
we will immediately prune, which means we must follow the
same rules as prune in determining what is reachable. And
prune uses the index for this purpose.
This is rather uncommon in practice, as objects in the index
would not usually have been packed in the first place. But
it could happen in a sequence like:
1. You make a commit on a branch that references blob X.
2. You repack, moving X into the pack.
3. You delete the branch (and its reflog), so that X is
unreferenced.
4. You "git add" blob X so that it is now referenced only
by the index.
5. You repack again with git-gc. The pack-objects we
invoke will see that X is neither referenced nor
recent and not bother loosening it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* jk/prune-packed-server-info:
repack: call prune_packed_objects() and update_server_info() directly
server-info: clean up after writing info/packs
make update-server-info more robust
prune-packed: fix minor memory leak
Call the functions behind git prune-packed and git update-server-info
directly instead of using run_command(). This is shorter, easier and
quicker.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most struct child_process variables are cleared using memset first after
declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead. That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/strip-suffix:
prepare_packed_git_one: refactor duplicate-pack check
verify-pack: use strbuf_strip_suffix
strbuf: implement strbuf_strip_suffix
index-pack: use strip_suffix to avoid magic numbers
use strip_suffix instead of ends_with in simple cases
replace has_extension with ends_with
implement ends_with via strip_suffix
add strip_suffix function
sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
When stripping a suffix like:
if (ends_with(str, "foo"))
buf = xmemdupz(str, strlen(str) - 3);
we can instead use strip_suffix to avoid the constant 3,
which must match the literal "foo" (we sometimes use
strlen("foo") instead, but that means we are repeating
ourselves). The example above becomes:
if (strip_suffix(str, "foo", &len))
buf = xmemdupz(str, len);
This also saves a strlen(), since we calculate the string
length when detecting the suffix.
Note that in some cases we also switch from xstrndup to
xmemdupz, which saves a further strlen call.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent updates to "git repack" started to duplicate objects that
are in packfiles marked with .keep flag into the new packfile by
mistake.
* jk/repack-pack-keep-objects:
repack: s/write_bitmap/&s/ in code
repack: respect pack.writebitmaps
repack: do not accidentally pack kept objects by default
We currently have pack.writeBitmaps, which originally
operated at the pack-objects level. This should really have
been a repack.* option from day one. Let's give it the more
sensible name, but keep the old version as a deprecated
synonym.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We previously needed to pass --no-write-bitmap-index
explicitly to pack-objects to override its reading of
pack.writebitmaps from the config. Now that it no longer
does so, we can assume that bitmaps are off by default, and
only turn them on when necessary. This also lets us avoid a
confusing tri-state flag for write_bitmaps.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config name is "writeBitmaps", so the internal variable
missing the plural is unnecessarily confusing to write.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The config option to turn on bitmaps is read all the way
down in the plumbing of pack-objects. This makes it hard for
other options in the porcelain of repack to make decisions
based on the bitmap setting. For example,
repack.packKeptObjects tries to kick in by default only when
bitmaps are turned on. But it can't do so reliably because
it doesn't yet know whether we are using bitmaps.
This patch teaches repack to respect pack.writebitmaps. It
means we pass a redundant command-line flag to pack-objects,
but that's OK; it shouldn't affect the outcome.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit ee34a2b (repack: add `repack.packKeptObjects` config
var, 2014-03-03) added a flag which could duplicate kept
objects, but did not mean to turn it on by default. Instead,
the option is tied by default to the decision to write
bitmaps, like:
if (pack_kept_objects < 0)
pack_kept_objects = write_bitmap;
after which we expect pack_kept_objects to be a boolean 0 or
1. However, that assignment neglects that write_bitmap is
_also_ a tri-state with "-1" as the default, and with
neither option given, we accidentally turn the option on.
This patch is the minimal fix to restore the desired
behavior for the default state. Further patches will fix the
more complicated cases.
Note the update to t7700. It failed to turn on bitmaps,
meaning we were actually confirming the wrong behavior!
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-repack command always passes `--honor-pack-keep`
to pack-objects. This has traditionally been a good thing,
as we do not want to duplicate those objects in a new pack,
and we are not going to delete the old pack.
However, when bitmaps are in use, it is important for a full
repack to include all reachable objects, even if they may be
duplicated in a .keep pack. Otherwise, we cannot generate
the bitmaps, as the on-disk format requires the set of
objects in the pack to be fully closed.
Even if the repository does not generally have .keep files,
a simultaneous push could cause a race condition in which a
.keep file exists at the moment of a repack. The repack may
try to include those objects in one of two situations:
1. The pushed .keep pack contains objects that were
already in the repository (e.g., blobs due to a revert of
an old commit).
2. Receive-pack updates the refs, making the objects
reachable, but before it removes the .keep file, the
repack runs.
In either case, we may prefer to duplicate some objects in
the new, full pack, and let the next repack (after the .keep
file is cleaned up) take care of removing them.
This patch introduces both a command-line and config option
to disable the `--honor-pack-keep` option. By default, it
is triggered when pack.writeBitmaps (or `--write-bitmap-index`
is turned on), but specifying it explicitly can override the
behavior (e.g., in cases where you prefer .keep files to
bitmaps, but only when they are present).
Note that this option just disables the pack-objects
behavior. We still leave packs with a .keep in place, as we
do not necessarily know that we have duplicated all of their
objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
When a repo was fully repacked, and is repacked again, we may run
into the situation that "new" packfiles have the same name as
already existing ones (traditionally packfiles have been named after
the list of names of objects in them, so repacking all the objects
in a single pack would have produced a packfile with the same name).
The logic is to rename the existing ones into filename like
"old-XXX", create the new ones and then remove the "old-" ones.
When something went wrong in the middle, this sequence is rolled
back by renaming the "old-" files back.
The renaming into "old-" did not work as intended, because
file_exists() was done on "XXX", not "pack-XXX". Also when rolling
back the change, the code tried to rename "old-pack-XXX" but the
saved ones are named "old-XXX", so this couldn't have worked.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git repack --max-pack-size=8g" stopped being parsed correctly when
the command was reimplemented in C.
* sb/repack-in-c:
repack: propagate pack-objects options as strings
repack: make parsed string options const-correct
repack: fix typo in max-pack-size option
In the original shell version of git-repack, any options
destined for pack-objects were left as strings, and passed
as a whole. Since the C rewrite in commit a1bbc6c (repack:
rewrite the shell script in C, 2013-09-15), we now parse
these values to integers internally, then reformat the
integers when passing the option to pack-objects.
This has the advantage that we catch format errors earlier
(i.e., when repack is invoked, rather than when pack-objects
is invoked).
It has three disadvantages, though:
1. Our internal data types may not be the right size. In
the case of "--window-memory" and "--max-pack-size",
these are "unsigned long" in pack-objects, but we can
only represent a regular "int".
2. Our parsing routines might not be the same as those of
pack-objects. For the two options above, pack-objects
understands "100m" to mean "100 megabytes", but repack
does not.
3. We have to keep a sentinel value to know whether it is
worth passing the option along. In the case of
"--window-memory", we currently do not pass it if the
value is "0". But that is a meaningful value to
pack-objects, where it overrides any configured value.
We can fix all of these by simply passing the strings from
the user along to pack-objects verbatim. This does not
actually fix anything for "--depth" or "--window", but these
are converted, too, for consistency.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we use OPT_STRING to parse an option, we get back a
pointer into the argv array, which should be "const char *".
The compiler doesn't notice because it gets passed through a
"void *" in the option struct.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we see "--max-pack-size", we accidentally propagated
this to pack-objects as "--max_pack_size", which does not
work at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.
This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.
Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.
Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We ask pack-objects to pack to a set of temporary files, and
then rename them into place. Some files that pack-objects
creates may be optional (like a .bitmap file), in which case
we would not want to call rename(). We already call stat()
and make the chmod optional if the file cannot be accessed.
We could simply skip the rename step in this case, but that
would be a minor regression in noticing problems with
non-optional files (like the .pack and .idx files).
Instead, we can now annotate extensions as optional, and
skip them if they don't exist (and otherwise rely on
rename() to barf).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is slightly more verbose, but will let us annotate the
extensions with further options in future commits.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a static array of extensions, but hardcode the size
of the array in our loops. Let's pull out this magic number,
which will make it easier to change.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Its value is the same as the number of entries in the "names"
string_list, so just use "names.nr" in its place.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During the review process of the previous commit (repack: rewrite the
shell script in C), Johannes Sixt proposed to retain any exit codes from
the sub-process, which makes it probably more obvious in case of failure.
As the commit before should behave as close to the original shell
script, the proposed change is put in this extra commit.
The infrastructure however was already setup in the previous commit.
(Having a local 'ret' variable)
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The motivation of this patch is to get closer to a goal of being
able to have a core subset of git functionality built in to git.
That would mean
* people on Windows could get a copy of at least the core parts
of Git without having to install a Unix-style shell
* people using git in on servers with chrooted environments
do not need to worry about standard tools lacking for shell
scripts.
This patch is meant to be mostly a literal translation of the
git-repack script; the intent is that later patches would start using
more library facilities, but this patch is meant to be as close to a
no-op as possible so it doesn't do that kind of thing.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>