When "gc" needs to retain unreachable objects, packing them into
cruft packs (instead of exploding them into loose object files) has
been offered as a more efficient option for some time. Now the use
of cruft packs has been made the default and no longer considered
an experimental feature.
* tb/enable-cruft-packs-by-default:
repository.h: drop unused `gc_cruft_packs`
builtin/gc.c: make `gc.cruftPacks` enabled by default
t/t9300-fast-import.sh: prepare for `gc --cruft` by default
t/t6500-gc.sh: add additional test cases
t/t6500-gc.sh: refactor cruft pack tests
t/t6501-freshen-objects.sh: prepare for `gc --cruft` by default
t/t5304-prune.sh: prepare for `gc --cruft` by default
builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
builtin/repack.c: fix incorrect reference to '-C'
pack-write.c: plug a leak in stage_tmp_packfiles()
Instead of the time the formatter was run, show the timestamp
recorded in the commit in the documentation.
* fc/doc-use-datestamp-in-commit:
doc: set actual revdate for manpages
The on-disk reverse index that allows mapping from the pack offset
to the object name for the object stored at the offset has been
enabled by default.
* tb/pack-revindex-on-disk:
t: invert `GIT_TEST_WRITE_REV_INDEX`
config: enable `pack.writeReverseIndex` by default
pack-revindex: introduce `pack.readReverseIndex`
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: make `load_pack_revindex` take a repository
t5325: mark as leak-free
pack-write.c: plug a leak in stage_tmp_packfiles()
Geometric repacking ("git repack --geometric=<n>") in a repository
that borrows from an alternate object database had various corner
case bugs, which have been corrected.
* ps/fix-geom-repack-with-alternates:
repack: disable writing bitmaps when doing a local repack
repack: honor `-l` when calculating pack geometry
t/helper: allow chmtime to print verbosely without modifying mtime
pack-objects: extend test coverage of `--stdin-packs` with alternates
pack-objects: fix error when same packfile is included and excluded
pack-objects: fix error when packing same pack twice
pack-objects: split out `--stdin-packs` tests into separate file
repack: fix generating multi-pack-index with only non-local packs
repack: fix trying to use preferred pack in alternates
midx: fix segfault with no packs and invalid preferred pack
The sendemail-validate validate hook learned to pass the total
number of input files and where in the sequence each invocation is
via environment variables.
* rj/send-email-validate-hook-count-messages:
send-email: export patch counters in validate environment
The code to parse capability list for v0 on-wire protocol fell into
an infinite loop when a capability appears multiple times, which
has been corrected.
* jk/protocol-cap-parse-fix:
v0 protocol: use size_t for capability length/offset
t5512: test "ls-remote --heads --symref" filtering with v0 and v2
t5512: allow any protocol version for filtered symref test
t5512: add v2 support for "ls-remote --symref" test
v0 protocol: fix sha1/sha256 confusion for capabilities^{}
t5512: stop referring to "v1" protocol
v0 protocol: fix infinite loop when parsing multi-valued capabilities
Header clean-up.
* en/header-split-cache-h: (24 commits)
protocol.h: move definition of DEFAULT_GIT_PORT from cache.h
mailmap, quote: move declarations of global vars to correct unit
treewide: reduce includes of cache.h in other headers
treewide: remove double forward declaration of read_in_full
cache.h: remove unnecessary includes
treewide: remove cache.h inclusion due to pager.h changes
pager.h: move declarations for pager.c functions from cache.h
treewide: remove cache.h inclusion due to editor.h changes
editor: move editor-related functions and declarations into common file
treewide: remove cache.h inclusion due to object.h changes
object.h: move some inline functions and defines from cache.h
treewide: remove cache.h inclusion due to object-file.h changes
object-file.h: move declarations for object-file.c functions from cache.h
treewide: remove cache.h inclusion due to git-zlib changes
git-zlib: move declarations for git-zlib functions from cache.h
treewide: remove cache.h inclusion due to object-name.h changes
object-name.h: move declarations for object-name.c functions from cache.h
treewide: remove unnecessary cache.h inclusion
treewide: be explicit about dependence on mem-pool.h
treewide: be explicit about dependence on oid-array.h
...
"git branch --format=..." and "git format-patch --format=..."
learns "--omit-empty" to hide refs that whose formatting result
becomes an empty string from the output.
* ow/ref-filter-omit-empty:
branch, for-each-ref, tag: add option to omit empty lines
"git archive" run from a subdirectory mishandled attributes and
paths outside the current directory.
* rs/archive-from-subdirectory-fixes:
archive: improve support for running in subdirectory
"git clone --local" stops copying from an original repository that
has symbolic links inside its $GIT_DIR; an error message when that
happens has been updated.
* gc/better-error-when-local-clone-fails-with-symlink:
clone: error specifically with --local and symlinked objects
Code clean-up to replace a hardcoded constant with a CPP macro.
* rs/get-tar-commit-id-use-defined-const:
get-tar-commit-id: use TYPEFLAG_GLOBAL_HEADER instead of magic value
The approxidate() API has been simplified by losing an extra
function that did the same thing as another one.
* rs/remove-approxidate-relative:
date: remove approxidate_relative()
The userdiff regexp patterns for various filetypes that are built
into the system have been updated to avoid triggering regexp errors
from UTF-8 aware regex engines.
* rs/userdiff-multibyte-regex:
userdiff: support regexec(3) with multi-byte support
The examples are an ordered list, however, they are complex enough that
a callout is inside example 1, and that confuses the parsers as the list
continuation (`+`) is unclear (are we continuing the previous list item,
or the previous callout?).
We could use an open block as the asciidoctor documentation suggests,
but that has a tiny formatting issue (a newline is missing).
To simplify things for everyone (the reader, the writer, and the parser)
let's use subsections.
After this change, the HTML documentation generated with asciidoc has
the right indentation.
Cc: Jeff King <peff@peff.net>
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callouts are directly tied to the listing above, remove spaces to
make it clear they are one and the same.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of the previous commit, all callers that need to read the value of
`gc.cruftPacks` do so outside without using the `repo_settings` struct,
making its `gc_cruft_packs` unused. Drop it accordingly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects
via loose, 2022-05-20), `git gc` learned the `--cruft` option and
`gc.cruftPacks` configuration to opt-in to writing cruft packs when
collecting or pruning unreachable objects.
Cruft packs were introduced with the merge in a50036da1a (Merge branch
'tb/cruft-packs', 2022-06-03). They address the problem of "loose object
explosions", where Git will write out many individual loose objects when
there is a large number of unreachable objects that have not yet aged
past `--prune=<date>`.
Instead of keeping track of those unreachable yet recent objects via
their loose object file's mtime, cruft packs collect all unreachable
objects into a single pack with a corresponding `*.mtimes` file that
acts as a table to store the mtimes of all unreachable objects. This
prevents the need to store unreachable objects as loose as they age out
of the repository, and avoids the problem of loose object explosions.
Beyond avoiding loose object explosions, cruft packs also act as a more
efficient mechanism to store unreachable objects as they age out of a
repository. This is because pairs of similar unreachable objects serve
as delta bases for one another.
In 5b92477f89, the feature was introduced as experimental. Since then,
GitHub has been running these patches in every repository generating
hundreds of millions of cruft packs along the way. The feature is
battle-tested, and avoids many pathological cases such as above. Users
who either run `git gc` manually, or via `git maintenance` can benefit
from having cruft packs.
As such, enable cruft pack generation to take place by default (by
making `gc.cruftPacks` have the default of "true" rather than "false).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar fashion as previous commits, adjust the fast-import tests
to prepare for "git gc" generating a cruft pack by default.
This adjustment is slightly different, however. Instead of relying on us
writing out the objects loose, and then calling `git prune` to remove
them, t9300 needs to be prepared to drop objects that would be moved
into cruft packs.
To do this, we can combine the `git gc` invocation with `git prune` into
one `git gc --prune`, which handles pruning both loose objects, and
objects that would otherwise be written to a cruft pack.
Likely this pattern of "git gc && git prune" started all the way back in
03db4525d3 (Support gitlinks in fast-import., 2008-07-19), which
happened after deprecating `git gc --prune` in 9e7d501990 (builtin-gc.c:
deprecate --prune, it now really has no effect, 2008-05-09).
After `--prune` was un-deprecated in 58e9d9d472 (gc: make --prune useful
again by accepting an optional parameter, 2009-02-14), this script got a
handful of new "git gc && git prune" instances via via 4cedb78cb5
(fast-import: add input format tests, 2011-08-11). These could have been
`git gc --prune`, but weren't (likely taking after 03db4525d3).
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the last commit, we refactored some of the tests in t6500 to make
clearer when cruft packs will and won't be generated by `git gc`.
Add the remaining cases not covered by the previous patch into this one,
which enumerates all possible combinations of arguments that will
produce (or not produce) a cruft pack.
This prepares us for a future commit which will change the default value
of `gc.cruftPacks` by ensuring that we understand which invocations do
and do not change as a result.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 12253ab6d0 (gc: add tests for --cruft and friends, 2022-10-26), we
added a handful of tests to t6500 to ensure that `git gc` respected the
value of `--cruft` and `gc.cruftPacks`.
Then, in c695592850 (config: let feature.experimental imply
gc.cruftPacks=true, 2022-10-26), another set of similar tests was added
to ensure that `feature.experimental` correctly implied enabling cruft
pack generation (or not).
These tests are similar and could be consolidated. Do so in this patch
to prepare for expanding the set of command-line invocations that enable
or disable writing cruft packs. This makes it possible to easily test
more combinations of arguments without being overly repetitive.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a similar spirit as previous commits, prepare for `gc --cruft`
becoming the default by ensuring that the tests in t6501 explicitly
cover the case of freshening loose objects not using cruft packs.
We could run this test twice, once with `--cruft` and once with
`--no-cruft`, but doing so is unnecessary, since we already test object
rescuing, freshening, and dealing with corrupt parts of the unreachable
object graph extensively via t5329.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of the tests in t5304 run `git gc`, and rely on its behavior that
unreachable-but-recent objects are written out loose. This is sensible,
since t5304 deals specifically with this kind of pruning.
If left unattended, however, this test would break when the default
behavior of a bare "git gc" is adjusted to generate a cruft pack by
default.
Ensure that these tests continue to work as-is (and continue to provide
coverage of loose object pruning) by passing `--no-cruft` explicitly.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cruft packs were implemented, we never adjusted the code for `git
gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft
packs. This option and configuration option share a common
implementation, but including cruft packs is wrong in both cases:
- Running `git gc --keep-largest-pack` in a repository where the
largest pack is the cruft pack itself will make it impossible for
`git gc` to prune objects, since the cruft pack itself is kept.
- The same is true for `gc.bigPackThreshold`, if the size of the cruft
pack exceeds the limit set by the caller.
In the future, it is possible that `gc.bigPackThreshold` could be used
to write a separate cruft pack containing any new unreachable objects
that entered the repository since the last time a cruft pack was
written.
There are some complexities to doing so, mainly around handling
pruning objects that are in an existing cruft pack that is above the
threshold (which would either need to be rewritten, or else delay
pruning). Rewriting a substantially similar cruft pack isn't ideal, but
it is significantly better than the status-quo.
If users have large cruft packs that they don't want to rewrite, they
can mark them as `*.keep` packs. But in general, if a repository has a
cruft pack that is so large it is slowing down GC's, it should probably
be pruned anyway.
In the meantime, ignore cruft packs in the common implementation for
both of these options, and add a pair of tests to prevent any future
regressions here.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cruft packs were originally being developed, `-C` was designated as
the short-form for `--cruft` (as in `git repack -C`).
This was dropped due to confusion with Git's top-level `-C` option
before submitting to the list. But the reference to it in
`--cruft-expiration`'s help text was never updated. Fix that dangling
reference in this patch.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function `stage_tmp_packfiles()` generates a filename to use for
staging the contents of what will become the pack's ".mtimes" file.
The name is generated in `write_mtimes_file()` and the result is
returned back to `stage_tmp_packfiles()` which uses it to rename the
temporary file into place via `rename_tmp_packfiles()`.
`write_mtimes_file()` returns a `const char *`, indicating that callers
are not expected to free its result (similar to, e.g., `oid_to_hex()`).
But callers are expected to free its result, so this return type is
incorrect.
Change the function's signature to return a non-const `char *`, and free
it at the end of `stage_tmp_packfiles()`.
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Michael J Gruber noticed that connection via the git:// protocol no
longer worked after a recent header clean-up. This was caused by
funny interaction of few gotchas. First, a necessary definition
#define DEFAULT_GIT_PORT 9418
was made invisible to a place where
const char *port = STR(DEFAULT_GIT_PORT);
was expecting to turn the integer into "9418" with a clever STR()
macro, and ended up stringifying it to
const char *port = "DEFAULT_GIT_PORT";
without giving any chance to compilers to notice such a mistake.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean-up of the code path that deals with merge strategy option
handling in "git rebase".
* pw/rebase-cleanup-merge-strategy-option-handling:
rebase: remove a couple of redundant strategy tests
rebase -m: fix serialization of strategy options
rebase -m: cleanup --strategy-option handling
sequencer: use struct strvec to store merge strategy options
rebase: stop reading and writing unnecessary strategy state
"git branch -d origin/master" would say "no such branch", but it is
likely a missed "-r" if refs/remotes/origin/master exists. The
command has been taught to give such a hint in its error message.
* cm/branch-delete-error-message-update:
branch: improve error log on branch not found by checking remotes refs
"git mergetool" and "git difftool" learns a new configuration
guiDefault to optionally favor configured guitool over non-gui-tool
automatically when $DISPLAY is set.
* tk/mergetool-gui-default-config:
mergetool: new config guiDefault supports auto-toggling gui by DISPLAY
While parsing a .rev file, we check the header information to be sure it
makes sense. This happens before doing any additional validation such as
a checksum or value check. In order to differentiate between a bad
header and a non-existent file, we need to update the API for loading a
reverse index.
Make load_pack_revindex_from_disk() non-static and specify that a
positive value means "the file does not exist" while other errors during
parsing are negative values. Since an invalid header prevents setting up
the structures we would use for further validations, we can stop at that
point.
The place where we can distinguish between a missing file and a corrupt
file is inside load_revindex_from_disk(), which is used both by pack
rev-indexes and multi-pack-index rev-indexes. Some tests in t5326
demonstrate that it is critical to take some conditions to allow
positive error signals.
Add tests that check the three header values.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking a rev-index file, it may be helpful to identify exactly
which positions are incorrect. Compare the rev-index to a
freshly-computed in-memory rev-index and report the comparison failures.
This additional check (on top of the checksum validation) can help find
files that were corrupt by a single bit flip on-disk or perhaps were
written incorrectly due to a bug in Git.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change added calls to verify_pack_revindex() in
builtin/fsck.c, but the implementation of the method was left empty. Add
the first and most-obvious check to this method: checksum verification.
While here, create a helper method in the test script that makes it easy
to adjust the .rev file and check that 'git fsck' reports the correct
error message.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'fsck' builtin checks many of Git's on-disk data structures, but
does not currently validate the pack rev-index files (a .rev file to
pair with a .pack and .idx file).
Before doing a more-involved check process, create the scaffolding
within builtin/fsck.c to have a new error type and add that error type
when the API method verify_pack_revindex() returns an error. That method
does nothing currently, but we will add checks to it in later changes.
For now, check that 'git fsck' succeeds without any errors in the normal
case. Future checks will be paired with tests that corrupt the .rev file
appropriately.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tb/pack-revindex-on-disk:
t: invert `GIT_TEST_WRITE_REV_INDEX`
config: enable `pack.writeReverseIndex` by default
pack-revindex: introduce `pack.readReverseIndex`
pack-revindex: introduce GIT_TEST_REV_INDEX_DIE_ON_DISK
pack-revindex: make `load_pack_revindex` take a repository
t5325: mark as leak-free
pack-write.c: plug a leak in stage_tmp_packfiles()