The test for --batch-all-objects in t1006 covers a variety
of object storage situations, but one thing it doesn't cover
is that we avoid mentioning duplicate objects. We won't have
any because running "git repack -ad" will have packed them
all and deleted the loose ones.
This does work (because we sort and de-dup the output list),
but it's good to include it in our test. And doubly so for
when we add an unordered mode which has to de-dup in a
different way.
Note that we cannot just re-create one of the objects, as
Git will omit the write of an object that is already
present. However, we can create a new pack with one of the
objects, which forces the duplication.
One alternative would be to just use "git repack -a" instead
of "-ad". But then _every_ object would be duplicated as
loose and packed, and we might miss a bug that omits packed
objects (because we'd show their loose counterparts).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently iterate over objects within a pack in .idx
order, which uses the object hashes. That means that it
is effectively random with respect to the location of the
object within the pack. If you're going to access the actual
object data, there are two reasons to move linearly through
the pack itself:
1. It improves the locality of access in the packfile. In
the cold-cache case, this may mean fewer disk seeks, or
better usage of disk cache.
2. We store related deltas together in the packfile. Which
means that the delta base cache can operate much more
efficiently if we visit all of those related deltas in
sequence, as the earlier items are likely to still be
in the cache. Whereas if we visit the objects in
random order, our cache entries are much more likely to
have been evicted by unrelated deltas in the meantime.
So in general, if you're going to access the object contents
pack order is generally going to end up more efficient.
But if you're simply generating a list of object names, or
if you're going to end up sorting the result anyway, you're
better off just using the .idx order, as finding the pack
order means generating the in-memory pack-revindex.
According to the numbers in 8b8dfd5132 (pack-revindex:
radix-sort the revindex, 2013-07-11), that takes about 200ms
for linux.git, and 20ms for git.git (those numbers are a few
years old but are still a good ballpark).
That makes it a good optimization for some cases (we can
save tens of seconds in git.git by having good locality of
delta access, for a 20ms cost), but a bad one for others
(e.g., right now "cat-file --batch-all-objects
--batch-check="%(objectname)" is 170ms in git.git, so adding
20ms to that is noticeable).
Hence this patch makes it an optional flag. You can't
actually do any interesting timings yet, as it's not plumbed
through to any user-facing tools like cat-file. That will
come in a later patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already mention the local/alternate behavior of these
functions, but we can help clarify a few other behaviors:
- there's no need to mention LOCAL_ONLY specifically, since
we already reference the flags by type (and as we add
more flags, we don't want to have to mention each)
- clarify that reachability doesn't matter here; this is
all accessible objects
- what ordering/uniqueness guarantees we give
- how pack-specific flags are handled for the loose case
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's not wrong to pass our flags in an "unsigned", as we
know it will be at least as large as the enum. However,
using the enum in the declaration makes it more obvious
where to find the list of flags.
While we're here, let's also drop the "extern" noise-words
from the declarations, per our modern coding style.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These flags were split between cache.h and packfile.h,
because some of the flags apply only to packs. However, they
share a single numeric namespace, since both are respected
for the packed variant. Let's make sure they're defined
together so that nobody accidentally adds a new flag in one
location that duplicates the other.
While we're here, let's also put them in an enum (which
helps debugger visibility) and use "(1<<n)" rather than
counting powers of 2 manually.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Look for broken "&&" chains that are hidden in subshell, many of
which have been found and corrected.
* es/chain-lint-in-subshell:
t/chainlint.sed: drop extra spaces from regex character class
t/chainlint: add chainlint "specialized" test cases
t/chainlint: add chainlint "complex" test cases
t/chainlint: add chainlint "cuddled" test cases
t/chainlint: add chainlint "loop" and "conditional" test cases
t/chainlint: add chainlint "nested subshell" test cases
t/chainlint: add chainlint "one-liner" test cases
t/chainlint: add chainlint "whitespace" test cases
t/chainlint: add chainlint "basic" test cases
t/Makefile: add machinery to check correctness of chainlint.sed
t/test-lib: teach --chain-lint to detect broken &&-chains in subshells
The lazy clone support had a few places where missing but promised
objects were not correctly tolerated, which have been fixed.
* jt/tags-to-promised-blobs-fix:
tag: don't warn if target is missing but promised
revision: tolerate promised targets of tags
Add a server-side knob to skip commits in exponential/fibbonacci
stride in an attempt to cover wider swath of history with a smaller
number of iterations, potentially accepting a larger packfile
transfer, instead of going back one commit a time during common
ancestor discovery during the "git fetch" transaction.
* jt/fetch-negotiator-skipping:
negotiator/skipping: skip commits during fetch
"git send-email" when using in a batched mode that limits the
number of messages sent in a single SMTP session lost the contents
of the variable used to choose between tls/ssl, unable to send the
second and later batches, which has been fixed.
* jm/send-email-tls-auth-on-batch:
send-email: fix tls AUTH when sending batch
"git rebase" started exporting GIT_DIR environment variable and
exposing it to hook scripts when part of it got rewritten in C.
Instead of matching the old scripted Porcelains' behaviour,
compensate by also exporting GIT_WORK_TREE environment as well to
lessen the damage. This can harm existing hooks that want to
operate on different repository, but the current behaviour is
already broken for them anyway.
* bc/sequencer-export-work-tree-as-well:
sequencer: pass absolute GIT_WORK_TREE to exec commands
Tests to cover conflict cases that involve submodules have been
added for merge-recursive.
* en/t7405-recursive-submodule-conflicts:
t7405: verify 'merge --abort' works after submodule/path conflicts
t7405: add a directory/submodule conflict
t7405: add a file/submodule conflict
Tests to cover various conflicting cases have been added for
merge-recursive.
* en/t6036-merge-recursive-tests:
t6036: add a failed conflict detection case: regular files, different modes
t6036: add a failed conflict detection case with conflicting types
t6036: add a failed conflict detection case with submodule add/add
t6036: add a failed conflict detection case with submodule modify/modify
t6036: add a failed conflict detection case with symlink add/add
t6036: add a failed conflict detection case with symlink modify/modify
The recursive merge strategy did not properly ensure there was no
change between HEAD and the index before performing its operation,
which has been corrected.
* en/dirty-merge-fixes:
merge: fix misleading pre-merge check documentation
merge-recursive: enforce rule that index matches head before merging
t6044: add more testcases with staged changes before a merge is invoked
merge-recursive: fix assumption that head tree being merged is HEAD
merge-recursive: make sure when we say we abort that we actually abort
t6044: add a testcase for index matching head, when head doesn't match HEAD
t6044: verify that merges expected to abort actually abort
index_has_changes(): avoid assuming operating on the_index
read-cache.c: move index_has_changes() from merge.c
"git rebase --rebase-merges" mode now handles octopus merges as
well.
* js/rebase-merge-octopus:
rebase --rebase-merges: adjust man page for octopus support
rebase --rebase-merges: add support for octopus merges
merge: allow reading the merge commit message from a file
"git gc --auto" opens file descriptors for the packfiles before
spawning "git repack/prune", which would upset Windows that does
not want a process to work on a file that is open by another
process. The issue has been worked around.
* kg/gc-auto-windows-workaround:
gc --auto: release pack files before auto packing
For a large tree, the index needs to hold many cache entries
allocated on heap. These cache entries are now allocated out of a
dedicated memory pool to amortize malloc(3) overhead.
* jm/cache-entry-from-mem-pool:
block alloc: add validations around cache_entry lifecyle
block alloc: allocate cache entries from mem_pool
mem-pool: fill out functionality
mem-pool: add life cycle management functions
mem-pool: only search head block for available space
block alloc: add lifecycle APIs for cache_entry structs
read-cache: teach make_cache_entry to take object_id
read-cache: teach refresh_cache_entry to take istate
"git fetch" learned a new option "--negotiation-tip" to limit the
set of commits it tells the other end as "have", to reduce wasted
bandwidth and cycles, which would be helpful when the receiving
repository has a lot of refs that have little to do with the
history at the remote it is fetching from.
* jt/fetch-nego-tip:
fetch-pack: support negotiation tip whitelist
Various glitches in the heuristics of merge-recursive strategy have
been documented in new tests.
* en/t6042-insane-merge-rename-testcases:
t6042: add testcase covering long chains of rename conflicts
t6042: add testcase covering rename/rename(2to1)/delete/delete conflict
t6042: add testcase covering rename/add/delete conflict type
Parsing of -L[<N>][,[<M>]] parameters "git blame" and "git log"
take has been tweaked.
* is/parsing-line-range:
log: prevent error if line range ends past end of file
blame: prevent error if range ends past end of file
Code restructuring and a small fix to transport protocol v2 during
fetching.
* jt/fetch-pack-negotiator:
fetch-pack: introduce negotiator API
fetch-pack: move common check and marking together
fetch-pack: make negotiation-related vars local
fetch-pack: use ref adv. to prune "have" sent
fetch-pack: directly end negotiation if ACK ready
fetch-pack: clear marks before re-marking
fetch-pack: split up everything_local()
"git checkout" and "git worktree add" learned to honor
checkout.defaultRemote when auto-vivifying a local branch out of a
remote tracking branch in a repository with multiple remotes that
have tracking branches that share the same names.
* ab/checkout-default-remote:
checkout & worktree: introduce checkout.defaultRemote
checkout: add advice for ambiguous "checkout <branch>"
builtin/checkout.c: use "ret" variable for return
checkout: pass the "num_matches" up to callers
checkout.c: change "unique" member to "num_matches"
checkout.c: introduce an *_INIT macro
checkout.h: wrap the arguments to unique_tracking_name()
checkout tests: index should be clean after dwim checkout
"git diff --color-moved" feature has further been tweaked.
* sb/diff-color-move-more:
diff.c: offer config option to control ws handling in move detection
diff.c: add white space mode to move detection that allows indent changes
diff.c: factor advance_or_nullify out of mark_color_as_moved
diff.c: decouple white space treatment from move detection algorithm
diff.c: add a blocks mode for moved code detection
diff.c: adjust hash function signature to match hashmap expectation
diff.c: do not pass diff options as keydata to hashmap
t4015: avoid git as a pipe input
xdiff/xdiffi.c: remove unneeded function declarations
xdiff/xdiff.h: remove unused flags
"git fsck" learns to make sure the optional commit-graph file is in
a sane state.
* ds/commit-graph-fsck: (23 commits)
coccinelle: update commit.cocci
commit-graph: update design document
gc: automatically write commit-graph files
commit-graph: add '--reachable' option
commit-graph: use string-list API for input
fsck: verify commit-graph
commit-graph: verify contents match checksum
commit-graph: test for corrupted octopus edge
commit-graph: verify commit date
commit-graph: verify generation number
commit-graph: verify parent list
commit-graph: verify root tree OIDs
commit-graph: verify objects exist
commit-graph: verify corrupt OID fanout and lookup
commit-graph: verify required chunks are present
commit-graph: verify catches corrupt signature
commit-graph: add 'verify' subcommand
commit-graph: load a root tree from specific graph
commit: force commit to parse from object database
commit-graph: parse commit from chosen graph
...
Recent "security fix" to pay attention to contents of ".gitmodules"
while accepting "git push" was a bit overly strict than necessary,
which has been adjusted.
* jk/fsck-gitmodules-gently:
fsck: downgrade gitmodulesParse default to "info"
fsck: split ".gitmodules too large" error from parse failure
fsck: silence stderr when parsing .gitmodules
config: add options parameter to git_config_from_mem
config: add CONFIG_ERROR_SILENT handler
config: turn die_on_error into caller-facing enum
Conversion from uchar[40] to struct object_id continues.
* bc/object-id:
pretty: switch hard-coded constants to the_hash_algo
sha1-file: convert constants to uses of the_hash_algo
log-tree: switch GIT_SHA1_HEXSZ to the_hash_algo->hexsz
diff: switch GIT_SHA1_HEXSZ to use the_hash_algo
builtin/merge-recursive: make hash independent
builtin/merge: switch to use the_hash_algo
builtin/fmt-merge-msg: make hash independent
builtin/update-index: simplify parsing of cacheinfo
builtin/update-index: convert to using the_hash_algo
refs/files-backend: use the_hash_algo for writing refs
sha1-name: use the_hash_algo when parsing object names
strbuf: allocate space with GIT_MAX_HEXSZ
commit: express tree entry constants in terms of the_hash_algo
hex: switch to using the_hash_algo
tree-walk: replace hard-coded constants with the_hash_algo
cache: update object ID functions for the_hash_algo
Tests to cover more D/F conflict cases have been added for
merge-recursive.
* en/t6036-recursive-corner-cases:
t6036: fix broken && chain in sub-shell
t6036: add lots of detail for directory/file conflicts in recursive case
httpd tests saw occasional breakage due to the way its access log
gets inspected by the tests, which has been updated to make them
less flaky.
* sg/httpd-test-unflake:
t/lib-httpd: avoid occasional failures when checking access.log
t/lib-httpd: add the strip_access_log() helper function
t5541: clean up truncating access log
"git pull --rebase" on a corrupt HEAD caused a segfault. In
general we substitute an empty tree object when running the in-core
equivalent of the diff-index command, and the codepath has been
corrected to do so as well to fix this issue.
* jk/has-uncommitted-changes-fix:
has_uncommitted_changes(): fall back to empty tree
This character class, like many others in this script, matches
horizontal whitespace consisting of spaces and tabs, however, a few
extra, entirely harmless, spaces somehow slipped into the expression.
Removing them is purely a cosmetic fix.
While at it, re-indent three lines with a single TAB each which were
incorrectly indented with six spaces. Also, a purely cosmetic fix.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Honor core.commentchar when preparing the list of commits to replay
in "rebase -i".
* as/sequencer-customizable-comment-char:
sequencer: use configured comment character
Look for broken use of "VAR=VAL shell_func" in test scripts as part
of test-lint.
* es/test-lint-one-shot-export:
t/check-non-portable-shell: detect "FOO=bar shell_func"
t/check-non-portable-shell: make error messages more compact
t/check-non-portable-shell: stop being so polite
t6046/t9833: fix use of "VAR=VAL cmd" with a shell function
"git rev-parse ':/substring'" did not consider the history leading
only to HEAD when looking for a commit with the given substring,
when the HEAD is detached. This has been fixed.
* wc/find-commit-with-pattern-on-detached-head:
sha1-name.c: for ":/", find detached HEAD commits
"git reset --merge" (hence "git merge ---abort") and "git reset --hard"
had trouble working correctly in a sparsely checked out working
tree after a conflict, which has been corrected.
* mk/merge-in-sparse-checkout:
unpack-trees: do not fail reset because of unmerged skipped entry
Code clean-up.
* hs/push-cert-check-cleanup:
gpg-interface: make parse_gpg_output static and remove from interface header
builtin/receive-pack: use check_signature from gpg-interface
Handling of an empty range by "git cherry-pick" was inconsistent
depending on how the range ended up to be empty, which has been
corrected.
* jk/empty-pick-fix:
sequencer: don't say BUG on bogus input
sequencer: handle empty-set cases consistently
Partial clone support of "git clone" has been updated to correctly
validate the objects it receives from the other side. The server
side has been corrected to send objects that are directly
requested, even if they may match the filtering criteria (e.g. when
doing a "lazy blob" partial clone).
* jt/partial-clone-fsck-connectivity:
clone: check connectivity even if clone is partial
upload-pack: send refs' objects despite "filter"
The content-transfer-encoding of the message "git send-email" sends
out by default was 8bit, which can cause trouble when there is an
overlong line to bust RFC 5322/2822 limit. A new option 'auto' to
automatically switch to quoted-printable when there is such a line
in the payload has been introduced and is made the default.
* bc/send-email-auto-cte:
docs: correct RFC specifying email line length
send-email: automatically determine transfer-encoding
send-email: accept long lines with suitable transfer encoding
send-email: add an auto option for transfer encoding
The character display width table has been updated to match the
latest Unicode standard.
* bb/unicode-11-width:
unicode: update the width tables to Unicode 11
The codebase has been updated to compile cleanly with -pedantic
option.
* bb/pedantic:
utf8.c: avoid char overflow
string-list.c: avoid conversion from void * to function pointer
sequencer.c: avoid empty statements at top level
convert.c: replace "\e" escapes with "\033".
fixup! refs/refs-internal.h: avoid forward declaration of an enum
refs/refs-internal.h: avoid forward declaration of an enum
fixup! connect.h: avoid forward declaration of an enum
connect.h: avoid forward declaration of an enum