In order to merge one or many trees with the index, unpack-trees code
walks multiple trees in parallel with the index and performs n-way
merge. If we find out at start of a directory that all trees are the
same (by comparing OID) and cache-tree happens to be available for
that directory as well, we could avoid walking the trees because we
already know what these trees contain: it's flattened in what's called
"the index".
The upside is of course a lot less I/O since we can potentially skip
lots of trees (think subtrees). We also save CPU because we don't have
to inflate and apply the deltas. The downside is of course more
fragile code since the logic in some functions are now duplicated
elsewhere.
"checkout -" with this patch on webkit.git (275k files):
baseline new
--------------------------------------------------------------------
0.056651714 0.080394752 s: read cache .git/index
0.183101080 0.216010838 s: preload index
0.008584433 0.008534301 s: refresh index
0.633767589 0.251992198 s: traverse_trees
0.340265448 0.377031383 s: check_updates
0.381884638 0.372768105 s: cache_tree_update
1.401562947 1.045887251 s: unpack_trees
0.338687914 0.314983512 s: write index, changed mask = 2e
0.411927922 0.062572653 s: traverse_trees
0.000023335 0.000022544 s: check_updates
0.423697246 0.073795585 s: unpack_trees
0.423708360 0.073807557 s: diff-index
2.559524127 1.938191592 s: git command: git checkout -
Another measurement from Ben's running "git checkout" with over 500k
trees (on the whole series):
baseline new
----------------------------------------------------------------------
0.535510167 0.556558733 s: read cache .git/index
0.3057373 0.3147105 s: initialize name hash
0.0184082 0.023558433 s: preload index
0.086910967 0.089085967 s: refresh index
7.889590767 2.191554433 s: unpack trees
0.120760833 0.131941267 s: update worktree after a merge
2.2583504 2.572663167 s: repair cache-tree
0.8916137 0.959495233 s: write index, changed mask = 28
3.405199233 0.2710663 s: unpack trees
0.000999667 0.0021554 s: update worktree after a merge
3.4063306 0.273318333 s: diff-index
16.9524923 9.462943133 s: git command: git.exe checkout
This command calls unpack_trees() twice, the first time on 2way merge
and the second 1way merge. In both times, "unpack trees" time is
reduced to one third. Overall time reduction is not that impressive of
course because index operations take a big chunk. And there's that
repair cache-tree line.
PS. A note about cache-tree invalidation and the use of it in this
code.
We do invalidate cache-tree in _source_ index when we add new entries
to the (temporary) "result" index. But we also use the cache-tree from
source index in this optimization. Does this mean we end up having no
cache-tree in the source index to activate this optimization?
The answer is twisted: the order of finding a good cache-tree and
invalidating it matters. In this case we check for a good cache-tree
first in all_trees_same_as_cache_tree(), then we start to merge things
and potentially invalidate that same cache-tree in the process. Since
cache-tree invalidation happens after the optimization kicks in, we're
still good. But we may lose that cache-tree at the very first
call_unpack_fn() call in traverse_by_cache_tree().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're going to optimize unpack_trees() a bit in the following
patches. Let's add some tracing to measure how long it takes before
and after. This is the baseline ("git checkout -" on webkit.git, 275k
files on worktree)
performance: 0.056651714 s: read cache .git/index
performance: 0.183101080 s: preload index
performance: 0.008584433 s: refresh index
performance: 0.633767589 s: traverse_trees
performance: 0.340265448 s: check_updates
performance: 0.381884638 s: cache_tree_update
performance: 1.401562947 s: unpack_trees
performance: 0.338687914 s: write index, changed mask = 2e
performance: 0.411927922 s: traverse_trees
performance: 0.000023335 s: check_updates
performance: 0.423697246 s: unpack_trees
performance: 0.423708360 s: diff-index
performance: 2.559524127 s: git command: git checkout -
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Performance measurements are listed right now as a flat list, which is
fine when we measure big blocks. But when we start adding more and
more measurements, some of them could be just part of a bigger
measurement and a flat list gives a wrong impression that they are
executed at the same level instead of nested.
Add trace_performance_enter() and trace_performance_leave() to allow
indent these nested measurements. For now it does not help much
because the only nested thing is (lazy) name hash initialization
(e.g. called in diff-index from "git status"). This will help more
because I'm going to add some more tracing that's actually nested.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Checkout with the -p switch uses the "add interactive" framework which
is written in Perl.
One test added in 8d7b558bae ("checkout & worktree: introduce
checkout.defaultRemote", 2018-06-05) didn't declare the PERL
prerequisite, breaking the test when built with NO_PERL.
Reported-by: CB Bailey <cb@hashpling.org>
Signed-off-by: CB Bailey <cb@hashpling.org>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The caller of maybe_colorize_sideband() gives a counted buffer
<src, n>, but the callee checked src[] as if it were a NUL terminated
buffer. If src[] had all isspace() bytes in it, we would have made
n negative, and then
(1) made number of strncasecmp() calls to see if the remaining
bytes in src[] matched keywords, reading beyond the end of the
array (this actually happens even if n does not go negative),
and/or
(2) called strbuf_add() with negative count, most likely triggering
the "you want to use way too much memory" error due to unsigned
integer overflow.
Fix both issues by making sure we do not go beyond &src[n].
In the longer term we may want to accept size_t as parameter for
clarity (even though we know that a sideband message we are painting
typically would fit on a line on a terminal and int is sufficient).
Write it down as a NEEDSWORK comment.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the '--quiet' option to git worktree, as for the other git
commands. 'add' is the only command affected by it since all other
commands, except 'list', are currently silent by default.
[jc: appiled trivial fix-up to keep the tests from touching outside
the scratch area]
Helped-by: Martin Ågren <martin.agren@gmail.com>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pull --rebase=interactive" learned "i" as a short-hand for
"interactive".
* js/pull-rebase-type-shorthand:
pull --rebase=<type>: allow single-letter abbreviations for the type
The end result of documentation update has been made to be
inspected more easily to help developers.
* jk/diff-rendered-docs:
add a script to diff rendered documentation
"git merge --abort" etc. did not clean things up properly when
there were conflicted entries in the index in certain order that
are involved in D/F conflicts. This has been corrected.
* en/abort-df-conflict-fixes:
read-cache: fix directory/file conflict handling in read_index_unmerged()
t1015: demonstrate directory/file conflict recovery failures
The http-backend (used for smart-http transport) used to slurp the
whole input until EOF, without paying attention to CONTENT_LENGTH
that is supplied in the environment and instead expecting the Web
server to close the input stream. This has been fixed.
* mk/http-backend-content-length:
t5562: avoid non-portable "export FOO=bar" construct
http-backend: respect CONTENT_LENGTH for receive-pack
http-backend: respect CONTENT_LENGTH as specified by rfc3875
http-backend: cleanup writing to child process
A few atoms like %(objecttype) and %(objectsize) in the format
specifier of "for-each-ref --format=<format>" can be filled without
getting the full contents of the object, but just with the object
header. These cases have been optimized by calling
oid_object_info() API (instead of reading and inspecting the data).
* ot/ref-filter-object-info:
ref-filter: use oid_object_info() to get object
ref-filter: merge get_obj and get_object
ref-filter: initialize eaten variable
ref-filter: fill empty fields with empty values
ref-filter: add info_source to valid_atom
Noiseword "extern" has been removed from function decls in the
header files.
* nd/no-extern:
submodule.h: drop extern from function declaration
revision.h: drop extern from function declaration
repository.h: drop extern from function declaration
rerere.h: drop extern from function declaration
line-range.h: drop extern from function declaration
diff.h: remove extern from function declaration
diffcore.h: drop extern from function declaration
convert.h: drop 'extern' from function declaration
cache-tree.h: drop extern from function declaration
blame.h: drop extern on func declaration
attr.h: drop extern from function declaration
apply.h: drop extern on func declaration
The parse-options machinery learned to refrain from enclosing
placeholder string inside a "<bra" and "ket>" pair automatically
without PARSE_OPT_LITERAL_ARGHELP. Existing help text for option
arguments that are not formatted correctly have been identified and
fixed.
* rs/parse-opt-lithelp:
parse-options: automatically infer PARSE_OPT_LITERAL_ARGHELP
shortlog: correct option help for -w
send-pack: specify --force-with-lease argument help explicitly
pack-objects: specify --index-version argument help explicitly
difftool: remove angular brackets from argument help
add, update-index: fix --chmod argument help
push: use PARSE_OPT_LITERAL_ARGHELP instead of unbalanced brackets
Update to a few other topics around 'git fetch'.
* ab/fetch-nego:
fetch doc: cross-link two new negotiation options
negotiator: unknown fetch.negotiationAlgorithm should error out
"git fetch $there refs/heads/s" ought to fetch the tip of the
branch 's', but when "refs/heads/refs/heads/s", i.e. a branch whose
name is "refs/heads/s" exists at the same time, fetched that one
instead by mistake. This has been corrected to honor the usual
disambiguation rules for abbreviated refnames.
* jt/refspec-dwim-precedence-fix:
remote: make refspec follow the same disambiguation rule as local refs
The automatic tree-matching in "git merge -s subtree" was broken 5
years ago and nobody has noticed since then, which is now fixed.
* jk/merge-subtree-heuristics:
score_trees(): fix iteration over trees with missing entries
The test performed at the receiving end of "git push" to prevent
bad objects from entering repository can be customized via
receive.fsck.* configuration variables; we now have gained a
counterpart to do the same on the "git fetch" side, with
fetch.fsck.* configuration variables.
* ab/fsck-transfer-updates:
fsck: test and document unknown fsck.<msg-id> values
fsck: add stress tests for fsck.skipList
fsck: test & document {fetch,receive}.fsck.* config fallback
fetch: implement fetch.fsck.*
transfer.fsckObjects tests: untangle confusing setup
config doc: elaborate on fetch.fsckObjects security
config doc: elaborate on what transfer.fsckObjects does
config doc: unify the description of fsck.* and receive.fsck.*
config doc: don't describe *.fetchObjects twice
receive.fsck.<msg-id> tests: remove dead code
Paths that only differ in case work fine in a case-sensitive
filesystems, but if those repos are cloned in a case-insensitive one,
you'll get problems. The first thing to notice is "git status" will
never be clean with no indication what exactly is "dirty".
This patch helps the situation a bit by pointing out the problem at
clone time. Even though this patch talks about case sensitivity, the
patch makes no assumption about folding rules by the filesystem. It
simply observes that if an entry has been already checked out at clone
time when we're about to write a new path, some folding rules are
behind this.
In the case that we can't rely on filesystem (via inode number) to do
this check, fall back to fspathcmp() which is not perfect but should
not give false positives.
This patch is tested with vim-colorschemes and Sublime-Gitignore
repositories on a JFS partition with case insensitive support on
Linux.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the few conditional uses of FREE_AND_NULL(x) to be
unconditional. As noted in the standard[1] free(NULL) is perfectly
valid, so we might as well leave this check up to the C library.
1. http://pubs.opengroup.org/onlinepubs/9699919799/functions/free.html
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description of this key does not really tell what the 'minimal'
mode checks and does not check. The description for the 'default'
mode is not much better and just says 'all fields', which is unclear
and is not even correct (e.g. we do not look at 'atime').
Spell out what are and what are not checked under the 'minimal' mode
relative to the 'default' mode to help those who want to decide if
they want to use the 'minimal' mode, also taking information about
this mode from the commit message of c08e4d5b5c (Enable minimal stat
checking - 2013-01-22).
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Skip merging the commit, updating the index and working directory if and
only if we are creating a new branch via "git checkout -b <new_branch>."
Any other checkout options will still go through the former code path.
If sparse_checkout is on, require the user to manually opt in to this
optimzed behavior by setting the config setting checkout.optimizeNewBranch
to true as we will no longer update the skip-worktree bit in the index, nor
add/remove files in the working directory to reflect the current sparse
checkout settings.
For comparison, running "git checkout -b <new_branch>" on a large repo takes:
14.6 seconds - without this patch
0.3 seconds - with this patch
Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for configuring default sort ordering for git branches. Command
line option will override this configured value, using the exact same
syntax.
Signed-off-by: Samuel Maftoul <samuel.maftoul@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reduces the size of 'struct object_entry' from 88 bytes
to 80 and therefore makes packing objects more efficient.
For example on a Linux repo with 12M objects,
`git pack-objects --all` needs extra 96MB memory even if the
layer feature is not used.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reduces the size of 'struct object_entry' and therefore
makes packing objects more efficient.
This also renames cmp_tree_depth() into tree_depth_compare(),
as it is more modern to have the name of the compare functions
end with "compare".
Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement simple support for --delta-islands option and
repack.useDeltaIslands config variable in git repack.
This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement support for delta islands in git pack-objects
and document how delta islands work in
"Documentation/git-pack-objects.txt" and Documentation/config.txt.
This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a following commit, as we will use delta islands, we will
have to compute the write order for different layers, not just
for one.
Let's prepare for that by refactoring the code that will be
used to compute the write order for a given layer into a new
compute_layer_order() function.
This will make it easier to see and understand what the
following changes are doing.
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hosting providers that allow users to "fork" existing
repos want those forks to share as much disk space as
possible.
Alternates are an existing solution to keep all the
objects from all the forks into a unique central repo,
but this can have some drawbacks. Especially when
packing the central repo, deltas will be created
between objects from different forks.
This can make cloning or fetching a fork much slower
and much more CPU intensive as Git might have to
compute new deltas for many objects to avoid sending
objects from a different fork.
Because the inefficiency primarily arises when an
object is deltified against another object that does
not exist in the same fork, we partition objects into
sets that appear in the same fork, and define
"delta islands". When finding delta base, we do not
allow an object outside the same island to be
considered as its base.
So "delta islands" is a way to store objects from
different forks in the same repo and packfile without
having deltas between objects from different forks.
This patch implements the delta islands mechanism in
"delta-islands.{c,h}", but does not yet make use of it.
A few new fields are added in 'struct object_entry'
in "pack-objects.h" though.
The documentation will follow in a patch that actually
uses delta islands in "builtin/pack-objects.c".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at it fix a typo (s/independed/independent) and
make sure git is not in a chain of pipes.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
--quit is supposed to be --abort but without restoring HEAD. Leaving
CHERRY_PICK_HEAD behind could make other commands mistake that
cherry-pick is still ongoing (e.g. "git commit --amend" will refuse to
work). Clean it too.
For --abort, this job of deleting CHERRY_PICK_HEAD is on "git reset"
so we don't need to do anything else. But let's add extra checks in
--abort tests to confirm.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a merge command in the todo list specifies just a branch to merge
with no -C/-c argument then item->commit is NULL. This means that if
there are merge conflicts error_with_patch() is passed a NULL commit
which causes a segmentation fault when make_patch() tries to look it up.
This commit implements a minimal fix which fixes the crash and allows
the user to successfully commit a conflict resolution with 'git rebase
--continue'. It does not write .git/rebase-merge/patch,
.git/rebase-merge/stopped-sha or update REBASE_HEAD. To sensibly get the
hashes of the merge parents would require refactoring do_merge() to
extract the code that parses the merge parents into a separate function
which error_with_patch() could then use to write the parents into the
stopped-sha file. To create meaningful output make_patch() and 'git
rebase --show-current-patch' would also need to be modified to diff the
merge parent and merge base in this case.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the creation of conflicting-G from a test to the setup so that it
can be used in subsequent tests without creating the kind of implicit
dependencies that plague t3404. While we're at it simplify the
arguments to the test_commit() call the creates the conflicting commit.
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch" sometimes failed to update the remote-tracking refs,
which has been corrected.
* jt/connectivity-check-after-unshallow:
fetch-pack: unify ref in and out param
The Travis CI scripts were taught to ship back the test data from
failed tests.
* sg/travis-retrieve-trash-upon-failure:
travis-ci: include the trash directories of failed tests in the trace log