Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Code clean-up.
* dk/blame-janitorial:
builtin/blame.c::find_copy_in_blob: no need to scan for region end
blame.c: prepare_lines should not call xrealloc for every line
builtin/blame.c::prepare_lines: fix allocation size of sb->lineno
builtin/blame.c: eliminate same_suspect()
builtin/blame.c: struct blame_entry does not need a prev link
Teach "--gpg-sign" option to many commands that create commits.
* bc/gpg-sign-everywhere:
pull: add the --gpg-sign option.
rebase: add the --gpg-sign option
rebase: parse options in stuck-long mode
rebase: don't try to match -M option
rebase: remove useless arguments check
am: add the --gpg-sign option
am: parse options in stuck-long mode
git-sh-setup.sh: add variable to use the stuck-long mode
cherry-pick, revert: add the --gpg-sign option
A handful of documentation updates, all trivially harmless.
* al/docs:
docs/git-blame: explain more clearly the example pickaxe use
docs/git-clone: clarify use of --no-hardlinks option
docs/git-remote: capitalize first word of initial blurb
docs/merge-strategies: remove hyphen from mis-merges
Avoid having to assign port number to be used in tests manually.
* jk/test-ports:
tests: auto-set git-daemon port
tests: auto-set LIB_HTTPD_PORT from test name
* ks/tree-diff-walk:
tree-walk: finally switch over tree descriptors to contain a pre-parsed entry
revision: convert to using diff_tree_sha1()
line-log: convert to using diff_tree_sha1()
tree-diff: convert diff_root_tree_sha1() to just call diff_tree_sha1 with old=NULL
tree-diff: allow diff_tree_sha1 to accept NULL sha1
All subcommands that take pathspecs mishandled an in-tree symbolic
link when given it as a full path from the root (which arguably is
a sick way to use pathspecs). "git ls-files -s $(pwd)/RelNotes" in
our tree is an easy reproduction recipe.
* mw/symlinks:
setup: don't dereference in-tree symlinks for absolute paths
setup: add abspath_part_inside_repo() function
t0060: add tests for prefix_path when path begins with work tree
t0060: add test for prefix_path when path == work tree
t0060: add test for prefix_path on symlinks via absolute paths
t3004: add test for ls-files on symlinks via absolute paths
Make sure 'submodule update' modes that do not detach HEADs can
be used more pleasantly by checking out a concrete branch when
cloning them to prime the well.
* wk/submodule-on-branch:
Documentation: describe 'submodule update --remote' use case
submodule: explicit local branch creation in module_clone
submodule: document module_clone arguments in comments
submodule: make 'checkout' update_module mode more explicit
Shrink lifetime of variables by moving their definitions to an
inner scope where appropriate.
* ep/varscope:
builtin/gc.c: reduce scope of variables
builtin/fetch.c: reduce scope of variable
builtin/commit.c: reduce scope of variables
builtin/clean.c: reduce scope of variable
builtin/blame.c: reduce scope of variables
builtin/apply.c: reduce scope of variables
bisect.c: reduce scope of variable
When we replace broken macros from stdio.h in git-compat-util.h,
preprocessor.
* bs/stdio-undef-before-redef:
git-compat-util.h: #undef (v)snprintf before #define them
include.path variable (or any variable that expects a path that can
use ~username expansion) in the configuration file is not a
boolean, but the code failed to check it.
* jk/config-path-include-fix:
handle_path_include: don't look at NULL value
expand_user_path: do not look at NULL path
"git rev-parse --default" without the required option argument did
not diagnose it as an error.
* ds/rev-parse-required-args:
rev-parse: check i before using argv[i] against argc
"git diff --quiet -- pathspec1 pathspec2" sometimes did not return
correct status value.
* nd/diff-quiet-stat-dirty:
diff: do not quit early on stat-dirty files
diff.c: move diffcore_skip_stat_unmatch core logic out for reuse later
Allow "git cmd path/", when the 'path' is where a submodule is
bound to the top-level working tree, to match 'path', despite the
extra and unnecessary trailing slash.
* nd/submodule-pathspec-ending-with-slash:
clean: use cache_name_is_other()
clean: replace match_pathspec() with dir_path_match()
pathspec: pass directory indicator to match_pathspec_item()
match_pathspec: match pathspec "foo/" against directory "foo"
dir.c: prepare match_pathspec_item for taking more flags
pathspec: rename match_pathspec_depth() to match_pathspec()
pathspec: convert some match_pathspec_depth() to dir_path_match()
pathspec: convert some match_pathspec_depth() to ce_path_match()
Allow "merge-recursive" to work in an empty (temporary) working
tree again when there are renames involved, correcting an old
regression in 1.7.7 era.
* bk/refresh-missing-ok-in-merge-recursive:
merge-recursive.c: tolerate missing files while refreshing index
read-cache.c: extend make_cache_entry refresh flag with options
read-cache.c: refactor --ignore-missing implementation
t3030-merge-recursive: test known breakage with empty work tree
"git pull" learned to pay attention to pull.ff configuration
variable.
* da/pull-ff-configuration:
pull: add --ff-only to the help text
pull: add pull.ff configuration
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.
* kb/fast-hashmap:
name-hash: retire unused index_name_exists()
hashmap.h: use 'unsigned int' for hash-codes everywhere
test-hashmap.c: drop unnecessary #includes
.gitignore: test-hashmap is a generated file
read-cache.c: fix memory leaks caused by removed cache entries
builtin/update-index.c: cleanup update_one
fix 'git update-index --verbose --again' output
remove old hash.[ch] implementation
name-hash.c: remove cache entries instead of marking them CE_UNHASHED
name-hash.c: use new hash map implementation for cache entries
name-hash.c: remove unreferenced directory entries
name-hash.c: use new hash map implementation for directories
diffcore-rename.c: use new hash map implementation
diffcore-rename.c: simplify finding exact renames
diffcore-rename.c: move code around to prepare for the next patch
buitin/describe.c: use new hash map implementation
add a hashtable implementation that supports O(1) removal
submodule: don't access the .gitmodules cache entry after removing it
Introduce commit.gpgsign configuration variable to force every
commit to be GPG signed. The variable cannot be overriden from the
command line of some of the commands that create commits except for
"git commit" and "git commit-tree", but I am not convinced that it
is a good idea to sprinkle support for --no-gpg-sign everywhere,
which in turn means that this configuration variable may not be
such a good idea.
* nv/commit-gpgsign-config:
test the commit.gpgsign config option
commit-tree: add and document --no-gpg-sign
commit-tree: add the commit.gpgsign option to sign all commits
We sometimes write tempfiles of the form "shallow_XXXXXX"
during fetch/push operations with shallow repositories.
Under normal circumstances, we clean up the result when we
are done. However, we do no take steps to clean up after
ourselves when we exit due to die() or signal death.
This patch teaches the tempfile creation code to register
handlers to clean up after ourselves. To handle this, we
change the ownership semantics of the filename returned by
setup_temporary_shallow. It now keeps a copy of the filename
itself, and returns only a const pointer to it.
We can also do away with explicit tempfile removal in the
callers. They all exit not long after finishing with the
file, so they can rely on the auto-cleanup, simplifying the
code.
Note that we keep things simple and maintain only a single
filename to be cleaned. This is sufficient for the current
caller, but we future-proof it with a die("BUG").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are about to write the shallow file, we check that
it has not changed since we last read it. Instead of
hand-rolling this, we can use stat_validity. This is built
around the index stat-check, so it is more robust than just
checking the mtime, as we do now (it uses the same check as
we do for index files).
The new code also handles the case of a shallow file
appearing unexpectedly. With the current code, two
simultaneous processes making us shallow (e.g., two "git
fetch --depth=1" running at the same time in a non-shallow
repository) can race to overwrite each other.
As a bonus, we also remove a race in determining the stat
information of what we read (we stat and then open, leaving
a race window; instead we should open and then fstat the
descriptor).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor binary search in "commit_graft_pos" function: use
generic "sha1_pos" function.
Signed-off-by: Dmitry S. Dolzhenko <dmitrys.dolzhenko@yandex.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When asking to fetch/pull a branch whose name is B or a tag whose
name is T, we used to show the command to run as:
git pull $URL B
git pull $URL tags/T
even when B and T were spelled in a more qualified way in order to
disambiguate, e.g. heads/B or refs/tags/T, but the recent update
lost this feature. Resurrect it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This illustrates behaviour changes that result from the recent
change by Linus. Most show good changes, but there may be some
usability regressions:
- The command continues to fail when the user forgot to push out
before running the command, but the wording of the message has
been slightly changed.
- The command no longer guesses when asked to request the commit at
the HEAD be pulled after pushing it to a branch 'for-upstream',
even when that branch points at the correct commit. The user
must ask the command with the new "master:for-upstream" syntax.
The new behaviour needs to be documented in any case, but we need to
agree what the new behaviour should be before doing so first.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous two steps were meant to stop updating the explicit
refname the user gave to the command to a different ref that points
at it. Most notably, we no longer substitute a branch name the user
used with a name of the tag that points at the commit at the tip of
the branch (it still can be done with "local-branch:remote-tag").
However, they also lost the code that included the message in a
tag when the user _did_ ask the tag to be pulled. Resurrect it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows a user to say that a local branch has a different name on
the remote server, using the same syntax that "git push" uses to create
that situation.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current 'request-pull' will try to find matching commit on the given
remote, and rewrite the "please pull" line to match that remote ref.
That may be very helpful if your local tree doesn't match the layout of
the remote branches, but for the common case it's been a recurring
disaster, when "request-pull" is done against a delayed remote update, and
it rewrites the target branch randomly to some other branch name that
happens to have the same expected SHA1 (or more commonly, leaves it
blank).
To avoid that recurring problem, this changes "git request-pull" so that
it matches the ref name to be pulled against the *local* repository, and
then warns if the remote repository does not have that exact same branch
or tag name and content.
This means that git request-pull will never rewrite the ref-name you gave
it. If the local branch name is "xyzzy", that is the only branch name
that request-pull will ask the other side to fetch.
If the remote has that branch under a different name, that's your problem
and git request-pull will not try to fix it up (but git request-pull will
warn about the fact that no exact matching branch is found, and you can
edit the end result to then have the remote name you want if it doesn't
match your local one).
The new "find local ref" code will also complain loudly if you give an
ambiguous refname (eg you have both a tag and a branch with that same
name, and you don't specify "heads/name" or "tags/name").
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 1a72cfd (commit -v: strip diffs and submodule shortlogs from the
commit message - 2013-12-05) we have a less fragile way to cut out
"git status" at the end of a commit message but it's only enabled for
stripping submodule shortlogs.
Add new cleanup option that reuses the same mechanism for the entire
"git status" without accidentally removing lines starting with '#'.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
db5360f3f4 (name-hash: refactor polymorphic index_name_exists();
2013-09-17) split index_name_exists() into index_file_exists() and
index_dir_exists() but retained index_name_exists() as a thin wrapper
to avoid disturbing possible in-flight topics. Since this change
landed in 'master' some time ago and there are no in-flight topics
referencing index_name_exists(), retire it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tests are checking that :
- when commit.gpgsign is true, "git commit" creates signed commits
- when commit.gpgsign is false, "git commit" creates unsigned commits
- when commit.gpgsign is true, "git commit --no-gpg-sign" creates
unsigned commits
- when commit.gpgsign is true, "git rebase -f" creates signed commits
Signed-off-by: Nicolas Vigier <boklm@mars-attacks.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document how to override commit.gpgsign configuration that is set to
true per "git commit" invocation (parse-options machinery lets us
say "--no-gpg-sign" to do so).
"git commit-tree" does not use parse-options, so manually add the
corresponding option for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you want to GPG sign all your commits, you have to add the -S option
all the time. The commit.gpgsign config option allows to sign all
commits automatically.
Signed-off-by: Nicolas Vigier <boklm@mars-attacks.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When QUICK is set (i.e. with --quiet) we try to do as little work as
possible, stopping after seeing the first change. stat-dirty is
considered a "change" but it may turn out not, if no actual content is
changed. The actual content test is performed too late in the process
and the shortcut may be taken prematurely, leading to incorrect return
code.
Assume we do "git diff --quiet". If we have a stat-dirty file "a" and
a really dirty file "b". We break the loop in run_diff_files() and
stop after "a" because we have got a "change". Later in
diffcore_skip_stat_unmatch() we find out "a" is actually not
changed. But there's nothing else in the diff queue, we incorrectly
declare "no change", ignoring the fact that "b" is changed.
This also happens to "git diff --quiet HEAD" when it hits
diff_can_quit_early() in oneway_diff().
This patch does the content test earlier in order to keep going if "a"
is unchanged. The test result is cached so that when
diffcore_skip_stat_unmatch() is done in the end, we spend no cycles on
re-testing "a".
Reported-by: IWAMOTO Toshihiro <iwamoto@valinux.co.jp>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
where "correct paths" stands for paths that are different to all
parents.
Up until now, we were testing combined diff only on one file, or on
several files which were all different (t4038-diff-combined.sh).
As recent thinko in "simplify intersect_paths() further" showed, and
also, since we are going to rework code for finding paths different to
all parents, lets write at least basic tests.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Linus once said:
I actually wish more people understood the really core low-level
kind of coding. Not big, complex stuff like the lockless name
lookup, but simply good use of pointers-to-pointers etc. For
example, I've seen too many people who delete a singly-linked
list entry by keeping track of the "prev" entry, and then to
delete the entry, doing something like
if (prev)
prev->next = entry->next;
else
list_head = entry->next;
and whenever I see code like that, I just go "This person
doesn't understand pointers". And it's sadly quite common.
People who understand pointers just use a "pointer to the entry
pointer", and initialize that with the address of the
list_head. And then as they traverse the list, they can remove
the entry without using any conditionals, by just doing a "*pp =
entry->next".
Applying that simplification lets us lose 7 lines from this function
even while adding 2 lines of comment.
I was tempted to squash this into the original commit, but because
the benchmarking described in the commit log is without this
simplification, I decided to keep it a separate follow-up patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The field was used in order to speed-up name comparison and also to
mark removed paths by setting it to 0.
Because the updated code does significantly less strcmp and also
just removes paths from the list and free right after we know a path
will not be needed, it is not needed anymore.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When generating combined diff, for each commit, we intersect diff
paths from diff(parent_0,commit) to diff(parent_i,commit) comparing
all paths pairs, i.e. doing it the quadratic way. That is correct,
but could be optimized.
Paths come from trees in sorted (= tree) order, and so does diff_tree()
emits resulting paths in that order too. Now if we look at diffcore
transformations, all of them, except diffcore_order, preserve resulting
path ordering:
- skip_stat_unmatch, grep, pickaxe, filter
-- just skip elements -> order stays preserved
- break -- just breaks diff for a path, adding path
dup after the path -> order stays preserved
- detect rename/copy -- resulting paths are emitted sorted
(verified empirically)
So only diffcore_order changes diff paths ordering.
But diffcore_order meaning affects only presentation - i.e. only how to
show the diff, so we could do all the internal computations without
paths reordering, and order only resultant paths set. This is faster,
since, if we know two paths sets are all ordered, their intersection
could be done in linear time.
This patch does just that.
Timings for `git log --raw --no-abbrev --no-renames` without `-c` ("git log")
and with `-c` ("git log -c") before and after the patch are as follows:
linux.git v3.10..v3.11
log log -c
before 1.9s 20.4s
after 1.9s 16.6s
navy.git (private repo)
log log -c
before 0.83s 15.6s
after 0.83s 2.1s
P.S.
I think linux.git case is sped up not so much as the second one, since
in navy.git, there are more exotic (subtree, etc) merges.
P.P.S.
My tracing showed that the rest of the time (16.6s vs 1.9s) is usually
spent in computing huge diffs from commit to second parent. Will try to
deal with it, if I'll have time.
P.P.P.S.
For combine_diff_path, ->len is not needed anymore - will remove it in
the next noisy cleanup path, to maintain good signal/noise ratio here.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next patch combine-diff will have special code-path for taking
orderfile into account. Prepare for making changes by introducing
coverage tests for that case.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diffcore_order() interface only accepts a queue of `struct
diff_filepair`.
In the next patches, we'll want to order `struct combine_diff_path`
by path, so let's first rework diffcore-order to also provide
generic low-level interface for ordering arbitrary objects, provided
they have path accessors.
The new interface is:
- `struct obj_order` for describing objects to ordering routine, and
- order_objects() for actually doing the ordering work.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This continues 4651ece8 (Switch over tree descriptors to contain a
pre-parsed entry) and moves the only rest computational part
mode = canon_mode(mode)
from tree_entry_extract() to tree entry decode phase - to
decode_tree_entry().
The reason to do it, is that canon_mode() is at least 2 conditional
jumps for regular files, and that could be noticeable should canon_mode()
be invoked several times.
That does not matter for current Git codebase, where typical tree
traversal is
while (t->size) {
sha1 = tree_entry_extract(t, &path, &mode);
...
update_tree_entry(t);
}
i.e. we do t -> sha1,path.mode "extraction" only once per entry. In such
cases, it does not matter performance-wise, where that mode
canonicalization is done - either once in tree_entry_extract(), or once
in decode_tree_entry() called by update_tree_entry() - it is
approximately the same.
But for future code, which could need to work with several tree_desc's
in parallel, it could be handy to operate on tree_desc descriptors, and
do "extracts" only when needed, or at all, access only relevant part of
it through structure fields directly.
And for such situations, having canon_mode() be done once in decode
phase is better - we won't need to pay the performance price of 2 extra
conditional jumps on every t->mode access.
So let's move mode canonicalization to decode_tree_entry(). That was the
final bit. Now after tree entry is decoded, it is fully ready and could
be accessed either directly via field, or through tree_entry_extract()
which this time got really "totally trivial".
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cmd_clean() has the exact same code of index_name_is_other(). Reduce
code duplication.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This instance was left out when many match_pathspec() call sites that
take input from dir_entry were converted to dir_path_match() because
it passed a path with the trailing slash stripped out to match_pathspec()
while the others did not. Stripping for all call sites back then would
be a regression because match_pathspec() did not know how to match
pathspec foo/ against _directory_ foo (the stripped version of path
"foo/").
match_pathspec() knows how to do it now. And dir_path_match() strips
the trailing slash also. Use the new function, because the stripping
code is removed in the next patch.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which
makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce
the same output. Previously only the version without trailing slash
returns the difference (if any).
That's the effect of new ce_path_match(). dir_path_match() is not
executed by the new tests. And it should not introduce regressions.
Previously if path "dir/" is passed in with pathspec "dir/", they
obviously match. With new dir_path_match(), the path becomes
_directory_ "dir" vs pathspec "dir/", which is not executed by the old
code path in m_p_i(). The new code path is executed and produces the
same result.
The other case is pathspec "dir" and path "dir/" is now turned to
"dir" (with DO_MATCH_DIRECTORY). Still the same result before or after
the patch.
So why change? Because of the next patch about clean.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently we do support matching pathspec "foo/" against directory
"foo". That is because match_pathspec() has no way to tell "foo" is a
directory and matching "foo/" against _file_ "foo" is wrong.
The callers can now tell match_pathspec if "foo" is a directory, we
could make an exception for this case. Code is not executed though
because no callers pass the flag yet.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>