"git rev-list --count" whose walk-length is limited with "-n"
option did not work well with the counting optimized to look at the
bitmap index.
* jk/rev-list-count-with-bitmap:
rev-list: disable bitmaps when "-n" is used with listing objects
rev-list: "adjust" results of "--count --use-bitmap-index -n"
A bash-ism "local" has been removed from "git submodule" scripted
Porcelain.
* sb/submodule-helper-relative-path:
submodule: remove bashism from shell script
The way how "submodule--helper list" signals unmatch error to its
callers has been updated.
* sb/submodule-helper-list-signal-unmatch-via-exit-status:
submodule--helper: offer a consistent API
You can ask rev-list to use bitmaps to speed up an --objects
traversal, which should generally give you your answers much
faster.
Likewise, you can ask rev-list to limit such a traversal
with `-n`, in which case we'll show only a limited set of
commits (and only the tree and commit objects directly
reachable from those commits).
But if you do both together, the results are nonsensical. We
end up limiting any fallback traversal we do to _find_ the
bitmaps, but the actual set of objects we output will be
picked arbitrarily from the union of any bitmaps we do find,
and will involve the objects of many more commits.
It's possible that somebody might want this as a "show me
what you can, but limit the amount of work you do" flag.
But as with the prior commit clamping "--count", the results
are basically non-deterministic; you'll get the values from
some commits between `n` and the total number, and you can't
tell which.
And unlike the `--count` case, we can't easily generate the
"real" value from the bitmap values (you can't just walk
back `-n` commits and subtract out the reachable objects
from the boundary commits; the bitmaps for `X` record its
total reachability, so you don't know which objects are
directly from `X` itself, which from `X^`, and so on).
So let's just fallback to the non-bitmap code path in this
case, so we always give a sane answer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you ask rev-list for:
git rev-list --count --use-bitmap-index HEAD
we optimize out the actual traversal and just give you the
number of bits set in the commit bitmap. This is faster,
which is good.
But if you ask to limit the size of the traversal, like:
git rev-list --count --use-bitmap-index -n 100 HEAD
we'll still output the full bitmapped number we found. On
the surface, that might even seem OK. You explicitly asked
to use the bitmap index, and it was cheap to compute the
real answer, so we gave it to you.
But there's something much more complicated going on under
the hood. If we don't have a bitmap directly for HEAD, then
we have to actually traverse backwards, looking for a
bitmapped commit. And _that_ traversal is bounded by our
`-n` count.
This is a good thing, because it bounds the work we have to
do, which is probably what the user wanted by asking for
`-n`. But now it makes the output quite confusing. You might
get many values:
- your `-n` value, if we walked back and never found a
bitmap (or fewer if there weren't that many commits)
- the actual full count, if we found a bitmap root for
every path of our traversal with in the `-n` limit
- any number in between! We might have walked back and
found _some_ bitmaps, but then cut off the traversal
early with some commits not accounted for in the result.
So you cannot even see a value higher than your `-n` and say
"OK, bitmaps kicked in, this must be the real full count".
The only sane thing is for git to just clamp the value to a
maximum of the `-n` value, which means we should output the
exact same results whether bitmaps are in use or not.
The test in t5310 demonstrates this by using `-n 1`.
Without this patch we fail in the full-bitmap case (where we
do not have to traverse at all) but _not_ in the
partial-bitmap case (where we have to walk down to find an
actual bitmap). With this patch, both cases just work.
I didn't implement the crazy in-between case, just because
it's complicated to set up, and is really a subset of the
full-count case, which we do cover.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Junio pointed out `relative_path` was using bashisms via the
local variables. As the longer term goal is to rewrite most of the
submodule code in C, do it now.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 48308681 (2016-02-29, git submodule update: have a dedicated helper
for cloning), the helper communicated errors back only via exit code,
and dance with printing '#unmatched' in case of error was left to
git-submodule.sh as it uses the output of the helper and pipes it into
shell commands. This change makes the helper consistent by never
printing '#unmatched' in the helper but always handling these piping
issues in the actual shell script.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cat-file --batch-all" has been sped up, by taking advantage
of the fact that it does not have to read a list of objects, in two
ways.
* jk/cat-file-buffered-batch-all:
cat-file: default to --buffer when --batch-all-objects is used
cat-file: avoid noop calls to sha1_object_info_extended
Get rid of magic numbers and avoid running over the end of a NUL
terminated string by using starts_with() and skip_prefix() instead
of memcmp().
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many commands normalize command line arguments from NFD to NFC
variant of UTF-8 on OSX, but commands in the "diff" family did
not, causing "git diff $path" to complain that no such path is
known to Git. They have been taught to do the normalization.
* ar/diff-args-osx-precompose:
diff: run arguments through precompose_argv
"git commit" learned to pay attention to "commit.verbose"
configuration variable and act as if "--verbose" option was
given from the command line.
* pb/commit-verbose-config:
commit: add a commit.verbose config variable
t7507-commit-verbose: improve test coverage by testing number of diffs
parse-options.c: make OPTION_COUNTUP respect "unspecified" values
t/t7507: improve test coverage
t0040-parse-options: improve test coverage
test-parse-options: print quiet as integer
t0040-test-parse-options.sh: fix style issues
"git format-patch" learned a new "--base" option to record what
(public, well-known) commit the original series was built on in
its output.
* xy/format-patch-base:
format-patch: introduce format.useAutoBase configuration
format-patch: introduce --base=auto option
format-patch: add '--base' option to record base tree info
patch-ids: make commit_patch_id() a public helper function
The experimental "multiple worktree" feature gains more safety to
forbid operations on a branch that is checked out or being actively
worked on elsewhere, by noticing that e.g. it is being rebased.
* nd/worktree-various-heads:
branch: do not rename a branch under bisect or rebase
worktree.c: check whether branch is bisected in another worktree
wt-status.c: split bisect detection out of wt_status_get_state()
worktree.c: check whether branch is rebased in another worktree
worktree.c: avoid referencing to worktrees[i] multiple times
wt-status.c: make wt_status_check_rebase() work on any worktree
wt-status.c: split rebase detection out of wt_status_get_state()
path.c: refactor and add worktree_git_path()
worktree.c: mark current worktree
worktree.c: make find_shared_symref() return struct worktree *
worktree.c: store "id" instead of "git_dir"
path.c: add git_common_path() and strbuf_git_common_path()
dir.c: rename str(n)cmp_icase to fspath(n)cmp
Traditionally cat-file's batch-mode does not do any output
buffering. The reason is that a caller may have pipes
connected to its input and output, and would want to use
cat-file interactively, getting output immediately for each
input it sends.
This may involve a lot of small write() calls, which can be
slow. So we introduced --buffer to improve this, but we
can't turn it on by default, as it would break the
interactive case above.
However, when --batch-all-objects is used, we do not read
stdin at all. We generate the output ourselves as quickly as
possible, and then exit. In this case buffering is a strict
win, and it is simply a hassle for the user to have to
remember to specify --buffer.
This patch makes --buffer the default when --batch-all-objects
is used. Specifying "--buffer" manually is still OK, and you
can even override it with "--no-buffer" if you're a
masochist (or debugging).
For some real numbers, running:
git cat-file --batch-all-objects --batch-check='%(objectname)'
on torvalds/linux goes from:
real 0m1.464s
user 0m1.208s
sys 0m0.252s
to:
real 0m1.230s
user 0m1.172s
sys 0m0.056s
for a 16% speedup.
Suggested-by: Charles Bailey <charles@hashpling.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is not unreasonable to ask cat-file for a batch-check
format of simply "%(objectname)". At first glance this seems
like a noop (you are generally already feeding the object
names on stdin!), but it has a few uses:
1. With --batch-all-objects, you can generate a listing of
the sha1s present in the repository, without any input.
2. You do not have to feed sha1s; you can feed arbitrary
sha1 expressions and have git resolve them en masse.
3. You can even feed a raw sha1, with the result that git
will tell you whether we actually have the object or
not.
In case 3, the call to sha1_object_info is useful; it tells
us whether the object exists or not (technically we could
swap this out for has_sha1_file, but the cost is roughly the
same).
In case 2, the existence check is of debatable value. A
mass-resolution might prefer performance to safety (against
outputting a value for a corrupted ref, for example).
However, the object lookup cost is likely not as noticeable
compared to the resolution cost. And since we have provided
that safety in the past, the conservative choice is to keep
it.
In case 1, though, the object lookup is a definite noop; we
know about the object because we found it in the object
database. There is no new information gained by making the
call.
This patch detects that case and optimizes out the call.
Here are best-of-five timings for linux.git:
[before]
$ time git cat-file --buffer \
--batch-all-objects \
--batch-check='%(objectname)'
real 0m2.117s
user 0m2.044s
sys 0m0.072s
[after]
$ time git cat-file --buffer \
--batch-all-objects \
--batch-check='%(objectname)'
real 0m1.230s
user 0m1.176s
sys 0m0.052s
There are two implementation details to note here.
One is that we detect the noop case by seeing that "struct
object_info" does not request any information. But besides
object existence, there is one other piece of information
which sha1_object_info may fill in: whether the object is
cached, loose, or packed. We don't currently provide that
information in the output, but if we were to do so later,
we'd need to take note and disable the optimization in that
case.
And that leads to the second note. If we were to output
that information, a better implementation would be to
remember where we saw the object in --batch-all-objects in
the first place, and avoid looking it up again by sha1.
In fact, we could probably squeeze out some extra
performance for less-trivial cases, too, by remembering the
pack location where we saw the object, and going directly
there to find its information (like type, size, etc). That
would in theory make this optimization unnecessary.
I didn't pursue that path here for two reasons:
1. It's non-trivial to implement, and has memory
implications. Because we sort and de-dup the list of
output sha1s, we'd have to record the pack information
for each object, too.
2. It doesn't save as much as you might hope. It saves the
find_pack_entry() call, but getting the size and type
for deltified objects requires walking down the delta
chain (for the real type) or reading the delta data
header (for the size). These costs tend to dominate the
non-trivial cases.
By contrast, this optimization is easy and self-contained,
and speeds up a real-world case I've used.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code for warning_errno/die_errno has been refactored and a new
error_errno() reporting helper is introduced.
* nd/error-errno: (41 commits)
wrapper.c: use warning_errno()
vcs-svn: use error_errno()
upload-pack.c: use error_errno()
unpack-trees.c: use error_errno()
transport-helper.c: use error_errno()
sha1_file.c: use {error,die,warning}_errno()
server-info.c: use error_errno()
sequencer.c: use error_errno()
run-command.c: use error_errno()
rerere.c: use error_errno() and warning_errno()
reachable.c: use error_errno()
mailmap.c: use error_errno()
ident.c: use warning_errno()
http.c: use error_errno() and warning_errno()
grep.c: use error_errno()
gpg-interface.c: use error_errno()
fast-import.c: use error_errno()
entry.c: use error_errno()
editor.c: use error_errno()
diff-no-index.c: use error_errno()
...
An earlier addition of "sanitize_submodule_env" with 14111fc4 (git:
submodule honor -c credential.* from command line, 2016-02-29)
turned out to be a convoluted no-op; implement what it wanted to do
correctly, and stop filtering settings given via "git -c var=val".
* jk/submodule-c-credential:
submodule: stop sanitizing config options
submodule: use prepare_submodule_repo_env consistently
submodule--helper: move config-sanitizing to submodule.c
submodule: export sanitized GIT_CONFIG_PARAMETERS
t5550: break submodule config test into multiple sub-tests
t5550: fix typo in $HTTPD_URL
Mark several messages for translation.
* va/i18n-misc-updates:
i18n: unpack-trees: avoid substituting only a verb in sentences
i18n: builtin/pull.c: split strings marked for translation
i18n: builtin/pull.c: mark placeholders for translation
i18n: git-parse-remote.sh: mark strings for translation
i18n: branch: move comment for translators
i18n: branch: unmark string for translation
i18n: builtin/rm.c: remove a comma ',' from string
i18n: unpack-trees: mark strings for translation
i18n: builtin/branch.c: mark option for translation
i18n: index-pack: use plural string instead of normal one
Update of "git submodule" to move pieces of logic to C continues.
* sb/submodule-init:
submodule init: redirect stdout to stderr
submodule--helper update-clone: abort gracefully on missing .gitmodules
submodule init: fail gracefully with a missing .gitmodules file
submodule: port init from shell to C
submodule: port resolve_relative_url from shell to C
When running diff commands, a pathspec containing decomposed
unicode code points is not converted to precomposed unicode form
under Mac OS X, but we normalize the paths in the index and the
history to precomposed form on that platform. As a result, the
pathspec would not match and no diff is shown.
Unlike many builtin commands, the "diff" family of commands do
not use parse_options(), which is how other builtin commands
indirectly call precompose_argv() to normalize argv[] into
precomposed form on Mac OSX. Teach these commands to call
precompose_argv() themselves.
Note that precomopose_argv() normalizes not just paths but all
command line arguments, so things like "git diff -G $string"
when $string has the decomposed form would first be normalized
into the precomposed form and would stop hitting the same string
in the decomposed form in the diff output with this change.
It is not a problem per-se, as "log" family of commands already use
parse_options() and call precompose_argv()--we can think of this
change as making the "diff" family of commands behave in a similar
way as the commands in the "log" family.
Signed-off-by: Alexander Rinass <alex@fournova.com>
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git commit-tree" plumbing command required the user to always sign
its result when the user sets the commit.gpgsign configuration
variable, which was an ancient mistake. Rework "git rebase" that
relied on this mistake so that it reads commit.gpgsign and pass (or
not pass) the -S option to "git commit-tree" to keep the end-user
expectation the same, while teaching "git commit-tree" to ignore
the configuration variable. This will stop requiring the users to
sign commit objects used internally as an implementation detail of
"git stash".
* jc/commit-tree-ignore-commit-gpgsign:
commit-tree: do not pay attention to commit.gpgsign
Add commit.verbose configuration variable as a convenience for those
who always prefer --verbose.
Add tests to check the behavior introduced by this commit and also to
verify that behavior of status doesn't break because of this commit.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, improve the error message to say _what_ failed to
remove.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"errno" is already passed in as "err". Here we should use err instead of
errno. errno is probably a copy/paste mistake in e011054 (Teach
git-update-index about gitlinks - 2007-04-12)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, improve the message a bit (what operation failed?) and
mark it for translation since the format string is now a sentence.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All these error() calls do not print error message previously, but
because when they are called, errno should be set. Use error_errno()
instead to give more information.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's one change, in split_mbox(), where an error() without strerror()
as argument is converted to error_errno(). This is correct because the
previous call is fopen (not shown in the context lines), which should
set errno if it returns NULL.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of newlines are also removed, because both error() and
error_errno() automatically append a newline.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add comment drawing translator attention in order to align "Push
URL:" and "Fetch URL:" fields translation of git remote show output.
Aligning both fields makes the output more appealing and easier to
grasp.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move from unsigned char[20] to struct object_id continues.
* bc/object-id:
match-trees: convert several leaf functions to use struct object_id
tree-walk: convert tree_entry_extract() to use struct object_id
struct name_entry: use struct object_id instead of unsigned char sha1[20]
match-trees: convert shift_tree() and shift_tree_by() to use object_id
test-match-trees: convert to use struct object_id
sha1-name: introduce a get_oid() function
Many instances of duplicate words (e.g. "the the path") and
a few typoes are fixed, originally in multiple patches.
wildmatch: fix duplicate words of "the"
t: fix duplicate words of "output"
transport-helper: fix duplicate words of "read"
Git.pm: fix duplicate words of "return"
path: fix duplicate words of "look"
pack-protocol.txt: fix duplicate words of "the"
precompose-utf8: fix typo of "sequences"
split-index: fix typo
worktree.c: fix typo
remote-ext: fix typo
utf8: fix duplicate words of "the"
git-cvsserver: fix duplicate words
Signed-off-by: Li Peng <lip@dtdream.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of having a whitelist of command-line config
options to pass to submodules was two-fold:
1. It prevented obvious nonsense like using core.worktree
for multiple repos.
2. It could prevent surprise when the user did not mean
for the options to leak to the submodules (e.g.,
http.sslverify=false).
For case 1, the answer is mostly "if it hurts, don't do
that". For case 2, we can note that any such example has a
matching inverted surprise (e.g., a user who meant
http.sslverify=true to apply everywhere, but it didn't).
So this whitelist is probably not giving us any benefit, and
is already creating a hassle as people propose things to put
on it. Let's just drop it entirely.
Note that we still need to keep a special code path for
"prepare the submodule environment", because we still have
to take care to pass through $GIT_CONFIG_PARAMETERS (and
block the rest of the repo-specific environment variables).
We can do this easily from within the submodule shell
script, which lets us drop the submodule--helper option
entirely (and it's OK to do so because as a "--" program, it
is entirely a private implementation detail).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git describe --contains" often made a hard-to-justify choice of
tag to give name to a given commit, because it tried to come up
with a name with smallest number of hops from a tag, causing an old
commit whose close descendant that is recently tagged were not
described with respect to an old tag but with a newer tag. It did
not help that its computation of "hop" count was further tweaked to
penalize being on a side branch of a merge. The logic has been
updated to favor using the tag with the oldest tagger date, which
is a lot easier to explain to the end users: "We describe a commit
in terms of the (chronologically) oldest tag that contains the
commit."
* js/name-rev-use-oldest-ref:
name-rev: include taggerdate in considering the best name
ba3c69a9 (commit: teach --gpg-sign option, 2011-10-05) introduced a
"signed commit" by teaching the --[no]-gpg-sign option and the
commit.gpgsign configuration variable to various commands that
create commits.
Teaching these to "git commit" and "git merge", both of which are
end-user facing Porcelain commands, was perfectly fine. Allowing
the plumbing "git commit-tree" to suddenly change the behaviour to
surprise the scripts by paying attention to commit.gpgsign was not.
Among the in-tree scripts, filter-branch, quiltimport, rebase and
stash are the commands that run "commit-tree". If any of these
wants to allow users to always sign every single commit, they should
offer their own configuration (e.g. "filterBranch.gpgsign") with an
option to disable signing (e.g. "git filter-branch --no-gpgsign").
Ignoring commit.gpgsign option _obviously_ breaks the backward
compatibility, but it is easy to follow the standard pattern in
scripts to honor whatever configuration variable they choose to
follow. E.g.
case $(git config --bool commit.gpgsign) in
true) sign=-S ;;
*) sign= ;;
esac &&
git commit-tree $sign ...whatever other args...
Do so to make sure that "git rebase" keeps paying attention to the
configuration variable, which unfortunately is a documented mistake.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reroute the output of stdout to stderr as it is just informative
messages, not to be consumed by machines.
This should not regress any scripts that try to parse the
current output, as the output is already internationalized
and therefore unstable.
We want to init submodules from the helper for `submodule update`
in a later patch and the stdout output of said helper is consumed
by the parts of `submodule update` which are still written in shell.
So we have to be careful which messages are on stdout.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unify internal logic between "git tag -v" and "git verify-tag"
commands by making one directly call into the other.
* st/verify-tag:
tag -v: verify directly rather than exec-ing verify-tag
verify-tag: move tag verification code to tag.c
verify-tag: prepare verify_tag for libification
verify-tag: update variable name and type
t7030: test verifying multiple tags
builtin/verify-tag.c: ignore SIGPIPE in gpg-interface