When upload-pack advertises refs, we attempt to peel tags
and advertise the peeled version. We currently hand-roll the
tag dereferencing, and use as many optimizations as we can
to avoid loading non-tag objects into memory.
Not only has peel_ref recently learned these optimizations,
too, but it also contains an even more important one: it
has access to the "peeled" data from the pack-refs file.
That means we can avoid not only loading annotated tags
entirely, but also avoid doing any kind of object lookup at
all.
This cut the CPU time to advertise refs by 50% in the
linux-2.6 repo, as measured by:
echo 0000 | git-upload-pack . >/dev/null
best-of-five, warm cache, objects and refs fully packed:
[before] [after]
real 0m0.026s real 0m0.013s
user 0m0.024s user 0m0.008s
sys 0m0.000s sys 0m0.000s
Those numbers are irrelevantly small compared to an actual
fetch. Here's a larger repo (400K refs, of which 12K are
unique, and of which only 107 are unique annotated tags):
[before] [after]
real 0m0.704s real 0m0.596s
user 0m0.600s user 0m0.496s
sys 0m0.096s sys 0m0.092s
This shows only a 15% speedup (mostly because it has fewer
actual tags to parse), but a larger absolute value (100ms,
which isn't a lot compared to a real fetch, but this
advertisement happens on every fetch, even if the client is
just finding out they are completely up to date).
In truly pathological cases, where you have a large number
of unique annotated tags, it can make an even bigger
difference. Here are the numbers for a linux-2.6 repository
that has had every seventh commit tagged (so about 50K
tags):
[before] [after]
real 0m0.443s real 0m0.097s
user 0m0.416s user 0m0.080s
sys 0m0.024s sys 0m0.012s
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of peel_ref is to dereference tags; if the base
object is not a tag, then we can return early without even
loading the object into memory.
This patch accomplishes that by checking sha1_object_info
for the type. For a packed object, we can get away with just
looking in the pack index. For a loose object, we only need
to inflate the first couple of header bytes.
This is a bit of a gamble; if we do find a tag object, then
we will end up loading the content anyway, and the extra
lookup will have been wasteful. However, if it is not a tag
object, then we save loading the object entirely. Depending
on the ratio of non-tags to tags in the input, this can be a
minor win or minor loss.
However, it does give us one potential major win: if a ref
points to a large blob (e.g., via an unannotated tag), then
we can avoid looking at it entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea of the peel_ref function is to dereference tag
objects recursively until we hit a non-tag, and return the
sha1. Conceptually, it should return 0 if it is successful
(and fill in the sha1), or -1 if there was nothing to peel.
However, the current behavior is much more confusing. For a
regular loose ref, the behavior is as described above. But
there is an optimization to reuse the peeled-ref value for a
ref that came from a packed-refs file. If we have such a
ref, we return its peeled value, even if that peeled value
is null (indicating that we know the ref definitely does
_not_ peel).
It might seem like such information is useful to the caller,
who would then know not to bother loading and trying to peel
the object. Except that they should not bother loading and
trying to peel the object _anyway_, because that fallback is
already handled by peel_ref. In other words, the whole point
of calling this function is that it handles those details
internally, and you either get a sha1, or you know that it
is not peel-able.
This patch catches the null sha1 case internally and
converts it into a -1 return value (i.e., there is nothing
to peel). This simplifies callers, which do not need to
bother checking themselves.
Two callers are worth noting:
- in pack-objects, a comment indicates that there is a
difference between non-peelable tags and unannotated
tags. But that is not the case (before or after this
patch). Whether you get a null sha1 has to do with
internal details of how peel_ref operated.
- in show-ref, if peel_ref returns a failure, the caller
tries to decide whether to try peeling manually based on
whether the REF_ISPACKED flag is set. But this doesn't
make any sense. If the flag is set, that does not
necessarily mean the ref came from a packed-refs file
with the "peeled" extension. But it doesn't matter,
because even if it didn't, there's no point in trying to
peel it ourselves, as peel_ref would already have done
so. In other words, the fallback peeling is guaranteed
to fail.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are asked to peel a ref to a sha1, we internally call
deref_tag, which will recursively parse each tagged object
until we reach a non-tag. This has the benefit that we will
verify our ability to load and parse the pointed-to object.
However, there is a performance downside: we may not need to
load that object at all (e.g., if we are listing peeled
simply listing peeled refs), or it may be a large object
that should follow a streaming code path (e.g., an annotated
tag of a large blob).
It makes more sense for peel_ref to choose the fast thing
rather than performing the extra check, for two reasons:
1. We will already sometimes short-circuit the tag parsing
in favor of a peeled entry from a packed-refs file. So
we are already favoring speed in some cases, and it is
not wise for a caller to rely on peel_ref to detect
corruption.
2. We already silently ignore much larger corruptions,
like a ref that points to a non-existent object, or a
tag object that exists but is corrupted.
2. peel_ref is not the right place to check for such a
database corruption. It is returning only the sha1
anyway, not the actual object. Any callers which use
that sha1 to load an object will soon discover the
corruption anyway, so we are really just pushing back
the discovery to later in the program.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Describe the behaviour, but do warn people against taking it too
literally and expect an abbreviation valid today will stay valid
forever.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When refactoring the merge-base computation to reduce the pairwise
O(n*(n-1)) traversals to parallel O(n) traversals, the code forgot
that timestamp based heuristics needs each commit to have been
parsed. This caused an empty "git pull" to spend cycles, traversing
the history all the way down to 0 (because an unparsed commit object
has 0 timestamp, and any other commit object with positive timestamp
will be processed for its parents, all getting parsed), only to come
up with a merge message to be used.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the commands from the "log" family the "--grep-reflog" option
to limit output by string that appears in the reflog entry when the
"--walk-reflogs" option is in effect.
* nd/grep-reflog:
revision: make --grep search in notes too if shown
log --grep-reflog: reject the option without -g
revision: add --grep-reflog to filter commits by reflog messages
grep: prepare for new header field filter
A patch attached as application/octet-stream (e.g. not text/*) were
mishandled, not correctly honoring Content-Transfer-Encoding
(e.g. base64).
* lt/mailinfo-handle-attachment-more-sanely:
mailinfo: don't require "text" mime type for attachments
"gc --auto" notified the user that auto-packing has triggered even
under the "--quiet" option.
* tu/gc-auto-quiet:
silence git gc --auto --quiet output
When a tag T points at an object X that is of a type that is
different from what the tag records as, fsck should report it as an
error.
However, depending on the order X and T are checked individually,
the actual error message can be different. If X is checked first,
fsck remembers X's type and then when it checks T, it notices that T
records X as a wrong type (i.e. the complaint is about a broken tag
T). If T is checked first, on the other hand, fsck remembers that we
need to verify X is of the type tag records, and when it later
checks X, it notices that X is of a wrong type (i.e. the complaint
is about a broken object X).
The important thing is that fsck notices such an error and diagnoses
the issue on object X, but the test was expecting that we happen to
check objects in the order to make us detect issues with tag T, not
with object X. Remove this unwarranted assumption.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git submodule frotz" was not diagnosed as "frotz" being an unknown
subcommand to "git submodule"; the user instead got a complaint that
"git submodule status" was run with an unknown path "frotz".
* rr/maint-submodule-unknown-cmd:
submodule: if $command was not matched, don't parse other args
"git fetch" over http advertised that it supports "deflate", which
is much less common, and did not advertise more common "gzip" on its
Accept-Encoding header.
* sp/maint-http-enable-gzip:
Enable info/refs gzip decompression in HTTP client
"git fetch" over http had an old workaround for an unlikely server
misconfiguration; it turns out that this hurts debuggability of the
configuration in general, and has been reverted.
* sp/maint-http-info-refs-no-retry:
Revert "retry request without query when info/refs?query fails"
It already is listed in the "git config" documentation, but people
interested in pushing would first look at "git push" documentation.
Noticed-by: David Glasser
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Fixed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The 'a', 'i' and 'c' commands take a literal text to be added
followed by backslash, but then in the source we cannot indent
the literal text which makes it ugly.
We need to also remember to double the backslash inside double
quotes.
Avoid these issues altogether by having an extra line in a template
file and generate test vectors by deleting the line or replacing the
line and not using the 'a' command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The actual external command to run for mergetool backend can be
specified with difftool/mergetool.$name.cmd configuration
variables, but this mechanism was ignored for the backends we
natively support.
* da/mergetool-custom:
mergetool--lib: Allow custom commands to override built-ins
"git status" honored the ignore=dirty settings in .gitmodules but
"git commit" didn't.
* os/commit-submodule-ignore:
commit: pay attention to submodule.$name.ignore in .gitmodules
Clarify the "blame" documentation to tell the users that there is
no need to ask for "--follow".
* jc/blame-follows-renames:
git blame: document that it always follows origin across whole-file renames
Send errors from "unpack-objects" and "index-pack" back to the "git
push" over the git and smart-http protocols, just like it is done
for a push over the ssh protocol.
* jk/receive-pack-unpack-error-to-pusher:
receive-pack: drop "n/a" on unpacker errors
receive-pack: send pack-processing stderr over sideband
receive-pack: redirect unpack-objects stdout to /dev/null
Running "git fetch" in a repository made with "git clone --single"
slurps all the branches, defeating the point of "--single".
* rt/maint-clone-single:
clone --single: limit the fetch refspec to fetched branch
Previously while reading the variable names in config files, there
was a 256 character limit with at most 128 of those characters being
used by the section header portion of the variable name. This
limitation was only enforced while reading the config files. It was
possible to write a config file that was not subsequently readable.
Instead of enforcing this limitation for both reading and writing,
remove it entirely by changing the var member of the config_file
struct to a strbuf instead of a fixed length buffer. Update all of
the parsing functions in config.c to use the strbuf instead of the
static buffer.
The parsing functions that returned the base length of the variable
name now return simply 0 for success and -1 for failure. The base
length information is obtained through the strbuf's len member.
We now send the buf member of the strbuf to external callback
functions to preserve the external api. None of the external
callers rely on the old size limitation for sizing their own buffers
so removing the limit should have no externally visible effect.
Signed-off-by: Ben Walton <bdwalton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a configuration variable diff.context that tells
Porcelain commands to use a non-default number of context
lines instead of 3 (the default). With this variable, users
do not have to keep repeating "git log -U8" from the command
line; instead, it becomes sufficient to say "git config
diff.context 8" just once.
Signed-off-by: Jeff Muizelaar <jmuizelaar@mozilla.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently "git am" does insane things if the mbox it is given contains
attachments with a MIME type that aren't "text/*".
In particular, it will still decode them, and pass them "one line at a
time" to the mail body filter, but because it has determined that they
aren't text (without actually looking at the contents, just at the mime
type) the "line" will be the encoding line (eg 'base64') rather than a
line of *content*.
Which then will cause the text filtering to fail, because we won't
correctly notice when the attachment text switches from the commit message
to the actual patch. Resulting in a patch failure, even if patch may be a
perfectly well-formed attachment, it's just that the message type may be
(for example) "application/octet-stream" instead of "text/plain".
Just remove all the bogus games with the message_type. The only difference
that code creates is how the data is passed to the filter function
(chunked per-pred-code line or per post-decode line), and that difference
is *wrong*, since chunking things per pre-decode line can never be a
sensible operation, and cannot possibly matter for binary data anyway.
This code goes all the way back to March of 2007, in commit 87ab799234
("builtin-mailinfo.c infrastrcture changes"), and apparently Don used to
pass random mbox contents to git. However, the pre-decode vs post-decode
logic really shouldn't matter even for that case, and more importantly, "I
fed git am crap" is not a valid reason to break *real* patch attachments.
If somebody really cares, and determines that some attachment is binary
data (by looking at the data, not the MIME-type), the whole attachment
should be dismissed, rather than fed in random-sized chunks to
"handle_filter()".
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When adding a new submodule it can happen that .git/modules/<name> already
contains a submodule repo, e.g. when a submodule is removed from the work
tree and another submodule is added at the same path. But then the work
tree of the submodule will be populated using the existing repository and
not the one the user provided, which results in an incorrect work tree. On
the other hand the user might reactivate a submodule removed earlier, then
reusing that .git directory is the Right Thing to do.
As git can't decide what is the case, error out and tell the user she
should use either use a different name for the submodule with the "--name"
option or can reuse the .git directory for the newly added submodule by
providing the --force option (which only makes sense when the upstream
matches, so the error message lists all remotes of .git/modules/<name>).
In one test in t7406 the --force option had to be added to "git submodule
add", as that test re-adds a formerly removed submodule.
Reported-by: Jonathan Johnson <me@jondavidjohn.com>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update German and Simplified Chinese translations.
* 'maint' of git://github.com/git-l10n/git-po:
l10n: de.po: correct translation of a 'rebase' message
l10n: Improve many translation for zh_CN
l10n: Unify the translation for '(un)expected'
* jc/maint-log-grep-all-match-1:
grep.c: make two symbols really file-scope static this time
t7810-grep: test --all-match with multiple --grep and --author options
t7810-grep: test interaction of multiple --grep and --author options
t7810-grep: test multiple --author with --all-match
t7810-grep: test multiple --grep with and without --all-match
t7810-grep: bring log --grep tests in common form
grep.c: mark private file-scope symbols as static
log: document use of multiple commit limiting options
log --grep/--author: honor --all-match honored for multiple --grep patterns
grep: show --debug output only once
grep: teach --debug option to dump the parse tree
With another reroll, it looks like the series is as polished as it
could be.
* rs/archive-zip-utf8:
archive-zip: write extended timestamp
archive-zip: support UTF-8 paths
Revert "archive-zip: support UTF-8 paths"
archive-zip: support UTF-8 paths
Allows users to turn off smart-http when talking to dumb-only
servers.
* jk/smart-http-switch:
remote-curl: let users turn off smart http
remote-curl: rename is_http variable
Kills an old workaround for a unlikely server misconfiguration that
hurts debuggability.
* sp/maint-http-info-refs-no-retry:
Revert "retry request without query when info/refs?query fails"
Teach an option to edit the insn sheet to "git rebase -i".
* aw/rebase-i-edit-todo:
rebase -i: suggest using --edit-todo to fix an unknown instruction
rebase -i: Add tests for "--edit-todo"
rebase -i: Teach "--edit-todo" action
rebase -i: Refactor help messages for todo file
rebase usage: subcommands can not be combined with -i
"git submodule add" initializes the name of a submodule to its path. This
was ok as long as the .git directory lived inside the submodule's work
tree, but since 1.7.8 it is stored in the .git/modules/<name> directory of
the superproject, making the submodule name survive the removal of the
submodule's work tree. This leads to problems when the user tries to add a
different submodule at the same path - and thus the same name - later, as
that will happily try to restore the submodule from the old repository
instead of the one the user specified and will lead to a checkout of the
wrong repository.
Add the new "--name" option to let the user provide a name for the
submodule. This enables the user to solve this conflict without having to
remove .git/modules/<name> by hand (which is no viable solution as it
makes it impossible to checkout a commit that records the old submodule
and populate it, as that will still check out the new submodule for the
same reason).
To achieve that the submodule's name is added to the parameter list of
the module_clone() helper function. This makes it possible to remove the
call of module_name() there because both callers of module_clone() already
know the name and can provide it as argument number two.
Reported-by: Jonathan Johnson <me@jondavidjohn.com>
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>