Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.
In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: https://github.com/github/git-lfs/pull/1382
This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.
A few key decisions:
* The long running filter process is referred to as filter protocol
version 2 because the existing single shot filter invocation is
considered version 1.
* Git sends a welcome message and expects a response right after the
external filter process has started. This ensures that Git will not
hang if a version 1 filter is incorrectly used with the
filter.<driver>.process option for version 2 filters. In addition,
Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
before the actual response and (if necessary!) re-set after the
response. The advantage of this two step status response is that if
the filter detects an error early, then the filter can communicate
this and Git does not even need to create structures to read the
response.
* All status responses are pkt-line lists terminated with a flush
packet. This allows us to send other status fields with the same
protocol in the future.
Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor the existing 'single shot filter mechanism' and prepare the
new 'long running filter mechanism'.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
apply_filter() returns a boolean that tells the caller if it
"did convert or did not convert". The variable `ret` was used throughout
the function to track errors whereas `1` denoted success and `0`
failure. This is unusual for the Git source where `0` denotes success.
Rename the variable and flip its value to make the function easier
readable for Git developers.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_packetized_from_fd() and write_packetized_from_buf() write a
stream of packets. All content packets use the maximal packet size
except for the last one. After the last content packet a `flush` control
packet is written.
read_packetized_to_strbuf() reads arbitrary sized packets until it
detects a `flush` packet.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packet_write_fmt_gently() uses format_packet() which lets the caller
only send string data via "%s". That means it cannot be used for
arbitrary data that may contain NULs.
Add packet_write_gently() which writes arbitrary data and does not die
in case of an error. The function is used by other pkt-line functions in
a subsequent patch.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packet_flush() would die in case of a write error even though for some
callers an error would be acceptable. Add packet_flush_gently() which
writes a pkt-line flush packet like packet_flush() but does not die in
case of an error. The function is used in a subsequent patch.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packet_write_fmt() would die in case of a write error even though for
some callers an error would be acceptable. Add packet_write_fmt_gently()
which writes a formatted pkt-line like packet_write_fmt() but does not
die in case of an error. The function is used in a subsequent patch.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extracted set_packet_header() function converts an integer to a 4 byte
hex string. Make this function locally available so that other pkt-line
functions could use it.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.
packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some processes might want to perform cleanup tasks before Git kills them
due to the 'clean_on_exit' flag. Let's give them an interface for doing
this. The feature is used in a subsequent patch.
Please note, that the cleanup callback is not executed if Git dies of a
signal. The reason is that only "async-signal-safe" functions would be
allowed to be call in that case. Since we cannot control what functions
the callback will use, we will not support the case. See 507d7804 for
more details.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move check_pipe() to run_command and make it public. This is necessary
to call the function from pkt-line in a subsequent patch.
While at it, make async_exit() static to run_command.c as it is no
longer used from outside.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `test_config` to set the config, check that files are empty with
`test_must_be_empty`, compare files with `test_cmp`, and remove spaces
after ">" and "<".
Please note that the "rot13" filter configured in "setup" keeps using
`git config` instead of `test_config` because subsequent tests might
depend on it.
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard
to read in error messages. Quote them to improve the readability.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the log formatting function to know about "git describe" output
such as "v2.8.0-4-g867ad08", in addition to just plain "867ad08".
There are still many valid refnames that we don't link to
e.g. v2.10.0-rc1~2^2~1 is also a valid way to refer to
v2.8.0-4-g867ad08, but I'm not supporting that with this commit,
similarly it's trivially possible to create some refnames like
"æ/var-gf6727b0" or which won't be picked up by this regex.
There's surely room for improvement here, but I just wanted to address
the very common case of sticking "git describe" output into commit
messages without trying to link to all possible refnames, that's going
to be a rather futile exercise given that this is free text, and it
would be prohibitively expensive to look up whether the references in
question exist in our repository.
There was on-list discussion about how we could do better than this
patch. Junio suggested to update parse_commits() to call a new
"gitweb--helper" command which would pass each of the revision
candidates through "rev-parse --verify --quiet". That would cut down
on our false positives (e.g. we'll link to "deadbeef"), and also allow
us to be more aggressive in selecting candidate revisions.
That may be too expensive to work in practice, or it may
not. Investigating that would be a good follow-up to this patch.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the minimum length of an abbreviated object identifier in the
commit message gitweb tries to turn into link from 8 hexchars to 7.
This arbitrary minimum length of 8 was introduced in bfe2191 ("gitweb:
SHA-1 in commit log message links to "object" view", 2006-12-10), but
the default abbreviation length is 7, and has been for a long time.
It's still possible to reference SHA-1s down to 4 characters in length,
see v1.7.4-1-gdce9648's MINIMUM_ABBREV, but I can't see how to make
git actually produce that, so I doubt anyone is putting that into log
messages in practice, but people definitely do put 7 character SHA-1s
into log messages.
I think it's fairly dubious to link to things matching [0-9a-fA-F]
here as opposed to just [0-9a-f], that dates back to the initial
version of gitweb from 161332a ("first working version",
2005-08-07). Git will accept all-caps SHA-1s, but didn't ever produce
them as far as I can tell.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change a typo'd MIME type in a comment. The Content-Type is
application/xhtml+xml, not application/xhtm+xml.
Fixes up code originally added in 53c4031 ("gitweb: Strip
non-printable characters from syntax highlighter output", 2011-09-16).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change "const char *" to "char *" in struct trailer_item and in the
return value of apply_command (since those strings are owned strings).
Change "struct conf_info *" to "const struct conf_info *" (since that
struct is not modified).
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.
This has gone unnoticed for several years because it
requires a fairly unique setup to matter:
1. You need to have a lot of packs on the client side to
make reprepare_packed_git() expensive (the most
expensive part is finding duplicates in an unsorted
list, which is currently quadratic).
2. You need a large number of tag refs on the server side
that are candidates for auto-following (i.e., that the
client doesn't have). Each one triggers a re-read of
the pack directory.
3. Under normal circumstances, the client would
auto-follow those tags and after one large fetch, (2)
would no longer be true. But if those tags point to
history which is disconnected from what the client
otherwise fetches, then it will never auto-follow, and
those candidates will impact it on every fetch.
So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).
This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.
Here are results from the included perf script, which sets
up a situation similar to the one described above:
Test HEAD^ HEAD
----------------------------------------------------------
5550.4: fetch 11.21(10.42+0.78) 0.08(0.04+0.02) -99.3%
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark error messages for translation passed to error() and die()
functions.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark messages for translation printed to stderr.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test introduced in this commit succeeds without the patch to Git.pm
if Mail::Address is installed, but fails otherwise because our in-house
parser does not accept any text after the email address. They succeed
both with and without Mail::Address after this commit.
Mail::Address accepts extra text and considers it as part of the name,
iff the address is surrounded with <...>. The implementation mimics
this behavior as closely as possible.
This mostly restores the behavior we had before b1c8a11 (send-email:
allow multiple emails using --cc, --to and --bcc, 2015-06-30), but we
keep the possibility to handle comma-separated lists.
Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In bare repositories, get_worktrees() still returns the main repository,
so git worktree list can show it. ignore it in find_shared_symref so we
can still check out the main branch.
Signed-off-by: Dennis Kaarsemaker <dennis@kaarsemaker.net>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-svn internals were previously not aware of repository
layout differences for users of the "git worktree" command.
Introduce this awareness by using "git rev-parse --git-path"
instead of relying on outdated uses of GIT_DIR and friends.
Thanks-to: Duy Nguyen <pclouds@gmail.com>
Reported-by: Mathieu Arnold <mat@freebsd.org>
Signed-off-by: Eric Wong <e@80x24.org>
Reducing the scope of where we change the record separator ($/)
avoids bugs in calls which rely on the input record separator
further down, such as the 'chomp' usage in command_oneline.
This is necessary for a future change to git-svn, but exists in
Git.pm since it seems useful for gitweb and our other Perl
scripts, too.
Signed-off-by: Eric Wong <e@80x24.org>
The --fork-point option looks in the reflog to try to find
where a derived branch forked from a base branch. However,
if the reflog for the base branch is totally empty (as it
commonly is right after cloning, which does not write a
reflog entry), then our for_each_reflog call will not find
any entries, and we will come up with no merge base, even
though there may be one with the current tip of the base.
We can fix this by just adding the current tip to
our list of collected entries.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
d64ea0f83b ("git-compat-util: add xstrdup_or_null helper",
2015-01-12) added a handy wrapper that allows us to get a duplicate
of a string or NULL if the original is NULL, but a handful of
codepath predate its introduction or just weren't aware of it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
According to gpg2's doc/DETAILS:
For each signature only one of the codes GOODSIG, BADSIG,
EXPSIG, EXPKEYSIG, REVKEYSIG or ERRSIG will be emitted.
gpg1 ("classic") behaves the same (although doc/DETAILS differs).
Currently, we parse gpg's status output for GOODSIG, BADSIG and
trust information and translate that into status codes G, B, U, N
for the %G? format specifier.
git-verify-* returns success in the GOODSIG case only. This is
somewhat in disagreement with gpg, which considers the first 5 of
the 6 above as VALIDSIG, but we err on the very safe side.
Introduce additional status codes E, X, Y, R for ERRSIG, EXPSIG,
EXPKEYSIG, and REVKEYSIG so that a user of %G? gets more information
about the absence of a 'G' on first glance.
Requested-by: Alex <agrambot@gmail.com>
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The concerned message was marked for translation by 0c99171
("get_short_sha1: mark ambiguity error for translation", 2016-09-26).
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like a lot of old commit-traversal code, this keeps a
commit_list in commit-date order, and and inserts parents
into the list. This means each insertion is potentially
linear, and the whole thing is quadratic (though the exact
runtime depends on the relationship between the commit dates
and the parent topology).
These days we have a priority queue, which can do the same
thing with a much better worst-case time.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A low-level function verify_packfile() was meant to show errors
that were detected without dying itself, but under some conditions
it didn't and died instead, which has been fixed.
* jk/verify-packfile-gently:
verify_packfile: check pack validity before accessing data
"git worktree", even though it used the default_abbrev setting that
ought to be affected by core.abbrev configuration variable, ignored
the variable setting. The command has been taught to read the
default set of configuration variables to correct this.
* jc/worktree-config:
worktree: honor configuration variables
Codepaths that read from an on-disk loose object were too loose in
validating what they are reading is a proper object file and
sometimes read past the data they read from the disk, which has
been corrected. H/t to Gustavo Grieco for reporting.
* jc/verify-loose-object-header:
unpack_sha1_header(): detect malformed object header
streaming: make sure to notice corrupt object
The original command line syntax for "git merge", which was "git
merge <msg> HEAD <parent>...", has been deprecated for quite some
time, and "git gui" was the last in-tree user of the syntax. This
is finally fixed, so that we can move forward with the deprecation.
* rs/git-gui-use-modern-git-merge-syntax:
git-gui: stop using deprecated merge syntax
An author name, that spelled a backslash-quoted double quote in the
human readable part "My \"double quoted\" name", was not unquoted
correctly while applying a patch from a piece of e-mail.
* kd/mailinfo-quoted-string:
mailinfo: unescape quoted-pair in header fields
t5100-mailinfo: replace common path prefix with variable
Almost everybody uses DEFAULT_ABBREV to refer to the default
setting for the abbreviation, but "git blame" peeked into
underlying variable bypassing the macro for no good reason.
* jc/blame-abbrev:
blame: use DEFAULT_ABBREV macro
The "graph" API used in "git log --graph" miscounted the number of
output columns consumed so far when drawing a padding line, which
has been fixed; this did not affect any existing code as nobody
tried to write anything after the padding on such a line, though.
* jk/graph-padding-fix:
graph: fix extra spaces in graph_padding_line
The code that parses the format parameter of for-each-ref command
has seen a micro-optimization.
* sg/ref-filter-parse-optim:
ref-filter: strip format option after a field name only once while parsing
This is based on the existing gnome-keyring helper, but instead of
libgnome-keyring (which was specific to GNOME and is deprecated), it
uses libsecret which can support other implementations of XDG Secret
Service API.
Passes t0303-credential-external.sh.
Signed-off-by: Mantas Mikulėnas <grawity@gmail.com>
Reviewed-by: Dennis Kaarsemaker <dennis@kaarsemaker.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The <tree-ish> parameter is actually optional (see man page).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach mergetool to pass "-O<orderfile>" down to `git diff` when
specified on the command-line.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: David Aguilar <davvid@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach mergetool to get the list of files to edit via `diff` so that we
gain support for diff.orderFile.
Suggested-by: Luis Gutierrez <luisgutz@gmail.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: David Aguilar <davvid@gmail.com>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>