The filter-process interface learned to allow a process with long
latency give a "delayed" response.
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.
Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.
Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before this change,
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
would have the same effect as
$ echo "* text" >.gitattributes
$ git config core.eol crlf
Since the 'eol' attribute had higher priority than 'text=auto', this may
corrupt binary files and is not what most users expect to happen.
Make the 'eol' attribute to obey 'text=auto' and now
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
behaves the same as
$ echo "* text=auto" >.gitattributes
$ git config core.eol crlf
In other words,
$ echo "* text=auto eol=crlf" >.gitattributes
has the same effect as
$ git config core.autocrlf true
and
$ echo "* text=auto eol=lf" >.gitattributes
has the same effect as
$ git config core.autocrlf input
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When working in a cross-platform environment, a user may want to
check if text files are stored normalized in the repository and
if .gitattributes are set appropriately.
Make it possible to let Git show the line endings in the index and
in the working tree and the effective text/eol attributes.
The end of line ("eolinfo") are shown like this:
"-text" binary (or with bare CR) file
"none" text file without any EOL
"lf" text file with LF
"crlf" text file with CRLF
"mixed" text file with mixed line endings.
The effective text/eol attribute is one of these:
"", "-text", "text", "text=auto", "text eol=lf", "text eol=crlf"
git ls-files --eol gives an output like this:
i/none w/none attr/text=auto t/t5100/empty
i/-text w/-text attr/-text t/test-binary-2.png
i/lf w/lf attr/text eol=lf t/t5100/rfc2047-info-0007
i/lf w/crlf attr/text eol=crlf doit.bat
i/mixed w/mixed attr/ locale/XX.po
to show what eol convention is used in the data in the index ('i'),
and in the working tree ('w'), and what attribute is in effect,
for each path that is shown.
Add test cases in t0027.
Helped-By: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The data is streamed to the filter process anyway. Better avoid mapping
the file if possible. This is especially useful if a clean filter
reduces the size, for example if it computes a sha1 for binary data,
like git media. The file size that the previous implementation could
handle was limited by the available address space; large files for
example could not be handled with (32-bit) msysgit. The new
implementation can filter files of any size as long as the filter output
is small enough.
The new code path is only taken if the filter is required. The filter
consumes data directly from the fd. If it fails, the original data is
not immediately available. The condition can easily be handled as
a fatal error, which is expected for a required filter anyway.
If the filter was not required, the condition would need to be handled
in a different way, like seeking to 0 and reading the data. But this
would require more restructuring of the code and is probably not worth
it. The obvious approach of falling back to reading all data would not
help achieving the main purpose of this patch, which is to handle large
files with limited address space. If reading all data is an option, we
can simply take the old code path right away and mmap the entire file.
The environment variable GIT_MMAP_LIMIT, which has been introduced in
a previous commit is used to test that the expected code path is taken.
A related test that exercises required filters is modified to verify
that the data actually has been modified on its way from the file system
to the object store.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is only the path that matters in the decision whether to filter
or not. Clarify this by making path the only argument of
would_convert_to_git().
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some callers may want to know whether convert_to_git will
actually do anything before performing the conversion
itself (e.g., to decide whether to stream or handle blobs
in-core). This patch lets callers specify the dry run mode
by passing a NULL destination buffer. The return value,
instead of indicating whether conversion happened, will
indicate whether conversion would occur.
For readability, we also include a wrapper function which
makes it more obvious we are not actually performing the
conversion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some filters may need to buffer the input and look-ahead inside it
to decide what to output, and they may consume more than zero bytes
of input and still not produce any output. After feeding all the
input, pass NULL as input as keep calling stream_filter() to let
such filters know there is no more input coming, and it is time for
them to produce the remaining output based on the buffered input.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This introduces an API to plug custom filters to an input stream.
The caller gets get_stream_filter("path") to obtain an appropriate
filter for the path, and then uses it when opening an input stream
via open_istream(). After that, the caller can read from the stream
with read_istream(), and close it with close_istream(), just like an
unfiltered stream.
This only adds a "null" filter that is a pass-thru filter, but later
changes can add LF-to-CRLF and other filters, and the callers of the
streaming API do not have to change.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before adding the streaming filter API to the conversion layer,
move the existing declarations related to the conversion to its
own header file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>