The copyfrom_source instruction appends data from the preimage buffer
to the end of output. Its arguments are a length and an offset
relative to the beginning of the source view.
With this change, the delta applier is able to reproduce all 5,636,613
blobs in the early history of the ASF repository. Tested with
mkfifo backflow
svn-fe <svn-asf-public-r0:940166 3<backflow |
git fast-import --cat-blob-fd=3 3>backflow
with svn-asf-public-r0:940166 produced by whatever version of
Subversion the dumps in /dump/ on svn.apache.org use (presumably
1.6.something).
Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
The copyfrom_target instruction copies appends data that is already
present in the current output view to the end of output. (The offset
argument is relative to the beginning of output produced in the
current window.)
The region copied is allowed to run past the end of the existing
output. To support that case, copy one character at a time rather
than calling memcpy or memmove. This allows copyfrom_target to be
used once to repeat a string many times. For example:
COPYFROM_DATA 2
COPYFROM_OUTPUT 10, 0
DATA "ab"
would produce the output "ababababababababababab".
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
By constraining the format of deltas, we can more easily detect
corruption and other breakage.
Requiring deltas not to provide unconsumed data also opens the
possibility of ignoring the declared amount of novel data and simply
streaming the data as needed to fulfill copyfrom_data requests.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
The copyfrom_data instruction copies a few bytes verbatim from the
novel text section of a window to the postimage.
[jn: with memory leak fix from David]
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Buffer the instruction section upon encountering it for later
interpretation.
An alternative design would involve parsing the instructions
at this point and buffering them in some processed form. Using
the unprocessed form is simpler.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Each window of an svndiff0-format delta includes a section for novel
text to be copied to the postimage (in the order it appears in the
window, possibly interspersed with other data).
Slurp in this data when encountering it. It is not actually necessary
to do so --- it would be just as easy to copy from delta to output
as part of interpreting the relevant instructions --- but this way,
the code that interprets svndiff0 instructions can proceed very
quickly because it does not require I/O.
Subversion's svndiff0 parser rejects deltas that do not consume all
the novel text that was provided. Omit that check for now so we can
test the new functionality right away, rather than waiting to learn
instructions that consume data.
Do check for truncated data sections. Subversion's parser rejects
deltas that end in the middle of a declared novel-text section, so it
should be safe for us to reject them, too.
Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
The source view offset heading each svndiff0 window represents a
number of bytes past the beginning of the preimage. Together with the
source view length, it dictates to the delta applier what portion of
the preimage instructions will refer to. Read that portion right away
using the sliding window code.
Maybe some day we will use mmap to read data more lazily.
Subversion's implementation tolerates source view offsets pointing
past the end of the preimage file but we do not, for simplicity.
This does not teach the delta applier to read instructions or copy
data from the source view. Deltas that could produce nonempty output
will still be rejected.
Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Each window in a subversion delta (svndiff0-format file) starts with a
window header, consisting of five integers with variable-length
representation:
source view offset
source view length
output length
instructions length
auxiliary data length
Parse it. The result is not usable for deltas with nonempty postimage
yet; in fact, this only adds support for deltas without any
instructions or auxiliary data. This is a good place to stop, though,
since that little support lets us add some simple passing tests
concerning error handling to the test suite.
Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
A delta in the subversion delta (svndiff0) format consists of the
magic bytes SVN\0 followed by a sequence of windows of a certain well
specified format (starting with five integers).
Add an svndiff0_apply function and test-svn-fe -d commandline tool to
parse such a delta in the special case of not including any windows.
Later patches will add features to turn this into a fully functional
delta applier for svn-fe to use to parse the streams produced by
"svnrdump dump" and "svnadmin dump --deltas".
The content of symlinks starts with the word "link " in Subversion's
worldview, so we need to be able to prepend that text to input for the
sake of delta application. So initialization of the input state of
the delta preimage is left to the calling program, giving callers a
chance to seed the buffer with text of their choice.
Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Pass the log message by strbuf instead of as a C-style string and use
fwrite instead of printf to write it to fast-import so embedded '\0'
bytes can be preserved.
Currently "git log" doesn't show the embedded NULs but "git cat-file
commit" can.
While at it, stop including system headers from repo_tree.h. git
source files need to include git-compat-util.h (or cache.h or
builtin.h) sooner to ensure the appropriate feature test macros are
defined.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
All previous users of buffer_read_string have already been converted
to use the more intuitive buffer_read_binary, so remove the old API to
avoid some confusion.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
svn-fe errors out on revision 59151 of the ASF repository:
fatal: invalid dump: unexpected end of file
The proximate cause is a property with an embedded NUL character.
Previously such anomalies were ignored but commit c9d1c8ba
(2010-12-28) introduced a check strlen(val) == len to avoid reading
uninitialized data when a property list ends early and unfortunately
this test does not distinguish between "foo" followed by EOF and the
string "foo\0bar\0baz".
Fix it by using buffer_read_binary to read to a strbuf and checking
the actual length read. Most consumers of properties still use
C-style strings, so in practice an author or log message with embedded
NULs will be truncated, but a least this way svn-fe won't error out
(fixing the regression).
Reported-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
* git://github.com/gitster/git:
vcs-svn: Allow change nodes for root of tree (/)
vcs-svn: Implement Prop-delta handling
vcs-svn: Sharpen parsing of property lines
vcs-svn: Split off function for handling of individual properties
vcs-svn: Make source easier to read on small screens
vcs-svn: More dump format sanity checks
vcs-svn: Reject path nodes without Node-action
vcs-svn: Delay read of per-path properties
vcs-svn: Combine repo_replace and repo_modify functions
vcs-svn: Replace = Delete + Add
vcs-svn: handle_node: Handle deletion case early
vcs-svn: Use mark to indicate nodes with included text
vcs-svn: Unclutter handle_node by introducing have_props var
vcs-svn: Eliminate node_ctx.mark global
vcs-svn: Eliminate node_ctx.srcRev global
vcs-svn: Check for errors from open()
vcs-svn: Allow simple v3 dumps (no deltas yet)
Conflicts:
t/t9010-svn-fe.sh
vcs-svn/svndump.c
buffer_read_string works well for non line-oriented input except for
one problem: it does not tell the caller how many bytes were actually
written. This means that unless one is very careful about checking
for errors (and eof) the calling program cannot tell the difference
between the string "foo" followed by an early end of file and the
string "foo\0bar\0baz".
So introduce a variant that reports the length, too, a thinner wrapper
around strbuf_fread. Its result is written to a strbuf so the caller
does not need to keep track of the number of bytes read.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
POSIX makes the behavior of read(2) from a pipe fairly clear: a read
from an empty pipe will block until there is data available and any
other read will not block, prefering to return a partial result.
Likewise, fread(3) and fgets(3) are clearly specified to act as
though implemented by calling fgetc(3) in a simple loop. But the
buffering behavior of fgetc is less clear.
Luckily, no sane platform is going to implement fgetc by calling the
equivalent of read(2) more than once. fgetc has to be able to
return without filling its buffer to preserve errno when errors are
encountered anyway. So let's assume the simpler behavior (trust) but
add some tests to catch insane platforms that violate that when they
come (verify).
First check that fread can handle a 0-length read from an empty fifo.
Because open(O_RDONLY) blocks until the writing end is open, open the
writing end of the fifo in advance in a subshell.
Next try short inputs from a pipe that is not filled all the way.
Lastly (two tests) try very large inputs from a pipe that will not fit
in the relevant buffers. The first of these tests reads a little
more than 8192 bytes, which is BUFSIZ (the size of stdio's buffers)
on this Linux machine. The second reads a little over 64 KiB (the
pipe capacity on Linux) and is not run unless requested by setting
the GIT_REMOTE_SVN_TEST_BIG_FILES environment variable.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Do not expect an implicit newline after each input record.
Use a separate command to exercise buffer_skip_bytes.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Split the line_buffer test into small pieces and move it to its
own file as preparation for adding more tests.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Imitate the input format of test-obj-pool to support arbitrary
sequences of commands rather than alternating read/copy. This should
make it easier to add tests that exercise other line_buffer functions.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Lazy fast-import frontend authors that want to rely on the backend to
keep track of the content of the imported trees _almost_ have what
they need in the 'cat-blob' command (v1.7.4-rc0~30^2~3, 2010-11-28).
But it is not quite enough, since
(1) cat-blob can be used to retrieve the content of files, but
not their mode, and
(2) using cat-blob requires the frontend to keep track of a name
(mark number or object id) for each blob to be retrieved
Introduce an 'ls' command to complement cat-blob and take care of the
remaining needs. The 'ls' command finds what is at a given path
within a given tree-ish (tag, commit, or tree):
'ls' SP <dataref> SP <path> LF
or in fast-import's active commit:
'ls' SP <path> LF
The response is a single line sent through the cat-blob channel,
imitating ls-tree output. So for example:
FE> ls :1 Documentation
gfi> 040000 tree 9e6c2b599341d28a2a375f8207507e0a2a627fe9 Documentation
FE> ls 9e6c2b599341d28a2a375f8207507e0a2a627fe9 git-fast-import.txt
gfi> 100644 blob 4f92954396e3f0f97e75b6838a5635b583708870 git-fast-import.txt
FE> ls :1 RelNotes
gfi> 120000 blob b942e49944 RelNotes
FE> cat-blob b942e49944
gfi> b942e49944 blob 32
gfi> Documentation/RelNotes/1.7.4.txt
The most interesting parts of the reply are the first word, which is
a 6-digit octal mode (regular file, executable, symlink, directory,
or submodule), and the part from the second space to the tab, which is
a <dataref> that can be used in later cat-blob, ls, and filemodify (M)
commands to refer to the content (blob, tree, or commit) at that path.
If there is nothing there, the response is "missing some/path".
The intent is for this command to be used to read files from the
active commit, so a frontend can apply patches to them, and to copy
files and directories from previous revisions.
For example, proposed updates to svn-fe use this command in place of
its internal representation of the repository directory structure.
This simplifies the frontend a great deal and means support for
resuming an import in a separate fast-import run (i.e., incremental
import) is basically free.
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Junio C Hamano <gitster@pobox.com>
Improved-by: Sverre Rabbelier <srabbelier@gmail.com>
* maint:
t/t7500-commit.sh: use test_cmp instead of test
t/gitweb-lib.sh: Ensure that errors are shown for --debug --immediate
gitweb/gitweb.perl: don't call S_ISREG() with undef
gitweb/gitweb.perl: remove use of qw(...) as parentheses
Change commit_msg_is() in t/t7500-commit.sh to use test_cmp instead of
the shell's test function. Now if a test fails we'll get test_cmp
output showing us what failed.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because '--immediate' stops test suite after first error, therefore in
this mode
test_debug 'cat gitweb.log'
was never ran, thus in effect negating effect of '--debug' option.
This made finidng the cause of errors in gitweb test sute difficult.
Modify the gitweb_run test subroutine to run test_debug itself in the
case of errors (and also remove "test_debug 'cat gitweb.log'" from
gitweb tests).
This makes it possible to run *gitweb tests* with --immediate ---debug
combination of options; also it makes gitweb tests to not output
spurious debug data that is not considered error.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Here is a 'feature' command for streams to use to require support for
the notemodify (N) command.
When the 'feature' facility was introduced (v1.7.0-rc0~95^2~4,
2009-12-04), the notes import feature was old news (v1.6.6-rc0~21^2~8,
2009-10-09) and it was not obvious it deserved to be a named feature.
But now that is clear, since all major non-git fast-import backends
lack support for it.
Details: on git version with this patch applied, any "feature notes"
command in the features/options section at the beginning of a stream
will be treated as a no-op. On fast-import implementations without
the feature (and older git versions), the command instead errors out
with a message like
This version of fast-import does not support feature notes.
So by declaring use of notes at the beginning of a stream, frontends
can avoid wasting time and other resources when the backend does not
support notes. (This would be especially important for backends that
do not support rewinding history after a botched import.)
Improved-by: Thomas Rast <trast@student.ethz.ch>
Improved-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --cached" (without revision) used to mean "git diff --cached
HEAD" (i.e. the user was too lazy to type HEAD). This "correctly"
failed when there was no commit yet. But was that correctness useful?
This patch changes the definition of what particular command means.
It is a request to show what _would_ be committed without further "git
add". The internal implementation is the same "git diff --cached HEAD"
when HEAD exists, but when there is no commit yet, it compares the index
with an empty tree object to achieve the desired result.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A test case verifies that filemode-only patches work as expected. Help
systems where "test -x" does not work by applying the test patch also to
the index, where the effects can be verified even on such systems.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first test did not run on msysGit due to the SYMLINKS constraint and
so subsequent tests failed because the test repository was not initialized.
Signed-off-by: Pat Thoyts <patthoyts@users.sourceforge.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes a typo where the "git config" arguments "-f" and "--unset" were
swapped leading to the creation of a "--unset" file.
Signed-off-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
rebase -i: clarify in-editor documentation of "exec"
tests: sanitize more git environment variables
fast-import: treat filemodify with empty tree as delete
rebase: give a better error message for bogus branch
rebase: use explicit "--" with checkout
Conflicts:
t/t9300-fast-import.sh
These variables should generally not be set in one's
environment, but they do get set by rebase, which means
doing an interactive rebase like:
pick abcd1234 foo
exec make test
will cause false negatives in the test suite.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normal git processes do not allow one to build a tree with an empty
subtree entry without trying hard at it. This is in keeping with the
general UI philosophy: git tracks content, not empty directories.
v1.7.3-rc0~75^2 (2010-06-30) changed that by making it easy to include
an empty subtree in fast-import's active commit:
M 040000 4b825dc642 subdir
One can trigger this by reading an empty tree (for example, the tree
corresponding to an empty root commit) and trying to move it to a
subtree. It is better and more closely analogous to 'git read-tree
--prefix' to treat such commands as requests to remove the subtree.
Noticed-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jn/setup-fixes:
t1510: fix typo in the comment of a test
Documentation updates for 'GIT_WORK_TREE without GIT_DIR' historical usecase
Subject: setup: officially support --work-tree without --git-dir
tests: compress the setup tests
tests: cosmetic improvements to the repo-setup test
t/README: hint about using $(pwd) rather than $PWD in tests
Fix expected values of setup tests on Windows
The original intention of --work-tree was to allow people to work in a
subdirectory of their working tree that does not have an embedded .git
directory. Because their working tree, which their $cwd was in, did not
have an embedded .git, they needed to use $GIT_DIR to specify where it is,
and because this meant there was no way to discover where the root level
of the working tree was, so we needed to add $GIT_WORK_TREE to tell git
where it was.
However, this facility has long been (mis)used by people's scripts to
start git from a working tree _with_ an embedded .git directory, let git
find .git directory, and then pretend as if an unrelated directory were
the associated working tree of the .git directory found by the discovery
process. It happens to work in simple cases, and is not worth causing
"regression" to these scripts.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
New test helpers:
- setup_repo, to initialize a repository or gitfile pointing to a
repository, with core.bare and core.worktree set as specified;
- try_case, to run setup from a given directory and validate the
result, with GIT_DIR and GIT_WORK_TREE set as specified;
- try_repo, to initialize a repository and call "try_case" from the
toplevel and a subdirectory;
- run_wt_tests, to run a battery of tests that check for sane
behavior when GIT_WORK_TREE is set to various positions relative to
the .git dir and cwd.
Use these helpers to make the test shorter, less repetitive, and (one
hopes) easier to understand and modify.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Give an overview in "sh t1510-repo-setup.sh --help" output.
Waste some vertical and horizontal space for clearer code.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rj/maint-test-fixes:
t9501-*.sh: Fix a test failure on Cygwin
lib-git-svn.sh: Add check for mis-configured web server variables
lib-git-svn.sh: Avoid setting web server variables unnecessarily
t9142: Move call to start_httpd into the setup test
t3600-rm.sh: Don't pass a non-existent prereq to test #15