Commit Graph

13 Commits

Author SHA1 Message Date
Jonathan Nieder
150f75467c vcs-svn: allow import of > 4GiB files
There is no reason in principle that an svn-format dump would not be
able to represent a file whose length does not fit in a 32-bit
integer.  Use off_t consistently to represent file lengths (in place
of using uint32_t in some contexts) so we can handle that.

Most svn-fe code is already ready to do that without this patch and
passes values of type off_t around.  The type mismatch from stragglers
was noticed with gcc -Wtype-limits.

While at it, tighten the parsing of the Text-content-length field to
make sure it is a number and does not overflow, and tighten other
overflow checks as that value is passed around and manipulated.

Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:03:30 -08:00
David Barr
7a75e661c5 vcs-svn: implement text-delta handling
Handle input in Subversion's dumpfile format, version 3.  This is the
format produced by "svnrdump dump" and "svnadmin dump --deltas", and
the main difference between v3 dumpfiles and the dumpfiles already
handled is that these can include nodes whose properties and text are
expressed relative to some other node.

To handle such nodes, we find which node the text and properties are
based on, handle its property changes, use the cat-blob command to
request the basis blob from the fast-import backend, use the
svndiff0_apply() helper to apply the text delta on the fly, writing
output to a temporary file, and then measure that postimage file's
length and write its content to the fast-import stream.

The temporary postimage file is shared between delta-using nodes to
avoid some file system overhead.

The svn-fe interface needs to be more complicated to accomodate the
backward flow of information from the fast-import backend to svn-fe.
The backflow fd is not needed when parsing streams without deltas,
though, so existing scripts using svn-fe on v2 dumps should
continue to work.

NEEDSWORK: generalize interface so caller sets the backflow fd, close
temporary file before exiting

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-05-26 02:28:04 -05:00
Jonathan Nieder
c19d653c4f Merge branch 'db/svn-fe-code-purge' into svn-fe
* db/svn-fe-code-purge:
  vcs-svn: drop obj_pool
  vcs-svn: drop treap
  vcs-svn: drop string_pool
  vcs-svn: pass paths through to fast-import

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/repo_tree.h
	vcs-svn/string_pool.c
	vcs-svn/svndump.c
	vcs-svn/trp.txt
2011-05-26 02:12:14 -05:00
Jonathan Nieder
9ecfa8ae4c Merge branch 'db/vcs-svn-incremental' into svn-fe
This teaches svn-fe to incrementally import into an existing
repository (at last!) at the expense of less convenient UI.  Think of
it as growing pains.  This opens the door to many excellent things,
and it would be a bad idea to discourage people from building on it
for much longer.

* db/vcs-svn-incremental:
  vcs-svn: avoid using ls command twice
  vcs-svn: use mark from previous import for parent commit
  vcs-svn: handle filenames with dq correctly
  vcs-svn: quote paths correctly for ls command
  vcs-svn: eliminate repo_tree structure
  vcs-svn: add a comment before each commit
  vcs-svn: save marks for imported commits
  vcs-svn: use higher mark numbers for blobs
  vcs-svn: set up channel to read fast-import cat-blob response

Conflicts:
	t/t9010-svn-fe.sh
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-05-26 02:02:44 -05:00
Jonathan Nieder
195b7ca6f2 vcs-svn: handle log message with embedded NUL
Pass the log message by strbuf instead of as a C-style string and use
fwrite instead of printf to write it to fast-import so embedded '\0'
bytes can be preserved.

Currently "git log" doesn't show the embedded NULs but "git cat-file
commit" can.

While at it, stop including system headers from repo_tree.h.  git
source files need to include git-compat-util.h (or cache.h or
builtin.h) sooner to ensure the appropriate feature test macros are
defined.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-26 00:49:37 -05:00
David Barr
030879718f vcs-svn: pass paths through to fast-import
Now that there is no internal representation of the repo, it is not
necessary to tokenise paths.  Use strbuf instead and bypass
string_pool.

This means svn-fe can handle arbitrarily long paths (as long as a
strbuf can fit them), with arbitrarily many path components.

While at it, since we now treat paths in their entirety, only quote
when necessary.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:32:58 -05:00
Jonathan Nieder
fa6c4bceab Merge branch 'db/strbufs-for-metadata' into db/svn-fe-code-purge
* db/strbufs-for-metadata:
  vcs-svn: use strbuf for author, UUID, and URL
  vcs-svn: use strbuf for revision log

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-03-22 18:19:46 -05:00
David Barr
7c5817d3ba vcs-svn: use strbuf for author, UUID, and URL
Use strbufs and strings instead of interned strings for values of rev,
dump, and node fields that happen to be strings.  After this change,
the only remaining string_pool use is for paths in the repo_tree API
and internals.

Functional change: treat an empty author, UUID, or URL as none at all.
So for example, in repos where the first revision has an empty
svn:author property, the first rev will be treated as by "nobody"
rather than by a person with empty name and email address created by
prepending an @ sign to the repository UUID.

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-22 18:01:48 -05:00
Jonathan Nieder
723b7a2789 vcs-svn: eliminate repo_tree structure
Rely on fast-import for information about previous revs.

This requires always setting up backward flow of information, even for
v2 dumps.  On the plus side, it simplifies the code by quite a bit and
opens the door to further simplifications.

[db: adjusted to support final version of the cat-blob patch]
[jn: avoiding hard-coding git's name for the empty tree for
 portability to other backends]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:58 -06:00
Jonathan Nieder
7e11902c99 vcs-svn: add a comment before each commit
Current svn-fe produces output like this:

	blob
	mark :7382321
	data 5
	hello

	blob
	mark :7382322
	data 5
	Hello

	commit
	mark :3
[...]
	M 100644 :7382321 hello.c
	M 100644 :7382322 hello2.c

This means svn-fe has to keep track of the paths modified in each
commit and the corresponding marks, instead of dealing with each file
as it arrives in input and then forgetting about it.  A better
strategy would be to use inline blobs:

	commit
	mark :3
[...]
	M 100644 inline hello.c
	data 5
	hello
[...]

As a first step towards that, teach svn-fe to notice when the
collection of blobs for each commit starts and write a comment
("# commit 3.") there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
David Barr
41529bbce4 vcs-svn: set up channel to read fast-import cat-blob response
Set up some plumbing: teach the svndump lib to pass a file descriptor
number to the fast_export lib, representing where cat-blob/ls
responses can be read from, and add a get_response_line helper
function to the fast_export lib to read a line from that file.

Unfortunately this means that svn-fe needs file descriptor 3 to be
redirected from somewhere (preferrably the cat-blob stream of a
fast-import backend); otherwise it will fail:

	$ svndump <path> | svn-fe
	fatal: cannot read from file descriptor 3: Bad file descriptor

For the moment, "svn-fe 3</dev/null" works as a workaround but it
will not work for very long.  A fast-import backend that can retrieve
old commits is needed in order to be able to fulfill svn
"Node-copyfrom-rev" requests that refer to revs from a previous run.

[jn: with new change description]

Based-on-patch-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-07 01:43:57 -06:00
Jonathan Nieder
e5e45ca1e3 vcs-svn: teach line_buffer to handle multiple input files
Collect the line_buffer state in a newly public line_buffer struct.
Callers can use multiple line_buffers to manage input from multiple
files at a time.

svn-fe's delta applier will use this to stream a delta from svnrdump
and the preimage it applies to from fast-import at the same time.

The tests don't take advantage of the new features, but I think that's
okay.  It is easier to find lingering examples of nonreentrant code by
searching for "static" in line_buffer.c.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-02-26 04:57:59 -06:00
David Barr
c0e6c23dca Infrastructure to write revisions in fast-export format
repo_tree maintains the exporter's state and provides a facility to to
call fast_export, which writes objects to stdout suitable for
consumption by fast-import.

The exported functions roughly correspond to Subversion FS operations.

 . repo_add, repo_modify, repo_copy, repo_replace, and repo_delete
   update the current commit, based roughly on the corresponding
   Subversion FS operation.

 . repo_commit calls out to fast_export to write the current commit to
   the fast-import stream in stdout.

 . repo_diff is used by the fast_export module to write the changes
   for a commit.

 . repo_reset erases the exporter's state, so valgrind can be happy.

[rr: squelched compiler warnings]
[jn: removed support for maintaining state on-disk, though we may
want to add it back later]

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-14 19:35:37 -07:00