Commit Graph

138 Commits

Author SHA1 Message Date
Johannes Schindelin
dddbad728c timestamp_t: a new data type for timestamps
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).

So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.

By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.

As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27 13:07:39 +09:00
Johannes Schindelin
cb71f8bdb5 PRItime: introduce a new "printf format" for timestamps
Currently, Git's source code treats all timestamps as if they were
unsigned longs. Therefore, it is okay to write "%lu" when printing them.

There is a substantial problem with that, though: at least on Windows,
time_t is *larger* than unsigned long, and hence we will want to switch
away from the ill-specified `unsigned long` data type.

So let's introduce the pseudo format "PRItime" (currently simply being
defined to "lu") to make it easier to change the data type used for
timestamps.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23 20:19:15 -07:00
Junio C Hamano
ac5ce66adc Merge branch 'mr/vcs-svn-printf-ulong'
Code cleanup.

* mr/vcs-svn-printf-ulong:
  vcs-svn/fast_export: fix timestamp fmt specifiers
2016-09-21 15:15:27 -07:00
Mike Ralphson
5efc60c12f vcs-svn/fast_export: fix timestamp fmt specifiers
Two instances of %ld being used for unsigned longs

Signed-off-by: Mike Ralphson <mike.ralphson@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-14 08:56:50 -07:00
Nguyễn Thái Ngọc Duy
1c8ead97f8 vcs-svn: use error_errno()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09 12:29:08 -07:00
Christian Couder
956623157f strbuf: introduce starts_with() and ends_with()
prefixcmp() and suffixcmp() share the common "cmp" suffix that
typically are used to name functions that can be used for ordering,
but they can't, because they are not antisymmetric:

        prefixcmp("foo", "foobar") < 0
        prefixcmp("foobar", "foo") == 0

We in fact do not use these functions for ordering.  Replace them
with functions that just check for equality.

Add starts_with() and end_with() that will be used to replace
prefixcmp() and suffixcmp(), respectively, as the first step.  These
are named after corresponding functions/methods in programming
languages, like Java, Python and Ruby.

In vcs-svn/fast_export.c, there was already an ends_with() function
that did the same thing. Let's use the new one instead while at it.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-05 14:12:52 -08:00
Florian Achleitner
8e43a1d010 remote-svn: add incremental import
Search for a note attached to the ref to update and read it's
'Revision-number:'-line. Start import from the next svn revision.

If there is no next revision in the svn repo, svnrdump terminates with
a message on stderr an non-zero return value. This looks a little
weird, but there is no other way to know whether there is a new
revision in the svn repo.

On the start of an incremental import, the parent of the first commit
in the fast-import stream is set to the branch name to update. All
following commits specify their parent by a mark number. Previous mark
files are currently not reused.

Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07 14:10:17 -07:00
Florian Achleitner
a9a55613cb Create a note for every imported commit containing svn metadata
To provide metadata from svn dumps for further processing, e.g.
branch detection, attach a note to each imported commit that stores
additional information.  The notes are currently hard-coded in
refs/notes/svn/revs.  Currently the following lines from the svn dump
are directly accumulated in the note. This can be refined as needed.

 - "Revision-number"
 - "Node-path"
 - "Node-kind"
 - "Node-action"
 - "Node-copyfrom-path"
 - "Node-copyfrom-rev"

Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07 14:10:17 -07:00
Dmitry Ivankov
3c23953fb2 vcs-svn: add fast_export_note to create notes
fast_export lacked a method to writes notes to fast-import stream.
Add two new functions fast_export_note which is similar to
fast_export_modify. And also add fast_export_buf_to_data to be able to
write inline blobs that don't come from a line_buffer or from delta
application.

To be used like this:

  fast_export_begin_commit("refs/notes/somenotes", ...)
  fast_export_note("refs/heads/master", "inline")
  fast_export_buf_to_data(&data)

or maybe

  fast_export_note("refs/heads/master", sha1)

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07 14:10:17 -07:00
Florian Achleitner
271fd1fc2a remote-svn, vcs-svn: Enable fetching to private refs
The reference to update by the fast-import stream is hard-coded.  When
fetching from a remote the remote-helper shall update refs in a
private namespace, i.e. a private subdir of refs/.  This namespace is
defined by the 'refspec' capability, that the remote-helper advertises
as a reply to the 'capabilities' command.

Extend svndump and fast-export to allow passing the target ref.
Update svn-fe to be compatible.

Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07 14:10:17 -07:00
Florian Achleitner
fd871b94f6 Add svndump_init_fd to allow reading dumps from arbitrary FDs
The existing function only allows reading from a filename or from
stdin. Allow passing of a FD and an additional FD for the back report
pipe. This allows us to retrieve the name of the pipe in the caller.

Signed-off-by: Florian Achleitner <florian.achleitner.2.6.31@gmail.com>
Acked-by: David Michael Barr <b@rr-dav.id.au>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-10-07 14:10:16 -07:00
Junio C Hamano
fde1cc1dc2 Merge branch 'jn/vcs-svn'
vcs-svn updates to clean-up compilation, lift 32-bit limitations, etc.

* jn/vcs-svn:
  vcs-svn: allow 64-bit Prop-Content-Length
  vcs-svn: suppress a signed/unsigned comparison warning
  vcs-svn: suppress a signed/unsigned comparison warning
  vcs-svn: suppress signed/unsigned comparison warnings
  vcs-svn: use strstr instead of memmem
  vcs-svn: use constcmp instead of prefixcmp
  vcs-svn: simplify cleanup in apply_one_window
  vcs-svn: avoid self-assignment in dummy initialization of pre_off
  vcs-svn: drop no-op reset methods
  vcs-svn: suppress -Wtype-limits warning
  vcs-svn: allow import of > 4GiB files
  vcs-svn: rename check_overflow and its arguments for clarity
2012-07-13 15:37:04 -07:00
Jonathan Nieder
e32b79cb32 vcs-svn: allow 64-bit Prop-Content-Length
Currently the vcs-svn/ library only pays attention to the presence of
the Prop-Content-Length field and doesn't care about its value, but
some day we might care about the value.  Parse it as an off_t instead
of arbitrarily limiting to 32 bits for intuitiveness.

So now you can import from a dump with more than 2 GiB of properties
for a node.  In practice that isn't likely to happen often, and this
is mostly meant as a cleanup.

Based-on-patch-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:53 -05:00
Jonathan Nieder
96a60a8709 vcs-svn: suppress a signed/unsigned comparison warning
All callers pass a nonnegative delta_len, so the code is already safe.
Add an assertion to ensure that remains so and add a cast to keep
clang and gcc -Wsign-compare from worrying.

Reported-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:53 -05:00
David Barr
c68038effe vcs-svn: suppress a signed/unsigned comparison warning
The preceding code checks that view->max_off is nonnegative and
(off + width) fits in an off_t, so this code is already safe.

Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:53 -05:00
David Barr
6a0b4438af vcs-svn: suppress signed/unsigned comparison warnings
These are already safe because both sides of the comparison are
nonnegative.

This would normally not be important because Git is not -Wsign-compare
clean anyway, but we like to keep the vcs-svn/ lib to a higher
standard for convenience using it in other projects.

Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:53 -05:00
David Barr
53153e8382 vcs-svn: use strstr instead of memmem
memmem is a GNU extension.

Avoiding it makes the code clearer and makes it easier for projects
that don't share git's compat/ code, such as the standalone
svn-dump-fast-export project, to reuse the vcs-svn/ library.

Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:52 -05:00
David Barr
d8d8708bd6 vcs-svn: use constcmp instead of prefixcmp
Since the length of t is already known, we can simplify a little by
using memcmp() instead of strncmp() to carry out a prefix comparison.
All nearby code already does this.

Noticed in the standalone svn-dump-fast-export project which has not
needed to implement prefixcmp() yet.

Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:52 -05:00
David Barr
4a1613194a vcs-svn: simplify cleanup in apply_one_window
Currently the cleanup code looks like this:

	free resources
	return 0;
 error_out:
	free resources
	return -1;

Avoid duplicating the "free resources" part by keeping the return
value in a variable and sharing code between the success and
exceptional case:

	ret = 0;
 out:
	free resources
	return ret;

Noticed in the svn-dump-fast-export project, where using the error()
macro in void context produces a warning.

Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:52 -05:00
David Barr
4a5de8dd79 vcs-svn: avoid self-assignment in dummy initialization of pre_off
Without this change, clang complains:

 vcs-svn/svndiff.c:298:3: warning: Assigned value is garbage or undefined
                 off_t pre_off = pre_off; /* stupid GCC... */
                 ^               ~~~~~~~

This code uses an old and common idiom for suppressing an
"uninitialized variable" warning, and clang is wrong to warn about it.
The idiom tells the compiler to leave the variable uninitialized,
which saves a few bytes of code size, and, more importantly, allows
valgrind to check at runtime that the variable is properly initialized
by the time it is used.

But MSVC and clang do not know that idiom, so let's avoid it in
vcs-svn/ code.

Initialize pre_off to -1, a recognizably meaningless value, to allow
future code changes that cause pre_off to be used before it is
initialized to be caught early.

Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:52 -05:00
David Barr
3b8a305173 vcs-svn: drop no-op reset methods
Since v1.7.5~42^2~6 (vcs-svn: remove buffer_read_string)
buffer_reset() does nothing thus fast_export_reset() also.

Signed-off-by: David Barr <davidbarr@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-07-05 23:26:51 -05:00
Pete Wyckoff
82247e9bd5 remove superfluous newlines in error messages
The error handling routines add a newline.  Remove
the duplicate ones in error messages.

Signed-off-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-30 15:45:51 -07:00
Jonathan Nieder
3f790003a3 vcs-svn: suppress a -Wtype-limits warning
On 32-bit architectures with 64-bit file offsets, gcc 4.3 and earlier
produce the following warning:

	    CC vcs-svn/sliding_window.o
	vcs-svn/sliding_window.c: In function `check_overflow':
	vcs-svn/sliding_window.c:36: warning: comparison is always false \
	    due to limited range of data type

The warning appears even when gcc is run without any warning flags
(this is gcc bug 12963).  In later versions the same warning can be
reproduced with -Wtype-limits, which is implied by -Wextra.

On 64-bit architectures it really is possible for a size_t not to be
representable as an off_t so the check this is warning about is not
actually redundant.  But even false positives are distracting.  Avoid
the warning by making the "len" argument to check_overflow a
uintmax_t; no functional change intended.

Reported-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:05:18 -08:00
Jonathan Nieder
150f75467c vcs-svn: allow import of > 4GiB files
There is no reason in principle that an svn-format dump would not be
able to represent a file whose length does not fit in a 32-bit
integer.  Use off_t consistently to represent file lengths (in place
of using uint32_t in some contexts) so we can handle that.

Most svn-fe code is already ready to do that without this patch and
passes values of type off_t around.  The type mismatch from stragglers
was noticed with gcc -Wtype-limits.

While at it, tighten the parsing of the Text-content-length field to
make sure it is a number and does not overflow, and tighten other
overflow checks as that value is passed around and manipulated.

Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 11:03:30 -08:00
Ramsay Jones
173223aa62 vcs-svn: rename check_overflow arguments for clarity
Code using the argument names a and b just doesn't look right (not
sure why!).  Use more explicit names "offset" and "len" to make their
type and meaning clearer.

Also rename check_overflow() to check_offset_overflow() to clarify
that we are making sure that "len" bytes beyond "offset" still fits
the type to represent an offset.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 10:53:18 -08:00
Jonathan Nieder
5b8bf02930 vcs-svn: suppress -Wtype-limits warning
On 32-bit architectures with 64-bit file offsets, gcc 4.3 and earlier
produce the following warning:

	    CC vcs-svn/sliding_window.o
	vcs-svn/sliding_window.c: In function `check_overflow':
	vcs-svn/sliding_window.c:36: warning: comparison is always false \
	    due to limited range of data type

The warning appears even when gcc is run without any warning flags
(PR12963).  In later versions it can be reproduced with -Wtype-limits,
which is implied by -Wextra.

On 64-bit architectures it really is possible for a size_t not to be
representable as an off_t so the check being warned about is not
actually redundant.  But even false positives are distracting.  Avoid
the warning by making the "len" argument to check_overflow a
uintmax_t; no functional change intended.

Reported-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-02-02 05:33:37 -06:00
Jonathan Nieder
2d54b9ea8b vcs-svn: allow import of > 4GiB files
There is no reason in principle that an svn-format dump would not
be able to represent a file whose length does not fit in a 32-bit
integer.  Use off_t consistently (instead of uint32_t) to represent
file lengths so we can handle that.

Most of our code is already ready to do that without this patch and
already passes values of type off_t around.  The type mismatch due to
stragglers was noticed with gcc -Wtype-limits.

Inspired-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2012-02-02 05:33:13 -06:00
Ramsay Jones
ce8ebcdaf3 vcs-svn: rename check_overflow and its arguments for clarity
The canonical interpretation of a range a,b is as an interval [a,b),
not [a,a+b), so this function taking argument names a and b feels
unnatural.  Use more explicit names "offset" and "len" to make the
arguments' type and function clearer.

While at it, rename the function to convey that we are making sure
the sum of this offset and length do not overflow an off_t, not a
size_t.

[jn: split out from a patch from Ramsay Jones, then improved with
 advice from Thomas Rast, Dmitry Ivankov, and David Barr]

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Dmitry Ivankov <divanorama@gmail.com>
2012-02-02 05:28:51 -06:00
Junio C Hamano
58ebd9865d vcs-svn/svndiff.c: squelch false "unused" warning from gcc
Curiously, pre_len given to read_length() does not trigger the same warning
even though the code structure is the same. Most likely this is because
read_offset() is used only once and inlining it will make gcc realize that
it has a chance to do more flow analysis. Alas, the analysis is flawed, so
it does not help X-<.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-01-27 11:58:56 -08:00
Junio C Hamano
d475536658 Merge branch 'svn-fe' of git://repo.or.cz/git/jrn into jn/svn-fe
This simplifies svn-fe a great deal and fulfills a longstanding wish:
support for dumps with deltas in them, and incremental imports.

The cost is that commandline usage of the svn-fe tool becomes a little
more complicated since it no longer keeps state itself but instead reads
blobs back from fast-import in order to copy them between revisions and
apply deltas to them.

Also removes a couple of custom data structures and replaces them with
strbufs like other parts of Git.

* 'svn-fe' of git://repo.or.cz/git/jrn: (32 commits)
  vcs-svn: reset first_commit_done in fast_export_init
  vcs-svn: do not initialize report_buffer twice
  vcs-svn: avoid hangs from corrupt deltas
  vcs-svn: guard against overflow when computing preimage length
  vcs-svn: cap number of bytes read from sliding view
  test-svn-fe: split off "test-svn-fe -d" into a separate function
  vcs-svn: implement text-delta handling
  vcs-svn: let deltas use data from preimage
  vcs-svn: let deltas use data from postimage
  vcs-svn: verify that deltas consume all inline data
  vcs-svn: implement copyfrom_data delta instruction
  vcs-svn: read instructions from deltas
  vcs-svn: read inline data from deltas
  vcs-svn: read the preimage when applying deltas
  vcs-svn: parse svndiff0 window header
  vcs-svn: skeleton of an svn delta parser
  vcs-svn: make buffer_read_binary API more convenient
  vcs-svn: learn to maintain a sliding view of a file
  Makefile: list one vcs-svn/xdiff object or header per line
  vcs-svn: avoid using ls command twice
  ...

Conflicts:
	Makefile
	contrib/svn-fe/svn-fe.txt
2012-01-27 11:20:00 -08:00
Ævar Arnfjörð Bjarmason
952fba9c63 Fix a bitwise negation assignment issue spotted by Sun Studio
Change direct and indirect assignments of the bitwise negation of 0 to
uint32_t variables to have a "U" suffix. I.e. ~0U instead of ~0. This
eliminates warnings under Sun Studio 12 Update 1:

    "vcs-svn/string_pool.c", line 11: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/string_pool.c", line 81: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/repo_tree.c", line 112: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "vcs-svn/repo_tree.c", line 112: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)
    "test-treap.c", line 34: warning: initializer will be sign-extended: -1 (E_INIT_SIGN_EXTEND)

The semantics are still the same as demonstrated by this program:

    $ cat test.c && make test && ./test
    #include <stdio.h>
    #include <stdint.h>

    int main(void)
    {
        uint32_t foo = ~0;
        uint32_t bar = ~0U;

        printf("foo = <%u> bar = <%u>\n", foo, bar);

        return 0;
    }
    cc     test.c   -o test
    "test.c", line 5: warning: initializer will be sign-extended: -1
    foo = <4294967295> bar = <4294967295>

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-12-21 10:19:40 -08:00
Dmitry Ivankov
c5bcbcdcfa vcs-svn: reset first_commit_done in fast_export_init
first_commit_done has zero as a default value, but it
is not reset back to zero in fast_export_init.

Reset it back to zero so that each export will have
proper initial state.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-23 10:04:36 -05:00
Dmitry Ivankov
c5f1fbe7bc vcs-svn: do not initialize report_buffer twice
When importing from a dump with deltas, first fast_export_init calls
buffer_fdinit, and then init_report_buffer calls fdopen once again
when processing the first delta.  The second initialization is
redundant and leaks a FILE *.

Remove the redundant on-demand initialization to fix this.
Initializing directly in fast_export_init is simpler and lets the
caller pass an int specifying which fd to use instead of hard-coding
REPORT_FILENO.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-21 05:02:57 -05:00
Jonathan Nieder
3ac10b2e3f vcs-svn: avoid hangs from corrupt deltas
A corrupt Subversion-format delta can request reads past the end of
the preimage.  Set sliding_view::max_off so such corruption is caught
when it appears rather than blocking in an impossible-to-fulfill
read() when input is coming from a socket or pipe.

Inspired-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15 02:32:50 -05:00
Jonathan Nieder
abe27c0cbd vcs-svn: guard against overflow when computing preimage length
Signed integer overflow produces undefined behavior in C and off_t is
a signed type.  For predictable behavior, add some checks to protect
in advance against overflow.

On 32-bit systems ftell as called by buffer_tmpfile_prepare_to_read
is likely to fail with EOVERFLOW when reading the corresponding
postimage, and this patch does not fix that.  So it's more of a
futureproofing measure than a complete fix.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15 02:32:50 -05:00
Jonathan Nieder
157415a9a9 Merge branch 'db/delta-applier' into db/text-delta
* db/delta-applier:
  vcs-svn: cap number of bytes read from sliding view
  test-svn-fe: split off "test-svn-fe -d" into a separate function
2011-06-15 02:31:35 -05:00
Jonathan Nieder
fbdd4f6fb4 vcs-svn: cap number of bytes read from sliding view
Introduce a "max_off" field in struct sliding_view, roughly
representing a maximum number of bytes that can be read from "file".
If it is set to a nonnegative integer, a call to move_window()
attempting to put the right endpoint beyond that offset will return
an error instead.

The idea is to use this when applying Subversion-format deltas to
prevent reads past the end of the preimage (which has known length).
Without such a check, corrupt deltas would cause svn-fe to block
indefinitely when data in the input pipe is exhausted.

Inspired-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-06-15 02:15:22 -05:00
David Barr
7a75e661c5 vcs-svn: implement text-delta handling
Handle input in Subversion's dumpfile format, version 3.  This is the
format produced by "svnrdump dump" and "svnadmin dump --deltas", and
the main difference between v3 dumpfiles and the dumpfiles already
handled is that these can include nodes whose properties and text are
expressed relative to some other node.

To handle such nodes, we find which node the text and properties are
based on, handle its property changes, use the cat-blob command to
request the basis blob from the fast-import backend, use the
svndiff0_apply() helper to apply the text delta on the fly, writing
output to a temporary file, and then measure that postimage file's
length and write its content to the fast-import stream.

The temporary postimage file is shared between delta-using nodes to
avoid some file system overhead.

The svn-fe interface needs to be more complicated to accomodate the
backward flow of information from the fast-import backend to svn-fe.
The backflow fd is not needed when parsing streams without deltas,
though, so existing scripts using svn-fe on v2 dumps should
continue to work.

NEEDSWORK: generalize interface so caller sets the backflow fd, close
temporary file before exiting

Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-05-26 02:28:04 -05:00
Jonathan Nieder
e9f3f8b6f4 Merge branch 'db/delta-applier' into db/text-delta
* db/delta-applier:
  vcs-svn: let deltas use data from preimage
  vcs-svn: let deltas use data from postimage
  vcs-svn: verify that deltas consume all inline data
  vcs-svn: implement copyfrom_data delta instruction
  vcs-svn: read instructions from deltas
  vcs-svn: read inline data from deltas
  vcs-svn: read the preimage when applying deltas
  vcs-svn: parse svndiff0 window header
  vcs-svn: skeleton of an svn delta parser
  vcs-svn: make buffer_read_binary API more convenient
  vcs-svn: learn to maintain a sliding view of a file
  Makefile: list one vcs-svn/xdiff object or header per line

Conflicts:
	Makefile
	vcs-svn/LICENSE
2011-05-26 02:27:48 -05:00
Jonathan Nieder
c19d653c4f Merge branch 'db/svn-fe-code-purge' into svn-fe
* db/svn-fe-code-purge:
  vcs-svn: drop obj_pool
  vcs-svn: drop treap
  vcs-svn: drop string_pool
  vcs-svn: pass paths through to fast-import

Conflicts:
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/repo_tree.h
	vcs-svn/string_pool.c
	vcs-svn/svndump.c
	vcs-svn/trp.txt
2011-05-26 02:12:14 -05:00
Jonathan Nieder
9ecfa8ae4c Merge branch 'db/vcs-svn-incremental' into svn-fe
This teaches svn-fe to incrementally import into an existing
repository (at last!) at the expense of less convenient UI.  Think of
it as growing pains.  This opens the door to many excellent things,
and it would be a bad idea to discourage people from building on it
for much longer.

* db/vcs-svn-incremental:
  vcs-svn: avoid using ls command twice
  vcs-svn: use mark from previous import for parent commit
  vcs-svn: handle filenames with dq correctly
  vcs-svn: quote paths correctly for ls command
  vcs-svn: eliminate repo_tree structure
  vcs-svn: add a comment before each commit
  vcs-svn: save marks for imported commits
  vcs-svn: use higher mark numbers for blobs
  vcs-svn: set up channel to read fast-import cat-blob response

Conflicts:
	t/t9010-svn-fe.sh
	vcs-svn/fast_export.c
	vcs-svn/fast_export.h
	vcs-svn/repo_tree.c
	vcs-svn/svndump.c
2011-05-26 02:02:44 -05:00
Junio C Hamano
793066e790 Merge branch 'rj/sparse'
* rj/sparse:
  sparse: Fix some "symbol not declared" warnings
  sparse: Fix errors due to missing target-specific variables
  sparse: Fix an "symbol 'merge_file' not decared" warning
  sparse: Fix an "symbol 'format_subject' not declared" warning
  sparse: Fix some "Using plain integer as NULL pointer" warnings
  sparse: Fix an "symbol 'cmd_index_pack' not declared" warning
  Makefile: Use cgcc rather than sparse in the check target
2011-04-27 11:36:42 -07:00
Ramsay Jones
c51477229e sparse: Fix some "symbol not declared" warnings
In particular, sparse issues the "symbol 'a_symbol' was not declared.
Should it be static?" warnings for the following symbols:

    attr.c:468:12: 'git_etc_gitattributes'
    attr.c:476:5:  'git_attr_system'
    vcs-svn/svndump.c:282:6: 'svndump_read'
    vcs-svn/svndump.c:417:5: 'svndump_init'
    vcs-svn/svndump.c:432:6: 'svndump_deinit'
    vcs-svn/svndump.c:445:6: 'svndump_reset'

The symbols in attr.c only require file scope, so we add the static
modifier to their declaration.

The symbols in vcs-svn/svndump.c are external symbols, and they
already have extern declarations in the "svndump.h" header file,
so we simply include the header in svndump.c.

Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-22 10:04:27 -07:00
Jim Meyering
0353a0c4ec remove doubled words, e.g., s/to to/to/, and fix related typos
I found that some doubled words had snuck back into projects from which
I'd already removed them, so now there's a "syntax-check" makefile rule in
gnulib to help prevent recurrence.

Running the command below spotted a few in git, too:

  git ls-files | xargs perl -0777 -n \
    -e 'while (/\b(then?|[iao]n|i[fst]|but|f?or|at|and|[dt])\s+\1\b/gims)' \
    -e '{$n=($` =~ tr/\n/\n/ + 1); ($v=$&)=~s/\n/\\n/g;' \
    -e 'print "$ARGV:$n:$v\n"}'

Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-04-13 11:59:11 -07:00
Junio C Hamano
44a9cedb00 Merge branch 'svn-fe' of git://repo.or.cz/git/jrn
* 'svn-fe' of git://repo.or.cz/git/jrn:
  tests: kill backgrounded processes more robustly
  vcs-svn: a void function shouldn't try to return something
  tests: make sure input to sed is newline terminated
  vcs-svn: add missing cast to printf argument
2011-03-30 10:49:13 -07:00
Michael Witten
9e113988d3 vcs-svn: a void function shouldn't try to return something
As v1.7.4-rc0~184 (2010-10-04) and C99 §6.8.6.4.1 remind us, standard
C does not permit returning an expression of type void, even for a
tail call.

Noticed with gcc -pedantic:

 vcs-svn/svndump.c: In function 'handle_node':
 vcs-svn/svndump.c:213:3: warning: ISO C forbids 'return' with expression,
  in function returning void [-pedantic]

[jn: with simplified log message]

Signed-off-by: Michael Witten <mfwitten@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
2011-03-29 14:47:02 -05:00
Jonathan Nieder
8cc299daf2 vcs-svn: add missing cast to printf argument
gcc -m32 correctly warns:

 vcs-svn/fast_export.c: In function 'fast_export_commit':
 vcs-svn/fast_export.c:54:2: warning: format '%llu' expects
   argument of type 'long long unsigned int', but argument 2
   has type 'unsigned int' [-Wformat]

Fix it.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-28 09:46:44 -07:00
Jonathan Nieder
c846e41078 vcs-svn: let deltas use data from preimage
The copyfrom_source instruction appends data from the preimage buffer
to the end of output.  Its arguments are a length and an offset
relative to the beginning of the source view.

With this change, the delta applier is able to reproduce all 5,636,613
blobs in the early history of the ASF repository.  Tested with

	mkfifo backflow
	svn-fe <svn-asf-public-r0:940166 3<backflow |
	git fast-import --cat-blob-fd=3 3>backflow

with svn-asf-public-r0:940166 produced by whatever version of
Subversion the dumps in /dump/ on svn.apache.org use (presumably
1.6.something).

Improved-by: Ramkumar Ramachandra <artagnon@gmail.com>
Improved-by: David Barr <david.barr@cordelta.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-28 00:33:48 -05:00
Jonathan Nieder
d3f131b57e vcs-svn: let deltas use data from postimage
The copyfrom_target instruction copies appends data that is already
present in the current output view to the end of output.  (The offset
argument is relative to the beginning of output produced in the
current window.)

The region copied is allowed to run past the end of the existing
output.  To support that case, copy one character at a time rather
than calling memcpy or memmove.  This allows copyfrom_target to be
used once to repeat a string many times.  For example:

	COPYFROM_DATA 2
	COPYFROM_OUTPUT 10, 0
	DATA "ab"

would produce the output "ababababababababababab".

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:28:27 -05:00
Jonathan Nieder
4c9b93ed76 vcs-svn: verify that deltas consume all inline data
By constraining the format of deltas, we can more easily detect
corruption and other breakage.

Requiring deltas not to provide unconsumed data also opens the
possibility of ignoring the declared amount of novel data and simply
streaming the data as needed to fulfill copyfrom_data requests.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
2011-03-27 23:28:02 -05:00