The marks table can be used by the frontend to load any commit after
the import and compare it to whatever data the frontend knows about
that commit. If the mark idnums can be easily correlated to some
reference source then its relatively trivial to compare the GIT
tree to the reference to verify the accuracy of the import.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
cvs2svn has three phases: begin_commit, middle_commit, end_commit.
The ancester is computed in the middle_commit phase. So its easier
to generate a stream if the from command appears after the commit
message itself but before the file change commands.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Jon Smirl was finding it difficult to alter cvs2svn to generate
branch commands prior to the first commit of the same branch.
This change moves the 'from' command to be an optional parameter of
the 'commit' command, thereby allowing a new branch to be defined
at the moment it gets used to create the first commit on that branch.
This change makes it impossible to create a branch with no commits
on it as at least one commit is needed to register the branch.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some architectures (e.g. SPARC) would require that we access pointers
only on pointer-sized alignments. So ensure the pool allocator
rounds out non-pointer sized allocations to the next pointer so we
don't generate bad memory addresses. This could have occurred if
we had previously allocated an atom whose string was not a whole
multiple of the pointer size, for example.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Tree reloading allows fast-import to swap out the least-recently used
branch by simply deallocating the data structures from memory that
were associated with that branch. Later if the branch becomes active
again it can lazily recreate those structures on demand by reloading
the necessary trees from the pack file it originally wrote them to.
The reloading process is implemented by mmap'ing the pack into
memory and using a much tighter variant of the pack reading code
contained in sha1_file.c. This was a blatent copy from sha1_file.c
but the unpacking functions were significantly simplified and are
actually now in a form that should make it easier to map only the
necessary regions of a pack rather than the entire file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Tags received from the frontend are generated in memory in a simple
linked list in the order that the tag commands were sent by the
frontend. If multiple different tag objects for the same tag name
get generated the last one sent by the frontend will be the one
that gets written out at termination. Multiple tag objects for
the same name will cause all older tags of the same name to be lost.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the branch load count exceeds the number of branches created then
the frontend is causing fast-import to page branches into and out of
memory due to the way its ordering its commits. Performance can
likely be increased if the frontend were to alter its commit
sequence such that it stays on one branch before switching to another
branch, then never returns to the prior branch.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Marks are now saved when the mark directive gets used by the frontend
and may be used in place of a SHA1 expression to locate a previous
SHA1 which fast-import may have generated. This is particularly
useful with commits where the frontend does not (easily) have the
ability to compute the SHA1 for an arbitrary commit but needs it
to generate a branch or tag from that commit.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The following command line options are now accepted before the
pack name:
--objects=n # replaces the object count after the pack name
--depth=n # delta chain depth to use (default is 10)
--active-branches=n # maximum number of branches to keep in memory
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Growing a tree caused all subtrees to be deallocated and put back
into the free list yet those subtree's contents were still actively
in use. Consequently they were doled out again and got stomped
on elsewhere. Releasing a tree is now performed in two parts,
either releasing only the content array or releasing the content
array and recursively releasing the subtree(s).
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If a frontend is smart enough to import a symlink then we should
let them do so. We'll assume that they were smart enough to first
generate a blob to hold the link target, as that's how symlinks
get represented in GIT.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Frontend clients can now send a text stream to fast-import rather
than a binary stream. This should facilitate developing frontend
software as the data stream is easier to view, manipulate and debug
my hand and Mark-I eyeball.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When accepting revision SHA1 IDs from the frontend verify the SHA1
actually refers to a blob and is known to exist. Its an error
to use a SHA1 in a tree if the blob doesn't exist as this would
cause git-fsck-objects to report a missing blob should the pack get
closed without the blob being appended into it or a subsequent pack.
So right now we'll just ask that the frontend "pre-declare" any
blobs it wants to use in a tree before it can use them.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The tree of the current commit can be altered by file_change commands
before the commit gets written to the pack. The file changes are
rather primitive as they simply allow removal of a tree entry or
setting/adding a tree entry.
Currently trees and commits aren't being deltafied when written to
the pack and branch reloading from the current pack doesn't work,
so at most 5 branches can be worked with at any one time.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This provides the basic data structures needed to store trees in
memory while we are processing them for a branch. What we are
attempting to do is track one complete tree for each branch that
the frontend has registered with us through the 'newb' (new_branch)
command. When the frontend edits that tree through 'updf' or 'delf'
commands we'll mark the affected tree(s) as being dirty and recompute
their objects during 'comt' (commit).
Currently the protocol is decidedly _not_ user friendly. I crashed
fast-import by giving it bad input data from Perl. I may try to
improve upon it, or at least upon its error handling.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Moved the new_blob logic off into a new subroutine and
invoked it when getting the 'blob' command.
Added statistics dump to STDERR when the program terminates listing
what it did at a high level. This is somewhat interesting.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Too many globals variables were being used not not enough
code was resuable to process trees and commits so this is
a simple refactoring of the existing blob processing code
to get into a state that will be easier to handle trees
and commits in.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Although its easy to ask the user to tell us how many objects they
will need, its probably better to dynamically grow the object table
in large units. But if the user can give us a hint as to roughly
how many objects then we can still use it during startup.
Also stopped printing the SHA1 strings to stdout as no user is
currently making use of that facility.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>