* sp/maint-fast-import-large-blob:
fast-import: Stream very large blobs directly to pack
bash: don't offer remote transport helpers as subcommands
Conflicts:
fast-import.c
If a blob is larger than the configured big-file-threshold, instead
of reading it into a single buffer obtained from malloc, stream it
onto the end of the current pack file. Streaming the larger objects
into the pack avoids the 4+ GiB memory footprint that occurs when
fast-import is processing 2+ GiB blobs.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* 'jh/notes' (early part):
Add more testcases to test fast-import of notes
Rename t9301 to t9350, to make room for more fast-import tests
fast-import: Proper notes tree manipulation
This patch teaches 'git fast-import' to automatically organize note objects
in a fast-import stream into an appropriate fanout structure. The notes API
in notes.h is NOT used to accomplish this, because trying to keep the
fast-import and notes data structures in sync would yield a significantly
larger patch with higher complexity.
Note objects are added with the 'N' command, and accounted for with a
per-branch counter, which is used to trigger fanout restructuring when
needed. Note that when restructuring the branch tree, _any_ entry whose
path consists of 40 hex chars (not including directory separators) will
be recognized as a note object. It is therefore not advisable to
manipulate note entries with M/D/R/C commands.
Since note objects are stored in the same tree structure as other objects,
the unloading and reloading of a fast-import branches handle note objects
transparently.
This patch has been improved by the following contributions:
- Shawn O. Pearce: Several style- and logic-related improvements
Cc: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After specifying 'feature relative-marks' the paths specified with
'feature import-marks' and 'feature export-marks' are relative to an
internal directory in the current repository.
In git-fast-import this means that the paths are relative to the
'.git/info/fast-import' directory. However, other importers may use a
different location.
Add 'feature non-relative-marks' to disable this behavior, this way
it is possible to, for example, specify the import-marks location as
relative, and the export-marks location as non-relative.
Also add tests to verify this behavior.
Cc: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --import-marks= option may be specified multiple times on the
commandline and should result in all marks being read in. Only one
import-marks feature may be specified in the stream, which is
overriden by any --import-marks= commandline options.
If one wishes to specify import-marks files in addition to the one
specified in the stream, it is easy to repeat the stream option as a
--import-marks= commandline option.
Also verify this behavior with tests.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test the quiet option and verify that the commandline options
override it.
Also make sure that an unknown option command is rejected and that
non-git options are ignored.
Lastly, show that unknown options are rejected when parsed on the
commandline.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows the fronted to require a specific feature to be supported
by the backend, or abort.
Also add support for four initial feature, date-format=, force=,
import-marks=, export-marks=.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a 'notemodify' subcommand of the 'commit' command. This subcommand
is similar to 'filemodify', except that no mode is supplied (all notes have
mode 0644), and the path is set to the hex SHA1 of the given "comittish".
This enables fast import of note objects along with their associated commits,
since the notes can now be named using the mark references of their
corresponding commits.
The patch also includes a test case of the added functionality.
Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though newer Porcelain tools always record the tagger information
when creating new tags, export/import pair should be able to faithfully
reproduce ancient tag objects that lack tagger information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
* js/maint-fetch-update-head:
pull: allow "git pull origin $something:$current_branch" into an unborn branch
Fix fetch/pull when run without --update-head-ok
Conflicts:
t/t5510-fetch.sh
Some confusing tutorials suggested that it would be a good idea to fetch
into the current branch with something like this:
git fetch origin master:master
(or even worse: the same command line with "pull" instead of "fetch").
While it might make sense to store what you want to pull, it typically is
plain wrong when the current branch is "master". This should only be
allowed when (an incorrect) "git pull origin master:master" tries to work
around by giving --update-head-ok to underlying "git fetch", and otherwise
we should refuse it, but somewhere along the lines we lost that behavior.
The check for the current branch is now _only_ performed in non-bare
repositories, which is an improvement from the original behaviour.
Some newer tests were depending on the broken behaviour of "git fetch"
this patch fixes, and have been adjusted.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also use "git hash-object" and "git rev-parse" without dash.
Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many test scripts assumed that they will start in a 'trash' subdirectory
that is a single level down from the t/ directory, and referred to their
test vector files by asking for files like "../t9999/expect". This will
break if we move the 'trash' subdirectory elsewhere.
To solve this, we earlier introduced "$TEST_DIRECTORY" so that they can
refer to t/ directory reliably. This finally makes all the tests use
it to refer to the outside environment.
With this patch, and a one-liner not included here (because it would
contradict with what Dscho really wants to do):
| diff --git a/t/test-lib.sh b/t/test-lib.sh
| index 70ea7e0..60e69e4 100644
| --- a/t/test-lib.sh
| +++ b/t/test-lib.sh
| @@ -485,7 +485,7 @@ fi
| . ../GIT-BUILD-OPTIONS
|
| # Test repository
| -test="trash directory"
| +test="trash directory/another level/yet another"
| rm -fr "$test" || {
| trap - exit
| echo >&5 "FATAL: Cannot prepare test area"
all the tests still pass, but we would want extra sets of eyeballs on this
type of change to really make sure.
[jc: with help from Stephan Beyer on http-push tests I do not run myself;
credits for locating silly quoting errors go to Olivier Marin.]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently fast-import/export cannot be used for
repositories with submodules. This patch extends
the relevant programs to make them correctly
process gitlinks.
Links can be represented by two forms of the
Modify command:
M 160000 SHA1 some/path
which sets the link target explicitly, or
M 160000 :mark some/path
where the mark refers to a commit. The latter
form can be used by importing tools to build
all submodules simultaneously in one physical
repository, and then simply fetch them apart.
Signed-off-by: Alexander Gavrilov <angavrilov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch changes every occurrence of "! git" -- with the meaning
that a git call has to gracefully fail -- into "test_must_fail git".
This is useful to
- make sure the test does not fail because of a signal,
e.g. SIGSEGV, and
- advertise the use of "test_must_fail" for new tests.
Signed-off-by: Stephan Beyer <s-beyer@gmx.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a general principle, we should not use "git diff" to validate the
results of what git command that is being tested has done. We would not
know if we are testing the command in question, or locating a bug in the
cute hack of "git diff --no-index".
Rather use test_cmp for that purpose.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
GIT 1.5.4.4
ident.c: reword error message when the user name cannot be determined
Fix dcommit, rebase when rewriteRoot is in use
Really make the LF after reset in fast-import optional
cmd_from() ends with a call to read_next_command(), which is needed
when using cmd_from() from commands where from is not the last element.
With reset, however, "from" is the last command, after which the flow
returns to the main loop, which calls read_next_command() again.
Because of this, always set unread_command_buf in cmd_reset_branch(),
even if cmd_from() was successful.
Add a test case for this in t9300-fast-import.sh.
Signed-off-by: Adeodato Simó <dato@net.com.org.es>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally, test_expect_failure was designed to be the opposite
of test_expect_success, but this was a bad decision. Most tests
run a series of commands that leads to the single command that
needs to be tested, like this:
test_expect_{success,failure} 'test title' '
setup1 &&
setup2 &&
setup3 &&
what is to be tested
'
And expecting a failure exit from the whole sequence misses the
point of writing tests. Your setup$N that are supposed to
succeed may have failed without even reaching what you are
trying to test. The only valid use of test_expect_failure is to
check a trivial single command that is expected to fail, which
is a minority in tests of Porcelain-ish commands.
This large-ish patch rewrites all uses of test_expect_failure to
use test_expect_success and rewrites the condition of what is
tested, like this:
test_expect_success 'test title' '
setup1 &&
setup2 &&
setup3 &&
! this command should fail
'
test_expect_failure is redefined to serve as a reminder that
that test *should* succeed but due to a known breakage in git it
currently does not pass. So if git-foo command should create a
file 'bar' but you discovered a bug that it doesn't, you can
write a test like this:
test_expect_failure 'git-foo should create bar' '
rm -f bar &&
git foo &&
test -f bar
'
This construct acts similar to test_expect_success, but instead
of reporting "ok/FAIL" like test_expect_success does, the
outcome is reported as "FIXED/still broken".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The existing checkpoint command is very useful to force fast-import
to dump the branches out to disk so that standard Git tools can
access them and the objects they refer to. However there was not a
way to know when fast-import had finished executing the checkpoint
and it was safe to read those refs.
The progress command can be used to make fast-import output any
message of the frontend's choosing to standard out. The frontend
can scan for these messages using select() or poll() to monitor a
pipe connected to the standard output of fast-import.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
For the same reasons as the prior change we want to allow frontends
to omit the trailing LF that usually delimits commands. In some
cases these just make the input stream more verbose looking than
it needs to be, and its just simpler for the frontend developer to
get started if our parser is slightly more lenient about where an
LF is required and where it isn't.
To make this optional LF feature work we now have to buffer up to one
line of input in command_buf. This buffering can happen if we look
at the current input command but don't recognize it at this point
in the code. In such a case we need to "unget" the entire line,
but we cannot depend upon the stdio library to let us do ungetc()
for that many characters at once.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
A few fast-import frontend developers have found it odd that we
require the LF following a `data` command, especially in the exact
byte count format. Technically we don't need this LF to parse
the stream properly, but having it here does make the stream more
readable to humans. We can easily make the LF optional by peeking
at the next byte available from the stream and pushing it back into
the buffer if its not LF.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Several frontend developers have asked that some form of stream
comments be permitted within a fast-import data stream. This way
they can include information from their own frontend program about
where specific data was taken from in the source system, or about
a decision that their frontend may have made while creating the
fast-import data stream.
This change introduces comments in the Bourne-shell/Tcl/Perl style.
Lines starting with '#' are ignored, up to and including the LF.
Unlike the above mentioned three languages however we do not look for
and ignore leading whitespace. This just simplifies the definition
of the comment format and the code that parses them.
To make comments work we had to stop using read_next_command() within
cmd_data() and directly invoke read_line() during the inline variant
of the function. This is necessary to retain any lines of the
input data that might otherwise look like a comment to fast-import.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Michael Haggerty <mhagger@alum.mit.edu> noticed while debugging a
Git backend for cvs2svn that fast-import was barfing when he tried
to use "TAG_FIXUP" as a branch name for temporary work needed to
cleanup the tree prior to creating an annotated tag object.
The reason we were rejecting the branch name was check_ref_format()
returns -2 when there are less than 2 '/' characters in the input
name. TAG_FIXUP has 0 '/' characters, but is technically just as
valid of a ref as HEAD and MERGE_HEAD, so we really should permit it
(and any other similar looking name) during import.
New test cases have been added to make sure we still detect very
wrong branch names (e.g. containing [ or starting with .) and yet
still permit reasonable names (e.g. TAG_FIXUP).
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The tree recursion behavior of git-diff may appear
inconsistent to the user because it depends on the format of
the patch as well as whether one is diffing between trees or
against the index.
Since git-diff is a porcelain wrapper for low-level diff
commands, it makes sense for its behavior to be consistent
no matter what is being diffed. This patch turns on
recursion in all cases.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some source material (e.g. Subversion dump files) perform directory
renames by telling us the directory was copied, then deleted in the
same revision. This makes it difficult for a frontend to convert
such data formats to a fast-import stream, as all the frontend has
on hand is "Copy a/ to b/; Delete a/" with no details about what
files are in a/, unless the frontend also kept track of all files.
The new 'C' subcommand within a commit allows the frontend to make a
recursive copy of one path to another path within the branch, without
needing to keep track of the individual file paths. The metadata
copy is performed in memory efficiently, but is implemented as a
copy-immediately operation, rather than copy-on-write.
With this new 'C' subcommand frontends could obviously implement an
'R' (rename) on their own as a combination of 'C' and 'D' (delete),
but since we have already offered up 'R' in the past and it is a
trivial thing to keep implemented I'm not going to deprecate it.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some source material (e.g. Subversion dump files) perform directory
renames without telling us exactly which files in that subdirectory
were moved. This makes it hard for a frontend to convert such data
formats to a fast-import stream, as all the frontend has on hand
is "Rename a/ to b/" with no details about what files are in a/,
unless the frontend also kept track of all files.
The new 'R' subcommand within a commit allows the frontend to
rename either a file or an entire subdirectory, without needing to
know the object's SHA-1 or the specific files contained within it.
The rename is performed as efficiently as possible internally,
making it cheaper than a 'D'/'M' pair for a file rename.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When e8438420bb allowed us to reload
the marks table on subsequent runs of fast-import we really broke
things, as we set pack_id to MAX_PACK_ID for any objects we imported
into the marks table. Creating a branch from that mark should fail
as we attempt to read the object through a non-existant packed_git
pointer. Instead we have to use the normal Git object system to
locate the older commit, as we ourselves do not have a reference
to the packed_git it resides in.
This bug only occurred because t9300 was not complete enough.
When we added the --import-marks feature we didn't actually test
its implementation enough to verify the function worked as intended.
I have corrected that, and included the changes as part of this fix.
Prior versions of fast-import fail the new test(s); this commit
allows them to pass.
Credit for this bug find goes to Simon Hausmann <simon@lst.de> as
he recently identified a similiar bug in the tree lazy-loading path.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Git tree sorting convention is more complex than just the name,
it needs to include the mode too to make sure trees sort as though
their name ends with "/".
This is a simple test case that verifies fast-import keeps the tree
ordering correct after editing the same tree twice in a single
input stream. A recent proposed patch series (that has not yet
been applied) will cause this test to fail, due to a bug in the
way the series handles sorting within the trees.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* js/diff-ni:
Get rid of the dependency to GNU diff in the tests
diff --no-index: support /dev/null as filename
diff-ni: fix the diff with standard input
diff: support reading a file from stdin via "-"
I'm giving fast-import a lesson on how to reload the marks table
using the same format it outputs with --export-marks. This way
a frontend can reload the marks table from a prior import, making
incremental imports less painful.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Now that "git diff" handles stdin and relative paths outside the
working tree correctly, we can convert all instances of "diff -u"
to "git diff".
This commit is really the result of
$ perl -pi.bak -e 's/diff -u/git diff/' $(git grep -l "diff -u" t/)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
(cherry picked from commit c699a40d68215c7e44a5b26117a35c8a56fbd387)
It was suggested on the mailing list that being able to use `from`
in any commit to reset the current branch is useful in some types of
importers, such as a darcs importer.
We originally did not permit resetting an existing branch with a
new `from` command during a `commit` command, but this restriction
was only to help debug the hacked up cvs2svn that Jon Smirl was
developing in parallel with git-fast-import. It is probably more
of a problem to disallow it than to allow it. So now we permit a
`from` during any `commit`.
While making the changes required to permit multiple `from`
commands on the same branch, I discovered we no longer needed the
last_commit field to be set to 0 during a reset, so that was removed.
(Reset was originally setting the field to 0 to signal cmd_from()
that it was OK to execute on the branch.)
While poking around in this section of fast-import I also realized
the `reset` command was not working as intended if the corresponding
`from` command was omitted (as allowed by the BNF grammar and the
code). If `from` was omitted we cleared out the tree but we left
the tree SHA-1 and parent commit SHA-1 intact. This is not what
the user intended in this case. Instead they would be trying to
reset the branch to have no parent and to have no tree, making the
branch look new-born during the next commit. We now clear these
SHA-1 values during `reset`, ensuring the branch looks new-born if
`from` does not get supplied.
New test cases for these were also added.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Most users don't need the pack boundary information that fast-import
was printing to standard output, especially if they were calling
it with --quiet.
Those users who do want this information probably want it captured
so they can go back and use it to repack the imported repository.
So dumping the boundary commits to a log file makes more sense then
printing them to standard output.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some frontends may not be able to (easily) keep track of which files
are included in the branch, and which aren't. Performing this
tracking can be tedious and error prone for the frontend to do,
especially if its foreign data source cannot supply the changed
path list on a per-commit basis.
fast-import now allows a frontend to request that a branch's tree
be wiped clean (reset to the empty tree) at the start of a commit,
allowing the frontend to feed in all paths which belong on the branch.
This is ideal for a tar-file importer frontend, for example, as
the frontend just needs to reformat the tar data stream into a gfi
data stream, which may be something a few Perl regexps can take
care of. :)
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If fast-import is being used to update an existing branch of
a repository, the user may not want to lose commits if another
process updates the same ref at the same time. For example, the
user might be using fast-import to make just one or two commits
against a live branch.
We now perform a fast-forward check during the ref updating process.
If updating a branch would cause commits in that branch to be lost,
we skip over it and display the new SHA1 to standard error.
This new default behavior can be overridden with `--force`, like
git-push and git-fetch.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since some frontends may be working with source material where
the dates are only readily available as RFC 2822 strings, it is
more friendly if fast-import exposes Git's parse_date() function
to handle the conversion. This way the frontend doesn't need
to perform the parsing itself.
The new --date-format option to fast-import can be used by a
frontend to select which format it will supply date strings in.
The default is the standard `raw` Git format, which fast-import
has always supported. Format rfc2822 can be used to activate the
parse_date() function instead.
Because fast-import could also be useful for creating new, current
commits, the format `now` is also supported to generate the current
system timestamp. The implementation of `now` is a trivial call
to datestamp(), but is actually a whole whopping 3 lines so that
fast-import can verify the frontend really meant `now`.
As part of this change I have added validation of the `raw` date
format. Prior to this change fast-import would accept anything
in a `committer` command, even if it was seriously malformed.
Now fast-import requires the '> ' near the end of the string and
verifies the timestamp is formatted properly.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Its very annoying to need to specify the file content ahead of a
commit and use marks to connect the individual blobs to the commit's
file modification entry, especially if the frontend can't/won't
generate the blob SHA1s itself. Instead it would much easier to
use if we can accept the blob data at the same time as we receive
each file_change line.
Now fast-import accepts 'inline' instead of a mark idnum or blob
SHA1 within the 'M' type file_change command. If an inline is
detected the very next line must be a 'data n' command, supplying
the file data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It is error prone to list the value of each file twice, instead we
should list the value only once early in the script and reuse the
shell variable when we need to access it.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Now that its easier to craft test cases (thanks to 'data <<')
we should start to verify fast-import works as expected.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>