Some source material (e.g. Subversion dump files) perform directory
renames by telling us the directory was copied, then deleted in the
same revision. This makes it difficult for a frontend to convert
such data formats to a fast-import stream, as all the frontend has
on hand is "Copy a/ to b/; Delete a/" with no details about what
files are in a/, unless the frontend also kept track of all files.
The new 'C' subcommand within a commit allows the frontend to make a
recursive copy of one path to another path within the branch, without
needing to keep track of the individual file paths. The metadata
copy is performed in memory efficiently, but is implemented as a
copy-immediately operation, rather than copy-on-write.
With this new 'C' subcommand frontends could obviously implement an
'R' (rename) on their own as a combination of 'C' and 'D' (delete),
but since we have already offered up 'R' in the past and it is a
trivial thing to keep implemented I'm not going to deprecate it.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some source material (e.g. Subversion dump files) perform directory
renames without telling us exactly which files in that subdirectory
were moved. This makes it hard for a frontend to convert such data
formats to a fast-import stream, as all the frontend has on hand
is "Rename a/ to b/" with no details about what files are in a/,
unless the frontend also kept track of all files.
The new 'R' subcommand within a commit allows the frontend to
rename either a file or an entire subdirectory, without needing to
know the object's SHA-1 or the specific files contained within it.
The rename is performed as efficiently as possible internally,
making it cheaper than a 'D'/'M' pair for a file rename.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When e8438420bb allowed us to reload
the marks table on subsequent runs of fast-import we really broke
things, as we set pack_id to MAX_PACK_ID for any objects we imported
into the marks table. Creating a branch from that mark should fail
as we attempt to read the object through a non-existant packed_git
pointer. Instead we have to use the normal Git object system to
locate the older commit, as we ourselves do not have a reference
to the packed_git it resides in.
This bug only occurred because t9300 was not complete enough.
When we added the --import-marks feature we didn't actually test
its implementation enough to verify the function worked as intended.
I have corrected that, and included the changes as part of this fix.
Prior versions of fast-import fail the new test(s); this commit
allows them to pass.
Credit for this bug find goes to Simon Hausmann <simon@lst.de> as
he recently identified a similiar bug in the tree lazy-loading path.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
To resolve a corner case uncovered by Simon Hausmann I need to
reuse the logic for the SHA-1 expression version of the 'from '
command within the mark version of the 'from ' command. This change
doesn't alter any functionality, but is merely breaking the common
code out to a function that I can reuse.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Commit a5c1780a03 sets the pack_id of existing
objects to MAX_PACK_ID. When the same object is referenced later again it is
found in the local object hash. With such a pack_id fast-import should not try
to locate that object in the newly created pack(s).
Signed-off-by: Simon Hausmann <simon@lst.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Fix uninitialized last_object->no_free variable that is accessed in
store_object.
Signed-off-by: Simon Hausmann <simon@lst.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
git-checkout is also adapted to make use of this new option
instead of the handcrafted command sequence.
Signed-off-by: Sven Verdoolaege <skimo@kotnet.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Include a generalized fixup_pack_header_footer() in this new file.
Needed by git-repack --max-pack-size feature in a later patchset.
[sp: Moved close(pack_fd) to callers, to support index-pack, and
changed name to better indicate it is for packfiles.]
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* maint:
http.c: Fix problem with repeated calls of http_init
Add missing reference to GIT_COMMITTER_DATE in git-commit-tree documentation
Fix import-tars fix.
Update .mailmap with "Michael"
Do not barf on too long action description
Catch empty pathnames in trees during fsck
Don't allow empty pathnames in fast-import
import-tars: be nice to wrong directory modes
git-svn: Added 'find-rev' command
git shortlog documentation: add long options and fix a typo
riddochc on #git noticed corruption caused by import-tars. This
was fixed in the prior commit by Dscho, but fast-import was wrong
to have allowed a tree to be created with an empty string as the
filename. No operating system allows this, and Git itself doesn't
accept this into the index.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some users of fast-import have been trying to use it to rewrite
commits and trees, an activity where the all of the relevant blobs
are already available from the existing packfiles. In such a case
we don't want to repack a blob, even if the frontend application
has supplied us the raw data rather than a mark or a SHA-1 name.
I'm intentionally only checking the packfiles that existed when
fast-import started and am always ignoring all loose object files.
We ignore loose objects because fast-import tends to operate on a
very large number of objects in a very short timespan, and it is
usually creating new objects, not reusing existing ones. In such
a situtation the majority of the objects will not be found in the
existing packfiles, nor will they be loose object files. If the
frontend application really wants us to look at loose object files,
then they can just repack the repository before running fast-import.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This fixes a problem reported by Randal Schwartz:
>I finally tracked down all the (albeit inconsequential) errors I was getting
>on both OpenBSD and OSX. It's the warn() function in usage.c. There's
>warn(3) in BSD-style distros. It'd take a "great rename" to change it, but if
>someone with better C skills than I have could do that, my linker and I would
>appreciate it.
It was annoying to me, too, when I was doing some mergetool testing on
Mac OS X, so here's a fix.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Randal L. Schwartz" <merlyn@stonehenge.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When some operations are interrupted (or "die()'d" or crashed) then the
partial object/pack/index file may remain around. Make it more obvious
in their name that those files are temporary stuff and can be cleaned up
if no operation is in progress.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Jeff King pointed out that these casts are quite unnecessary, as
the compiler should be doing them anyway, and may cause problems
in the future if the size of the argument for to_atom were to ever
be increased.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When building up a tree for a commit, fast-import
dynamically allocates memory for the tree entries. When more
space is needed, the allocated memory is increased by a
constant amount. For very large trees, this means
re-allocating and memcpy()ing the memory O(n) times.
To compound this problem, releasing the previous tree
resource does not free the memory; it is kept in a pool
for future trees. This means that each of the O(n)
allocations will consume increasing amounts of memory,
giving O(n^2) memory consumption.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* 'master' of git://repo.or.cz/git/fastimport:
Allow fast-import frontends to reload the marks table
Use atomic updates to the fast-import mark file
Preallocate memory earlier in fast-import
I'm giving fast-import a lesson on how to reload the marks table
using the same format it outputs with --export-marks. This way
a frontend can reload the marks table from a prior import, making
incremental imports less painful.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When we allow fast-import frontends to reload a mark file from a
prior session we want to let them use the same file as they exported
the marks to. This makes it very simple for the frontend to save
state across incremental imports.
But we don't want to lose the old marks table if anything goes wrong
while writing our current marks table. So instead of truncating and
overwriting the path specified to --export-marks we use the standard
lockfile code to write the current marks out to a temporary file,
then rename it over the old marks table.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
I'm about to teach fast-import how to reload the marks file created
by a prior session. The general approach that I want to use is to
immediately parse the marks file when the specific argument is found
in argv, thereby allowing the caller to supply multiple marks files,
as the mark space can be sparsely populated.
To make that work out we need to allocate our object tables before
we parse the command line options. Since none of these tables
depend on the command line options, we can easily relocate them.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Always use an off_t value in pack-objects anytime we are dealing
with an offset to some data within a packfile.
Also fixed a minor uintmax_t that was incorrectly defined before.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Not all platforms have declared 'unsigned long' to be a 64 bit value,
but we want to support a 64 bit packfile (or close enough anyway)
in the near future as some projects are getting large enough that
their packed size exceeds 4 GiB.
By using off_t, the POSIX type that is declared to mean an offset
within a file, we support whatever maximum file size the underlying
operating system will handle. For most modern systems this is up
around 2^60 or higher.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
fast-import: Fail if a non-existant commit is used for merge
fast-import: Avoid infinite loop after reset
[sp: Minor evil merge to deal with type_names array moving
to be private in 'master'.]
Johannes Sixt noticed during one of his own imports that fast-import
did not fail if a non-existant commit is referenced by SHA-1 value
as an argument to the 'merge' command. This allowed the user to
unknowingly create commits that would fail in fsck, as the commit
contents would not be completely reachable.
A side effect of this bug was that a frontend process could mark
any SHA-1 object (blob, tree, tag) as a parent of a merge commit.
This should also fail in fsck, as the commit is not a valid commit.
We now use the same rule as the 'from' command. If a commit is
referenced in the 'merge' command by hex formatted SHA-1 then the
SHA-1 must be a commit or a tag that can be peeled back to a commit,
the commit must already exist, and must be readable by the core Git
infrastructure code. This requirement means that the commit must
have existed prior to fast-import starting, or the commit must have
been flushed out by a prior 'checkpoint' command.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Johannes Sixt noticed that a 'reset' command applied to a branch that
is already active in the branch LRU cache can cause fast-import to
relink the same branch into the LRU cache twice. This will cause
the LRU cache to contain a cycle, making unload_one_branch run in an
infinite loop as it tries to select the oldest branch for eviction.
I have trivially fixed the problem by adding an active bit to
each branch object; this bit indicates if the branch is already
in the LRU and allows us to avoid trying to add it a second time.
Converting the pack_id field into a bitfield makes this change take
up no additional memory.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Sometime typename() is used, sometimes type_names[] is accessed directly.
Let's enforce typename() all the time which allows for validating the
type.
Also let's add a function to go from a name to a type and use it instead
of manual memcpy() when appropriate.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Previous step converted use of strncmp() with literal string
mechanically even when the result is only used as a boolean:
if (!strncmp("foo", arg, 3)) ==> if (!(-prefixcmp(arg, "foo")))
This step manually cleans them up to read:
if (!prefixcmp(arg, "foo"))
Signed-off-by: Junio C Hamano <junkio@cox.net>
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily. Leftover from this will be fixed in a separate step, including
idiotic conversions like
if (!strncmp("foo", arg, 3))
=>
if (!(-prefixcmp(arg, "foo")))
This was done by using this script in px.perl
#!/usr/bin/perl -i.bak -p
if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
}
if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
}
and running:
$ git grep -l strncmp -- '*.c' | xargs perl px.perl
Signed-off-by: Junio C Hamano <junkio@cox.net>
Thanks to Simon 'corecode' Schubert <corecode@fs.ei.tum.de> for
the clean-up. Defining the C99 standard PRIuMAX when necessary
replaces UM_FMT and the awkward UM10_FMT. There are no direct
C99 translations for other uses of NO_C99_FORMAT in git, alas.
Signed-off-by: Jason Riedy <ejr@cs.berkeley.edu>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Define UM_FMT and UM10_FMT and use in place of %ju and %10ju,
respectively. Both format as unsigned long long, so this
assumes the compiler supports long long.
Signed-off-by: Jason Riedy <jason@acm.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It was suggested on the mailing list that being able to use `from`
in any commit to reset the current branch is useful in some types of
importers, such as a darcs importer.
We originally did not permit resetting an existing branch with a
new `from` command during a `commit` command, but this restriction
was only to help debug the hacked up cvs2svn that Jon Smirl was
developing in parallel with git-fast-import. It is probably more
of a problem to disallow it than to allow it. So now we permit a
`from` during any `commit`.
While making the changes required to permit multiple `from`
commands on the same branch, I discovered we no longer needed the
last_commit field to be set to 0 during a reset, so that was removed.
(Reset was originally setting the field to 0 to signal cmd_from()
that it was OK to execute on the branch.)
While poking around in this section of fast-import I also realized
the `reset` command was not working as intended if the corresponding
`from` command was omitted (as allowed by the BNF grammar and the
code). If `from` was omitted we cleared out the tree but we left
the tree SHA-1 and parent commit SHA-1 intact. This is not what
the user intended in this case. Instead they would be trying to
reset the branch to have no parent and to have no tree, making the
branch look new-born during the next commit. We now clear these
SHA-1 values during `reset`, ensuring the branch looks new-born if
`from` does not get supplied.
New test cases for these were also added.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Most users don't need the pack boundary information that fast-import
was printing to standard output, especially if they were calling
it with --quiet.
Those users who do want this information probably want it captured
so they can go back and use it to repack the imported repository.
So dumping the boundary commits to a log file makes more sense then
printing them to standard output.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Not on all platforms are size_t and unsigned long equivalent.
Since I do not know how portable %z is, I play safe, and just
cast the respective variables to unsigned long.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Apparently fast-import used to die a horrible death if we
were unable to open the marks file for output. This is
slightly less than ideal, especially now that we dump
the marks as part of the `checkpoint` command.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the frontend asks us to checkpoint (via the explicit checkpoint
command) its probably because they are afraid the current import
will crash/fail/whatever and want to make sure they can pickup from
the last checkpoint. To do that sort of recovery, we will need the
current tip of every branch and tag available at the next startup.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Often users will be running fast-import from within a larger frontend
process, and this may be a frequent periodic tool such as a future
edition of `git-svn fetch`. We don't want to bombard users with our
large stats output if they won't be interested in it, so `--quiet`
is now an option to make gfi be more silent.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Some frontends may not be able to (easily) keep track of which files
are included in the branch, and which aren't. Performing this
tracking can be tedious and error prone for the frontend to do,
especially if its foreign data source cannot supply the changed
path list on a per-commit basis.
fast-import now allows a frontend to request that a branch's tree
be wiped clean (reset to the empty tree) at the start of a commit,
allowing the frontend to feed in all paths which belong on the branch.
This is ideal for a tar-file importer frontend, for example, as
the frontend just needs to reformat the tar data stream into a gfi
data stream, which may be something a few Perl regexps can take
care of. :)
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If fast-import is being used to update an existing branch of
a repository, the user may not want to lose commits if another
process updates the same ref at the same time. For example, the
user might be using fast-import to make just one or two commits
against a live branch.
We now perform a fast-forward check during the ref updating process.
If updating a branch would cause commits in that branch to be lost,
we skip over it and display the new SHA1 to standard error.
This new default behavior can be overridden with `--force`, like
git-push and git-fetch.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since some frontends may be working with source material where
the dates are only readily available as RFC 2822 strings, it is
more friendly if fast-import exposes Git's parse_date() function
to handle the conversion. This way the frontend doesn't need
to perform the parsing itself.
The new --date-format option to fast-import can be used by a
frontend to select which format it will supply date strings in.
The default is the standard `raw` Git format, which fast-import
has always supported. Format rfc2822 can be used to activate the
parse_date() function instead.
Because fast-import could also be useful for creating new, current
commits, the format `now` is also supported to generate the current
system timestamp. The implementation of `now` is a trivial call
to datestamp(), but is actually a whole whopping 3 lines so that
fast-import can verify the frontend really meant `now`.
As part of this change I have added validation of the `raw` date
format. Prior to this change fast-import would accept anything
in a `committer` command, even if it was seriously malformed.
Now fast-import requires the '> ' near the end of the string and
verifies the timestamp is formatted properly.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
There is no need to check for a NULL pointer before invoking free(),
the runtime library automatically performs this check anyway and
does nothing if a NULL pointer is supplied.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Junio noticed that I was using a different style in fast-import
for returned pointers than the rest of Git. Before merging this
code into the main git.git tree I'd like to make it consistent,
as this style variation was not intentional.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Junio noticed these warnings/errors in fast-import when compiling
with `-Werror -ansi -pedantic`. A few changes are to reduce compiler
warnings, while one (in cmd_merge) is a bug fix.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>