git-commit-vandalism/Documentation/git-fast-export.txt
Elijah Newren 530ca19c02 fast-export: add --reference-excluded-parents option
git filter-branch has a nifty feature allowing you to rewrite, e.g. just
the last 8 commits of a linear history
  git filter-branch $OPTIONS HEAD~8..HEAD

If you try the same with git fast-export, you instead get a history of
only 8 commits, with HEAD~7 being rewritten into a root commit.  There
are two alternatives:

  1) Don't use the negative revision specification, and when you're
     filtering the output to make modifications to the last 8 commits,
     just be careful to not modify any earlier commits somehow.

  2) First run 'git fast-export --export-marks=somefile HEAD~8', then
     run 'git fast-export --import-marks=somefile HEAD~8..HEAD'.

Both are more error prone than I'd like (the first for obvious reasons;
with the second option I have sometimes accidentally included too many
revisions in the first command and then found that the corresponding
extra revisions were not exported by the second command and thus were
not modified as I expected).  Also, both are poor from a performance
perspective.

Add a new --reference-excluded-parents option which will cause
fast-export to refer to commits outside the specified rev-list-args
range by their sha1sum.  Such a stream will only be useful in a
repository which already contains the necessary commits (much like the
restriction imposed when using --no-data).

Note from Peff:
  I think we might be able to do a little more optimization here. If
  we're exporting HEAD^..HEAD and there's an object in HEAD^ which is
  unchanged in HEAD, I think we'd still print it (because it would not
  be marked SHOWN), but we could omit it (by walking the tree of the
  boundary commits and marking them shown).  I don't think it's a
  blocker for what you're doing here, but just a possible future
  optimization.

Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-17 18:43:52 +09:00

233 lines
8.6 KiB
Plaintext

git-fast-export(1)
==================
NAME
----
git-fast-export - Git data exporter
SYNOPSIS
--------
[verse]
'git fast-export [<options>]' | 'git fast-import'
DESCRIPTION
-----------
This program dumps the given revisions in a form suitable to be piped
into 'git fast-import'.
You can use it as a human-readable bundle replacement (see
linkgit:git-bundle[1]), or as a kind of an interactive
'git filter-branch'.
OPTIONS
-------
--progress=<n>::
Insert 'progress' statements every <n> objects, to be shown by
'git fast-import' during import.
--signed-tags=(verbatim|warn|warn-strip|strip|abort)::
Specify how to handle signed tags. Since any transformation
after the export can change the tag names (which can also happen
when excluding revisions) the signatures will not match.
+
When asking to 'abort' (which is the default), this program will die
when encountering a signed tag. With 'strip', the tags will silently
be made unsigned, with 'warn-strip' they will be made unsigned but a
warning will be displayed, with 'verbatim', they will be silently
exported and with 'warn', they will be exported, but you will see a
warning.
--tag-of-filtered-object=(abort|drop|rewrite)::
Specify how to handle tags whose tagged object is filtered out.
Since revisions and files to export can be limited by path,
tagged objects may be filtered completely.
+
When asking to 'abort' (which is the default), this program will die
when encountering such a tag. With 'drop' it will omit such tags from
the output. With 'rewrite', if the tagged object is a commit, it will
rewrite the tag to tag an ancestor commit (via parent rewriting; see
linkgit:git-rev-list[1])
-M::
-C::
Perform move and/or copy detection, as described in the
linkgit:git-diff[1] manual page, and use it to generate
rename and copy commands in the output dump.
+
Note that earlier versions of this command did not complain and
produced incorrect results if you gave these options.
--export-marks=<file>::
Dumps the internal marks table to <file> when complete.
Marks are written one per line as `:markid SHA-1`. Only marks
for revisions are dumped; marks for blobs are ignored.
Backends can use this file to validate imports after they
have been completed, or to save the marks table across
incremental runs. As <file> is only opened and truncated
at completion, the same path can also be safely given to
--import-marks.
The file will not be written if no new object has been
marked/exported.
--import-marks=<file>::
Before processing any input, load the marks specified in
<file>. The input file must exist, must be readable, and
must use the same format as produced by --export-marks.
+
Any commits that have already been marked will not be exported again.
If the backend uses a similar --import-marks file, this allows for
incremental bidirectional exporting of the repository by keeping the
marks the same across runs.
--fake-missing-tagger::
Some old repositories have tags without a tagger. The
fast-import protocol was pretty strict about that, and did not
allow that. So fake a tagger to be able to fast-import the
output.
--use-done-feature::
Start the stream with a 'feature done' stanza, and terminate
it with a 'done' command.
--no-data::
Skip output of blob objects and instead refer to blobs via
their original SHA-1 hash. This is useful when rewriting the
directory structure or history of a repository without
touching the contents of individual files. Note that the
resulting stream can only be used by a repository which
already contains the necessary objects.
--full-tree::
This option will cause fast-export to issue a "deleteall"
directive for each commit followed by a full list of all files
in the commit (as opposed to just listing the files which are
different from the commit's first parent).
--anonymize::
Anonymize the contents of the repository while still retaining
the shape of the history and stored tree. See the section on
`ANONYMIZING` below.
--reference-excluded-parents::
By default, running a command such as `git fast-export
master~5..master` will not include the commit master{tilde}5
and will make master{tilde}4 no longer have master{tilde}5 as
a parent (though both the old master{tilde}4 and new
master{tilde}4 will have all the same files). Use
--reference-excluded-parents to instead have the the stream
refer to commits in the excluded range of history by their
sha1sum. Note that the resulting stream can only be used by a
repository which already contains the necessary parent
commits.
--refspec::
Apply the specified refspec to each ref exported. Multiple of them can
be specified.
[<git-rev-list-args>...]::
A list of arguments, acceptable to 'git rev-parse' and
'git rev-list', that specifies the specific objects and references
to export. For example, `master~10..master` causes the
current master reference to be exported along with all objects
added since its 10th ancestor commit and (unless the
--reference-excluded-parents option is specified) all files
common to master{tilde}9 and master{tilde}10.
EXAMPLES
--------
-------------------------------------------------------------------
$ git fast-export --all | (cd /empty/repository && git fast-import)
-------------------------------------------------------------------
This will export the whole repository and import it into the existing
empty repository. Except for reencoding commits that are not in
UTF-8, it would be a one-to-one mirror.
-----------------------------------------------------
$ git fast-export master~5..master |
sed "s|refs/heads/master|refs/heads/other|" |
git fast-import
-----------------------------------------------------
This makes a new branch called 'other' from 'master~5..master'
(i.e. if 'master' has linear history, it will take the last 5 commits).
Note that this assumes that none of the blobs and commit messages
referenced by that revision range contains the string
'refs/heads/master'.
ANONYMIZING
-----------
If the `--anonymize` option is given, git will attempt to remove all
identifying information from the repository while still retaining enough
of the original tree and history patterns to reproduce some bugs. The
goal is that a git bug which is found on a private repository will
persist in the anonymized repository, and the latter can be shared with
git developers to help solve the bug.
With this option, git will replace all refnames, paths, blob contents,
commit and tag messages, names, and email addresses in the output with
anonymized data. Two instances of the same string will be replaced
equivalently (e.g., two commits with the same author will have the same
anonymized author in the output, but bear no resemblance to the original
author string). The relationship between commits, branches, and tags is
retained, as well as the commit timestamps (but the commit messages and
refnames bear no resemblance to the originals). The relative makeup of
the tree is retained (e.g., if you have a root tree with 10 files and 3
trees, so will the output), but their names and the contents of the
files will be replaced.
If you think you have found a git bug, you can start by exporting an
anonymized stream of the whole repository:
---------------------------------------------------
$ git fast-export --anonymize --all >anon-stream
---------------------------------------------------
Then confirm that the bug persists in a repository created from that
stream (many bugs will not, as they really do depend on the exact
repository contents):
---------------------------------------------------
$ git init anon-repo
$ cd anon-repo
$ git fast-import <../anon-stream
$ ... test your bug ...
---------------------------------------------------
If the anonymized repository shows the bug, it may be worth sharing
`anon-stream` along with a regular bug report. Note that the anonymized
stream compresses very well, so gzipping it is encouraged. If you want
to examine the stream to see that it does not contain any private data,
you can peruse it directly before sending. You may also want to try:
---------------------------------------------------
$ perl -pe 's/\d+/X/g' <anon-stream | sort -u | less
---------------------------------------------------
which shows all of the unique lines (with numbers converted to "X", to
collapse "User 0", "User 1", etc into "User X"). This produces a much
smaller output, and it is usually easy to quickly confirm that there is
no private data in the stream.
LIMITATIONS
-----------
Since 'git fast-import' cannot tag trees, you will not be
able to export the linux.git repository completely, as it contains
a tag referencing a tree instead of a commit.
SEE ALSO
--------
linkgit:git-fast-import[1]
GIT
---
Part of the linkgit:git[1] suite