git-commit-vandalism/builtin
Jeff King a872275098 teach fast-export an --anonymize option
Sometimes users want to report a bug they experience on
their repository, but they are not at liberty to share the
contents of the repository. It would be useful if they could
produce a repository that has a similar shape to its history
and tree, but without leaking any information. This
"anonymized" repository could then be shared with developers
(assuming it still replicates the original problem).

This patch implements an "--anonymize" option to
fast-export, which generates a stream that can recreate such
a repository. Producing a single stream makes it easy for
the caller to verify that they are not leaking any useful
information. You can get an overview of what will be shared
by running a command like:

  git fast-export --anonymize --all |
  perl -pe 's/\d+/X/g' |
  sort -u |
  less

which will show every unique line we generate, modulo any
numbers (each anonymized token is assigned a number, like
"User 0", and we replace it consistently in the output).

In addition to anonymizing, this produces test cases that
are relatively small (compared to the original repository)
and fast to generate (compared to using filter-branch, or
modifying the output of fast-export yourself). Here are
numbers for git.git:

  $ time git fast-export --anonymize --all \
         --tag-of-filtered-object=drop >output
  real    0m2.883s
  user    0m2.828s
  sys     0m0.052s

  $ gzip output
  $ ls -lh output.gz | awk '{print $5}'
  2.9M

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 10:42:16 -07:00
..
add.c read-cache: new API write_locked_index instead of write_index/write_cache 2014-06-13 11:49:10 -07:00
annotate.c annotate: use argv_array 2014-07-16 11:10:11 -07:00
apply.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
archive.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
bisect--helper.c Replace deprecated OPT_BOOLEAN by OPT_BOOL 2013-08-05 11:32:19 -07:00
blame.c Merge branch 'rs/code-cleaning' 2014-07-22 10:59:37 -07:00
branch.c refactor skip_prefix to return a boolean 2014-06-20 10:44:43 -07:00
bundle.c
cat-file.c Merge branch 'jk/warn-on-object-refname-ambiguity' 2014-03-25 11:07:36 -07:00
check-attr.c Merge branch 'jc/check-attr-honor-working-tree' into maint 2014-03-18 14:03:03 -07:00
check-ignore.c Merge branch 'dw/check-ignore-sans-index' 2013-09-20 12:37:32 -07:00
check-mailmap.c builtin: add git-check-mailmap command 2013-07-13 10:19:37 -07:00
check-ref-format.c
checkout-index.c entry.c: update cache_changed if refresh_cache is set in checkout_entry() 2014-06-13 11:49:39 -07:00
checkout.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
clean.c use xcalloc() to allocate zero-initialized memory 2014-07-21 10:30:21 -07:00
clone.c use local cloning if insteadOf makes a local URL 2014-07-17 11:17:13 -07:00
column.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
commit-tree.c commit_tree: take a pointer/len pair rather than a const strbuf 2014-06-12 10:29:41 -07:00
commit.c Merge branch 'ta/string-list-init' 2014-07-23 11:35:54 -07:00
config.c Merge branch 'jk/daemon-tolower' 2014-06-16 10:07:15 -07:00
count-objects.c count-objects: add -H option to humanize sizes 2013-04-10 13:27:26 -07:00
credential.c
describe.c hashmap: add simplified hashmap_get_from_hash() API 2014-07-07 13:56:35 -07:00
diff-files.c convert read_cache_preload() to take struct pathspec 2013-07-15 10:56:08 -07:00
diff-index.c convert read_cache_preload() to take struct pathspec 2013-07-15 10:56:08 -07:00
diff-tree.c Merge branch 'jk/alloc-commit-id' 2014-07-22 10:59:25 -07:00
diff.c Merge branch 'tg/diff-no-index-refactor' 2013-12-27 14:58:17 -08:00
fast-export.c teach fast-export an --anonymize option 2014-08-27 10:42:16 -07:00
fetch-pack.c Merge branch 'nd/shallow-clone' 2014-01-17 12:21:20 -08:00
fetch.c Merge branch 'jk/xstrfmt' 2014-07-09 11:34:05 -07:00
fmt-merge-msg.c Merge branch 'jk/xstrfmt' 2014-07-09 11:34:05 -07:00
for-each-ref.c use commit_list_count() to count the members of commit_lists 2014-07-17 13:36:25 -07:00
fsck.c refs.c: add a public is_branch function 2014-07-16 13:06:41 -07:00
gc.c Merge branch 'nd/daemonize-gc' into maint 2014-06-25 11:47:36 -07:00
get-tar-commit-id.c stop installing git-tar-tree link 2013-12-03 12:35:22 -08:00
grep.c Merge branch 'sk/spawn-less-case-insensitively-from-grep-O-i' into maint 2014-06-25 11:47:49 -07:00
hash-object.c hash-object: replace stdin parsing OPT_BOOLEAN by OPT_COUNTUP 2013-08-07 08:30:55 -07:00
help.c builtin/help.c: speed up is_git_command() by checking for builtin commands first 2014-01-06 11:26:31 -08:00
index-pack.c Merge branch 'maint' 2014-07-21 12:35:39 -07:00
init-db.c i18n: only extract comments marked with "TRANSLATORS:" 2014-04-17 11:09:56 -07:00
log.c Merge branch 'jk/commit-buffer-length' into maint 2014-07-16 11:16:38 -07:00
ls-files.c pathspec: pass directory indicator to match_pathspec_item() 2014-02-24 14:37:19 -08:00
ls-remote.c builtin/ls-remote.c: rearrange xcalloc arguments 2014-05-27 14:00:43 -07:00
ls-tree.c pathspec: rename match_pathspec_depth() to match_pathspec() 2014-02-24 14:37:14 -08:00
mailinfo.c Merge branch 'rs/mailinfo-header-cmp' into maint 2014-06-25 11:48:23 -07:00
mailsplit.c mailsplit: sort maildir filenames more cleverly 2013-03-02 22:52:44 -08:00
merge-base.c Merge branch 'bm/merge-base-octopus-dedup' into maint 2014-02-13 13:38:59 -08:00
merge-file.c Replace deprecated OPT_BOOLEAN by OPT_BOOL 2013-08-05 11:32:19 -07:00
merge-index.c Convert "struct cache_entry *" to "const ..." wherever possible 2013-07-09 09:12:48 -07:00
merge-ours.c
merge-recursive.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
merge-tree.c merge-tree: handle directory/empty conflict correctly 2013-05-06 22:17:00 -07:00
merge.c Merge branch 'rs/code-cleaning' 2014-07-16 11:33:09 -07:00
mktag.c
mktree.c builtin/mktree.c: use ALLOC_GROW() in append_to_tree() 2014-03-03 14:54:45 -08:00
mv.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
name-rev.c use xstrfmt to replace xmalloc + strcpy/strcat 2014-06-19 15:20:54 -07:00
notes.c Merge branch 'mh/ref-transaction' 2014-06-03 12:06:41 -07:00
pack-objects.c Merge branch 'jk/repack-pack-writebitmaps-config' 2014-06-25 12:23:19 -07:00
pack-redundant.c
pack-refs.c pack-refs: merge code from pack-refs.{c,h} into refs.{c,h} 2013-05-01 15:33:11 -07:00
patch-id.c patch-id: make it stable against hunk reordering 2014-06-10 13:09:24 -07:00
prune-packed.c i18n: mark all progress lines for translation 2014-02-24 09:08:37 -08:00
prune.c Merge branch 'mh/replace-refs-variable-rename' 2014-03-14 14:27:06 -07:00
push.c refactor skip_prefix to return a boolean 2014-06-20 10:44:43 -07:00
read-tree.c read-tree: note about dropping split-index mode or index version 2014-06-13 11:49:41 -07:00
receive-pack.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
reflog.c refs.c: add new functions reflog_exists and delete_reflog 2014-05-08 14:31:43 -07:00
remote-ext.c
remote-fd.c
remote.c Merge branch 'rs/ref-transaction-0' 2014-07-21 11:18:37 -07:00
repack.c Merge branch 'jk/strip-suffix' 2014-07-16 11:26:00 -07:00
replace.c Merge branch 'cc/replace-graft' 2014-07-27 15:14:18 -07:00
rerere.c rerere: fix for merge.conflictstyle 2014-04-30 10:30:02 -07:00
reset.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
rev-list.c commit: record buffer length in cache 2014-06-13 12:09:38 -07:00
rev-parse.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
revert.c parse-options: multi-word argh should use dash to separate words 2014-03-24 10:43:34 -07:00
rm.c read-cache: new API write_locked_index instead of write_index/write_cache 2014-06-13 11:49:10 -07:00
send-pack.c Merge branch 'nd/shallow-clone' 2014-01-17 12:21:20 -08:00
shortlog.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
show-branch.c Merge branch 'jk/misc-fixes-maint' 2014-07-28 11:30:41 -07:00
show-ref.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
stripspace.c builtin/stripspace.c: fix broken indentation 2013-09-06 13:33:17 -07:00
symbolic-ref.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
tag.c Merge branch 'jk/tag-sort' 2014-07-23 11:35:45 -07:00
unpack-file.c
unpack-objects.c Merge branch 'mh/replace-refs-variable-rename' 2014-03-14 14:27:06 -07:00
update-index.c Merge branch 'nd/split-index' 2014-07-16 11:25:40 -07:00
update-ref.c refs.c: change ref_transaction_update() to do error checking and return status 2014-07-14 11:54:42 -07:00
update-server-info.c
upload-archive.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
var.c
verify-commit.c verify-commit: scriptable commit signature verification 2014-06-23 15:50:31 -07:00
verify-pack.c verify-pack: use strbuf_strip_suffix 2014-06-30 13:43:32 -07:00
verify-tag.c
write-tree.c