git-commit-vandalism/Documentation
Thomas Braun e0e7cb8080 log -G: ignore binary files
The -G<regex> option of log looks for the differences whose patch text
contains added/removed lines that match regex.

Currently -G looks also into patches of binary files (which
according to [1]) is binary as well.

This has a couple of issues:

- It makes the pickaxe search slow. In a proprietary repository of the
  author with only ~5500 commits and a total .git size of ~300MB
  searching takes ~13 seconds

    $time git log -Gwave > /dev/null

    real    0m13,241s
    user    0m12,596s
    sys     0m0,644s

  whereas when we ignore binary files with this patch it takes ~4s

    $time ~/devel/git/git log -Gwave > /dev/null

    real    0m3,713s
    user    0m3,608s
    sys     0m0,105s

  which is a speedup of more than fourfold.

- The internally used algorithm for generating patch text is based on
  xdiff and its states in [1]

  > The output format of the binary patch file is proprietary
  > (and binary) and it is basically a collection of copy and insert
  > commands [..]

  which means that the current format could change once the internal
  algorithm is changed as the format is not standardized. In addition
  the git binary patch format used for preparing patches for git apply
  is *different* from the xdiff format as can be seen by comparing

  git log -p -a

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor <author@example.com>
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84..edfeb6f 100644
    --- a/data.bin
    +++ b/data.bin
    @@ -1,2 +1,4 @@
     a
     a^@a
    +a
    +a^@a

  with git log --binary

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor <author@example.com>
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84bd3aa25fa07836bb1fb73db784635e24b..edfeb6f501[..]
    GIT binary patch
    literal 12
    QcmYe~N@Pgn0zx1O01)N^ZvX%Q

    literal 6
    NcmYe~N@Pgn0ssWg0XP5v

  which seems unexpected.

To resolve these issues this patch makes -G<regex> ignore binary files
by default. Textconv filters are supported and also -a/--text for
getting the old and broken behaviour back.

The -S<block of text> option of log looks for differences that changes
the number of occurrences of the specified block of text (i.e.
addition/deletion) in a file. As we want to keep the current behaviour,
add a test to ensure it stays that way.

[1]: http://www.xmailserver.org/xdiff.html

Signed-off-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-26 14:59:37 -08:00
..
config index: make index.threads=true enable ieot and eoie 2018-11-21 16:46:54 +09:00
howto Merge branch 'uk/merge-subtree-doc-update' into maint 2018-11-21 22:58:08 +09:00
RelNotes Git 2.20-rc1 2018-11-21 23:24:52 +09:00
technical Merge branch 'mw/doc-typofixes' into maint 2018-11-21 22:58:03 +09:00
.gitattributes
.gitignore add a script to diff rendered documentation 2018-08-06 12:30:23 -07:00
asciidoc.conf
asciidoctor-extensions.rb Documentation: implement linkgit macro for Asciidoctor 2017-01-31 12:18:18 -08:00
blame-options.txt Merge branch 'bc/blame-doc-fix' 2017-02-24 10:48:08 -08:00
build-docdep.perl
cat-texi.perl Documentation: remove unneeded argument in cat-texi.perl 2017-01-23 10:56:47 -08:00
cmd-list.perl
CodingGuidelines Merge branch 'jc/how-to-document-api' 2018-10-19 13:34:05 +09:00
config.txt Merge branch 'nd/packobjectshook-doc-fix' into maint 2018-11-21 22:58:01 +09:00
date-formats.txt Merge branch 'lr/doc-fix-cet' into maint 2017-01-17 15:19:08 -08:00
diff-format.txt Documentation: improve description for core.quotePath 2017-03-02 11:40:51 -08:00
diff-generate-patch.txt Documentation: improve description for core.quotePath 2017-03-02 11:40:51 -08:00
diff-options.txt log -G: ignore binary files 2018-12-26 14:59:37 -08:00
doc-diff Merge branch 'es/worktree-forced-ops-fix' 2018-09-17 13:53:59 -07:00
docbook-xsl.css
docbook.xsl
everyday.txto Documentation: fix linkgit references 2016-05-09 15:44:14 -07:00
fetch-options.txt Merge branch 'ab/fetch-tags-noclobber' 2018-09-17 13:54:00 -07:00
fix-texi.perl
git-add.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-am.txt Merge branch 'nd/rebase-show-current-patch' 2018-03-06 14:54:02 -08:00
git-annotate.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-apply.txt Merge branch 'nd/diff-apply-ita' 2018-06-25 13:22:36 -07:00
git-archimport.txt git-archimport.1: specify what kind of Arch we're talking about 2018-09-21 09:28:58 -07:00
git-archive.txt
git-bisect-lk2009.txt doc: fix ASCII art tab spacing 2018-10-23 12:23:09 +09:00
git-bisect.txt Merge branch 'ak/bisect-doc-typofix' 2018-04-25 13:28:56 +09:00
git-blame.txt diff: --indent-heuristic is no longer experimental 2017-11-02 14:51:24 +09:00
git-branch.txt Merge branch 'jk/branch-l-1-repurpose' 2018-09-17 13:53:50 -07:00
git-bundle.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-cat-file.txt cat-file: support "unordered" output for --batch-all-objects 2018-08-13 13:48:31 -07:00
git-check-attr.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-ignore.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-mailmap.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-check-ref-format.txt Doc/check-ref-format: clarify information about @{-N} syntax 2017-12-19 10:00:45 -08:00
git-checkout-index.txt
git-checkout.txt doc: fix ASCII art tab spacing 2018-10-23 12:23:09 +09:00
git-cherry-pick.txt Merge branch 'mm/doc-tt' into maint 2016-07-28 11:25:54 -07:00
git-cherry.txt
git-citool.txt
git-clean.txt doc: typeset short command-line options as literal 2016-06-28 08:20:52 -07:00
git-clone.txt Merge branch 'nd/doc-header' 2018-05-23 14:38:22 +09:00
git-column.txt git-column.1: clarify initial description, provide examples 2018-09-21 09:32:15 -07:00
git-commit-graph.txt Doc: refer to the "commit-graph file" with dash 2018-09-27 15:29:12 -07:00
git-commit-tree.txt Merge branch 'mm/doc-tt' into maint 2016-07-28 11:25:54 -07:00
git-commit.txt commit: add support for --fixup <commit> -m"<extra message>" 2017-12-22 13:10:24 -08:00
git-config.txt worktree: add per-worktree config files 2018-10-22 13:17:04 +09:00
git-count-objects.txt count-objects: report alternates via verbose mode 2016-10-10 13:52:37 -07:00
git-credential-cache--daemon.txt
git-credential-cache.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-credential-store.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-credential.txt
git-cvsexportcommit.txt
git-cvsimport.txt Merge branch 'jk/doc-cvs-update' into maint 2016-10-03 13:22:25 -07:00
git-cvsserver.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-daemon.txt daemon: add --log-destination=(stderr|syslog|none) 2018-02-05 10:30:44 -08:00
git-describe.txt git-describe.1: clarify that "human readable" is also git-readable 2018-09-21 09:32:21 -07:00
git-diff-files.txt
git-diff-index.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-diff-tree.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-diff.txt doc: fix a typo and clarify a sentence 2018-10-11 09:39:15 +09:00
git-difftool.txt Document the --no-gui option in difftool 2017-02-08 13:30:28 -08:00
git-fast-export.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-fast-import.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-fetch-pack.txt Documentation: fix several one-character-off spelling errors 2018-04-09 14:15:02 +09:00
git-fetch.txt fetch: make the --prune-tags work with <url> 2018-02-09 13:10:13 -08:00
git-filter-branch.txt Merge branch 'nd/doc-header' 2018-05-23 14:38:22 +09:00
git-fmt-merge-msg.txt config.txt: move fmt-merge-msg-config.txt to config/ 2018-10-29 10:17:01 +09:00
git-for-each-ref.txt Merge branch 'jk/ui-color-always-to-auto' 2018-08-15 15:08:19 -07:00
git-format-patch.txt format-patch: allow --range-diff to apply to a lone-patch 2018-08-14 14:27:05 -07:00
git-fsck-objects.txt
git-fsck.txt fsck: verify commit-graph 2018-06-27 10:29:10 -07:00
git-gc.txt gc doc: mention the commit-graph in the intro 2018-10-11 15:40:27 +09:00
git-get-tar-commit-id.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-grep.txt grep: add -r/--[no-]recursive 2018-10-03 21:25:57 -07:00
git-gui.txt doc: git-gui browser does not default to HEAD 2017-01-13 12:23:28 -08:00
git-hash-object.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-help.txt Merge branch 'rv/alias-help' 2018-10-26 14:22:13 +09:00
git-http-backend.txt doc: more consistency in environment variables format 2016-06-08 12:04:37 -07:00
git-http-fetch.txt http-fetch: make -a standard behaviour 2018-04-24 10:55:02 +09:00
git-http-push.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-imap-send.txt git-imap-send.txt: move imap.* to a separate file 2018-10-29 10:17:02 +09:00
git-index-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-init-db.txt
git-init.txt init: document dotfiles exclusion on template copy 2017-02-17 15:57:21 -08:00
git-instaweb.txt doc: change configuration variables format 2016-06-08 12:04:55 -07:00
git-interpret-trailers.txt Merge branch 'jk/trailer-fixes' into maint 2018-11-21 22:57:42 +09:00
git-log.txt git-[short]log.txt: unify quoted standalone -- 2018-04-18 12:49:26 +09:00
git-ls-files.txt Merge branch 'ah/misc-doc-updates' 2018-05-23 14:38:23 +09:00
git-ls-remote.txt Merge branch 'bw/server-options' 2018-05-23 14:38:15 +09:00
git-ls-tree.txt Documentation: improve description for core.quotePath 2017-03-02 11:40:51 -08:00
git-mailinfo.txt Merge branch 'va/mailinfo-doc-typofix' into maint 2016-05-26 13:17:14 -07:00
git-mailsplit.txt mailsplit: support unescaping mboxrd messages 2016-06-06 11:14:43 -07:00
git-merge-base.txt doc: fix ASCII art tab spacing 2018-10-23 12:23:09 +09:00
git-merge-file.txt merge-file: clamp exit code to maximum 127 2015-10-29 12:10:23 -07:00
git-merge-index.txt
git-merge-one-file.txt
git-merge-tree.txt
git-merge.txt config.txt: move merge-config.txt to config/ 2018-10-29 10:17:03 +09:00
git-mergetool--lib.txt
git-mergetool.txt mergetool: accept -g/--[no-]gui as arguments 2018-10-25 14:01:10 +09:00
git-mktag.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-mktree.txt Documentation: normalize spelling of 'normalised' 2018-04-09 14:15:07 +09:00
git-multi-pack-index.txt multi-pack-index: add 'verify' verb 2018-09-17 13:49:38 -07:00
git-mv.txt doc: typeset short command-line options as literal 2016-06-28 08:20:52 -07:00
git-name-rev.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-notes.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-p4.txt git-p4: fully support unshelving changelists 2018-10-16 13:28:49 +09:00
git-pack-objects.txt pack-objects: add delta-islands support 2018-08-16 10:51:17 -07:00
git-pack-redundant.txt
git-pack-refs.txt
git-parse-remote.txt
git-patch-id.txt doc: remove unsupported parameter from patch-id 2017-07-28 14:41:32 -07:00
git-prune-packed.txt
git-prune.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-pull.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-push.txt push doc: add spacing between two words 2018-09-19 12:43:50 -07:00
git-quiltimport.txt doc: more consistency in environment variables format 2016-06-08 12:04:37 -07:00
git-range-diff.txt range-diff doc: add a section about output stability 2018-11-12 12:05:38 +09:00
git-read-tree.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-rebase.txt Merge branch 'nd/config-split' 2018-11-13 22:37:16 +09:00
git-receive-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-reflog.txt reflog expire: cover reflog from all worktrees 2018-10-22 13:32:54 +09:00
git-remote-ext.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-remote-fd.txt Spelling fixes 2016-08-11 14:35:42 -07:00
git-remote-helpers.txto
git-remote-testgit.txt
git-remote.txt Merge branch 'nd/remote-update-doc' 2018-06-04 21:39:49 +09:00
git-repack.txt Merge branch 'cc/delta-islands' 2018-09-17 13:53:55 -07:00
git-replace.txt replace: introduce --convert-graft-file 2018-04-30 11:12:30 +09:00
git-request-pull.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-rerere.txt rerere: add note about files with existing conflict markers 2018-08-29 09:03:29 -07:00
git-reset.txt reset: add new reset.quiet config setting 2018-10-24 11:57:07 +09:00
git-rev-list.txt rev-list: add list-objects filtering support 2017-11-22 14:11:57 +09:00
git-rev-parse.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-revert.txt doc: typeset long command-line options as literal 2016-06-28 08:36:45 -07:00
git-rm.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
git-send-email.txt Merge branch 'jw/send-email-no-auth' 2018-11-06 15:50:20 +09:00
git-send-pack.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-sh-i18n--envsubst.txt
git-sh-i18n.txt
git-sh-setup.txt doc: more consistency in environment variables format 2016-06-08 12:04:37 -07:00
git-shell.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-shortlog.txt git-[short]log.txt: unify quoted standalone -- 2018-04-18 12:49:26 +09:00
git-show-branch.txt doc: fix small typo in git show-branch 2018-10-18 12:26:51 +09:00
git-show-index.txt show-index: update documentation for index v2 2018-05-29 00:28:22 +09:00
git-show-ref.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-show.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-stage.txt
git-stash.txt git-stash.txt: remove extra square bracket 2018-03-27 19:09:13 -07:00
git-status.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-stripspace.txt usage: do not insist that standard input must come from a file 2015-10-16 15:27:52 -07:00
git-submodule.txt Merge branch 'pc/submodule-helper-foreach' 2018-06-25 13:22:35 -07:00
git-svn.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
git-symbolic-ref.txt
git-tag.txt doc: fix descripion for 'git tag --format' 2018-10-23 12:23:09 +09:00
git-tools.txt doc: replace or.cz gitwiki link with git.wiki.kernel.org 2017-04-20 22:05:37 -07:00
git-unpack-file.txt
git-unpack-objects.txt unpack-objects: add --max-input-size=<size> option 2016-08-24 12:31:05 -07:00
git-update-index.txt Merge branch 'jc/update-index-doc' 2018-08-20 11:33:50 -07:00
git-update-ref.txt Merge branch 'ah/doc-updates' into maint 2018-11-21 22:58:07 +09:00
git-update-server-info.txt Documentation: use [verse] for SYNOPSIS sections 2011-07-06 14:26:26 -07:00
git-upload-archive.txt Documentation: match underline with the text 2015-10-22 10:16:12 -07:00
git-upload-pack.txt doc: fix inappropriate monospace formatting 2018-10-23 12:23:09 +09:00
git-var.txt doc: keep first level section header in upper case 2018-05-02 17:03:33 +09:00
git-verify-commit.txt Merge branch 'dn/gpg-doc' into maint 2016-07-06 13:06:36 -07:00
git-verify-pack.txt
git-verify-tag.txt builtin/verify-tag: add --format to verify-tag 2017-01-17 16:10:22 -08:00
git-web--browse.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
git-whatchanged.txt
git-worktree.txt Merge branch 'nd/per-worktree-ref-iteration' 2018-11-13 22:37:26 +09:00
git-write-tree.txt
git.txt Merge branch 'ah/doc-updates' into maint 2018-11-21 22:58:07 +09:00
gitattributes.txt doc: fix inappropriate monospace formatting 2018-10-23 12:23:09 +09:00
gitcli.txt Use proper syntax for replaceables in command docs 2018-05-25 17:16:47 +09:00
gitcore-tutorial.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
gitcredentials.txt doc: clarify gitcredentials path component matching 2018-09-27 15:24:50 -07:00
gitcvs-migration.txt Merge branch 'sb/doc-unify-bottom' 2017-02-15 12:54:20 -08:00
gitdiffcore.txt log -G: ignore binary files 2018-12-26 14:59:37 -08:00
giteveryday.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
gitglossary.txt Documentation: unify bottom "part of git suite" lines 2017-02-09 15:14:01 -08:00
githooks.txt git-p4: add the p4-pre-submit hook 2018-08-01 13:37:18 -07:00
gitignore.txt Merge branch 'nd/wildmatch-double-asterisk' 2018-11-13 22:37:19 +09:00
gitk.txt doc: convert [\--] to [--] 2018-04-18 12:49:26 +09:00
gitmodules.txt doc: fix inappropriate monospace formatting 2018-10-23 12:23:09 +09:00
gitnamespaces.txt doc: mention transfer data leaks in more places 2016-11-14 11:23:07 -08:00
gitremote-helpers.txt Merge branch 'bw/protocol-v2' 2018-05-08 15:59:16 +09:00
gitrepository-layout.txt doc: move extensions.worktreeConfig to the right place 2018-11-16 14:10:31 +09:00
gitrevisions.txt push doc: correct lies about how push refspecs work 2018-08-31 14:04:06 -07:00
gitsubmodules.txt doc: fix inappropriate monospace formatting 2018-10-23 12:23:09 +09:00
gittutorial-2.txt Documentation: unify bottom "part of git suite" lines 2017-02-09 15:14:01 -08:00
gittutorial.txt treewide: correct several "up-to-date" to "up to date" 2017-08-23 12:17:22 -07:00
gitweb.conf.txt doc: fix indentation of listing blocks in gitweb.conf.txt 2018-10-23 12:23:09 +09:00
gitweb.txt doc: use https links to Wikipedia to avoid http redirects 2017-05-15 13:04:54 +09:00
gitworkflows.txt Merge branch 'km/doc-workflows-typofix' 2018-06-18 10:18:42 -07:00
glossary-content.txt Documentation: spelling and grammar fixes 2018-06-22 14:26:23 -07:00
howto-index.sh
i18n.txt doc: camelCase the i18n config variables to improve readability 2017-07-17 15:11:26 -07:00
install-doc-quick.sh install-doc-quick: allow specifying what ref to install 2017-12-12 16:49:40 -08:00
install-webdoc.sh
line-range-format.txt
lint-gitlink.perl ci: validate "linkgit:" in documentation 2016-05-10 11:15:04 -07:00
mailmap.txt
Makefile Merge branch 'ts/doc-build-manpage-xsl-quietly' into maint 2018-11-21 22:57:53 +09:00
manpage-1.72.xsl
manpage-base-url.xsl.in
manpage-base.xsl
manpage-bold-literal.xsl
manpage-normal.xsl
manpage-quote-apos.xsl
manpage-suppress-sp.xsl
merge-options.txt merge: allow fast-forward when merging a tracked tag 2018-02-16 11:22:43 -08:00
merge-strategies.txt merge: add merge.renames config setting 2018-05-08 16:19:41 +09:00
pretty-formats.txt gpg-interface.c: obtain primary key fingerprint as well 2018-10-23 08:00:43 +09:00
pretty-options.txt Merge branch 'tr/doc-tt' into maint 2016-07-06 13:06:34 -07:00
pull-fetch-param.txt fetch doc: correct grammar in --force docs 2018-09-20 09:40:03 -07:00
rev-list-options.txt Merge branch 'md/exclude-promisor-objects-fix' 2018-11-06 15:50:21 +09:00
revisions.txt Merge branch 'wc/find-commit-with-pattern-on-detached-head' 2018-07-24 14:50:49 -07:00
sequencer.txt
SubmittingPatches SubmittingPatches: mention doc-diff 2018-08-21 12:54:33 -07:00
texi.xsl Documentation: add XSLT to fix DocBook for Texinfo 2017-01-23 10:56:53 -08:00
transfer-data-leaks.txt doc: mention transfer data leaks in more places 2016-11-14 11:23:07 -08:00
urls-remotes.txt Documentation: match underline with the text 2015-10-22 10:16:12 -07:00
urls.txt transport: drop support for git-over-rsync 2016-02-01 13:07:41 -08:00
user-manual.conf
user-manual.txt checkout: describe_detached_head: remove ellipsis after committish 2017-12-06 07:32:40 -08:00