Commit Graph

31 Commits

Author SHA1 Message Date
Junio C Hamano
dd4048d1c7 Merge branch 'en/ort-perf-batch-8'
Rename detection rework continues.

* en/ort-perf-batch-8:
  diffcore-rename: compute dir_rename_guess from dir_rename_counts
  diffcore-rename: limit dir_rename_counts computation to relevant dirs
  diffcore-rename: compute dir_rename_counts in stages
  diffcore-rename: extend cleanup_dir_rename_info()
  diffcore-rename: move dir_rename_counts into dir_rename_info struct
  diffcore-rename: add function for clearing dir_rename_count
  Move computation of dir_rename_count from merge-ort to diffcore-rename
  diffcore-rename: add a mapping of destination names to their indices
  diffcore-rename: provide basic implementation of idx_possible_rename()
  diffcore-rename: use directory rename guided basename comparisons
2021-03-22 14:00:24 -07:00
Junio C Hamano
12bd17521c Merge branch 'en/diffcore-rename'
Performance optimization work on the rename detection continues.

* en/diffcore-rename:
  merge-ort: call diffcore_rename() directly
  gitdiffcore doc: mention new preliminary step for rename detection
  diffcore-rename: guide inexact rename detection based on basenames
  diffcore-rename: complete find_basename_matches()
  diffcore-rename: compute basenames of source and dest candidates
  t4001: add a test comparing basename similarity and content similarity
  diffcore-rename: filter rename_src list when possible
  diffcore-rename: no point trying to find a match better than exact
2021-03-01 14:02:56 -08:00
Elijah Newren
37a2514364 diffcore-rename: use directory rename guided basename comparisons
A previous commit noted that it is very common for people to move files
across directories while keeping their filename the same.  The last few
commits took advantage of this and showed that we can accelerate rename
detection significantly using basenames; since files with the same
basename serve as likely rename candidates, we can check those first and
remove them from the rename candidate pool if they are sufficiently
similar.

Unfortunately, the previous optimization was limited by the fact that
the remaining basenames after exact rename detection are not always
unique.  Many repositories have hundreds of build files with the same
name (e.g. Makefile, .gitignore, build.gradle, etc.), and may even have
hundreds of source files with the same name.  (For example, the linux
kernel has 100 setup.c, 87 irq.c, and 112 core.c files.  A repository at
$DAYJOB has a lot of ObjectFactory.java and Plugin.java files).

For these files with non-unique basenames, we are faced with the task of
attempting to determine or guess which directory they may have been
relocated to.  Such a task is precisely the job of directory rename
detection.  However, there are two catches: (1) the directory rename
detection code has traditionally been part of the merge machinery rather
than diffcore-rename.c, and (2) directory rename detection currently
runs after regular rename detection is complete.  The 1st catch is just
an implementation issue that can be overcome by some code shuffling.
The 2nd requires us to add a further approximation: we only have access
to exact renames at this point, so we need to do directory rename
detection based on just exact renames.  In some cases we won't have
exact renames, in which case this extra optimization won't apply.  We
also choose to not apply the optimization unless we know that the
underlying directory was removed, which will require extra data to be
passed in to diffcore_rename_extended().  Also, even if we get a
prediction about which directory a file may have relocated to, we will
still need to check to see if there is a file in the predicted
directory, and then compare the two files to see if they meet the higher
min_basename_score threshold required for marking the two files as
renames.

This commit introduces an idx_possible_rename() function which will
do this directory rename detection for us and give us the index within
rename_dst of the resulting filename.  For now, this function is
hardcoded to return -1 (not found) and just hooks up how its results
would be used once we have a more complete implementation in place.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-26 17:53:11 -08:00
Junio C Hamano
1eb4136ac2 diff: --{rotate,skip}-to=<path>
In the implementation of "git difftool", there is a case where the
user wants to start viewing the diffs at a specific path and
continue on to the rest, optionally wrapping around to the
beginning.  Since it is somewhat cumbersome to implement such a
feature as a post-processing step of "git diff" output, let's
support it internally with two new options.

 - "git diff --rotate-to=C", when the resulting patch would show
   paths A B C D E without the option, would "rotate" the paths to
   shows patch to C D E A B instead.  It is an error when there is
   no patch for C is shown.

 - "git diff --skip-to=C" would instead "skip" the paths before C,
   and shows patch to C D E.  Again, it is an error when there is no
   patch for C is shown.

 - "git log [-p]" also accepts these two options, but it is not an
   error if there is no change to the specified path.  Instead, the
   set of output paths are rotated or skipped to the specified path
   or the first path that sorts after the specified path.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 09:30:42 -08:00
Elijah Newren
07c9a7fcb5 gitdiffcore doc: mention new preliminary step for rename detection
The last few patches have introduced a new preliminary step when rename
detection is on but both break detection and copy detection are off.
Document this new step.  While we're at it, add a testcase that checks
the new behavior as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15 18:02:16 -08:00
Thomas Braun
e0e7cb8080 log -G: ignore binary files
The -G<regex> option of log looks for the differences whose patch text
contains added/removed lines that match regex.

Currently -G looks also into patches of binary files (which
according to [1]) is binary as well.

This has a couple of issues:

- It makes the pickaxe search slow. In a proprietary repository of the
  author with only ~5500 commits and a total .git size of ~300MB
  searching takes ~13 seconds

    $time git log -Gwave > /dev/null

    real    0m13,241s
    user    0m12,596s
    sys     0m0,644s

  whereas when we ignore binary files with this patch it takes ~4s

    $time ~/devel/git/git log -Gwave > /dev/null

    real    0m3,713s
    user    0m3,608s
    sys     0m0,105s

  which is a speedup of more than fourfold.

- The internally used algorithm for generating patch text is based on
  xdiff and its states in [1]

  > The output format of the binary patch file is proprietary
  > (and binary) and it is basically a collection of copy and insert
  > commands [..]

  which means that the current format could change once the internal
  algorithm is changed as the format is not standardized. In addition
  the git binary patch format used for preparing patches for git apply
  is *different* from the xdiff format as can be seen by comparing

  git log -p -a

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor <author@example.com>
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84..edfeb6f 100644
    --- a/data.bin
    +++ b/data.bin
    @@ -1,2 +1,4 @@
     a
     a^@a
    +a
    +a^@a

  with git log --binary

    commit 6e95bf4bafccf14650d02ab57f3affe669be10cf
    Author: A U Thor <author@example.com>
    Date:   Thu Apr 7 15:14:13 2005 -0700

        modify binary file

    diff --git a/data.bin b/data.bin
    index f414c84bd3aa25fa07836bb1fb73db784635e24b..edfeb6f501[..]
    GIT binary patch
    literal 12
    QcmYe~N@Pgn0zx1O01)N^ZvX%Q

    literal 6
    NcmYe~N@Pgn0ssWg0XP5v

  which seems unexpected.

To resolve these issues this patch makes -G<regex> ignore binary files
by default. Textconv filters are supported and also -a/--text for
getting the old and broken behaviour back.

The -S<block of text> option of log looks for differences that changes
the number of occurrences of the specified block of text (i.e.
addition/deletion) in a file. As we want to keep the current behaviour,
add a test to ensure it stays that way.

[1]: http://www.xmailserver.org/xdiff.html

Signed-off-by: Thomas Braun <thomas.braun@virtuell-zuhause.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-26 14:59:37 -08:00
Patrick Steinhardt
b803ae4427 docs/diffcore: unquote "Complete Rewrites" in headers
The gitdiffcore documentation quotes the term "Complete Rewrites" in
headers for no real gain. This would make sense if the term could be
easily confused if not properly grouped together. But actually, the term
is quite obvious and thus does not really need any quoting, especially
regarding that it is not used anywhere else.

But more importanly, this brings up a bug when rendering man pages: when
trying to render quotes inside of a section header, we end up with
quotes which have been misaligned to the end of line. E.g.

    diffcore-break: For Splitting Up Complete Rewrites
    --------------------------------------------------

renders as

    DIFFCORE-BREAK: FOR SPLITTING UP  COMPLETE REWRITES""

, which is obviously wrong. While this is fixable for the man pages by
using double-quotes (e.g. ""COMPLETE REWRITES""), this again breaks it
for our generated HTML pages.

So fix the issue by simply dropping quotes inside of section headers,
which is currently only done for the term "Complete Rewrites".

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-28 11:34:38 -08:00
Patrick Steinhardt
1aa38199af docs/diffcore: fix grammar in diffcore-rename header
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-28 11:34:36 -08:00
Junio C Hamano
153a33f98c Merge branch 'sb/doc-unify-bottom'
Doc clean-up.

* sb/doc-unify-bottom:
  Documentation: unify bottom "part of git suite" lines
2017-02-15 12:54:20 -08:00
Stefan Beller
941b9c5270 Documentation: unify bottom "part of git suite" lines
We currently have 168 man pages that mention they are part of Git, you
can check yourself easily via:
  $ git grep "Part of the linkgit:git\[1\] suite" |wc -l
  168
However some have a trailing period, i.e.
  $ git grep "Part of the linkgit:git\[1\] suite." |wc -l
  8

Unify the bottom line in all man pages to not end with a period.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-09 15:14:01 -08:00
Matthieu Moy
bcf9626a71 doc: typeset long command-line options as literal
Similarly to the previous commit, use backquotes instead of
forward-quotes, for long options.

This was obtained with:

  perl -pi -e "s/'(--[a-z][a-z=<>-]*)'/\`\$1\`/g" *.txt

and manual tweak to remove false positive in ascii-art (o'--o'--o' to
describe rewritten history).

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28 08:36:45 -07:00
Jeff King
1c262bb7b2 doc: convert \--option to --option
Older versions of AsciiDoc would convert the "--" in
"--option" into an emdash. According to 565e135
(Documentation: quote double-dash for AsciiDoc, 2011-06-29),
this is fixed in AsciiDoc 8.3.0. According to bf17126, we
don't support anything older than 8.4.1 anyway, so we no
longer need to worry about quoting.

Even though this does not change the output at all, there
are a few good reasons to drop the quoting:

  1. It makes the source prettier to read.

  2. We don't quote consistently, which may be confusing when
     reading the source.

  3. Asciidoctor does not like the quoting, and renders a
     literal backslash.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-12 22:14:46 -07:00
Ramkumar Ramachandra
5bc3f0b567 diffcore-pickaxe doc: document -S and -G properly
The documentation of -S and -G is very sketchy.  Completely rewrite the
sections in Documentation/diff-options.txt and
Documentation/gitdiffcore.txt.

References:
52e9578 ([PATCH] Introducing software archaeologist's tool "pickaxe".)
f506b8e (git log/diff: add -G<regexp> that greps in the patch text)

Inputs-from: Phil Hord <phil.hord@gmail.com>
Co-authored-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:53:11 -07:00
Thomas Ackermann
d5fa1f1a69 The name of the hash function is "SHA-1", not "SHA1"
Use "SHA-1" instead of "SHA1" whenever we talk about the hash function.
When used as a programming symbol, we keep "SHA1".

Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 11:08:37 -07:00
Thomas Ackermann
2de9b71138 Documentation: the name of the system is 'Git', not 'git'
Signed-off-by: Thomas Ackermann <th.acker@arcor.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-01 13:53:33 -08:00
Jeff King
6cf378f0cb docs: stop using asciidoc no-inline-literal
In asciidoc 7, backticks like `foo` produced a typographic
effect, but did not otherwise affect the syntax. In asciidoc
8, backticks introduce an "inline literal" inside which markup
is not interpreted. To keep compatibility with existing
documents, asciidoc 8 has a "no-inline-literal" attribute to
keep the old behavior. We enabled this so that the
documentation could be built on either version.

It has been several years now, and asciidoc 7 is no longer
in wide use. We can now decide whether or not we want
inline literals on their own merits, which are:

  1. The source is much easier to read when the literal
     contains punctuation. You can use `master~1` instead
     of `master{tilde}1`.

  2. They are less error-prone. Because of point (1), we
     tend to make mistakes and forget the extra layer of
     quoting.

This patch removes the no-inline-literal attribute from the
Makefile and converts every use of backticks in the
documentation to an inline literal (they must be cleaned up,
or the example above would literally show "{tilde}" in the
output).

Problematic sites were found by grepping for '`.*[{\\]' and
examined and fixed manually. The results were then verified
by comparing the output of "html2text" on the set of
generated html pages. Doing so revealed that in addition to
making the source more readable, this patch fixes several
formatting bugs:

  - HTML rendering used the ellipsis character instead of
    literal "..." in code examples (like "git log A...B")

  - some code examples used the right-arrow character
    instead of '->' because they failed to quote

  - api-config.txt did not quote tilde, and the resulting
    HTML contained a bogus snippet like:

      <tt><sub></tt> foo <tt></sub>bar</tt>

    which caused some parsers to choke and omit whole
    sections of the page.

  - git-commit.txt confused ``foo`` (backticks inside a
    literal) with ``foo'' (matched double-quotes)

  - mentions of `A U Thor <author@example.com>` used to
    erroneously auto-generate a mailto footnote for
    author@example.com

  - the description of --word-diff=plain incorrectly showed
    the output as "[-removed-] and {added}", not "{+added+}".

  - using "prime" notation like:

      commit `C` and its replacement `C'`

    confused asciidoc into thinking that everything between
    the first backtick and the final apostrophe were meant
    to be inside matched quotes

  - asciidoc got confused by the escaping of some of our
    asterisks. In particular,

      `credential.\*` and `credential.<url>.\*`

    properly escaped the asterisk in the first case, but
    literally passed through the backslash in the second
    case.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-26 13:19:06 -07:00
Martin von Zweigbergk
7791a1d9b9 Documentation: use [verse] for SYNOPSIS sections
The SYNOPSIS sections of most commands that span several lines already
use [verse] to retain line breaks. Most commands that don't span
several lines seem not to use [verse]. In the HTML output, [verse]
does not only preserve line breaks, but also makes the section
indented, which causes a slight inconsistency between commands that
use [verse] and those that don't. Use [verse] in all SYNOPSIS sections
for consistency.

Also remove the blank lines from git-fetch.txt and git-rebase.txt to
align with the other man pages. In the case of git-rebase.txt, which
already uses [verse], the blank line makes the [verse] not apply to
the last line, so removing the blank line also makes the formatting
within the document more consistent.

While at it, add single quotes to 'git cvsimport' for consistency with
other commands.

Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-06 14:26:26 -07:00
Junio C Hamano
a2c2cef0cd gitdiffcore doc: update pickaxe description
The old text described the original design (one side does not have it at
all while the other side has it); this was later amended to check if the
number of occurrences changed, which is what we currently do with -S.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-31 14:28:20 -07:00
Michael J Gruber
d07ef71575 Documentation/gitdiffcore: fix order in pickaxe description
Reverse the order of "origin" and "result" so that the sentence
really describes an addition rather than a removal.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-18 21:55:11 -07:00
Junio C Hamano
d16a5dafdc Merge branch 'maint-1.6.6' into maint
* maint-1.6.6:
  Documentation/git-clone: Transform description list into item list
  Documentation/urls: Remove spurious example markers
  Documentation/gitdiffcore: Remove misleading date in heading
  Documentation/git-reflog: Fix formatting of command lists
2010-03-21 17:00:22 -07:00
Michael J Gruber
dddfb3f126 Documentation/gitdiffcore: Remove misleading date in heading
Ever since the automatic conversion into man form, the heading
contained a misidentified subheading reading "June 2005".
Remove this since the documentation is more recent, and the correct
date is in the footer.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-21 14:40:18 -07:00
Thomas Rast
0b444cdb19 Documentation: spell 'git cmd' without dash throughout
The documentation was quite inconsistent when spelling 'git cmd' if it
only refers to the program, not to some specific invocation syntax:
both 'git-cmd' and 'git cmd' spellings exist.

The current trend goes towards dashless forms, and there is precedent
in 647ac70 (git-svn.txt: stop using dash-form of commands.,
2009-07-07) to actively eliminate the dashed variants.

Replace 'git-cmd' with 'git cmd' throughout, except where git-shell,
git-cvsserver, git-upload-pack, git-receive-pack, and
git-upload-archive are concerned, because those really live in the
$PATH.
2010-01-10 13:01:28 +01:00
Yann Dirson
264e0b9a3c Bust the ghost of long-defunct diffcore-pathspec.
This concept was retired by 77882f6 (Retire diffcore-pathspec.,
2006-04-10), more than 2 years ago.

Signed-off-by: Yann Dirson <ydirson@altern.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-09-19 19:48:30 -07:00
Jonathan Nieder
2fd02c92db manpages: italicize nongit command names (if they are in teletype font)
Some manual pages use teletype font to set command names. We
change them to use italics, instead.  This creates a visual
distinction between names of commands and command lines that
can be typed at the command line. It is also more consistent
with other man pages outside Git.

In this patch, the commands named are non-git commands like bash.

Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05 11:24:40 -07:00
Jonathan Nieder
ba020ef5eb manpages: italicize git command names (which were in teletype font)
The names of git commands are not meant to be entered at the
commandline; they are just names. So we render them in italics,
as is usual for command names in manpages.

Using

	doit () {
	  perl -e 'for (<>) { s/\`(git-[^\`.]*)\`/'\''\1'\''/g; print }'
	}
	for i in git*.txt config.txt diff*.txt blame*.txt fetch*.txt i18n.txt \
	        merge*.txt pretty*.txt pull*.txt rev*.txt urls*.txt
	do
	  doit <"$i" >"$i+" && mv "$i+" "$i"
	done
	git diff

.

Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05 11:24:40 -07:00
Jonathan Nieder
0979c10649 manpages: italicize command names
This includes nongit commands like RCS 'merge'.  This patch only
italicizes names of commands if they had no formatting before.

Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05 11:24:39 -07:00
Jonathan Nieder
877276d4d3 gitdiffcore(7): fix awkward wording
The phrase "diff outputs" sounds awkward to my ear (I think
"output" is meant to be used as a substantive noun.)

Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-05 11:24:39 -07:00
Jonathan Nieder
483bc4f045 Documentation formatting and cleanup
Following what appears to be the predominant style, format
names of commands and commandlines both as `teletype text`.

While we're at it, add articles ("a" and "the") in some
places, italicize the name of the command in the manual page
synopsis line, and add a comma or two where it seems appropriate.

Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-01 17:20:16 -07:00
Jonathan Nieder
b1889c36d8 Documentation: be consistent about "git-" versus "git "
Since the git-* commands are not installed in $(bindir), using
"git-command <parameters>" in examples in the documentation is
not a good idea. On the other hand, it is nice to be able to
refer to each command using one hyphenated word. (There is no
escaping it, anyway: man page names cannot have spaces in them.)

This patch retains the dash in naming an operation, command,
program, process, or action. Complete command lines that can
be entered at a shell (i.e., without options omitted) are
made to use the dashless form.

The changes consist only of replacing some spaces with hyphens
and vice versa. After a "s/ /-/g", the unpatched and patched
versions are identical.

Signed-off-by: Jonathan Nieder <jrnieder@uchicago.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-01 17:20:15 -07:00
Christian Couder
9e1f0a85c6 documentation: move git(7) to git(1)
As the "git" man page describes the "git" command at the end-user
level, it seems better to move it to man section 1.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-06 11:18:28 -07:00
Christian Couder
30eba7bf2c documentation: convert "diffcore" and "repository-layout" to man pages
This patch renames the following documents and at the same time converts
them to the man format:

diffcore.txt          -> gitdiffcore.txt		(man section 7)
repository-layout.txt -> gitrepository-layout.txt	(man section 5)

Other documents that reference the above ones are changed accordingly.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-06 11:14:52 -07:00