Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".
Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.
It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
}
if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;
The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
$initial_reply_to =~ s/(^\s+|\s+$)//g;
}
+if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.
The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.
Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.
This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.
Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:
1. It measures the following characteristics of a proposed splitting
position in a `struct split_measurement`:
* the number of blank lines above the proposed split
* whether the line directly after the split is blank
* the number of blank lines following that line
* the indentation of the nearest non-blank line above the split
* the indentation of the line directly below the split
* the indentation of the nearest non-blank line after that line
2. It combines the measured attributes using a bunch of
empirically-optimized weighting factors to derive a `struct
split_score` that measures the "badness" of splitting the text at
that position.
3. It combines the `split_score` for the top and the bottom of the
slider at each of its possible positions, and selects the position
that has the best `split_score`.
I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:
| repository | count | Git 2.9.0 | compaction | compaction-fixed | indent-1 | indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking | 109 | 89 (81.7%) | 37 (33.9%) | 37 (33.9%) | 2 (1.8%) | 2 (1.8%) |
| alamofire | 30 | 18 (60.0%) | 14 (46.7%) | 15 (50.0%) | 0 (0.0%) | 0 (0.0%) |
| angular | 184 | 127 (69.0%) | 39 (21.2%) | 23 (12.5%) | 5 (2.7%) | 5 (2.7%) |
| animate | 313 | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) |
| ant | 380 | 356 (93.7%) | 152 (40.0%) | 148 (38.9%) | 15 (3.9%) | 15 (3.9%) | *
| bugzilla | 306 | 263 (85.9%) | 109 (35.6%) | 99 (32.4%) | 14 (4.6%) | 15 (4.9%) | *
| corefx | 126 | 91 (72.2%) | 22 (17.5%) | 21 (16.7%) | 6 (4.8%) | 6 (4.8%) |
| couchdb | 78 | 44 (56.4%) | 26 (33.3%) | 28 (35.9%) | 6 (7.7%) | 6 (7.7%) | *
| cpython | 937 | 158 (16.9%) | 50 (5.3%) | 49 (5.2%) | 5 (0.5%) | 5 (0.5%) | *
| discourse | 160 | 95 (59.4%) | 42 (26.2%) | 36 (22.5%) | 18 (11.2%) | 13 (8.1%) |
| docker | 307 | 194 (63.2%) | 198 (64.5%) | 253 (82.4%) | 8 (2.6%) | 8 (2.6%) | *
| electron | 163 | 132 (81.0%) | 38 (23.3%) | 39 (23.9%) | 6 (3.7%) | 6 (3.7%) |
| git | 536 | 470 (87.7%) | 73 (13.6%) | 78 (14.6%) | 16 (3.0%) | 16 (3.0%) | *
| gitflow | 127 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| ionic | 133 | 89 (66.9%) | 29 (21.8%) | 38 (28.6%) | 1 (0.8%) | 1 (0.8%) |
| ipython | 482 | 362 (75.1%) | 167 (34.6%) | 169 (35.1%) | 11 (2.3%) | 11 (2.3%) | *
| junit | 161 | 147 (91.3%) | 67 (41.6%) | 66 (41.0%) | 1 (0.6%) | 1 (0.6%) | *
| lighttable | 15 | 5 (33.3%) | 0 (0.0%) | 2 (13.3%) | 0 (0.0%) | 0 (0.0%) |
| magit | 88 | 75 (85.2%) | 11 (12.5%) | 9 (10.2%) | 1 (1.1%) | 0 (0.0%) |
| neural-style | 28 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| nodejs | 781 | 649 (83.1%) | 118 (15.1%) | 111 (14.2%) | 4 (0.5%) | 5 (0.6%) | *
| phpmyadmin | 491 | 481 (98.0%) | 75 (15.3%) | 48 (9.8%) | 2 (0.4%) | 2 (0.4%) | *
| react-native | 168 | 130 (77.4%) | 79 (47.0%) | 81 (48.2%) | 0 (0.0%) | 0 (0.0%) |
| rust | 171 | 128 (74.9%) | 30 (17.5%) | 27 (15.8%) | 16 (9.4%) | 14 (8.2%) |
| spark | 186 | 149 (80.1%) | 52 (28.0%) | 52 (28.0%) | 2 (1.1%) | 2 (1.1%) |
| tensorflow | 115 | 66 (57.4%) | 48 (41.7%) | 48 (41.7%) | 5 (4.3%) | 5 (4.3%) |
| test-more | 19 | 15 (78.9%) | 2 (10.5%) | 2 (10.5%) | 1 (5.3%) | 1 (5.3%) | *
| test-unit | 51 | 34 (66.7%) | 14 (27.5%) | 8 (15.7%) | 2 (3.9%) | 2 (3.9%) | *
| xmonad | 23 | 22 (95.7%) | 2 (8.7%) | 2 (8.7%) | 1 (4.3%) | 1 (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals | 6668 | 4391 (65.9%) | 1496 (22.4%) | 1491 (22.4%) | 150 (2.2%) | 144 (2.2%) |
| totals (training set) | 4552 | 3195 (70.2%) | 1053 (23.1%) | 1061 (23.3%) | 86 (1.9%) | 88 (1.9%) |
| totals (test set) | 2116 | 1196 (56.5%) | 443 (20.9%) | 430 (20.3%) | 64 (3.0%) | 56 (2.6%) |
In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are
* "repository" - the name of the repository used. I used the diffs
between successive non-merge commits on the HEAD branch of the
corresponding repository.
* "count" - the number of sliders that were human-rated. I chose most,
but not all, sliders to rate from those among which the various
algorithms gave different answers.
* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.
* "compaction" - the heuristic used by `git diff --compaction-heuristic`
in Git 2.9.0.
* "compaction-fixed" - the heuristic used by `git diff
--compaction-heuristic` after the fixes from earlier in this patch
series. Note that the results are not dramatically different than
those for "compaction". Both produce non-ideal diffs only about 1/3 as
often as the default `git diff`.
* "indent-1" - the new `--indent-heuristic` algorithm, using the first
set of weighting factors, determined as described above.
* "indent-2" - the new `--indent-heuristic` algorithm, using the final
set of weighting factors, determined as described below.
* `*` - indicates that repo was part of training set used to determine
the first set of weighting factors.
The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.
The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)
The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].
This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.
[1] https://github.com/mhagger/diff-slider-tools
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
HTTP transport gained an option to produce more detailed debugging
trace.
* ep/http-curl-trace:
imap-send.c: introduce the GIT_TRACE_CURL enviroment variable
http.c: implement the GIT_TRACE_CURL environment variable
The documentation tries to consistently spell "GPG"; when
referring to the specific program name, "gpg" is used.
* dn/gpg-doc:
Documentation: GPG capitalization
The documentation set has been updated so that literal commands,
configuration variables and environment variables are consistently
typeset in fixed-width font and bold in manpages.
* tr/doc-tt:
doc: change configuration variables format
doc: more consistency in environment variables format
doc: change environment variables format
doc: clearer rule about formatting literals
"git worktree add" learned that '-' can be used as a short-hand for
"@{-1}", the previous branch.
* jg/dash-is-last-branch-in-worktree-add:
worktree: allow "-" short-hand for @{-1} in add command
An upstream project can make a recommendation to shallowly clone
some submodules in the .gitmodules file it ships.
* sb/submodule-recommend-shallowness:
submodule update: learn `--[no-]recommend-shallow` option
submodule-config: keep shallow recommendation around
"git fast-import" learned the same performance trick to avoid
creating too small a packfile as "git fetch" and "git push" have,
using *.unpackLimit configuration.
* ew/fast-import-unpack-limit:
fast-import: invalidate pack_id references after loosening
fast-import: implement unpack limit
When "GPG" is used in a sentence it is now consistently capitalized.
When referring to the binary it is left as "gpg".
Signed-off-by: David Nicolson <david.nicolson@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add example usage to the git-svn documentation.
Reported-by: Joseph Pecoraro <pecoraro@apple.com>
Signed-off-by: Alfred Perlstein <alfred@freebsd.org>
Reviewed-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was changed in 10a6cc8 (fetch --prune: Run prune before
fetching, 2014-01-02), but it seems that nobody in that
discussion realized we were advertising the "after"
explicitly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It turns out that the earlier effort to update the heuristics may
want to use a bit more time to mature. Turn it off by default.
* jk/diff-compact-heuristic:
diff: disable compaction heuristic for now
http://lkml.kernel.org/g/20160610075043.GA13411@sigill.intra.peff.net
reports that a change to add a new "function" with common ending
with the existing one at the end of the file is shown like this:
def foo
do_foo_stuff()
+ common_ending()
+end
+
+def bar
+ do_bar_stuff()
+
common_ending()
end
when the new heuristic is in use. In reality, the change is to add
the blank line before "def bar" and everything below, which is what
the code without the new heuristic shows.
Disable the heuristics by default, and resurrect the documentation
for the option and the configuration variables, while clearly
marking the feature as still experimental.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This change configuration variables that where in italic style
to monospace font according to the guideline. It was obtained with
grep '[[:alpha:]]*\.[[:alpha:]]*::$' config.txt | \
sed -e 's/::$//' -e 's/\./\\\\./' | \
xargs -iP perl -pi -e "s/\'P\'/\`P\`/g" ./*.txt
Signed-off-by: Tom Russello <tom.russello@grenoble-inp.org>
Signed-off-by: Erwan Mathoniere <erwan.mathoniere@grenoble-inp.org>
Signed-off-by: Samuel Groot <samuel.groot@grenoble-inp.org>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wrap with backticks (monospaced font) unwrapped or single-quotes wrapped
(italic type) environment variables which are followed by the word
"environment". It was obtained with:
perl -pi -e "s/\'?(\\\$?[0-9A-Z\_]+)\'?(?= environment ?)/\`\1\`/g" *.txt
One of the main purposes is to stick to the CodingGuidelines as possible so
that people writting new documentation by mimicking the existing are more likely
to have it right (even if they didn't read the CodingGuidelines).
Signed-off-by: Tom Russello <tom.russello@grenoble-inp.org>
Signed-off-by: Erwan Mathoniere <erwan.mathoniere@grenoble-inp.org>
Signed-off-by: Samuel Groot <samuel.groot@grenoble-inp.org>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This change GIT_* variables that where in italic style to monospaced font
according to the guideline. It was obtained with
perl -pi -e "s/\'(GIT_.*?)\'/\`\1\`/g" *.txt
One of the main purposes is to stick to the CodingGuidelines as possible so
that people writting new documentation by mimicking the existing are more likely
to have it right (even if they didn't read the CodingGuidelines).
Signed-off-by: Tom Russello <tom.russello@grenoble-inp.org>
Signed-off-by: Erwan Mathoniere <erwan.mathoniere@grenoble-inp.org>
Signed-off-by: Samuel Groot <samuel.groot@grenoble-inp.org>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the guideline text that we want for our documentation clearer.
Signed-off-by: Tom Russello <tom.russello@grenoble-inp.org>
Signed-off-by: Erwan Mathoniere <erwan.mathoniere@grenoble-inp.org>
Signed-off-by: Samuel Groot <samuel.groot@grenoble-inp.org>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A couple of bugs around core.autocrlf have been fixed.
* tb/core-eol-fix:
convert.c: ident + core.autocrlf didn't work
t0027: test cases for combined attributes
convert: allow core.autocrlf=input and core.eol=crlf
t0027: make commit_chk_wrnNNO() reliable
CSS is widely used, motivating it being included as a built-in pattern.
It must be noted that the word_regex for CSS (i.e. the regex defining
what is a word in the language) does not consider '.' and '#' characters
(in CSS selectors) to be part of the word. This behavior is documented
by the test t/t4018/css-rule.
The logic behind this behavior is the following: identifiers in CSS
selectors are identifiers in a HTML/XML document. Therefore, the '.'/'#'
character are not part of the identifier, but an indicator of the nature
of the identifier in HTML/XML (class or id). Diffing ".class1" and
".class2" must show that the class name is changed, but we still are
selecting a class.
Logic behind the "pattern" regex is:
1. reject lines ending with a colon/semicolon (properties)
2. if a line begins with a name in column 1, pick the whole line
Credits to Johannes Sixt (j6t@kdbg.org) for the pattern regex and most
of the tests.
Signed-off-by: William Duclot <william.duclot@ensimag.grenoble-inp.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The manpage output of our documentation did not render well in
terminal; typeset literals in bold by default to make them stand
out more.
* em/man-bold-literal:
Documentation: bold literals in man
"git cherry-pick --help" had three instances of word "behavior",
one of which was spelled "behaviour", which is updated to match the
other two.
* pa/cherry-pick-doc-typo:
git-cherry-pick.txt: correct a small typo
Correct faulty recommendation to use "git submodule deinit ." when
de-initialising all submodules, which would result in a strange
error message in a pathological corner case.
* sb/submodule-deinit-all:
submodule deinit: require '--all' instead of '.' for all submodules
"http.cookieFile" configuration variable clearly wants a pathname,
but we forgot to treat it as such by e.g. applying tilde expansion.
* bn/http-cookiefile-config:
http: expand http.cookieFile as a path
Documentation: config: improve word ordering for http.cookieFile
Since `git worktree add` uses `git checkout` when `[<branch>]` is used,
and `git checkout -` is already supported, it makes sense to allow the
same shortcut in `git worktree add`.
Signed-off-by: Jordan DE GEA <jordan.de-gea@grenoble-inp.org>
Signed-off-by: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Backticks are emphasized through monospaced styling in the HTML
version of Git documentation. But they were left unstyled in the
manual pages.
To make the man pages more comfortably read, `MAN_BOLD_LITERAL` was
added by 5121a6d (Documentation: option to render literal text as
bold for manpages, 2009-03-27). It allowed the user to build the
manpages with literals in bold style.
For precaution it was not set by default back then.
Since 79c461d (docs: default to more modern toolset, 2010-11-19), it
is assumed ASCIIDOC 8 and at least docbook-xsl 1.73 are used, so the
need for compatibility concern is much lessor now.
Remove `MAN_BOLD_LITERAL`, and typeset literals as bold by default .
Add `NO_MAN_BOLD_LITERAL`, a new Makefile option, disabling this
feature when defined.
Signed-off-by: Erwan MATHONIERE <erwan.mathoniere@grenoble-inp.org>
Signed-off-by: Samuel GROOT <samuel.groot@grenoble-inp.org>
Signed-off-by: Tom RUSSELLO <tom.russello@grenoble-inp.org>
Signed-off-by: Matthieu MOY <matthieu.moy@grenoble-inp.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the parse-options API rather than a hand-rolled option parser.
Description for --stateless-rpc and --advertise-refs come from
42526b4 (Add stateless RPC options to upload-pack,
receive-pack, 2009-10-30).
Signed-off-by: Antoine Queru <antoine.queru@grenoble-inp.org>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We forgot to add "git log --decorate=auto" to documentation when we
added the feature back in v2.1.0 timeframe.
* rj/log-decorate-auto:
log: document the --decorate=auto option
Most of the document mentions `behavior` instead of the British
variation, `behaviour`. This change makes it consistent.
Signed-off-by: Pablo Santiago Blum de Aguiar <scorphus@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For those who use two-factor authentication with gmail, git-send-email
will not work unless it is setup with an app-specific password. The
example for setting up git-send-email for use with gmail will now
include information on generating and storing the app-specific password.
Signed-off-by: Michael Rappazzo <rappazzo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>