git-commit-vandalism/Documentation/config/merge.txt
Elijah Newren 94b82d5686 rename: bump limit defaults yet again
These were last bumped in commit 92c57e5c1d (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo && cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" <../sample
         echo tag: $1
        ) >$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-15 16:54:34 -07:00

119 lines
5.2 KiB
Plaintext

merge.conflictStyle::
Specify the style in which conflicted hunks are written out to
working tree files upon merge. The default is "merge", which
shows a `<<<<<<<` conflict marker, changes made by one side,
a `=======` marker, changes made by the other side, and then
a `>>>>>>>` marker. An alternate style, "diff3", adds a `|||||||`
marker and the original text before the `=======` marker.
merge.defaultToUpstream::
If merge is called without any commit argument, merge the upstream
branches configured for the current branch by using their last
observed values stored in their remote-tracking branches.
The values of the `branch.<current branch>.merge` that name the
branches at the remote named by `branch.<current branch>.remote`
are consulted, and then they are mapped via `remote.<remote>.fetch`
to their corresponding remote-tracking branches, and the tips of
these tracking branches are merged. Defaults to true.
merge.ff::
By default, Git does not create an extra merge commit when merging
a commit that is a descendant of the current commit. Instead, the
tip of the current branch is fast-forwarded. When set to `false`,
this variable tells Git to create an extra merge commit in such
a case (equivalent to giving the `--no-ff` option from the command
line). When set to `only`, only such fast-forward merges are
allowed (equivalent to giving the `--ff-only` option from the
command line).
merge.verifySignatures::
If true, this is equivalent to the --verify-signatures command
line option. See linkgit:git-merge[1] for details.
include::fmt-merge-msg.txt[]
merge.renameLimit::
The number of files to consider in the exhaustive portion of
rename detection during a merge. If not specified, defaults
to the value of diff.renameLimit. If neither
merge.renameLimit nor diff.renameLimit are specified,
currently defaults to 7000. This setting has no effect if
rename detection is turned off.
merge.renames::
Whether Git detects renames. If set to "false", rename detection
is disabled. If set to "true", basic rename detection is enabled.
Defaults to the value of diff.renames.
merge.directoryRenames::
Whether Git detects directory renames, affecting what happens at
merge time to new files added to a directory on one side of
history when that directory was renamed on the other side of
history. If merge.directoryRenames is set to "false", directory
rename detection is disabled, meaning that such new files will be
left behind in the old directory. If set to "true", directory
rename detection is enabled, meaning that such new files will be
moved into the new directory. If set to "conflict", a conflict
will be reported for such paths. If merge.renames is false,
merge.directoryRenames is ignored and treated as false. Defaults
to "conflict".
merge.renormalize::
Tell Git that canonical representation of files in the
repository has changed over time (e.g. earlier commits record
text files with CRLF line endings, but recent ones use LF line
endings). In such a repository, Git can convert the data
recorded in commits to a canonical form before performing a
merge to reduce unnecessary conflicts. For more information,
see section "Merging branches with differing checkin/checkout
attributes" in linkgit:gitattributes[5].
merge.stat::
Whether to print the diffstat between ORIG_HEAD and the merge result
at the end of the merge. True by default.
merge.autoStash::
When set to true, automatically create a temporary stash entry
before the operation begins, and apply it after the operation
ends. This means that you can run merge on a dirty worktree.
However, use with care: the final stash application after a
successful merge might result in non-trivial conflicts.
This option can be overridden by the `--no-autostash` and
`--autostash` options of linkgit:git-merge[1].
Defaults to false.
merge.tool::
Controls which merge tool is used by linkgit:git-mergetool[1].
The list below shows the valid built-in values.
Any other value is treated as a custom merge tool and requires
that a corresponding mergetool.<tool>.cmd variable is defined.
merge.guitool::
Controls which merge tool is used by linkgit:git-mergetool[1] when the
-g/--gui flag is specified. The list below shows the valid built-in values.
Any other value is treated as a custom merge tool and requires that a
corresponding mergetool.<guitool>.cmd variable is defined.
include::../mergetools-merge.txt[]
merge.verbosity::
Controls the amount of output shown by the recursive merge
strategy. Level 0 outputs nothing except a final error
message if conflicts were detected. Level 1 outputs only
conflicts, 2 outputs conflicts and file changes. Level 5 and
above outputs debugging information. The default is level 2.
Can be overridden by the `GIT_MERGE_VERBOSITY` environment variable.
merge.<driver>.name::
Defines a human-readable name for a custom low-level
merge driver. See linkgit:gitattributes[5] for details.
merge.<driver>.driver::
Defines the command that implements a custom low-level
merge driver. See linkgit:gitattributes[5] for details.
merge.<driver>.recursive::
Names a low-level merge driver to be used when
performing an internal merge between common ancestors.
See linkgit:gitattributes[5] for details.