Commit Graph

12 Commits

Author SHA1 Message Date
Elijah Newren
94b82d5686 rename: bump limit defaults yet again
These were last bumped in commit 92c57e5c1d (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo && cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" <../sample
         echo tag: $1
        ) >$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-15 16:54:34 -07:00
Elijah Newren
6623a528e0 doc: clarify documentation for rename/copy limits
A few places in the docs implied that rename/copy detection is always
quadratic or that all (unpaired) files were involved in the quadratic
portion of rename/copy detection.  The following two commits each
introduced an exception to this:

    9027f53cb5 (Do linear-time/space rename logic for exact renames,
                  2007-10-25)
    bd24aa2f97 (diffcore-rename: guide inexact rename detection based
                  on basenames, 2021-02-14)

(As a side note, for copy detection, the basename guided inexact rename
detection is turned off and the exact renames will only result in
sources (without the dests) being removed from the set of files used in
quadratic detection.  So, for copy detection, the documentation was
closer to correct.)

Avoid implying that all files involved in rename/copy detection are
subject to the full quadratic algorithm.  While at it, also note the
default values for all these settings.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-15 16:54:24 -07:00
Sangeeta Jain
8ef9312464 diff: do not show submodule with untracked files as "-dirty"
Git diff reports a submodule directory as -dirty even when there are
only untracked files in the submodule directory. This is inconsistent
with what `git describe --dirty` says when run in the submodule
directory in that state.

Make `--ignore-submodules=untracked` the default for `git diff` when
there is no configuration variable or command line option, so that the
command would not give '-dirty' suffix to a submodule whose working
tree has untracked files, to make it consistent with `git
describe --dirty` that is run in the submodule working tree.

And also make `--ignore-submodules=none` the default for `git status`
so that the user doesn't end up deleting a submodule that has
uncommitted (untracked) files.

Signed-off-by: Sangeeta Jain <sangunb09@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-08 14:27:35 -08:00
Laurent Arnoud
c28ded83fc diff: add config option relative
The `diff.relative` boolean option set to `true` shows only changes in
the current directory/value specified by the `path` argument of the
`relative` option and shows pathnames relative to the aforementioned
directory.

Teach `--no-relative` to override earlier `--relative`

Add for git-format-patch(1) options documentation `--relative` and
`--no-relative`

Signed-off-by: Laurent Arnoud <laurent@spkdev.net>
Acked-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-24 16:23:59 -07:00
Junio C Hamano
e1151704f2 Merge branch 'sg/diff-indent-heuristic-non-experimental'
We promoted the "indent heuristics" that decides where to split
diff hunks from experimental to the default a few years ago, but
some stale documentation still marked it as experimental, which has
been corrected.

* sg/diff-indent-heuristic-non-experimental:
  diff: 'diff.indentHeuristic' is no longer experimental
2019-09-09 12:26:36 -07:00
SZEDER Gábor
64e5e1fba1 diff: 'diff.indentHeuristic' is no longer experimental
The indent heuristic started out as experimental, but it's now our
default diff heuristic since 33de716387 (diff: enable indent heuristic
by default, 2017-05-08).  Alas, that commit didn't update the
documentation, and the description of the 'diff.indentHeuristic'
configuration variable still implies that it's experimental and not
the default.

Update the description of 'diff.indentHeuristic' to make it clear that
it's the default diff heuristic.

The description of the related '--indent-heuristic' option has already
been updated in bab76141da (diff: --indent-heuristic is no
longer experimental, 2017-10-29).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-15 09:58:47 -07:00
Junio C Hamano
f496b064fc Merge branch 'nd/switch-and-restore'
Two new commands "git switch" and "git restore" are introduced to
split "checking out a branch to work on advancing its history" and
"checking out paths out of the index and/or a tree-ish to work on
advancing the current history" out of the single "git checkout"
command.

* nd/switch-and-restore: (46 commits)
  completion: disable dwim on "git switch -d"
  switch: allow to switch in the middle of bisect
  t2027: use test_must_be_empty
  Declare both git-switch and git-restore experimental
  help: move git-diff and git-reset to different groups
  doc: promote "git restore"
  user-manual.txt: prefer 'merge --abort' over 'reset --hard'
  completion: support restore
  t: add tests for restore
  restore: support --patch
  restore: replace --force with --ignore-unmerged
  restore: default to --source=HEAD when only --staged is specified
  restore: reject invalid combinations with --staged
  restore: add --worktree and --staged
  checkout: factor out worktree checkout code
  restore: disable overlay mode by default
  restore: make pathspec mandatory
  restore: take tree-ish from --source option instead
  checkout: split part of it to new command 'restore'
  doc: promote "git switch"
  ...
2019-07-09 15:25:44 -07:00
Nguyễn Thái Ngọc Duy
d787d311db checkout: split part of it to new command 'switch'
"git checkout" doing too many things is a source of confusion for many
users (and it even bites old timers sometimes). To remedy that, the
command will be split into two new ones: switch and restore. The good
old "git checkout" command is still here and will be until all (or most
of users) are sick of it.

See the new man page for the final design of switch. The actual
implementation though is still pretty much the same as "git checkout"
and not completely aligned with the man page. Following patches will
adjust their behavior to match the man page.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-02 13:56:59 +09:00
Martin Ågren
8d75a1d183 Documentation: turn middle-of-line tabs into spaces
These tabs happen to appear in columns where they don't stand out too
much, so the diff here is non-obvious. Some of these are rendered
differently by AsciiDoc and Asciidoctor (although the difference might
be invisible!), which is how I found a few of them. The remainder were
found using `git grep "[a-zA-Z.,)]$TAB[a-zA-Z]"`.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07 09:25:32 +09:00
Martin Ågren
ef39d22fe6 config/diff.txt: drop spurious backtick
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07 09:25:32 +09:00
Junio C Hamano
fd7761a1cd Merge branch 'nd/config-split'
Split the overly large Documentation/config.txt file into million
little pieces.  This potentially allows each individual piece
included into the manual page of the command it affects more easily.

* nd/config-split: (81 commits)
  config.txt: remove config/dummy.txt
  config.txt: move worktree.* to a separate file
  config.txt: move web.* to a separate file
  config.txt: move versionsort.* to a separate file
  config.txt: move user.* to a separate file
  config.txt: move url.* to a separate file
  config.txt: move uploadpack.* to a separate file
  config.txt: move uploadarchive.* to a separate file
  config.txt: move transfer.* to a separate file
  config.txt: move tag.* to a separate file
  config.txt: move submodule.* to a separate file
  config.txt: move stash.* to a separate file
  config.txt: move status.* to a separate file
  config.txt: move splitIndex.* to a separate file
  config.txt: move showBranch.* to a separate file
  config.txt: move sequencer.* to a separate file
  config.txt: move sendemail-config.txt to config/
  config.txt: move reset.* to a separate file
  config.txt: move rerere.* to a separate file
  config.txt: move repack.* to a separate file
  ...
2018-11-13 22:37:16 +09:00
Nguyễn Thái Ngọc Duy
fa922d74c5 config.txt: move diff-config.txt to config/
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-29 10:17:01 +09:00