Commit Graph

62254 Commits

Author SHA1 Message Date
Derrick Stolee
3677773371 commit-reach: use heuristic in remove_redundant()
Reachability algorithms in commit-reach.c frequently benefit from using
the first-parent history as a heuristic for satisfying reachability
queries. The most obvious example was implemented in 4fbcca4e
(commit-reach: make can_all_from_reach... linear, 2018-07-20).

Update the walk in remove_redundant() to use this same heuristic. Here,
we are walking starting at the parents of the input commits. Sort those
parents and walk from the highest generation to lower. Each time, use
the heuristic of searching the first parent history before continuing to
expand the walk.

The order in which we explore the commits matters, so update
compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with
corrected commit dates computed, but it will assist when the commits are
in the region "above" the commit-graph with "infinite" generation
number. Note that we cannot shift to use
compare_commits_by_gen_then_commit_date as the method prototype is
different. We use compare_commits_by_gen for QSORT() as opposed to as a
priority function.

The important piece is to ensure we short-circuit the walk when we find
that there is a single non-redundant commit. This happens frequently
when looking for merge-bases or comparing several tags with 'git
merge-base --independent'. Use a new count 'count_still_independent' and
if that hits 1 we can stop walking.

To update 'count_still_independent' properly, we add use of the RESULT
flag on the input commits. Then we can detect when we reach one of these
commits and decrease the count. We need to remove the RESULT flag at
that moment because we might re-visit that commit when popping the
stack.

We use the STALE flag to mark parents that have been added to the new
walk_start list, but we need to clear that flag before we start walking
so those flags don't halt our depth-first-search walk.

On my copy of the Linux kernel repository, the performance of 'git
merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
seconds.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:34:34 -08:00
Derrick Stolee
c8d693e1e6 commit-reach: move compare_commits_by_gen
Move this earlier in the file so it can be used by more methods.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:34:34 -08:00
Derrick Stolee
fbc21e3fbb commit-reach: use one walk in remove_redundant()
The current implementation of remove_redundant() uses several calls to
paint_down_to_common() to determine that commits are independent of each
other. This leads to quadratic behavior when many inputs are passed to
commands such as 'git merge-base'.

For example, in the Linux kernel repository, I tested the performance
by passing all tags:

 git merge-base --independent $(git for-each-ref refs/tags --format="$(refname)")

(Note: I had to delete the tags v2.6.11-tree and v2.6.11 as they do
not point to commits.)

Here is the performance improvement introduced by this change:

 Before: 16.4s
  After:  1.1s

This performance improvement requires the commit-graph file to be
present. We keep the old algorithm around as remove_redundant_no_gen()
and use it when generation_numbers_enabled() is false. This is similar
to other algorithms within commit-reach.c. The new algorithm is
implemented in remove_redundant_with_gen().

The basic approach is to do one commit walk instead of many. First, scan
all commits in the list and mark their _parents_ with the STALE flag.
This flag will indicate commits that are reachable from one of the
inputs, except not including themselves. Then, walk commits until
covering all commits up to the minimum generation number pushing the
STALE flag throughout.

At the end, we need to clear the STALE bit from all of the commits
we walked. We move the non-stale commits in 'array' to the beginning of
the list, and this might overwrite stale commits. However, we store an
array of commits that started the walk, and use clear_commit_marks() on
each of those starting commits. That method will walk the reachable
commits with the STALE bit and clear them all. This makes the algorithm
safe for re-entry or for other uses of those commits after this walk.

This logic is covered by tests in t6600-test-reach.sh, so the behavior
does not change. This is tested both in the case with a commit-graph and
without.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:34:34 -08:00
Christian Walther
3a837b58e3 doc: mention bigFileThreshold for packing
Knowing about the core.bigFileThreshold configuration variable is
helpful when examining pack file size differences between repositories.
Add a reference to it to the manpages a user is likely to read in this
situation.

Capitalize CONFIGURATION for consistency with other pages having such a
section.

Signed-off-by: Christian Walther <cwalther@gmx.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-22 13:18:30 -08:00
Pratyush Yadav
b1056f60b6 Merge branch 'py/commit-comments'
Use git-stripspace to remove comment lines from the commit message. Also
use it to clean up whitespace instead of rolling our own logic.

* py/commit-comments:
  git-gui: remove lines starting with the comment character
2021-02-22 20:19:53 +05:30
Junio C Hamano
1b5b8cf072 Documentation: typofix --column description
f4ed0af6 (Merge branch 'nd/columns', 2012-05-03) brought in three
cut-and-pasted copies of malformatted descriptions.  Let's fix them
all the same way by marking the configuration variable names up as
monospace just like the command line option `--column` is typeset.

While we are at it, correct a missing space after the full stop that
ends the sentence.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-19 19:36:47 -08:00
Martin Ågren
f89f46b704 gitmailmap.txt: fix rendering of e-mail addresses
Both AsciiDoc and Asciidoctor are eager to pick up the e-mail addresses
in this document and turn them into references at the bottom of the
manpage / clickable links. We don't really need that for these dummy
addresses. Spell "@" as "&#64;" to make them not do this. In the open
block, we can instead avoid this by indenting the contents, similar to
the earlier blocks.

Fix a backtick which should have been a single quote mark. With all the
quoting that is going on around here, this mistake trips up the parsing
and rendering quite a bit.

Before this commit, we have the same failure mode with AsciiDoc 8.6.10
and Asciidoctor 1.5.5, and this change makes both of them happy.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 10:53:33 -08:00
Martin Ågren
83171ede22 git.txt: fix monospace rendering
When we write `<name>`s with the "s" tucked on to the closing backtick,
we end up rendering the backticks literally. Rephrase this sentence
slightly to render this as monospace.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-18 10:53:33 -08:00
Pratyush Yadav
b9a43869c9 git-gui: remove lines starting with the comment character
The comment character is specified by the config variable
'core.commentchar'. Any lines starting with this character is considered
a comment and should not be included in the final commit message.

Teach git-gui to filter out lines in the commit message that start with
the comment character using git-stripspace. If the config is not set,
'#' is taken as the default. Also add a message educating users about
the comment character.

Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
2021-02-18 23:35:57 +05:30
Junio C Hamano
2283e0e9af The ninth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17 17:21:43 -08:00
Junio C Hamano
483e09e810 Merge branch 'ak/config-bad-bool-error'
The error message given when a configuration variable that is
expected to have a boolean value has been improved.

* ak/config-bad-bool-error:
  config: improve error message for boolean config
2021-02-17 17:21:43 -08:00
Junio C Hamano
e68f62be8d Merge branch 'js/reflog-expire-stale-fix'
"git reflog expire --stale-fix" can be used to repair the reflog by
removing entries that refer to objects that have been pruned away,
but was not careful to tolerate missing objects.

* js/reflog-expire-stale-fix:
  reflog expire --stale-fix: be generous about missing objects
2021-02-17 17:21:43 -08:00
Junio C Hamano
726b11d68a Merge branch 'js/commit-graph-warning'
When certain features (e.g. grafts) used in the repository are
incompatible with the use of the commit-graph, we used to silently
turned commit-graph off; we now tell the user what we are doing.

* js/commit-graph-warning:
  commit-graph: when incompatible with graphs, indicate why
2021-02-17 17:21:42 -08:00
Junio C Hamano
e9b4c483c7 Merge branch 'ew/rev-parse-since-test'
Test to make sure "git rev-parse one-thing one-thing" gives
the same thing twice (when one-thing is --since=X).

* ew/rev-parse-since-test:
  t1500: ensure current --since= behavior remains
2021-02-17 17:21:42 -08:00
Junio C Hamano
d494433d26 Merge branch 'ds/maintenance-pack-refs'
"git maintenance" tool learned a new "pack-refs" maintenance task.

* ds/maintenance-pack-refs:
  maintenance: incremental strategy runs pack-refs weekly
  maintenance: add pack-refs task
2021-02-17 17:21:42 -08:00
Junio C Hamano
fdf3a27ca9 Merge branch 'jx/t5411-unique-filenames'
Avoid individual tests in t5411 from getting affected by each other
by forcing them to use separate output files during the test.

* jx/t5411-unique-filenames:
  t5411: refactor check of refs using test_cmp_refs
  t5411: use different out file to prevent overwriting
2021-02-17 17:21:42 -08:00
Junio C Hamano
9e634a91c8 Merge branch 'js/fsck-name-objects-fix'
Fix "git fsck --name-objects" which apparently has not been used by
anybody who is motivated enough to report breakage.

* js/fsck-name-objects-fix:
  fsck --name-objects: be more careful parsing generation numbers
  t1450: robustify `remove_object()`
2021-02-17 17:21:42 -08:00
Junio C Hamano
9bdccbcda7 Merge branch 'jk/mailmap-only-at-root'
The .mailmap is documented to be read only from the root level of a
working tree, but a stray file in a bare repository also was read
by accident, which has been corrected.

* jk/mailmap-only-at-root:
  mailmap: only look for .mailmap in work tree
2021-02-17 17:21:42 -08:00
Junio C Hamano
f712632a51 Merge branch 'mt/grep-cached-untracked'
"git grep --untracked" is meant to be "let's ALSO find in these
files on the filesystem" when looking for matches in the working
tree files, and does not make any sense if the primary search is
done against the index, or the tree objects.  The "--cached" and
"--untracked" options have been marked as mutually incompatible.

* mt/grep-cached-untracked:
  grep: error out if --untracked is used with --cached
2021-02-17 17:21:41 -08:00
Junio C Hamano
78a26cb720 Merge branch 'sh/mergetool-hideresolved'
"git mergetool" feeds three versions (base, local and remote) of
a conflicted path unmodified.  The command learned to optionally
prepare these files with unconflicted parts already resolved.

* sh/mergetool-hideresolved:
  mergetool: add per-tool support and overrides for the hideResolved flag
  mergetool: break setup_tool out into separate initialization function
  mergetool: add hideResolved configuration
2021-02-17 17:21:41 -08:00
Junio C Hamano
aa2d3dbdf5 Merge branch 'jt/trace2-BUG'
Even though invocations of "die()" were logged to the trace2
system, "BUG()"s were not, which has been corrected.

* jt/trace2-BUG:
  usage: trace2 BUG() invocations
2021-02-17 17:21:41 -08:00
Junio C Hamano
dadc91ff0c Merge branch 'js/range-diff-one-side-only'
The "git range-diff" command learned "--(left|right)-only" option
to show only one side of the compared range.

* js/range-diff-one-side-only:
  range-diff: offer --left-only/--right-only options
  range-diff: move the diffopt initialization down one layer
  range-diff: combine all options in a single data structure
  range-diff: simplify code spawning `git log`
  range-diff: libify the read_patches() function again
  range-diff: avoid leaking memory in two error code paths
2021-02-17 17:21:41 -08:00
Junio C Hamano
77348b0e6e Merge branch 'js/range-diff-wo-dotdot'
There are other ways than ".." for a single token to denote a
"commit range", namely "<rev>^!" and "<rev>^-<n>", but "git
range-diff" did not understand them.

* js/range-diff-wo-dotdot:
  range-diff(docs): explain how to specify commit ranges
  range-diff/format-patch: handle commit ranges other than A..B
  range-diff/format-patch: refactor check for commit range
2021-02-17 17:21:41 -08:00
Junio C Hamano
69571dfe21 Merge branch 'jt/clone-unborn-head'
"git clone" tries to locally check out the branch pointed at by
HEAD of the remote repository after it is done, but the protocol
did not convey the information necessary to do so when copying an
empty repository.  The protocol v2 learned how to do so.

* jt/clone-unborn-head:
  clone: respect remote unborn HEAD
  connect, transport: encapsulate arg in struct
  ls-refs: report unborn targets of symrefs
2021-02-17 17:21:40 -08:00
Junio C Hamano
0871fb9af5 Merge branch 'mr/bisect-in-c-4'
Piecemeal of rewrite of "git bisect" in C continues.

* mr/bisect-in-c-4:
  bisect--helper: retire `--check-and-set-terms` subcommand
  bisect--helper: reimplement `bisect_skip` shell function in C
  bisect--helper: retire `--bisect-auto-next` subcommand
  bisect--helper: use `res` instead of return in BISECT_RESET case option
  bisect--helper: retire `--bisect-write` subcommand
  bisect--helper: reimplement `bisect_replay` shell function in C
  bisect--helper: reimplement `bisect_log` shell function in C
2021-02-17 17:21:40 -08:00
Junio C Hamano
5bd0b21bf7 Merge branch 'ds/commit-graph-genno-fix'
Fix incremental update of commit-graph file around corrected commit
date data.

* ds/commit-graph-genno-fix:
  commit-graph: prepare commit graph
  commit-graph: be extra careful about mixed generations
  commit-graph: compute generations separately
  commit-graph: validate layers for generation data
  commit-graph: always parse before commit_graph_data_at()
  commit-graph: use repo_parse_commit
2021-02-17 17:21:40 -08:00
Junio C Hamano
8b4701ae4f Merge branch 'ak/corrected-commit-date'
The commit-graph learned to use corrected commit dates instead of
the generation number to help topological revision traversal.

* ak/corrected-commit-date:
  doc: add corrected commit date info
  commit-reach: use corrected commit dates in paint_down_to_common()
  commit-graph: use generation v2 only if entire chain does
  commit-graph: implement generation data chunk
  commit-graph: implement corrected commit date
  commit-graph: return 64-bit generation number
  commit-graph: add a slab to store topological levels
  t6600-test-reach: generalize *_three_modes
  commit-graph: consolidate fill_commit_graph_info
  revision: parse parent in indegree_walk_step()
  commit-graph: fix regression when computing Bloom filters
2021-02-17 17:21:40 -08:00
Joey Salazar
9d336655ba doc: fix naming of response-end-pkt
Git Protocol version 2[1] defines 0002 as a Message Packet that indicates
the end of a response for stateless connections.

Change the naming of the 0002 Packet to 'Response End' to match the
parsing introduced in Wireshark's MR !1922 for consistency. A subsequent
MR in Wireshark will address additional mismatches.

[1] kernel.org/pub/software/scm/git/docs/technical/protocol-v2.html
[2] gitlab.com/wireshark/wireshark/-/merge_requests/1922

Signed-off-by: Joey Salazar <jgsal@protonmail.com>
Reviewed-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17 16:30:43 -08:00
Jeff King
a1db097e10 docs/rev-list: add some examples of --disk-usage
It's not immediately obvious why --disk-usage might be a useful thing.
These examples show off a few of the real-world cases I've used it for.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17 16:25:29 -08:00
Jeff King
669b458755 docs/rev-list: add an examples section
We currently don't show any examples of using git-rev-list at all. Let's
add some pretty elementary examples. They likely seem obvious to anybody
who has worked with the tool for a while, but my purpose here is
two-fold:

  - they may be enlightening to people who haven't used the tool a lot
    to give a general flavor of how it is meant to be used

  - they can serve as a starting point for adding more interesting
    examples (we can do that without the basic ones, of course, but I
    think it makes sense to show off the building blocks)

This set is far from exhaustive, but again, the purpose is to be a
starting point for further additions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17 16:22:13 -08:00
Martin Ågren
452d26448d rev-list-options.txt: fix rendering of bonus paragraph
In git-log(1) -- but not in git-shortlog(1) or git-rev-list(1) -- we
include a bonus paragraph in the description of `--first-parent`. But
we forgot to add a lone "+" for a list continuation, and we shouldn't
be indenting this second paragraph. As a result, we get a different
indentation and the `backticks` render literally.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17 13:16:11 -08:00
Rafael Silva
8e16effe97 blame: remove unnecessary use of get_commit_info()
When `git blame --color-by-age`, the determine_line_heat() is called to
select how to color the output based on the commit's author date.  It
uses the get_commit_info() to parse the information into a `commit_info`
structure, however, this is actually unnecessary because the
determine_line_heat() caller also does the same.

Instead, let's change the determine_line_heat() to take a `commit_info`
structure and remove the internal call to get_commit_info() thus
cleaning up and optimizing the code path.

Enabling Git's trace2 API in order to record the execution time for
every call to determine_line_heat() function:

   + trace2_region_enter("blame", "determine_line_heat", the_repository);
     determine_line_heat(ent, &default_color);
   + trace2_region_enter("blame", "determine_line_heat", the_repository);

Then, running `git blame` for "kernel/fork.c" in linux.git and summing
all the execution time for every call (around 1.3k calls) resulted in
2.6x faster execution (best out 3):

   git built from 328c109303 (The eighth batch, 2021-02-12) = 42ms
   git built from 328c109303 + this change                  = 16ms

Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-17 11:04:17 -08:00
Jeff Hostetler
fcd19b09f8 fsmonitor: refactor initialization of fsmonitor_last_update token
Isolate and document initialization of `istate->fsmonitor_last_update`.
This field should contain a fsmonitor-specific opaque token, but we
need to initialize it before we can actually talk to a fsmonitor process,
so we create a generic default value.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:35 -08:00
Kevin Willford
ff03836b9d fsmonitor: allow all entries for a folder to be invalidated
Allow fsmonitor to report directory changes by reporting paths with a
trailing slash.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:35 -08:00
Jeff Hostetler
29fbbf43a0 fsmonitor: log FSMN token when reading and writing the index
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:35 -08:00
Jeff Hostetler
940b94f35c fsmonitor: log invocation of FSMonitor hook to trace2
Let's measure the time taken to request and receive FSMonitor data
via the hook API and the size of the response.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Jeff Hostetler
15268d12be read-cache: log the number of scanned files to trace2
Report the number of files in the working directory that were read and
their hashes verified in `refresh_index()`.

FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files.  Let's measure this.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Jeff Hostetler
a98e0f2d31 read-cache: log the number of lstat calls to trace2
Report the total number of calls made to lstat() inside of refresh_index().

FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files.  This can be seen in
`refresh_index()`.  Let's measure this.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Jeff Hostetler
8c4b7503d0 preload-index: log the number of lstat calls to trace2
Report the total number of calls made to lstat() inside preload_index().

FSMonitor improves the performance of commands like `git status` by
avoiding scanning the disk for changed files.  This can be seen in
`preload_index()`.  Let's measure this.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Jeff Hostetler
4f2009dce2 p7519: add trace logging during perf test
Add optional trace logging to allow us to better compare performance of
various fsmonitor providers and compare results with non-fsmonitor runs.

Currently, this includes Trace2 logging, but may be extended to include
other trace targets, such as GIT_TRACE_FSMONITOR if desired.

Using this logging helped me explain an odd behavior on MacOS where the
kernel was dropping events and causing the hook to Watchman to timeout.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Jeff Hostetler
a7556c3bde p7519: move watchman cleanup earlier in the test
Shutdown Watchman after the Watchman-based tests and before the block of
"no fsmonitor" tests.

This helps ensure that Watchman cannot affect the test results for the
other.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Jeff Hostetler
0917763d67 p7519: fix watchman watch-list test on Windows
Only use the final portion of the test trash directory file name
when verifying that Watchman was started.

On Windows and under the SDK, $GIT_WORKTREE is a cygwin-style
path with forward slashes and a "/c/" drive name.  However
`watchman watch-list` reports a proper Windows-style pathname
with drive letters and backslashes.  This causes the grep to
fail.  Since we don't really care about the full pathname (and
we really don't want to bother with normalizaing them), just see
if the test-name portion of the path is found.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Jeff Hostetler
eb10e637cf p7519: do not rely on "xargs -d" in test
Convert the test to use a more portable method to update the mtime on a
large number of files under version control.

The Mac version of xargs does not support the "-d" option.
Likewise, the "-0" and "--null" options are not portable.

Furthermore, use `test-tool chmtime` rather than `touch` to update the
mtime to ensure that it is actually updated (especially on file systems
with only whole second resolution).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 17:14:34 -08:00
Matheus Tavares
3f7ba60350 checkout-index: omit entries with no tempname from --temp output
With --temp (or --stage=all, which implies --temp), checkout-index
writes a list to stdout associating temporary file names to the entries'
names. But if it fails to write an entry, and the failure happens before
even assigning a temporary filename to that entry, we get an odd output
line. This can be seen when trying to check out a symlink whose blob is
missing:

$ missing_blob=$(git hash-object --stdin </dev/null)
$ git update-index --add --cacheinfo 120000,$missing_blob,foo
$ git checkout-index --temp foo
error: unable to read sha1 file of foo (e69de29bb2)
        foo

The 'TAB foo' line is not much useful and it might break scripts that
expect the 'tempname TAB foo' output. So let's omit such entries from
the stdout list (but leaving the error message on stderr).

We could also consider omitting _all_ failed entries from the output
list, but that's probably not a good idea as the associated tempfiles
may have been created even when checkout failed, so scripts may want to
use the output list for cleanup.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 11:27:18 -08:00
Matheus Tavares
9334ea8e92 write_entry(): fix misuses of path in error messages
The variables `path` and `ce->name`, at write_entry(), usually have the
same contents, but that's not the case when using a checkout prefix or
writing to a tempfile. (In fact, `path` will be either empty or dirty
when writing to a tempfile.) Therefore, these variables cannot be used
interchangeably. In this sense, fix wrong uses of `path` in error
messages where it should really be `ce->name`, and add some regression
tests. (Note: there doesn't seem to be any misuse in the other way
around.)

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 11:27:17 -08:00
Junio C Hamano
1eb4136ac2 diff: --{rotate,skip}-to=<path>
In the implementation of "git difftool", there is a case where the
user wants to start viewing the diffs at a specific path and
continue on to the rest, optionally wrapping around to the
beginning.  Since it is somewhat cumbersome to implement such a
feature as a post-processing step of "git diff" output, let's
support it internally with two new options.

 - "git diff --rotate-to=C", when the resulting patch would show
   paths A B C D E without the option, would "rotate" the paths to
   shows patch to C D E A B instead.  It is an error when there is
   no patch for C is shown.

 - "git diff --skip-to=C" would instead "skip" the paths before C,
   and shows patch to C D E.  Again, it is an error when there is no
   patch for C is shown.

 - "git log [-p]" also accepts these two options, but it is not an
   error if there is no change to the specified path.  Instead, the
   set of output paths are rotated or skipped to the specified path
   or the first path that sorts after the specified path.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-16 09:30:42 -08:00
Elijah Newren
f78cf97617 merge-ort: call diffcore_rename() directly
We want to pass additional information to diffcore_rename() (or some
variant thereof) without plumbing that extra information through
diff_tree_oid() and diffcore_std().  Further, since we will need to
gather additional special information related to diffs and are walking
the trees anyway in collect_merge_info(), it seems odd to have
diff_tree_oid()/diffcore_std() repeat those tree walks.  And there may
be times where we can avoid traversing into a subtree in
collect_merge_info() (based on additional information at our disposal),
that the basic diff logic would be unable to take advantage of.  For all
these reasons, just create the add and delete pairs ourself and then
call diffcore_rename() directly.

This change is primarily about enabling future optimizations; the
advantage of avoiding extra tree traversals is small compared to the
cost of rename detection, and the advantage of avoiding the extra tree
traversals is somewhat offset by the extra time spent in
collect_merge_info() collecting the additional data anyway.  However...

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       13.294 s ±  0.103 s    12.775 s ±  0.062 s
    mega-renames:    187.248 s ±  0.882 s   188.754 s ±  0.284 s
    just-one-mega:     5.557 s ±  0.017 s     5.599 s ±  0.019 s

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15 18:02:16 -08:00
Elijah Newren
07c9a7fcb5 gitdiffcore doc: mention new preliminary step for rename detection
The last few patches have introduced a new preliminary step when rename
detection is on but both break detection and copy detection are off.
Document this new step.  While we're at it, add a testcase that checks
the new behavior as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15 18:02:16 -08:00
Elijah Newren
bd24aa2f97 diffcore-rename: guide inexact rename detection based on basenames
Make use of the new find_basename_matches() function added in the last
two patches, to find renames more rapidly in cases where we can match up
files based on basenames.  As a quick reminder (see the last two commit
messages for more details), this means for example that
docs/extensions.txt and docs/config/extensions.txt are considered likely
renames if there are no remaining 'extensions.txt' files elsewhere among
the added and deleted files, and if a similarity check confirms they are
similar, then they are marked as a rename without looking for a better
similarity match among other files.  This is a behavioral change, as
covered in more detail in the previous commit message.

We do not use this heuristic together with either break or copy
detection.  The point of break detection is to say that filename
similarity does not imply file content similarity, and we only want to
know about file content similarity.  The point of copy detection is to
use more resources to check for additional similarities, while this is
an optimization that uses far less resources but which might also result
in finding slightly fewer similarities.  So the idea behind this
optimization goes against both of those features, and will be turned off
for both.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       13.815 s ±  0.062 s    13.294 s ±  0.103 s
    mega-renames:   1799.937 s ±  0.493 s   187.248 s ±  0.882 s
    just-one-mega:    51.289 s ±  0.019 s     5.557 s ±  0.017 s

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15 18:02:16 -08:00
Elijah Newren
da09f65127 diffcore-rename: complete find_basename_matches()
It is not uncommon in real world repositories for the majority of file
renames to not change the basename of the file; i.e. most "renames" are
just a move of files into different directories.  We can make use of
this to avoid comparing all rename source candidates with all rename
destination candidates, by first comparing sources to destinations with
the same basenames.  If two files with the same basename are
sufficiently similar, we record the rename; if not, we include those
files in the more exhaustive matrix comparison.

This means we are adding a set of preliminary additional comparisons,
but for each file we only compare it with at most one other file.  For
example, if there was a include/media/device.h that was deleted and a
src/module/media/device.h that was added, and there are no other
device.h files in the remaining sets of added and deleted files after
exact rename detection, then these two files would be compared in the
preliminary step.

This commit does not yet actually employ this new optimization, it
merely adds a function which can be used for this purpose.  The next
commit will do the necessary plumbing to make use of it.

Note that this optimization might give us different results than without
the optimization, because it's possible that despite files with the same
basename being sufficiently similar to be considered a rename, there's
an even better match between files without the same basename.  I think
that is okay for four reasons: (1) it's easy to explain to the users
what happened if it does ever occur (or even for them to intuitively
figure out), (2) as the next patch will show it provides such a large
performance boost that it's worth the tradeoff, and (3) it's somewhat
unlikely that despite having unique matching basenames that other files
serve as better matches.  Reason (4) takes a full paragraph to
explain...

If the previous three reasons aren't enough, consider what rename
detection already does.  Break detection is not the default, meaning
that if files have the same _fullname_, then they are considered related
even if they are 0% similar.  In fact, in such a case, we don't even
bother comparing the files to see if they are similar let alone
comparing them to all other files to see what they are most similar to.
Basically, we override content similarity based on sufficient filename
similarity.  Without the filename similarity (currently implemented as
an exact match of filename), we swing the pendulum the opposite
direction and say that filename similarity is irrelevant and compare a
full N x M matrix of sources and destinations to find out which have the
most similar contents.  This optimization just adds another form of
filename similarity comparison, but augments it with a file content
similarity check as well.  Basically, if two files have the same
basename and are sufficiently similar to be considered a rename, mark
them as such without comparing the two to all other rename candidates.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15 18:02:16 -08:00