Commit 0b4396f068, (git-p4: make python2.7 the oldest supported version,
2019-12-13) pointed out that git-p4 uses Python 2.7-or-later features
in the code.
In addition, git-p4 gained enough support for Python 3 from
6cec21a82f, (git-p4: encode/decode communication with p4 for
python3, 2019-12-13).
Let's update our documentation to reflect that fact.
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Test update.
* js/t5526-with-no-particular-primary-branch-name:
t5526: drop the prereq expecting the default branch name `main`
t5526: avoid depending on a specific default branch name
Tighten error checking in the codepath that responds to "git fetch".
* jk/check-config-parsing-error-in-upload-pack:
upload-pack: propagate return value from object filter config callback
Newer versions of xsltproc can assign IDs in HTML documents it
generates in a consistent manner. Use the feature to help format
HTML version of the user manual reproducibly.
* ae/doc-reproducible-html:
doc: make HTML manual reproducible
The glossary described a branch as an "active" line of development,
which is misleading---a stale and non-moving branch is still a
branch.
* so/glossary-branch-is-not-necessarily-active:
glossary: improve "branch" definition
"@" sometimes worked (e.g. "git push origin @:there") as a part of
a refspec element, but "git push origin @" did not work, which has
been corrected.
* fc/atmark-in-refspec:
refspec: make @ a synonym of HEAD
tests: push: trivial cleanup
tests: push: improve cleanup of HEAD tests
"git $cmd $args", when $cmd is not a recognised subcommand, by
default tries to see if $cmd is a typo of an existing subcommand
and optionally executes the corrected command if there is only one
possibility, depending on the setting of help.autocorrect; the
users can now disable the whole thing, including the cycles spent
to find a likely typo, by setting the configuration variable to
'never'.
* dd/help-autocorrect-never:
help.c: help.autocorrect=never means "do not compute suggestions"
register_rename_src() simply references the passed pair inside
rename_src. In contrast, add_rename_dst() did something entirely
different for rename_dst. Instead of copying the passed pair, it made a
copy of the second diff_filespec from the passed pair, referenced it,
and then set the diff_rename_dst.pair field to NULL. Later, when a
pairing is found, record_rename_pair() allocated a full diff_filepair
via diff_queue() and pointed its src and dst fields at the appropriate
diff_filespecs. This contrast between register_rename_src() for the
rename_src data structure and add_rename_dst() for the rename_dst data
structure is oddly inconsistent and requires more memory and work than
necessary. Let's just reference the original diff_filepair in
rename_dst as-is, just as we do with rename_src. Add a new
rename_dst.is_rename field, since the rename_dst.p field is never NULL
unlike the old rename_dst.pair field.
Taking advantage of this change and the fact that same-named paths will
be adjacent, we can get rid of the sorting of the array and most of the
lookups on it, allowing us to instead just append as we go. However,
there is one remaining reason to still keep locate_rename_dst():
handling broken pairs (i.e. when break detection is on). Those are
somewhat rare, but we can set up a simple strintmap to get the map
between the source and the index. Doing that allows us to still have a
fast lookup without sorting the rename_dst array. Since the sorting had
been done in a weakly quadratic manner, when many renames are involved
this time could add up.
There is still a strcmp() in add_rename_dst() that I have left in place
to make it easier to verify that the algorithm has the same results.
This strcmp() is there to check for duplicate destination entries (which
was the easiest way at the time to avoid segfaults in the
diffcore-rename code when trees had multiple entries at a given path).
The underlying double free()s are no longer an issue with the new
algorithm, but that can be addressed in a subsequent commit.
This patch is being submitted in a different order than its original
development, but in a large rebase of many commits with lots of renames
and with several optimizations to inexact rename detection, both setup
time and write back to output queue time from diffcore_rename() were
sizeable chunks of overall runtime. This patch accelerated the setup
time by about 65%, and final write back to the output queue time by
about 50%, resulting in an overall drop of 3.5% on the execution time of
rebasing a few dozen patches.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
register_rename_src() took pains to create an array in rename_src which
was sorted by pathname of the contained diff_filepair. The sorting was
entirely unnecessary since callers pass filepairs to us in sorted
order. We can simply append to the end of the rename_src array,
speeding up diffcore_rename() setup time.
Also, note that I dropped the return type on the function since it was
unconditionally discarded anyway.
This patch is being submitted in a different order than its original
development, but in a large rebase of many commits with lots of renames
and with several optimizations to inexact rename detection,
diffcore_rename() setup time was a sizeable chunk of overall runtime.
This patch dropped execution time of rebasing 35 commits with lots of
renames by 2% overall.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While creating the last commit, I found a number of other cases where
git would segfault when faced with trees that have duplicate entries.
None of these segfaults are in the diffcore-rename code (they all occur
in cache-tree and unpack-trees). Further, to my knowledge, no one has
ever been adversely affected by these bugs, and given that it has been
15 years and folks have fixed a few other issues with historical
duplicate entries (as noted in the last commit), I am not sure we will
ever run into anyone having problems with these. So I am not sure these
are worth fixing, but it doesn't hurt to at least document these
failures in the same test file that is concerned with duplicate tree
entries.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 4d6be03b95 ("diffcore-rename: avoid processing duplicate
destinations", 2015-02-26) added t4058 to demonstrate that a workaround
it added to avoid double frees (namely to just turn off rename detection
when trees had duplicate entries) would indeed avoid segfaults. The
tests, though, give the impression that the expected diffs are "correct"
when in reality they are just "don't segfault, and do something
semi-reasonable under the circumstances". Add some notes to make this
clearer.
Also, commit 25d5ea410f ("[PATCH] Redo rename/copy detection logic.",
2005-05-24) added a similar workaround to avoid segfaults, but for
rename_src rather than rename_dst. I do not see any tests in the
testsuite to cover the collision detection of entries limited to the
source side, so add a couple.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inexact rename detection works by comparing all sources to all
destinations, computing similarities, and then finding the best matches
among those that are sufficiently similar.
However, it is preceded by exact rename detection that works by
checking if there are files with identical hashes. If exact renames are
found, we can exclude some files from inexact rename detection.
The inexact rename detection loops over the full set of files, but
immediately skips those for which rename_dst[i].is_rename is true and
thus doesn't compare any sources to that destination. As such, these
paths shouldn't be included in the progress counter.
For the eagle eyed, this change hints at an actual optimization -- the
first one I presented at Git Merge 2020. I'll be submitting that
optimization later, once the basic merge-ort algorithm has merged.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
diffcore-rename had two different checks of the form
if ((a < limit || b < limit) &&
a * b <= limit * limit)
This can be simplified to
if (st_mult(a, b) <= st_mult(limit, limit))
which makes it clearer how we are checking for overflow, and makes it
much easier to parse given the drop from 8 to 4 variable appearances.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
too_many_rename_candidates() got the number of rename destinations via
an argument to the function, but the number of rename sources via a
global variable. That felt rather inconsistent. Pass in the number of
rename sources as an argument as well.
While we are at it... We had a local variable, num_src, that served two
purposes. Initially it was set to the global value, but later was used
for counting a subset of the number of sources. Since we now have a
function argument for the former usage, introduce a clearer variable
name for the latter usage.
This patch has no behavioral changes; it's just renaming and passing an
argument instead of grabbing it from the global namespace. (You may
find it easier to view the patch using git diff's --color-words option.)
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our main data structures are rename_src and rename_dst. For counters of
these data structures, num_sources and num_destinations seem natural;
definitely more so than using num_create for the latter.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eventually we want to be omit the advice when we can fast-forward
in which case there is no reason to require the user to choose
between rebase or merge.
In order to do so, we need to delay giving the advice up to the
point where we can check if we can fast-forward or not.
Additionally, config_get_rebase() was probably never its true home.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We would like to be able to make this check before the decision to
rebase is made in a future step. Besides, using a separate helper
makes the code easier to follow.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add code which determines which kind of special rename case each rename
corresponds to, but leave the handling of each type unimplemented for
now. Future commits will implement each one.
There is some tenuous resemblance to merge-recursive's
process_renames(), but comparing the two is very unlikely to yield any
insights. merge-ort's process_renames() is a bit complex and I would
prefer if I could simplify it more, but it is far easier to grok than
merge-recursive's function of the same name in my opinion. Plus,
merge-ort handles more rename conflict types than merge-recursive does.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Based heavily on merge-recursive's get_diffpairs() function, and also
includes the necessary paired call to diff_warn_rename_limit() so that
users will be warned if merge.renameLimit is not sufficiently large for
rename detection to run.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will grow later, but we only need a few fields for basic rename
handling.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update the documentation of the file system monitor extension to
describe version 2.
The format was extended to support opaque tokens in:
56c6910028 fsmonitor: change last update timestamp on the index_state to opaque token
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was implemented in the 'git multi-pack-index' command and
merged in 468b3221 (Merge branch 'ds/multi-pack-verify',
2018-10-10).
And there's no 'git midx' command.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To give ample warning for users wishing to override Git's the fall-back
for an unconfigured `init.defaultBranch` (in case we decide to change it
in a future Git version), let's introduce some advice that is shown upon
`git init` when that value is not set.
Note: two test cases in Git's test suite want to verify that the
`stderr` output of `git init` is empty. It is now necessary to suppress
the advice, we now do that via the `init.defaultBranch` setting. While
not strictly necessary, we also set this to `false` in
`test_create_repo()`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are about to introduce a message giving users running `git init` some
advice about `init.defaultBranch`. This will necessarily be done in
`repo_default_branch_name()`.
Not all code paths want to show that advice, though. In particular, the
`git clone` codepath _specifically_ asks for `init_db()` to be quiet,
via the `INIT_DB_QUIET` flag.
In preparation for showing users above-mentioned advice, let's change
the function signature of `get_default_branch_name()` to accept the
parameter `quiet`.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In one of the next commits, we would like to give users some advice
regarding the initial branch name, and how to modify it.
To that end, it would be good if `git branch -m <name>` worked in a
freshly initialized repository without any commits. Let's make it so.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our documentation does not mention any future plan to change 'master' to
other value. It is a good idea to document this, though.
Initial-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The focus here is on adding a path_msg() which will queue up
warning/conflict/notice messages about the merge for later processing,
storing these in a pathname -> strbuf map. It might seem like a big
change, but it really just is:
* declaration of necessary map with some comments
* initialization and recording of data
* a bunch of code to iterate over the map at print/free time
* at least one caller in order to avoid an error about having an
unused function (which we provide in the form of implementing
modify/delete conflict handling).
At this stage, it is probably not clear why I am opting for delayed
output processing. There are multiple reasons:
1. Merges are supposed to abort if they would overwrite dirty changes
in the working tree. We cannot correctly determine whether changes
would be overwritten until both rename detection has occurred and
full processing of entries with the renames has finalized.
Warning/conflict/notice messages come up at intermediate codepaths
along the way, so unless we want spurious conflict/warning messages
being printed when the merge will be aborted anyway, we need to
save these messages and only print them when relevant.
2. There can be multiple messages for a single path, and we want all
messages for a give path to appear together instead of having them
grouped by conflict/warning type. This was a problem already with
merge-recursive.c but became even more important due to the
splitting apart of conflict types as discussed in the commit
message for 1f3c9ba707 ("t6425: be more flexible with rename/delete
conflict messages", 2020-08-10)
3. Some callers might want to avoid showing the output in certain
cases, such as if the end result is a clean merge. Rebases have
typically done this.
4. Some callers might not want the output to go to stdout or even
stderr, but might want to do something else with it entirely.
For example, a --remerge-diff option to `git show` or `git log
-p` that remerges on the fly and diffs merge commits against the
remerged version would benefit from stdout/stderr not being
written to in the standard form.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplistic and weird-looking patch is here to facilitate future
patch submissions. Adding this stub allows rename detection code to
reference it in one patch series, while a separate patch series can
define the implementation, and then both series can merge cleanly and
work nicely together at that point.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit b658536f59 ("merge-ort: add some high-level algorithm structure",
2020-10-27) added high-level structure of the ort merge algorithm. As
we have added more and more functions, that high-level structure has
been slightly obscured. Since functions are still grouped according to
this high-level structure, add comments denoting sections where all the
functions are specifically tied to a piece of the high-level structure.
This function groupings include a few sub-divisions of the original
high-level structure, including some sub-divisions that are yet to be
submitted. Each has (or will have) several functions all serving as
helpers to one or two main functions for each section.
As an added bonus, the comments will serve to provide a small textual
separation between nearby sections and allow the next three patch series
to be submitted independently and merge cleanly.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This field will be used in future patches to allow removal of paths from
opt->priv->paths.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This field is not yet used, but will be used by both the rename handling
code, and the conflict type handling code in process_entry().
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move most of merge_finalize() into a new helper function,
clear_internal_opts(). This is a step to facilitate recursive merges,
as well as some future optimizations.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include blob.h for definition of blob_type, and commit-reach.h for
declarations of get_merge_bases() and in_merge_bases(). While none of
these are used yet, we want to avoid cross-dependencies in the next
three series of patches for merge-ort and merge them at the end; adding
these "#include"s now avoids textual conflicts.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After checkout(), the working tree has the appropriate contents, and the
index matches the working copy. That means that all unmodified and
cleanly merged files have correct index entries, but conflicted entries
need to be updated.
We do this by looping over the conflicted entries, marking the existing
index entry for the path with CE_REMOVE, adding new higher order staged
for the path at the end of the index (ignoring normal index sort order),
and then at the end of the loop removing the CE_REMOVED-marked cache
entries and sorting the index.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since merge-ort creates a tree for its output, when there are no
conflicts, updating the working tree and index is as simple as using the
unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent
of a "checkout" operation).
If there were conflicts in the merge, then since the tree we created
included all the conflict markers, then using the unpack_trees machinery
in this manner will still update the working tree correctly. Further,
all index entries corresponding to cleanly merged files will also be
updated correctly by this procedure. Index entries corresponding to
conflicted entries will appear as though the user had run "git add -u"
after the merge to accept all files as-is with conflict markers.
Thus, after running unpack_trees(), there needs to be a separate step
for updating the entries in the index corresponding to conflicted files.
This will be the job for the function record_conflicted_index_entris(),
which will be implemented in a subsequent commit.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds a basic implementation for merge_switch_to_result(), though
just in terms of a few new empty functions that will be defined in
subsequent commits.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our order for processing of entries means that if we have a tree of
files that looks like
Makefile
src/moduleA/foo.c
src/moduleA/bar.c
src/moduleB/baz.c
src/moduleB/umm.c
tokens.txt
Then we will process paths in the order of the leftmost column below. I
have added two additional columns that help explain the algorithm that
follows; the 2nd column is there to remind us we have oid & mode info we
are tracking for each of these paths (which differs between the paths
which I'm not representing well here), and the third column annotates
the parent directory of the entry:
tokens.txt <version_info> ""
src/moduleB/umm.c <version_info> src/moduleB
src/moduleB/baz.c <version_info> src/moduleB
src/moduleB <version_info> src
src/moduleA/foo.c <version_info> src/moduleA
src/moduleA/bar.c <version_info> src/moduleA
src/moduleA <version_info> src
src <version_info> ""
Makefile <version_info> ""
When the parent directory changes, if it's a subdirectory of the previous
parent directory (e.g. "" -> src/moduleB) then we can just keep appending.
If the parent directory differs from the previous parent directory and is
not a subdirectory, then we should process that directory.
So, for example, when we get to this point:
tokens.txt <version_info> ""
src/moduleB/umm.c <version_info> src/moduleB
src/moduleB/baz.c <version_info> src/moduleB
and note that the next entry (src/moduleB) has a different parent than
the last one that isn't a subdirectory, we should write out a tree for it
100644 blob <HASH> umm.c
100644 blob <HASH> baz.c
then pop all the entries under that directory while recording the new
hash for that directory, leaving us with
tokens.txt <version_info> ""
src/moduleB <new version_info> src
This process repeats until at the end we get to
tokens.txt <version_info> ""
src <new version_info> ""
Makefile <version_info> ""
and then we can write out the toplevel tree. Since we potentially have
entries in our string_list corresponding to multiple different toplevel
directories, e.g. a slightly different repository might have:
whizbang.txt <version_info> ""
tokens.txt <version_info> ""
src/moduleD <new version_info> src
src/moduleC <new version_info> src
src/moduleB <new version_info> src
src/moduleA/foo.c <version_info> src/moduleA
src/moduleA/bar.c <version_info> src/moduleA
When src/moduleA is popped off, we need to know that the "last
directory" reverts back to src, and how many entries in our string_list
are associated with that parent directory. So I use an auxiliary
offsets string_list which would have (parent_directory,offset)
information of the form
"" 0
src 2
src/moduleA 5
Whenever I write out a tree for a subdirectory, I set versions.nr to
the final offset value and then decrement offsets.nr...and then add
an entry to versions with a hash for the new directory.
The idea is relatively simple, there's just a lot of accounting to
implement this.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new function, write_tree(), which will take a list of
basenames, modes, and oids for a single directory and create a tree
object in the object-store. We do not yet have just basenames, modes,
and oids for just a single directory (we have a mixture of entries from
all directory levels in the hierarchy) so we still die() before the
current call to write_tree(), but the next patch will rectify that.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As a step towards transforming the processed path->conflict_info entries
into an actual tree object, start recording basenames, modes, and oids
in a dir_metadata structure. Subsequent commits will make use of this
to actually write a tree.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to handle paths below a directory before needing to handle the
directory itself. Also, we want to handle the directory immediately
after the paths below it, so we can't use simple lexicographic ordering
from strcmp (which would insert foo.txt between foo and foo/file.c).
Copy string_list_df_name_compare() from merge-recursive.c, and set up a
string list of paths sorted by that function so that we can iterate in
the desired order.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>