Commit Graph

54788 Commits

Author SHA1 Message Date
Martin Ågren
41a74bd013 t7510: invoke git as part of &&-chain
If `git commit-tree HEAD^{tree}` fails on us and produces no output on
stdout, we will substitute that empty string and execute `git tag
ninth-unsigned`, i.e., we will tag HEAD rather than a newly created
object. But we are lucky: we have a signature on HEAD, so we should
eventually fail the next test, where we verify that "ninth-unsigned" is
indeed unsigned.

We have a similar problem a few lines later. If `git commit-tree -S`
fails with no output, we will happily tag HEAD as "tenth-signed". Here,
we are not so lucky. The tag ends up on the same commit as
"eighth-signed-alt", and that's a signed commit, so t7510-signed-commit
will pass, despite `git commit-tree -S` failing.

Make these `git commit-tree` invocations a direct part of the &&-chain,
so that we can rely less on luck and set a better example for future
tests modeled after this one. Fix a 9/10 copy/paste error while at it.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Brandon Richardson <brandon1024.br@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-22 11:08:33 -08:00
Stefan Beller
5d124f419d git-submodule: abort if core.worktree could not be set correctly
74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by
ensure-core-worktree, 2018-08-13) forgot to exit the submodule operation
if the helper could not ensure that core.worktree is set correctly.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 15:28:13 -08:00
David Turner
d1dd94b308 Do not print 'dangling' for cat-file in case of ambiguity
The return values -1 and -2 from get_oid could mean two different
things, depending on whether they were from an enum returned by
get_tree_entry_follow_symlinks, or from a different code path.  This
caused 'dangling' to be printed from a git cat-file in the case of an
ambiguous (-2) result.

Unify the results of get_oid* and get_tree_entry_follow_symlinks to be
one common type, with unambiguous values.

Signed-off-by: David Turner <novalis@novalis.org>
Reported-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 15:22:02 -08:00
Junio C Hamano
16a465bc01 Third batch after 2.20
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 13:56:54 -08:00
Junio C Hamano
5104f8f1ac Merge branch 'js/gc-repack-close-before-remove'
"git gc" and "git repack" did not close the open packfiles that
they found unneeded before removing them, which didn't work on a
platform incapable of removing an open file.  This has been
corrected.

* js/gc-repack-close-before-remove:
  gc/repack: release packs when needed
2019-01-18 13:49:57 -08:00
Junio C Hamano
eab7584e37 Merge branch 'en/show-ref-doc-fix'
Doc update.

* en/show-ref-doc-fix:
  git-show-ref.txt: fix order of flags
2019-01-18 13:49:57 -08:00
Junio C Hamano
55574bd04a Merge branch 'ot/ref-filter-object-info'
The "--format=<placeholder>" option of for-each-ref, branch and tag
learned to show a few more traits of objects that can be learned by
the object_info API.

* ot/ref-filter-object-info:
  ref-filter: give uintmax_t to format with %PRIuMAX
  ref-filter: add docs for new options
  ref-filter: add tests for deltabase
  ref-filter: add deltabase option
  ref-filter: add tests for objectsize:disk
  ref-filter: add check for negative file size
  ref-filter: add objectsize:disk option
2019-01-18 13:49:56 -08:00
Junio C Hamano
3fe47ff444 Merge branch 'sg/stress-test'
Flaky tests can now be repeatedly run under load with the
"--stress" option.

* sg/stress-test:
  test-lib: add the '--stress' option to run a test repeatedly under load
  test-lib-functions: introduce the 'test_set_port' helper function
  test-lib: set $TRASH_DIRECTORY earlier
  test-lib: consolidate naming of test-results paths
  test-lib: parse command line options earlier
  test-lib: parse options in a for loop to keep $@ intact
  test-lib: extract Bash version check for '-x' tracing
  test-lib: translate SIGTERM and SIGHUP to an exit
2019-01-18 13:49:56 -08:00
Junio C Hamano
2c0a645d9e Merge branch 'rs/sha1-file-close-mapped-file-on-error'
Code clean-up.

* rs/sha1-file-close-mapped-file-on-error:
  sha1-file: close fd of empty file in map_sha1_file_1()
2019-01-18 13:49:56 -08:00
Junio C Hamano
eb8638abec Merge branch 'rs/loose-object-cache-perffix'
The loose object cache used to optimize existence look-up has been
updated.

* rs/loose-object-cache-perffix:
  object-store: retire odb_load_loose_cache()
  object-store: use one oid_array per subdirectory for loose cache
  object-store: factor out odb_clear_loose_cache()
  object-store: factor out odb_loose_cache()
2019-01-18 13:49:56 -08:00
Junio C Hamano
702bbfef3c Merge branch 'po/git-p4-wo-login'
"git p4" update.

* po/git-p4-wo-login:
  git-p4: fix problem when p4 login is not necessary
2019-01-18 13:49:56 -08:00
Junio C Hamano
41db137234 Merge branch 'mm/multimail-1.5'
Update "git multimail" from the upstream.

* mm/multimail-1.5:
  git-multimail: update to release 1.5.0
2019-01-18 13:49:55 -08:00
Junio C Hamano
9462ac7211 Merge branch 'tg/t5570-drop-racy-test'
An inherently racy test that caused intermittent failures has been
removed.

* tg/t5570-drop-racy-test:
  Revert "t/lib-git-daemon: record daemon log"
  t5570: drop racy test
2019-01-18 13:49:55 -08:00
Junio C Hamano
74ae0652c4 Merge branch 'jk/dev-build-format-security'
Earlier we added "-Wformat-security" to developer builds, assuming
that "-Wall" (which includes "-Wformat" which in turn is required
to use "-Wformat-security") is always in effect.  This is not true
when config.mak.autogen is in use, unfortunately.  This has been
fixed by unconditionally adding "-Wall" to developer builds.

* jk/dev-build-format-security:
  config.mak.dev: add -Wall, primarily for -Wformat, to help autoconf users
2019-01-18 13:49:55 -08:00
Junio C Hamano
77fbd96aeb Merge branch 'so/cherry-pick-always-allow-m1'
"git cherry-pick -m1" was forbidden when picking a non-merge
commit, even though there _is_ parent number 1 for such a commit.
This was done to avoid mistakes back when "cherry-pick" was about
picking a single commit, but is no longer useful with "cherry-pick"
that can pick a range of commits.  Now the "-m$num" option is
allowed when picking any commit, as long as $num names an existing
parent of the commit.

Technically this is a backward incompatible change; hopefully
nobody is relying on the error-checking behaviour.

* so/cherry-pick-always-allow-m1:
  t3506: validate '-m 1 -ff' is now accepted for non-merge commits
  t3502: validate '-m 1' argument is now accepted for non-merge commits
  cherry-pick: do not error on non-merge commits when '-m 1' is specified
  t3510: stop using '-m 1' to force failure mid-sequence of cherry-picks
2019-01-18 13:49:54 -08:00
Junio C Hamano
726f89c2dd Merge branch 'nd/worktree-remove-with-uninitialized-submodules'
"git worktree remove" and "git worktree move" refused to work when
there is a submodule involved.  This has been loosened to ignore
uninitialized submodules.

* nd/worktree-remove-with-uninitialized-submodules:
  worktree: allow to (re)move worktrees with uninitialized submodules
2019-01-18 13:49:54 -08:00
Junio C Hamano
bb20dbbc20 Merge branch 'sg/test-bash-version-fix'
The test suite tried to see if it is run under bash, but the check
itself failed under some other implementations of shell (notably
under NetBSD).  This has been corrected.

* sg/test-bash-version-fix:
  test-lib: check Bash version for '-x' without using shell arrays
2019-01-18 13:49:54 -08:00
Junio C Hamano
9f2eba2b90 Merge branch 'rb/hpe'
Portability updates for the HPE NonStop platform.

* rb/hpe:
  compat/regex/regcomp.c: define intptr_t and uintptr_t on NonStop
  git-compat-util.h: add FLOSS headers for HPE NonStop
  config.mak.uname: support for modern HPE NonStop config.
  transport-helper: drop read/write errno checks
  transport-helper: use xread instead of read
2019-01-18 13:49:54 -08:00
Junio C Hamano
e805dc1892 Merge branch 'ed/simplify-setup-git-dir'
Code simplification.

* ed/simplify-setup-git-dir:
  Simplify handling of setup_git_directory_gently() failure cases.
2019-01-18 13:49:54 -08:00
Junio C Hamano
b84e297753 Merge branch 'cy/zsh-completion-SP-in-path'
With zsh, "git cmd path<TAB>" was completed to "git cmd path name"
when the completed path has a special character like SP in it,
without any attempt to keep "path name" a single filename.  This
has been fixed to complete it to "git cmd path\ name" just like
Bash completion does.

* cy/zsh-completion-SP-in-path:
  completion: treat results of git ls-tree as file paths
  zsh: complete unquoted paths with spaces correctly
2019-01-18 13:49:54 -08:00
Junio C Hamano
c433857894 Merge branch 'cy/completion-typofix'
Typofix.

* cy/completion-typofix:
  completion: fix typo in git-completion.bash
2019-01-18 13:49:54 -08:00
Junio C Hamano
81bf66b760 Merge branch 'ew/ban-strncat'
The "strncat()" function is now among the banned functions.

* ew/ban-strncat:
  banned.h: mark strncat() as banned
2019-01-18 13:49:53 -08:00
Junio C Hamano
d01a3faa50 Merge branch 'ds/commit-graph-assert-missing-parents'
Tightening error checking in commit-graph writer.

* ds/commit-graph-assert-missing-parents:
  commit-graph: writing missing parents is a BUG
2019-01-18 13:49:53 -08:00
Junio C Hamano
540ee40e11 Merge branch 'es/doc-worktree-guessremote-config'
Doc clarification.

* es/doc-worktree-guessremote-config:
  doc/config: do a better job of introducing 'worktree.guessRemote'
2019-01-18 13:49:53 -08:00
Junio C Hamano
3942920966 Merge branch 'sb/submodule-unset-core-worktree-when-worktree-is-lost'
The core.worktree setting in a submodule repository should not be
pointing at a directory when the submodule loses its working tree
(e.g. getting deinit'ed), but the code did not properly maintain
this invariant.

* sb/submodule-unset-core-worktree-when-worktree-is-lost:
  submodule deinit: unset core.worktree
  submodule--helper: fix BUG message in ensure_core_worktree
  submodule: unset core.worktree if no working tree is present
  submodule update: add regression test with old style setups
2019-01-18 13:49:53 -08:00
Junio C Hamano
1ed943e9ae Merge branch 'ma/asciidoctor'
Some of the documentation pages formatted incorrectly with
Asciidoctor, which have been fixed.

* ma/asciidoctor:
  git-status.txt: render tables correctly under Asciidoctor
  Documentation: do not nest open blocks
  git-column.txt: fix section header
2019-01-18 13:49:53 -08:00
Junio C Hamano
ec27a94013 Merge branch 'jn/stripspace-wo-repository'
"git stripspace" should be usable outside a git repository, but
under the "-s" or "-c" mode, it didn't.

* jn/stripspace-wo-repository:
  stripspace: allow -s/-c outside git repository
2019-01-18 13:49:53 -08:00
Junio C Hamano
4744d03a47 Merge branch 'sb/submodule-fetchjobs-default-to-one'
"git submodule update" ought to use a single job unless asked, but
by mistake used multiple jobs, which has been fixed.

* sb/submodule-fetchjobs-default-to-one:
  submodule update: run at most one fetch job unless otherwise set
2019-01-18 13:49:52 -08:00
Junio C Hamano
9c51ad5853 Merge branch 'la/quiltimport-keep-non-patch'
"git quiltimport" learned "--keep-non-patch" option.

* la/quiltimport-keep-non-patch:
  git-quiltimport: add --keep-non-patch option
2019-01-18 13:49:52 -08:00
Junio C Hamano
3434569fc2 Merge branch 'nd/style-opening-brace'
Code clean-up.

* nd/style-opening-brace:
  style: the opening '{' of a function is in a separate line
2019-01-18 13:49:52 -08:00
Junio C Hamano
e07074d3f0 Merge branch 'ds/gc-doc-typofix'
Typofix.

* ds/gc-doc-typofix:
  git-gc.txt: fix typo about gc.writeCommitGraph
2019-01-18 13:49:52 -08:00
Johannes Schindelin
9e9da23c27 mingw: special-case arguments to sh
The MSYS2 runtime does its best to emulate the command-line wildcard
expansion and de-quoting which would be performed by the calling Unix
shell on Unix systems.

Those Unix shell quoting rules differ from the quoting rules applying to
Windows' cmd and Powershell, making it a little awkward to quote
command-line parameters properly when spawning other processes.

In particular, git.exe passes arguments to subprocesses that are *not*
intended to be interpreted as wildcards, and if they contain
backslashes, those are not to be interpreted as escape characters, e.g.
when passing Windows paths.

Note: this is only a problem when calling MSYS2 executables, not when
calling MINGW executables such as git.exe. However, we do call MSYS2
executables frequently, most notably when setting the use_shell flag in
the child_process structure.

There is no elegant way to determine whether the .exe file to be
executed is an MSYS2 program or a MINGW one. But since the use case of
passing a command line through the shell is so prevalent, we need to
work around this issue at least when executing sh.exe.

Let's introduce an ugly, hard-coded test whether argv[0] is "sh", and
whether it refers to the MSYS2 Bash, to determine whether we need to
quote the arguments differently than usual.

That still does not fix the issue completely, but at least it is
something.

Incidentally, this also fixes the problem where `git clone \\server\repo`
failed due to incorrect handling of the backslashes when handing the path
to the git-upload-pack process.

Further, we need to take care to quote not only whitespace and
backslashes, but also curly brackets. As aliases frequently go through
the MSYS2 Bash, and as aliases frequently get parameters such as
HEAD@{yesterday}, this is really important. As an early version of this
patch broke this, let's make sure that this does not regress by adding a
test case for that.

Helped-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 13:12:14 -08:00
Johannes Schindelin
5440df44c2 mingw (t5580): document bug when cloning from backslashed UNC paths
Due to a quirk in Git's method to spawn git-upload-pack, there is a
problem when passing paths with backslashes in them: Git will force the
command-line through the shell, which has different quoting semantics in
Git for Windows (being an MSYS2 program) than regular Win32 executables
such as git.exe itself.

The symptom is that the first of the two backslashes in UNC paths of the
form \\myserver\folder\repository.git is *stripped off*.

Document this bug by introducing a test case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 13:12:14 -08:00
Jonathan Tan
e2f41a0a5a ls-refs: filter refs using namespace-stripped name
If a user fetches refs/heads/master from a repo with namespace "ns", the
remote is expected to (1) not send the real refs/heads/master, and (2)
send refs/namespaces/ns/refs/heads/master with the name
refs/heads/master. (1) indeed happens now, but not (2) - Git only sends
refs that have the user-given prefix, but it checks them against the
full name of the ref (the one starting with refs/namespaces), and not
the namespace-stripped one.

This is demonstrated by the patch in the test. Currently, it results in
"fatal: couldn't find remote ref refs/heads/master" despite both
unnamespaced and namespaced master being present. With the code change,
it produces the expected result.

Check the ref prefixes against the namespace-stripped name.

This bug was discovered through applying patches [1] that override
protocol.version to 2 in repositories when running tests, allowing us to
notice differences in behavior across different protocol versions.

[1] https://public-inbox.org/git/cover.1547677183.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 12:48:41 -08:00
Linus Torvalds
acdd37769d Add 'human' date format
This adds --date=human, which skips the timezone if it matches the
current time-zone, and doesn't print the whole date if that matches (ie
skip printing year for dates that are "this year", but also skip the
whole date itself if it's in the last few days and we can just say what
weekday it was).

For really recent dates (same day), use the relative date stamp, while
for old dates (year doesn't match), don't bother with time and timezone.

Also add 'auto' date mode, which defaults to human if we're using the
pager.  So you can do

	git config --add log.date auto

and your "git log" commands will show the human-legible format unless
you're scripting things.

Note that this time format still shows the timezone for recent enough
events (but not so recent that they show up as relative dates).  You can
combine it with the "-local" suffix to never show timezones for an even
more simplified view.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stephen P. Smith <ischis2@cox.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 10:31:23 -08:00
Johannes Schindelin
21853626ea built-in rebase: call git am directly
While the scripted `git rebase` still has to rely on the
`git-rebase--am.sh` script to implement the glue between the `rebase`
and the `am` commands, we can go a more direct route in the built-in
rebase and avoid using a shell script altogether.

This patch represents a straight-forward port of `git-rebase--am.sh` to
C, along with the glue code to call it directly from within
`builtin/rebase.c`.

This reduces the chances of Git for Windows running into trouble due to
problems with the POSIX emulation layer (known as "MSYS2 runtime",
itself a derivative of the Cygwin runtime): when no shell script is
called, the POSIX emulation layer is avoided altogether.

Note: we pass an empty action to `reset_head()` here when moving back to
the original branch, as no other action is applicable, really. This
parameter is used to initialize `unpack_trees()`' messages.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 10:11:45 -08:00
Johannes Schindelin
414f336069 rebase: teach reset_head() to optionally skip the worktree
This is what the legacy (scripted) rebase does in
`move_to_original_branch`, and we will need this functionality in the
next commit.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 10:11:45 -08:00
Johannes Schindelin
5b2237a876 rebase: avoid double reflog entry when switching branches
When switching a branch *and* updating said branch to a different
revision, let's avoid a double entry in HEAD's reflog by first updating
the branch and then adjusting the symbolic ref HEAD.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 10:11:45 -08:00
Johannes Schindelin
c5233708c5 rebase: move reset_head() into a better spot
Over the next commits, we want to make use of it in `run_am()` (i.e.
running the `--am` backend directly, without detouring to Unix shell
script code) which in turn will be called from `run_specific_rebase()`.

So let's move it before that latter function.

This commit is best viewed using --color-moved.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 10:11:45 -08:00
Johannes Schindelin
d8727b3687 abspath_part_inside_repo: respect core.ignoreCase
If the file system is case-insensitive, we really must be careful to
ignore differences in case only.

This fixes https://github.com/git-for-windows/git/issues/735

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 09:53:06 -08:00
Luke Diamand
7a10946ab9 git-p4: handle update of moved/copied files when updating a shelve
Perforce requires a complete list of files being operated on. If
git is updating an existing shelved changelist, then any files
which are moved or copied were not being added to this list.

Signed-off-by: Luke Diamand <luke@diamand.org>
Acked-by: Andrey Mazo <amazo@checkvideo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 09:43:40 -08:00
Luke Diamand
7a10bb3a4c git-p4: add failing test for shelved CL update involving move/copy
Updating a shelved P4 changelist where one or more files have
been moved or copied does not work. Add a test for this.

The problem is that P4 requires a complete list of the files being
changed, and move/copy only includes the _source_ in the case of
updating a shelved changelist. This results in errors from Perforce
such as:

  //depot/src - needs tofile //depot/dst
  Submit aborted -- fix problems then use 'p4 submit -c 1234'

Signed-off-by: Luke Diamand <luke@diamand.org>
Acked-by: Andrey Mazo <amazo@checkvideo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-18 09:43:34 -08:00
Derrick Stolee
99dbbfa8dd pack-objects: create GIT_TEST_PACK_SPARSE
Create a test variable GIT_TEST_PACK_SPARSE to enable the sparse
object walk algorithm by default during the test suite. Enabling
this variable ensures coverage in many interesting cases, such as
shallow clones, partial clones, and missing objects.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 13:44:44 -08:00
Derrick Stolee
3d036eb0d2 pack-objects: create pack.useSparse setting
The '--sparse' flag in 'git pack-objects' changes the algorithm
used to enumerate objects to one that is faster for individual
users pushing new objects that change only a small cone of the
working directory. The sparse algorithm is not recommended for a
server, which likely sends new objects that appear across the
entire working directory.

Create a 'pack.useSparse' setting that enables this new algorithm.
This allows 'git push' to use this algorithm without passing a
'--sparse' flag all the way through four levels of run_command()
calls.

If the '--no-sparse' flag is set, then this config setting is
overridden.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 13:44:43 -08:00
Derrick Stolee
d5d2e93577 revision: implement sparse algorithm
When enumerating objects to place in a pack-file during 'git
pack-objects --revs', we discover the "frontier" of commits
that we care about and the boundary with commit we find
uninteresting. From that point, we walk trees to discover which
trees and blobs are uninteresting. Finally, we walk trees from the
interesting commits to find the interesting objects that are
placed in the pack.

This commit introduces a new, "sparse" way to discover the
uninteresting trees. We use the perspective of a single user trying
to push their topic to a large repository. That user likely changed
a very small fraction of the paths in their working directory, but
we spend a lot of time walking all reachable trees.

The way to switch the logic to work in this sparse way is to start
caring about which paths introduce new trees. While it is not
possible to generate a diff between the frontier boundary and all
of the interesting commits, we can simulate that behavior by
inspecting all of the root trees as a whole, then recursing down
to the set of trees at each path.

We already had taken the first step by passing an oidset to
mark_trees_uninteresting_sparse(). We now create a dictionary
whose keys are paths and values are oidsets. We consider the set
of trees that appear at each path. While we inspect a tree, we
add its subtrees to the oidsets corresponding to the tree entry's
path. We also mark trees as UNINTERESTING if the tree we are
parsing is UNINTERESTING.

To actually improve the performance, we need to terminate our
recursion. If the oidset contains only UNINTERESTING trees, then
we do not continue the recursion. This avoids walking trees that
are likely to not be reachable from interesting trees. If the
oidset contains only interesting trees, then we will walk these
trees in the final stage that collects the intersting objects to
place in the pack. Thus, we only recurse if the oidset contains
both interesting and UNINITERESTING trees.

There are a few ways that this is not a universally better option.

First, we can pack extra objects. If someone copies a subtree
from one tree to another, the first tree will appear UNINTERESTING
and we will not recurse to see that the subtree should also be
UNINTERESTING. We will walk the new tree and see the subtree as
a "new" object and add it to the pack. A test is modified to
demonstrate this behavior and to verify that the new logic is
being exercised.

Second, we can have extra memory pressure. If instead of being a
single user pushing a small topic we are a server sending new
objects from across the entire working directory, then we will
gain very little (the recursion will rarely terminate early) but
will spend extra time maintaining the path-oidset dictionaries.

Despite these potential drawbacks, the benefits of the algorithm
are clear. By adding a counter to 'add_children_by_path' and
'mark_tree_contents_uninteresting', I measured the number of
parsed trees for the two algorithms in a variety of repos.

For git.git, I used the following input:

	v2.19.0
	^v2.19.0~10

 Objects to pack: 550
Walked (old alg): 282
Walked (new alg): 130

For the Linux repo, I used the following input:

	v4.18
	^v4.18~10

 Objects to pack:   518
Walked (old alg): 4,836
Walked (new alg):   188

The two repos above are rather "wide and flat" compared to
other repos that I have used in the past. As a comparison,
I tested an old topic branch in the Azure DevOps repo, which
has a much deeper folder structure than the Linux repo.

 Objects to pack:    220
Walked (old alg): 22,804
Walked (new alg):    129

I used the number of walked trees the main metric above because
it is consistent across multiple runs. When I ran my tests, the
performance of the pack-objects command with the same options
could change the end-to-end time by 10x depending on the file
system being warm. However, by repeating the same test on repeat
I could get more consistent timing results. The git.git and
Linux tests were too fast overall (less than 0.5s) to measure
an end-to-end difference. The Azure DevOps case was slow enough
to see the time improve from 15s to 1s in the warm case. The
cold case was 90s to 9s in my testing.

These improvements will have even larger benefits in the super-
large Windows repository. In our experiments, we see the
"Enumerate objects" phase of pack-objects taking 60-80% of the
end-to-end time of non-trivial pushes, taking longer than the
network time to send the pack and the server time to verify the
pack.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 13:44:42 -08:00
Derrick Stolee
4f6d26b167 list-objects: consume sparse tree walk
When creating a pack-file using 'git pack-objects --revs' we provide
a list of interesting and uninteresting commits. For example, a push
operation would make the local topic branch be interesting and the
known remote refs as uninteresting. We want to discover the set of
new objects to send to the server as a thin pack.

We walk these commits until we discover a frontier of commits such
that every commit walk starting at interesting commits ends in a root
commit or unintersting commit. We then need to discover which
non-commit objects are reachable from  uninteresting commits. This
commit walk is not changing during this series.

The mark_edges_uninteresting() method in list-objects.c iterates on
the commit list and does the following:

* If the commit is UNINTERSTING, then mark its root tree and every
  object it can reach as UNINTERESTING.

* If the commit is interesting, then mark the root tree of every
  UNINTERSTING parent (and all objects that tree can reach) as
  UNINTERSTING.

At the very end, we repeat the process on every commit directly
given to the revision walk from stdin. This helps ensure we properly
cover shallow commits that otherwise were not included in the
frontier.

The logic to recursively follow trees is in the
mark_tree_uninteresting() method in revision.c. The algorithm avoids
duplicate work by not recursing into trees that are already marked
UNINTERSTING.

Add a new 'sparse' option to the mark_edges_uninteresting() method
that performs this logic in a slightly different way. As we iterate
over the commits, we add all of the root trees to an oidset. Then,
call mark_trees_uninteresting_sparse() on that oidset. Note that we
include interesting trees in this process. The current implementation
of mark_trees_unintersting_sparse() will walk the same trees as
the old logic, but this will be replaced in a later change.

Add a '--sparse' flag in 'git pack-objects' to call this new logic.
Add a new test script t/t5322-pack-objects-sparse.sh that tests this
option. The tests currently demonstrate that the resulting object
list is the same as the old algorithm. This includes a case where
both algorithms pack an object that is not needed by a remote due to
limits on the explored set of trees. When the sparse algorithm is
changed in a later commit, we will add a test that demonstrates a
change of behavior in some cases.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 13:44:39 -08:00
Derrick Stolee
f1f5de442f revision: add mark_tree_uninteresting_sparse
In preparation for a new algorithm that walks fewer trees when
creating a pack from a set of revisions, create a method that
takes an oidset of tree oids and marks reachable objects as
UNINTERESTING.

The current implementation uses the existing
mark_tree_uninteresting to recursively walk the trees and blobs.
This will walk the same number of trees as the old mechanism. To
ensure that mark_tree_uninteresting walks the tree, we need to
remove the UNINTERESTING flag before calling the method. This
implementation will be replaced entirely in a later commit.

There is one new assumption in this approach: we are also given
the oids of the interesting trees. This implementation does not
use those trees at the moment, but we will use them in a later
rewrite of this method.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 13:44:37 -08:00
Jeff King
9e5da3d055 add: use separate ADD_CACHE_RENORMALIZE flag
Commit 9472935d81 (add: introduce "--renormalize", 2017-11-16) taught
git-add to pass HASH_RENORMALIZE to add_to_index(), which then passes
the flag along to index_path(). However, the flags taken by
add_to_index() and the ones taken by index_path() are distinct
namespaces. We cannot take HASH_* flags in add_to_index(), because they
overlap with the ADD_CACHE_* flags we already take (in this case,
HASH_RENORMALIZE conflicts with ADD_CACHE_IGNORE_ERRORS).

We can solve this by adding a new ADD_CACHE_RENORMALIZE flag, and using
it to set HASH_RENORMALIZE within add_to_index(). In order to make it
clear that these two flags come from distinct sets, let's also change
the name "newflags" in the function to "hash_flags".

Reported-by: Dmitriy Smirnov <dmitriy.smirnov@jetbrains.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 13:40:21 -08:00
Johannes Schindelin
da43c07294 t6042: work around speed optimization on Windows
When Git determines whether a file has changed, it looks at the mtime,
at the file size, and to detect changes even if the mtime is the same
(on Windows, the mtime granularity is 100ns, read: if two files are
written within the same 100ns time slot, they have the same mtime) and
even if the file size is the same, Git also looks at the inode/device
numbers.

This design obviously comes from a Linux background, where `lstat()`
calls were designed to be cheap.

On Windows, there is no `lstat()`. It has to be emulated. And while
obtaining the mtime and the file size is not all that expensive (you can
get both with a single `GetFileAttributesW()` call), obtaining the
equivalent of the inode and device numbers is very expensive (it
requires a call to `GetFileInformationByHandle()`, which in turn
requires a file handle, which is *a lot* more expensive than one might
imagine).

As it is very uncommon for developers to modify files within 100ns time
slots, Git for Windows chooses not to fill inode/device numbers
properly, but simply sets them to 0.

However, in t6042 the files file_v1 and file_v2 are typically written
within the same 100ns time slot, and they do not differ in file size. So
the minor modification is not picked up.

Let's work around this issue by avoiding the `git mv` calls in the
'mod6-setup: chains of rename/rename(1to2) and rename/rename(2to1)' test
case. The target files are overwritten anyway, so it is not like we
really rename those files. This fixes the issue because `git add` will
now add the files as new files (as opposed to existing, just renamed
files).

Functionally, we do not change anything because we replace two `git mv
<old> <new>` calls (where `<new>` is completely overwritten and `git
add`ed later anyway) by `git rm <old>` calls (removing other files, too,
that are also completely overwritten and `git add`ed later).

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 12:57:12 -08:00
Jonathan Tan
07c3c2aa16 tests: define GIT_TEST_SIDEBAND_ALL
Define a GIT_TEST_SIDEBAND_ALL environment variable meant to be used
from tests. When set to true, this overrides uploadpack.allowsidebandall
to true, allowing the entire test suite to be run as if this
configuration is in place for all repositories.

As of this patch, all tests pass whether GIT_TEST_SIDEBAND_ALL is unset
or set to 1.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:25:07 -08:00