Another brown paper bag inconsistency fix for a new feature
introduced during this cycle.
* dl/stash-show-untracked-fixup:
stash show: use stash.showIncludeUntracked even when diff options given
The "simple-ipc" did not compile without pthreads support, but the
build procedure was not properly account for it.
* jh/simple-ipc-sans-pthread:
simple-ipc: correct ifdefs when NO_PTHREADS is defined
The "rev-parse" command did not diagnose the lack of argument to
"--path-format" option, which was introduced in v2.31 era, which
has been corrected.
* wm/rev-parse-path-format-wo-arg:
rev-parse: fix segfault with missing --path-format argument
Despite that tar is available everywhere, it's not required by POSIX.
In our build system, users are allowed to specify which tar to be used
in Makefile knobs. Furthermore, GNU tar (gtar) is prefered when autotools
is being used.
In our testsuite, 7 out of 9 tar-required-tests use "$TAR", the other
two use "tar".
Let's change the remaining two tests to "$TAR".
Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If options pertaining to how the diff is displayed is provided to
`git stash show`, the command will ignore the stash.showIncludeUntracked
configuration variable, defaulting to not showing any untracked files.
This is unintuitive behaviour since the format of the diff output and
whether or not to display untracked files are orthogonal.
Use stash.showIncludeUntracked even when diff options are given. Of
course, this is still overridable via the command-line options.
Update the documentation to explicitly say which configuration variables
will be overridden when a diff options are given.
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix long standing inconsistency between -c/--cc that do imply -p on
one side, and -m that did not imply -p on the other side.
Change corresponding test accordingly, as "log -m" output should now
match one from "log -m -p", rather than from just "log".
Change documentation accordingly.
NOTES:
After this patch
git log -m
produces diffs without need to provide -p as well, that improves both
consistency and usability. It gets even more useful if one sets
"log.diffMerges" configuration variable to "first-parent" to force -m
produce usual diff with respect to first parent only.
This patch, however, does not change behavior when specific diff
format is explicitly provided on the command-line, so that commands
like
git log -m --raw
git log -m --stat
are not affected, nor does it change commands where specific diff
format is active by default, such as:
git diff-tree -m
It's also worth to be noticed that exact historical semantics of -m is
still provided by --diff-merges=separate.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is refactoring change in preparation for the next commit that
will let -m imply -p.
The old name doesn't match the intention to let not only -c/-cc imply
-p, but also -m, that is not a "combined" format, so we rename the
flag accordingly.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Passing "-m" in "git log --first-parent -m" is not needed as
--first-parent implies --diff-merges=first-parent anyway. OTOH, it
will stop being harmless once we let "-m" imply "-p".
While we are at it, fix corresponding test description in t3903-stash
to match what it actually tests.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rev-list doesn't utilize -m. It happens to eat it silently, so this
bug went unnoticed.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move specific handling of "-m" for diff-index to diff-index.c, so
diff-merges is left to handle only diff for merges options.
Being a better design by itself, this is especially essential in
preparation for letting -m imply -p, as "diff-index -m" obviously
should not imply -p, as it's entirely unrelated.
To handle this, in addition to moving specific diff-index "-m" code
out of diff-merges, we introduce new
diff_merges_suppress_options_parsing()
and call it before generic options processing in cmd_diff_index().
This new diff_merges_suppress_options_parsing() could then be reused
and called before invocations of setup_revisions() for other commands
that don't need --diff-merges options, but that's outside of the scope
of these patch series.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
-m in "git diff-index" means "match missing", that differs
from its meaning in "git diff". Let's check it in diff-index.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to ensure we don't affect plumbing commands with our changes
of "-m" semantics, so add corresponding test.
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is to ensure we won't break different diff formats when we start
to imply "-p" by "-m".
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is to ensure we won't break different diff formats when we start
to imply "-p" by "-m".
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is to notice current behavior that we are going to change when we
start to imply "-p" by "-m".
Signed-off-by: Sergey Organov <sorganov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simple IPC always requires threads (in addition to various
platform-specific IPC support). Fix the ifdefs in the Makefile
to define SUPPORTS_SIMPLE_IPC when appropriate.
Previously, the Unix version of the code would only verify that
Unix domain sockets were available.
This problem was reported here:
https://lore.kernel.org/git/YKN5lXs4AoK%2FJFTO@coredump.intra.peff.net/T/#m08be8f1942ea8a2c36cfee0e51cdf06489fdeafc
Reported-by: Randall S. Becker <rsbecker@nexbridge.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Classic string concatenation while forgetting a space character.
Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor tr2_dst_try_uds_connect() to avoid a gcc warning[1] that
appears under -O3 (but not -O2). This makes the build pass under
DEVELOPER=1 without needing a DEVOPTS=no-error.
This can be reproduced with GCC Debian 8.3.0-6, but not e.g. with
clang 7.0.1-8+deb10u2. We've had this warning since
ee4512ed48 (trace2: create new combined trace facility, 2019-02-22).
As noted in [2] this warning happens because the compiler doesn't
assume that errno must be non-zero after a failed syscall.
Let's work around by using the well-established "saved_errno" pattern,
along with returning -1 ourselves instead of "errno". The caller can
thus rely on our "errno" on failure.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61846 for a related
bug report against GCC.
1.
trace2/tr2_dst.c: In function ‘tr2_dst_get_trace_fd.part.5’:
trace2/tr2_dst.c:296:10: warning: ‘fd’ may be used uninitialized in this function [-Wmaybe-uninitialized]
dst->fd = fd;
~~~~~~~~^~~~
trace2/tr2_dst.c:229:6: note: ‘fd’ was declared here
int fd;
^~
2. https://lore.kernel.org/git/20200404142131.GA679473@coredump.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Regression fix for a change made during this cycle.
* cs/http-use-basic-after-failed-negotiate:
Revert "remote-curl: fall back to basic auth if Negotiate fails"
t5551: test http interaction with credential helpers
When there are many renames between the old base of a series of commits
and the new base, the way sequencer.c, merge-recursive.c, and
diffcore-rename.c have traditionally split the work resulted in
redetecting the same renames with each and every commit being
transplanted. To address this, the last several commits have been
creating a cache of rename detection results, determining when it was
safe to use such a cache in subsequent merge operations, adding helper
functions, and so on. See the previous half dozen commit messages for
additional discussion of this optimization, particularly the message a
few commits ago entitled "add code to check for whether cached renames
can be reused". This commit finally ties all of that work together,
modifying the merge algorithm to make use of these cached renames.
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:
Before After
no-renames: 5.665 s ± 0.129 s 5.622 s ± 0.059 s
mega-renames: 11.435 s ± 0.158 s 10.127 s ± 0.073 s
just-one-mega: 494.2 ms ± 6.1 ms 500.3 ms ± 3.8 ms
That's a fairly small improvement, but mostly because the previous
optimizations were so effective for these particular testcases; this
optimization only kicks in when the others don't. If we undid the
basename-guided rename detection and skip-irrelevant-renames
optimizations, then we'd see that this series by itself improved
performance as follows:
Before Basename Series After Just This Series
no-renames: 13.815 s ± 0.062 s 5.697 s ± 0.080 s
mega-renames: 1799.937 s ± 0.493 s 205.709 s ± 0.457 s
Since this optimization kicks in to help accelerate cases where the
previous optimizations do not apply, this last comparison shows that
this cached-renames optimization has the potential to help signficantly
in cases that don't meet the requirements for the other optimizations to
be effective.
The changes made in this optimization also lay some important groundwork
for a future optimization around having collect_merge_info() avoid
recursing into subtrees in more cases.
However, for this optimization to be effective, merge_switch_to_result()
should only be called when the rebase or cherry-pick operation has
either completed or hit a case where the user needs to resolve a
conflict or edit the result. If it is called after every commit, as
sequencer.c does, then the working tree and index are needlessly updated
with every commit and the cached metadata is tossed, defeating this
optimization. Refactoring sequencer.c to only call
merge_switch_to_result() at the end of the operation is a bigger
undertaking, and the practical benefits of this optimization will not be
realized until that work is performed. Since `test-tool fast-rebase`
only updates at the end of the operation, it was used to obtain the
timings above.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As documented in Documentation/technical/remembering-renames.txt, and as
tested for in the two testcases in t6429 with "rename same file
identically" in their description, there is one case where we need to
have renames in one commit NOT be cached for the next commit in our
rebase sequence -- namely, rename/rename(1to1) cases. Rather than
specifically trying to uncache those and fix up dir_rename_counts() to
match (which would also be valid but more work), we simply disable the
optimization when this really rare type of rename occurs.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we have a usable rename cache, then we can remove from
relevant_sources all the paths that were cached;
diffcore_rename_extended() can then consider an even smaller set of
relevant_sources in its rename detection.
However, when diffcore_rename_extended() is done, we will need to take
the renames it detected and then add back in all the ones we had cached
from before.
Add helper functions for doing these two operations; the next commit
will make use of them.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previous commits created an in-memory cache of the results of rename
detection, and added logic to detect when that cache could appropriately
be used in a subsequent merge operation -- but we were still
unconditionally clearing the cache with each new merge operation anyway.
If it is valid to reuse the cache from one of the two sides of history,
preserve that side.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, callers of the merge-ort API could have passed an
uninitialized value for struct merge_result *result. However, we want
to check result to see if it has cached renames from a previous merge
that we can reuse; such values would be found behind result->priv.
However, if result->priv is uninitialized, attempting to access behind
it will give a segfault. So, we need result->priv to be NULL (which
will be the case if the caller does a memset(&result, 0)), or be written
by a previous call to the merge-ort machinery. Documenting this
requirement may help, but despite being the person who introduced this
requirement, I still missed it once and it did not fail in a very clear
way and led to a long debugging session.
Add a _properly_initialized field to merge_result; that value will be
0 if the caller zero'ed the merge_result, it will be set to a very
specific value by a previous run by the merge-ort machinery, and if it's
uninitialized it will most likely either be 0 or some value that does
not match the specific one we'd expect allowing us to throw a much more
meaningful error.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We need to know when renames detected in a previous merge operation can
be reused in a later merge operation. Consider the following setup
(from the git-rebase manpage):
A---B---C topic
/
D---E---F---G master
After rebasing, this will appear as:
A'--B'--C' topic
/
D---E---F---G master
Further, let's say that 'oldfile' was renamed to 'newfile' between E
and G. The rebase or cherry-pick of A onto G will involve a three-way
merge between E (as the merge base) and G and A. After detecting the
rename between E:oldfile and G:newfile, there will be a three-way
content merge of the following:
E:oldfile
G:newfile
A:oldfile
and produce a new result:
A':newfile
Now, when we want to pick B onto A', we will need to do a three-way
merge between A (as the merge-base) and A' and B. This will involve
a three-way content merge of
A:oldfile
A':newfile
B:oldfile
but only if we can detect that A:oldfile is similar enough to A':newfile
to be used together in a three-way content merge, i.e. only if we can
detect that A:oldfile and A':newfile are a rename. But we already know
that A:oldfile and A':newfile are similar enough to be used in a
three-way content merge, because that is precisely where A':newfile came
from in the previous merge.
Note that A & A' both appear in both merges. That gives us the
condition under which we can reuse renames.
There are a couple important points about this optimization:
- If the rebase or cherry-pick halts for user conflicts, these caches
are NOT saved anywhere. Thus, resuming a halted rebase or
cherry-pick will result in no reused renames for the next commit.
This is intentional, as user resolution can change files
significantly and in ways that violate the similarity assumptions
here.
- Technically, in a *very* narrow case this might give slightly
different results for rename detection. Using the example above,
if:
* E:oldfile had 20 lines
* G:newfile added 10 new lines at the beginning of the file
* A:oldfile deleted all but the first three lines of the file
then
=> A':newfile would have 13 lines, 3 of which matches those
in A:oldfile.
Consider the two cases:
* Without this optimization:
- the next step of the rebase operation (moving B to B')
would not detect the rename betwen A:oldfile and A':newfile
- we'd thus get a modify/delete conflict with the rebase
operation halting for the user to resolve, and have both
A':newfile and B:oldfile sitting in the working tree.
* With this optimization:
- the rename between A:oldfile and A':newfile would be detected
via the cache of renames
- a three-way merge between A:oldfile, A':newfile, and B:oldfile
would commence and be written to A':newfile
Now, is the difference in behavior a bug...or a bugfix? I can't
tell. Given that A:oldfile and A':newfile are not very similar,
when we three-way merge with B:oldfile it seems likely we'll hit a
conflict for the user to resolve. And it shouldn't be too hard for
users to see why we did that three-way merge; oldfile and newfile
*were* renames somewhere in the sequence. So, most of these corner
cases will still behave similarly -- namely, a conflict given to the
user to resolve. Also, consider the interesting case when commit B
is a clean revert of commit A. Without this optimization, a rebase
could not both apply a weird patch like A and then immediately
revert it; users would be forced to resolve merge conflicts. With
this optimization, it would successfully apply the clean revert.
So, there is certainly at least one case that behaves better. Even
if it's considered a "difference in behavior", I think both behaviors
are reasonable, and the time savings provided by this optimization
justify using the slightly altered rename heuristics.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fill in cache_pairs, cached_target_names, and cached_irrelevant based on
rename detection results. Future commits will make use of these values.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there are many renames between the old base of a series of commits
and the new base for a series of commits, the sequence of merges
employed to transplant those commits (from a cherry-pick or rebase
operation) will repeatedly detect the exact same renames. This is
wasted effort.
Add data structures which will be used to cache rename detection
results, along with the initialization and deallocation of these data
structures. Future commits will populate these caches, detect the
appropriate circumstances when they can be used, and employ them to
avoid re-detecting the same renames repeatedly.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will soon be adding an optimization that caches (in memory only,
never written to disk) upstream renames during a sequence of merges such
as occurs during a cherry-pick or rebase operation. Add several tests
meant to stress such an implementation to ensure it does the right
thing, and include a test whose outcome we will later change due to this
optimization as well.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, when fast-rebase hit a conflict, it simply aborted and left
HEAD, the index, and the working tree where they were before the
operation started. While fast-rebase does not support restarting from a
conflicted state, write the conflicted state out anyway as it gives us a
way to see what the conflicts are and write tests that check for them.
This will be important in the upcoming commits, because sequencer.c is
only superficially integrated with merge-ort.c; in particular, it calls
merge_switch_to_result() after EACH merge instead of only calling it at
the end of all the sequence of merges (or when a conflict is hit). This
not only causes needless updates to the working copy and index, but also
causes all intermediate data to be freed and tossed, preventing caching
information from one merge to the next. However, integrating
sequencer.c more deeply with merge-ort.c is a big task, and making this
small extension to fast-rebase.c provides us with a simple way to test
the edge and corner cases that we want to make sure continue working.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
assert() can succinctly document expectations for the code, and do so in
a way that may be useful to future folks trying to refactor the code and
change basic assumptions; it allows them to more quickly find some
places where their violations of previous assumptions trips things up.
Unfortunately, assert() can surround a function call with important
side-effects, which is a huge mistake since some users will compile with
assertions disabled. I've had to debug such mistakes before in other
codebases, so I should know better. Luckily, this was only in test
code, but it's still very embarrassing. Change an assert() to an if
(...) BUG (...).
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remembering renames on the upstream side of history in an early merge of
a rebase or cherry-pick for re-use in a latter merge of the same
operation makes pretty good intuitive sense. However, trying to show
that it doesn't cause some subtle behavioral difference or some funny
edge or corner case is much more involved. And, in fact, it does
introduce a subtle behavioral change.
Document all the assumptions, special cases, and logic involved in such
an optimization, and describe why this optimization is safe under the
current optimizations/features/etc. -- even when the subtle behavioral
change is triggered.
Part of the point of adding this document that goes over the
optimization in such laborious detail, is that it is possible that
significant future changes (optimizations or feature changes) could
interact with this optimization in interesting ways; this document is
here to help folks making big changes sanity check that the assumptions
and arguments underlying this optimization are still valid. (As a side
note, creating this document forced me to review things in sufficient
detail that I found I was not properly caching directory-rename-induced
renames, resulting in the code not being aware of those renames and
causing unnecessary diffcore_rename_extended() calls in subsequent
merges.)
A subsequent commit will add several testcases based on this document
meant to stress-test the implementation and also document the case with
the subtle behavioral change.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current documentation for color.pager is technically correct, but
slightly misleading and doesn't really clarify the purpose of the
variable. As explained in the original thread which added it:
https://lore.kernel.org/git/E1G6zPH-00062L-Je@moooo.ath.cx/
the point is to deal with pagers that don't understand colors. And hence it
being set to "true" is necessary for colorizing output to the pager, but
not sufficient by itself (you must also have enabled one of the other
color options, though note that these are set to "auto" by default these
days).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Find typos in "po/TEAMS" file using the "git-po-helper" program. These
typos were introduced from commit v2.24.0-1-g9917eca794 (l10n: zh_TW:
add translation for v2.24.0, 2019-11-20 19:14:22 +0800).
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
The "chainlint" feature in the test framework is a handy way to
catch common mistakes in writing new tests, but tends to get
expensive. An knob to selectively disable it has been introduced
to help running tests that the developer has not modified.
* jk/test-chainlint-softer:
t: avoid sed-based chain-linting in some expensive cases
The handling of "%(push)" formatting element of "for-each-ref" and
friends was broken when the same codepath started handling
"%(push:<what>)", which has been corrected.
* zh/ref-filter-push-remote-fix:
ref-filter: fix read invalid union member bug
"git clone" from SHA256 repository by Git built with SHA-1 as the
default hash algorithm over the dumb HTTP protocol did not
correctly set up the resulting repository, which has been corrected.
* ew/sha256-clone-remote-curl-fix:
remote-curl: fix clone on sha256 repos
"git clean" and "git ls-files -i" had confusion around working on
or showing ignored paths inside an ignored directory, which has
been corrected.
* en/dir-traversal:
dir: introduce readdir_skip_dot_and_dotdot() helper
dir: update stale description of treat_directory()
dir: traverse into untracked directories if they may have ignored subfiles
dir: avoid unnecessary traversal into ignored directory
t3001, t7300: add testcase showcasing missed directory traversal
t7300: add testcase showing unnecessary traversal into ignored directory
ls-files: error out on -i unless -o or -c are specified
dir: report number of visited directories and paths with trace2
dir: convert trace calls to trace2 equivalents
Use -1 as error return value throughout.
This removes spurious differences in the GIT_TRACE_REFS output, depending on the
ref storage backend active.
Before, the cached ref_iterator (but only that iterator!) would return
peel_object() output directly. No callers relied on the peel_status values
beyond success/failure. All calls to these functions go through
peel_iterated_oid(), which returns peel_object() as a fallback, but also
squashing the error values.
The iteration interface already passes REF_ISSYMREF and REF_ISBROKEN through the
flags argument, so the additional error values in enum peel_status provide no
value.
The ref iteration interface provides a separate peel() function because certain
formats (eg. packed-refs and reftable) can store the peeled object next to the
tag SHA1. Passing the peeled SHA1 as an optional argument to each_ref_fn maps
more naturally to the implementation of ref databases. Changing the code in this
way is left for a future refactoring.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching with the v0 protocol over ssh (or a local upload-pack with
pipes), the server closes the connection as soon as it is finished
sending the pack. So even though the client may still be operating on
the data via index-pack (e.g., resolving deltas, checking connectivity,
etc), the server has released all resources.
With the v2 protocol, however, the server considers the ssh session only
as a transport, with individual requests coming over it. After sending
the pack, it goes back to its main loop, waiting for another request to
come from the client. As a result, the ssh session hangs around until
the client process ends, which may be much later (because resolving
deltas, etc, may consume a lot of CPU).
This is bad for two reasons:
- it's consuming resources on the server to leave open a connection
that won't see any more use
- if something bad happens to the ssh connection in the meantime (say,
it gets killed by the network because it's idle, as happened in a
real-world report), then ssh will exit non-zero, and we'll propagate
the error up the stack.
The server is correct here not to hang up after serving the pack. The v2
protocol's design is meant to allow multiple requests like this, and
hanging up would be the wrong thing for a hypothetical client which was
planning to make more requests (though in practice, the git.git client
never would, and I doubt any other implementations would either).
The right thing is instead for the client to signal to the server that
it's not interested in making more requests. We can do that by closing
the pipe descriptor we use to write to ssh. This will propagate to the
server upload-pack as an EOF when it tries to read the next request (and
then it will close its half, and the whole connection will go away).
It's important to do this "half duplex" shutdown, because we have to do
it _before_ we actually receive the pack. This is an artifact of the way
fetch-pack and index-pack (or unpack-objects) interact. We hand the
connection off to index-pack (really, a sideband demuxer which feeds
it), and then wait until it returns. And it doesn't do that until it has
resolved all of the deltas in the pack, even though it was done reading
from the server long before.
So just closing the connection fully after index-pack returns would be
too late; we'd have held it open much longer than was necessary. And
teaching index-pack to close the connection is awkward. It's not even
seeing the whole conversation (the sideband demuxer is, but it doesn't
actually know what's in the packets, or when the end comes).
Note that this close() is happening deep within the transport code. It's
possible that a caller would want to perform other operations over the
same ssh transport after receiving the pack. But as of the current code,
none of the callers do, and there haven't been discussions of any plans
to change this. If we need to support that later, we can probably do so
by passing down a flag for "you're the last request on the transport;
it's OK to close" instead of the code just assuming that's true.
The description above all discusses v2 ssh, so it's worth thinking about
how this interacts with other protocols:
- in v0 protocols, we could do the same half-duplex shutdown (it just
goes into the v0 do_fetch_pack() instead). This does work, but since
it doesn't have the same persistence problem in the first place,
there's little reason to change it at this point.
- local fetches against git-upload-pack on the same machine will
behave the same as ssh (they are talking over two pipes, and see EOF
on their input pipe)
- fetches against git-daemon will run this same code, and close one of
the descriptors. In practice, this won't do anything, since there
our two descriptors are dups of each other, and not part of a
half-duplex pair. The right thing would probably be to call
shutdown(SHUT_WR) on it. I didn't bother with that here. It doesn't
face the same error-code problem (since it's just a TCP connection),
so it's really only an optimization problem. And git:// is not that
widely used these days, and has less impact on server resources than
an ssh termination.
- v2 http doesn't suffer from this problem in the first place, as our
pipes terminate at a local git-remote-https, which is passing data
along as individual requests via curl. Probably curl is keeping the
TCP/TLS connection open for more requests, and we might be able to
tell it manually "hey, we are done making requests now". But I think
that's much less important. It again doesn't suffer from the
error-code problem, and HTTP keepalive is pretty well understood
(importantly, the timeouts can be set low, because clients like curl
know how to reconnect for subsequent requests if necessary). So it's
probably not worth figuring out how to tell curl that we're done
(though if we do, this patch is probably the first step anyway;
fetch-pack closes the pipe back to remote-https, which would be the
signal that it should tell curl we're done).
The code is pretty straightforward. We close the pipe at the right
moment, and set it to -1 to mark it as invalid. I modified the later
cleanup code to avoid calling close(-1). That's not strictly necessary,
since close(-1) is a noop, but hopefully makes things a bit more obvious
to a reader.
I suspect that trying to call more transport functions after the close()
(e.g., calling transport_fetch_refs() again) would fail, as it's not
smart enough to realize we need to re-open the ssh connection. But
that's already true when v0 is in use. And no current callers want to do
that (and again, the solution is probably a flag in the transport code
to keep things open, which can be added later).
There's no test here, as the situation it covers is inherently racy (the
question is when upload-pack exits, compared to when index-pack finishes
resolving deltas and exits). The rather gross shell snippet below does
recreate the problematic situation; when run on a sufficiently-large
repository (git.git works fine), it kills an "idle" upload-pack while
the client is resolving deltas, leading to a failed clone.
(
git clone --no-local --progress . foo.git 2>&1
echo >&2 "clone exit code=$?"
) |
tr '\r' '\n' |
while read line
do
case "$done,$line" in
,Resolving*)
echo "hit resolving deltas; killing upload-pack"
killall -9 git-upload-pack
done=t
;;
esac
done
Reported-by: Greg Pflaum <greg.pflaum@pnp-hcl.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>