Commit Graph

399 Commits

Author SHA1 Message Date
Junio C Hamano
676278f8ea Merge branch 'bc/object-id-part17'
Preparation for SHA-256 upgrade continues.

* bc/object-id-part17: (26 commits)
  midx: switch to using the_hash_algo
  builtin/show-index: replace sha1_to_hex
  rerere: replace sha1_to_hex
  builtin/receive-pack: replace sha1_to_hex
  builtin/index-pack: replace sha1_to_hex
  packfile: replace sha1_to_hex
  wt-status: convert struct wt_status to object_id
  cache: remove null_sha1
  builtin/worktree: switch null_sha1 to null_oid
  builtin/repack: write object IDs of the proper length
  pack-write: use hash_to_hex when writing checksums
  sequencer: convert to use the_hash_algo
  bisect: switch to using the_hash_algo
  sha1-lookup: switch hard-coded constants to the_hash_algo
  config: use the_hash_algo in abbrev comparison
  combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
  bundle: switch to use the_hash_algo
  connected: switch GIT_SHA1_HEXSZ to the_hash_algo
  show-index: switch hard-coded constants to the_hash_algo
  blame: remove needless comparison with GIT_SHA1_HEXSZ
  ...
2019-10-11 14:24:46 +09:00
Josh Steadmon
5fc31180d8 fetch: add trace2 instrumentation
Add trace2 regions to fetch-pack.c and builtins/fetch.c to better track
time spent in the various phases of a fetch:

* listing refs
* negotiation for protocol versions v0-v2
* fetching refs
* consuming refs

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-03 10:13:17 +09:00
Junio C Hamano
627b826834 Merge branch 'md/list-objects-filter-combo'
The list-objects-filter API (used to create a sparse/lazy clone)
learned to take a combined filter specification.

* md/list-objects-filter-combo:
  list-objects-filter-options: make parser void
  list-objects-filter-options: clean up use of ALLOC_GROW
  list-objects-filter-options: allow mult. --filter
  strbuf: give URL-encoding API a char predicate fn
  list-objects-filter-options: make filter_spec a string_list
  list-objects-filter-options: move error check up
  list-objects-filter: implement composite filters
  list-objects-filter-options: always supply *errbuf
  list-objects-filter: put omits set in filter struct
  list-objects-filter: encapsulate filter components
2019-09-18 11:50:09 -07:00
brian m. carlson
f6af19a9ad fetch-pack: use parse_oid_hex
Instead of hard-coding constants, use parse_oid_hex to compute a pointer
and use it in further parsing operations.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 15:04:57 -07:00
Derrick Stolee
aaf633c2ad repo-settings: create feature.experimental setting
The 'feature.experimental' setting includes config options that are
not committed to become defaults, but could use additional testing.

Update the following config settings to take new defaults, and to
use the repo_settings struct if not already using them:

* 'pack.useSparse=true'

* 'fetch.negotiationAlgorithm=skipping'

In the case of fetch.negotiationAlgorithm, the existing logic
would load the config option only when about to use the setting,
so had a die() statement on an unknown string value. This is
removed as now the config is parsed under prepare_repo_settings().
In general, this die() is probably misplaced and not valuable.
A test was removed that checked this die() statement executed.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-13 13:33:55 -07:00
Junio C Hamano
b49d337bfb Merge branch 'nd/fetch-capability-tweak'
Protocol capabilities that go over wire should never be translated,
but it was incorrectly marked for translation, which has been
corrected.  The output of protocol capabilities for debugging has
been tweaked a bit.

* nd/fetch-capability-tweak:
  fetch-pack: print server version at the top in -v -v
  fetch-pack: print all relevant supported capabilities with -v -v
  fetch-pack: move capability names out of i18n strings
2019-07-09 15:25:43 -07:00
Matthew DeVore
cf9ceb5a12 list-objects-filter-options: make filter_spec a string_list
Make the filter_spec string a string_list rather than a raw C string.
The list of strings must be concatted together to make a complete
filter_spec. A future patch will use this capability to build "combine:"
filter specs gradually.

A strbuf would seem to be a more natural choice for this object, but it
unfortunately requires initialization besides just zero'ing out the
memory.  This results in all container structs, and all containers of
those structs, etc., to also require initialization. Initializing them
all would be more cumbersome that simply using a string_list, which
behaves properly when its contents are zero'd.

For the purposes of code simplification, change behavior in how filter
specs are conveyed over the protocol: do not normalize the tree:<depth>
filter specs since there should be no server in existence that supports
tree:# but not tree:#k etc.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Nguyễn Thái Ngọc Duy
0e04297101 fetch-pack: print server version at the top in -v -v
Before the previous patch, the server version is printed after all the
"Server supports" lines. The previous one puts the version in the middle
of "Server supports" group.

Instead of moving it to the bottom, I move it to the top. Version may
stand out more at the top as we will have even more debug out after
capabilities.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 14:01:00 -07:00
Nguyễn Thái Ngọc Duy
5a88583b0b fetch-pack: print all relevant supported capabilities with -v -v
When we check if some capability is supported, we do print something in
verbose mode. Some capabilities are not printed though (and it made me
think it's not supported; I was more used to GIT_TRACE_PACKET) so let's
print them all.

It's a bit more code. And one could argue for printing all supported
capabilities the server sends us. But I think it's still valuable this
way because we see the capabilities that the client cares about.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 14:00:59 -07:00
Nguyễn Thái Ngọc Duy
0778b2931c fetch-pack: move capability names out of i18n strings
This reduces the work on translators since they only have one string to
translate (and I think it's still enough context to translate). It also
makes sure no capability name is translated by accident.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 14:00:59 -07:00
Jeff King
d0229abd93 object: convert lookup_object() to use object_id
There are no callers left of lookup_object() that aren't just passing us
the "hash" member of a "struct object_id". Let's take the whole struct,
which gets us closer to removing all raw sha1 variables.  It also
matches the existing conversions of lookup_blob(), etc.

The conversions of callers were done by hand, but they're all mechanical
one-liners.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 10:18:09 -07:00
Junio C Hamano
0e5387cd02 Merge branch 'jt/clone-server-option'
A brown-paper-bag bugfix to a change already in 'master'.

* jt/clone-server-option:
  fetch-pack: send server options after command
2019-05-30 10:50:46 -07:00
Jonathan Tan
5b204b7df3 fetch-pack: send server options after command
Currently, if any server options are specified during a protocol v2
fetch, server options will be sent before "command=fetch". Write server
options to the request buffer in send_fetch_request() so that the
components of the request are sent in the correct order.

The protocol documentation states that the command must come first. The
Git server implementation in serve.c (see process_request() in that
file) tolerates any order of command and capability, which is perhaps
why we haven't noticed this. This was noticed when testing against a
JGit server implementation, which follows the documentation in this
regard.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Acked-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-28 11:01:07 -07:00
Junio C Hamano
57a6b93236 Merge branch 'jk/fetch-reachability-error-fix'
Code clean-up and a fix for "git fetch" by an explicit object name
(as opposed to fetching refs by name).

* jk/fetch-reachability-error-fix:
  fetch: do not consider peeled tags as advertised tips
  remote.c: make singular free_ref() public
  fetch: use free_refs()
  pkt-line: prepare buffer before handling ERR packets
  upload-pack: send ERR packet for non-tip objects
  t5530: check protocol response for "not our ref"
  t5516: drop ok=sigpipe from unreachable-want tests
2019-04-25 16:41:23 +09:00
Junio C Hamano
732ce7aaca Merge branch 'jt/fetch-no-update-shallow-in-proto-v2'
Fix for protocol v2 support in "git fetch-pack" of shallow clones.

* jt/fetch-no-update-shallow-in-proto-v2:
  fetch-pack: respect --no-update-shallow in v2
  fetch-pack: call prepare_shallow_info only if v0
2019-04-25 16:41:16 +09:00
Junio C Hamano
abd7ccdd4d Merge branch 'jt/fetch-pack-wanted-refs-optim'
Performance fix around "git fetch" that grabs many refs.

* jt/fetch-pack-wanted-refs-optim:
  fetch-pack: binary search when storing wanted-refs
2019-04-25 16:41:16 +09:00
Jeff King
34066f0661 fetch: do not consider peeled tags as advertised tips
Our filter_refs() function accidentally considers the target of a peeled
tag to be advertised by the server, even though upload-pack on the
server side does not consider it so. This can result in the client
making a bogus fetch to the server, which will end with the server
complaining "not our ref". Whereas the correct behavior is for the
client to notice that the server will not allow the request and error
out immediately.

So as bugs go, this is not very serious (the outcome is the same either
way -- the fetch fails). But it's worth making the logic here correct
and consistent with other related cases (e.g., fetching an oid that the
server did not mention at all).

The crux of the issue comes from fdb69d33c4 (fetch-pack: always allow
fetching of literal SHA1s, 2017-05-15). After that, the strategy of
filter_refs() is basically:

  - for each advertised ref, try to match it with a "sought" ref
    provided by the user. Skip any malformed refs (which includes
    peeled values like "refs/tags/foo^{}"), and place any unmatched
    items onto the unmatched list.

  - if there are unmatched sought refs, then put all of the advertised
    tips into an oidset, including the unmatched ones.

  - for each sought ref, see if it's in the oidset, in which case it's
    legal for us to ask the server for it

The problem is in the second step. Our list of unmatched refs includes
the peeled refs, even though upload-pack does not allow them to be
directly fetched. So the simplest fix would be to exclude them during
that step.

However, we can observe that the unmatched list isn't used for anything
else, and is freed at the end. We can just free those malformed refs
immediately. That saves us having to check each ref a second time to see
if it's malformed.

Note that this code only kicks in when "strict" is in effect. I.e., if
we are using the v0 protocol and uploadpack.allowReachableSHA1InWant is
not in effect. With v2, all oids are allowed, and we do not bother
creating or consulting the oidset at all. To future-proof our test
against the upcoming GIT_TEST_PROTOCOL_VERSION flag, we'll manually mark
it as a v0-only test.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-15 14:00:52 +09:00
Jeff King
259eddde6a fetch: use free_refs()
There's no need for us to write this loop manually when a helper
function can already do it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-15 14:00:51 +09:00
Jonathan Tan
b764300912 fetch-pack: binary search when storing wanted-refs
In do_fetch_pack_v2(), the "sought" array is sorted by name, and it is
not subsequently reordered (within the function). Therefore,
receive_wanted_refs() can assume that "sought" is sorted, and can thus
use a binary search when storing wanted-refs retrieved from the server.

Replace the existing linear search with a binary search. This improves
performance significantly when mirror cloning a repository with more
than 1 million refs.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 15:51:05 +09:00
Jonathan Tan
1339078f5e fetch-pack: respect --no-update-shallow in v2
In protocol v0, when sending "shallow" lines, the server distinguishes
between lines caused by the remote repo being shallow and lines caused
by client-specified depth settings. Unless "--update-shallow" is
specified, there is a difference in behavior: refs that reach the former
"shallow" lines, but not the latter, are rejected. But in v2, the server
does not, and the client treats all "shallow" lines like lines caused by
client-specified depth settings.

Full restoration of v0 functionality is not possible without protocol
change, but we can implement a heuristic: if we specify any depth
setting, treat all "shallow" lines like lines caused by client-specified
depth settings (that is, unaffected by "--no-update-shallow"), but
otherwise, treat them like lines caused by the remote repo being shallow
(that is, affected by "--no-update-shallow"). This restores most of v0
behavior, except in the case where a client fetches from a shallow
repository with depth settings.

This patch causes a test that previously failed with
GIT_TEST_PROTOCOL_VERSION=2 to pass.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 15:35:56 +09:00
Jonathan Tan
1e7d440b0a fetch-pack: call prepare_shallow_info only if v0
In fetch_pack(), be clearer that there is no shallow information before
the fetch when v2 is used - memset the struct shallow_info to 0 instead
of calling prepare_shallow_info().

This patch is in preparation for a future patch in which a v2 fetch
might call prepare_shallow_info() after shallow info has been retrieved
during the fetch, so I needed to ensure that prepare_shallow_info() is
not called before the fetch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 15:35:54 +09:00
Jeff King
0f804b0bac fetch_pack(): drop unused parameters
We don't need the caller of fetch_pack() to pass in "dest", which is the
remote URL. Since ba227857d2 (Reduce the number of connects when
fetching, 2008-02-04), the caller is responsible for calling
git_connect() itself, and our "dest" parameter is unused.

That commit also started passing us the resulting "conn" child_process
from git_connect(). But likewise, we do not need do anything with it.
The descriptors in "fd" are enough for us, and the caller is responsible
for cleaning up "conn".

We can just drop both parameters.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-20 18:34:09 +09:00
Junio C Hamano
27cdbdd134 Merge branch 'jk/no-sigpipe-during-network-transport'
On platforms where "git fetch" is killed with SIGPIPE (e.g. OSX),
the upload-pack that runs on the other end that hangs up after
detecting an error could cause "git fetch" to die with a signal,
which led to a flakey test.  "git fetch" now ignores SIGPIPE during
the network portion of its operation (this is not a problem as we
check the return status from our write(2)s).

* jk/no-sigpipe-during-network-transport:
  fetch: ignore SIGPIPE during network operation
  fetch: avoid calling write_or_die()
2019-03-20 15:16:06 +09:00
Jeff King
37c80012f1 fetch: avoid calling write_or_die()
The write_or_die() function has one quirk that a caller might not
expect: when it sees EPIPE from the write() call, it translates that
into a death by SIGPIPE. This doesn't change the overall behavior (the
program exits either way), but it does potentially confuse test scripts
looking for a non-signal exit code.

Let's switch away from using write_or_die() in a few code paths, which
will give us more consistent exit codes. It also gives us the
opportunity to write more descriptive error messages, since we have
context that write_or_die() does not.

Note that this won't do much by itself, since we'd typically be killed
by SIGPIPE before write_or_die() even gets a chance to do its thing.
That will be addressed in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-05 15:02:01 +09:00
Junio C Hamano
8f0f46539a Merge branch 'bc/fetch-pack-clear-alternate-shallow'
"git fetch" over protocol v2 that needs to make a second connection
to backfill tags did not clear a variable that holds shallow
repository information correctly, leading to an access of freed
piece of memory.

* bc/fetch-pack-clear-alternate-shallow:
  fetch-pack: clear alternate shallow in one more place
  fetch-pack: clear alternate shallow when complete
2019-02-06 22:05:30 -08:00
brian m. carlson
380ebab209 fetch-pack: clear alternate shallow in one more place
The previous one did not clear the variable in one codepath,
but we should aim to be complete.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
[jc: made a reroll into incremental, as the previous one already is
 in the next branch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-06 18:50:49 -08:00
Junio C Hamano
5f8b86db94 Merge branch 'jt/fetch-v2-sideband'
"git fetch" and "git upload-pack" learned to send all exchange over
the sideband channel while talking the v2 protocol.

* jt/fetch-v2-sideband:
  tests: define GIT_TEST_SIDEBAND_ALL
  {fetch,upload}-pack: sideband v2 fetch response
  sideband: reverse its dependency on pkt-line
  pkt-line: introduce struct packet_writer
  pack-protocol.txt: accept error packets in any context
  Use packet_reader instead of packet_read_line
2019-02-05 14:26:11 -08:00
Junio C Hamano
073312b4c7 Merge branch 'js/filter-options-should-use-plain-int'
Update the protocol message specification to allow only the limited
use of scaled quantities.  This is ensure potential compatibility
issues will not go out of hand.

* js/filter-options-should-use-plain-int:
  filter-options: expand scaled numbers
  tree:<depth>: skip some trees even when collecting omits
  list-objects-filter: teach tree:# how to handle >0
2019-02-05 14:26:10 -08:00
brian m. carlson
23311f3542 fetch-pack: clear alternate shallow when complete
When we write an alternate shallow file in update_shallow, we write it
into the lock file. The string stored in alternate_shallow_file is
copied from the lock file path, but it is freed the moment that the lock
file is closed, since we call strbuf_release to free that path.

This used to work, since we did not invoke git index-pack more than
once, but now that we do, we reuse the freed memory. Ensure we reset the
value to NULL to avoid using freed memory. git index-pack will read the
repository's shallow file, which will have been updated with the correct
information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-04 13:33:32 -08:00
Jonathan Tan
07c3c2aa16 tests: define GIT_TEST_SIDEBAND_ALL
Define a GIT_TEST_SIDEBAND_ALL environment variable meant to be used
from tests. When set to true, this overrides uploadpack.allowsidebandall
to true, allowing the entire test suite to be run as if this
configuration is in place for all repositories.

As of this patch, all tests pass whether GIT_TEST_SIDEBAND_ALL is unset
or set to 1.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:25:07 -08:00
Jonathan Tan
0bbc0bc574 {fetch,upload}-pack: sideband v2 fetch response
Currently, a response to a fetch request has sideband support only while
the packfile is being sent, meaning that the server cannot send notices
until the start of the packfile.

Extend sideband support in protocol v2 fetch responses to the whole
response. upload-pack will advertise it if the
uploadpack.allowsidebandall configuration variable is set, and
fetch-pack will automatically request it if advertised.

If the sideband is to be used throughout the whole response, upload-pack
will use it to send errors instead of prefixing a PKT-LINE payload with
"ERR ".

This will be tested in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-17 11:25:07 -08:00
Josh Steadmon
87c2d9d310 filter-options: expand scaled numbers
When communicating with a remote server or a subprocess, use
expanded numbers rather than numbers with scaling suffix in the
object filter spec (e.g.  "limit:blob=1k" becomes
"limit:blob=1024").

Update the protocol docs to note that clients should always perform this
expansion, to allow for more compatibility between server
implementations.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 15:42:31 -08:00
Junio C Hamano
17069c7fae Merge branch 'ms/packet-err-check' into jt/fetch-v2-sideband
* ms/packet-err-check:
  pack-protocol.txt: accept error packets in any context
  Use packet_reader instead of packet_read_line
2019-01-14 11:16:04 -08:00
Jonathan Tan
5056cf4a62 upload-pack: teach deepen-relative in protocol v2
Commit 685fbd3291 ("fetch-pack: perform a fetch using v2", 2018-03-15)
attempted to teach Git deepen-relative in protocol v2 (among other
things), but it didn't work:

 (1) fetch-pack.c needs to emit "deepen-relative".
 (2) upload-pack.c needs to ensure that the correct deepen_relative
     variable is passed to deepen() (there are two - the static variable
     and the one in struct upload_pack_data).
 (3) Before deepen() computes the list of reachable shallows, it first
     needs to mark all "our" refs as OUR_REF. v2 currently does not do
     this, because unlike v0, it is not needed otherwise.

Fix all this and include a test demonstrating that it works now. For
(2), the static variable deepen_relative is also eliminated, removing a
source of confusion.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10 14:53:49 -08:00
Jonathan Tan
bd0b42aed3 fetch-pack: do not take shallow lock unnecessarily
When fetching using protocol v2, the remote may send a "shallow-info"
section if the client is shallow. If so, Git as the client currently
takes the shallow file lock, even if the "shallow-info" section is
empty.

This is not a problem except that Git does not support taking the
shallow file lock after modifying the shallow file, because
is_repository_shallow() stores information that is never cleared. And
this take-after-modify occurs when Git does a tag-following fetch from a
shallow repository on a transport that does not support tag following
(since in this case, 2 fetches are performed).

To solve this issue, take the shallow file lock (and perform all other
shallow processing) only if the "shallow-info" section is non-empty;
otherwise, behave as if it were empty.

A full solution (probably, ensuring that any action of committing
shallow file locks also includes clearing the information stored by
is_repository_shallow()) would solve the issue without need for this
patch, but this patch is independently useful (as an optimization to
prevent writing a file in an unnecessary case), hence why I wrote it. I
have included a NEEDSWORK outlining the full solution.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-10 14:53:35 -08:00
Junio C Hamano
3b2f8a02fa Merge branch 'jk/loose-object-cache'
Code clean-up with optimization for the codepath that checks
(non-)existence of loose objects.

* jk/loose-object-cache:
  odb_load_loose_cache: fix strbuf leak
  fetch-pack: drop custom loose object cache
  sha1-file: use loose object cache for quick existence check
  object-store: provide helpers for loose_objects_cache
  sha1-file: use an object_directory for the main object dir
  handle alternates paths the same as the main object dir
  sha1_file_name(): overwrite buffer instead of appending
  rename "alternate_object_database" to "object_directory"
  submodule--helper: prefer strip_suffix() to ends_with()
  fsck: do not reuse child_process structs
2019-01-04 13:33:32 -08:00
Masaya Suzuki
2d103c31c2 pack-protocol.txt: accept error packets in any context
In the Git pack protocol definition, an error packet may appear only in
a certain context. However, servers can face a runtime error (e.g. I/O
error) at an arbitrary timing. This patch changes the protocol to allow
an error packet to be sent instead of any packet.

Without this protocol spec change, when a server cannot process a
request, there's no way to tell that to a client. Since the server
cannot produce a valid response, it would be forced to cut a connection
without telling why. With this protocol spec change, the server can be
more gentle in this situation. An old client may see these error packets
as an unexpected packet, but this is not worse than having an unexpected
EOF.

Following this protocol spec change, the error packet handling code is
moved to pkt-line.c. Implementation wise, this implementation uses
pkt-line to communicate with a subprocess. Since this is not a part of
Git protocol, it's possible that a packet that is not supposed to be an
error packet is mistakenly parsed as an error packet. This error packet
handling is enabled only for the Git pack protocol parsing code
considering this.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:30 -08:00
Masaya Suzuki
01f9ec64c8 Use packet_reader instead of packet_read_line
By using and sharing a packet_reader while handling a Git pack protocol
request, the same reader option is used throughout the code. This makes
it easy to set a reader option to the request parsing code.

Signed-off-by: Masaya Suzuki <masayasuzuki@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-02 13:05:27 -08:00
Jeff King
97b2fa08b6 fetch-pack: drop custom loose object cache
Commit 024aa4696c (fetch-pack.c: use oidset to check existence of loose
object, 2018-03-14) added a cache to avoid calling stat() for a bunch of
loose objects we don't have.

Now that OBJECT_INFO_QUICK handles this caching itself, we can drop the
custom solution.

Note that this might perform slightly differently, as the original code
stopped calling readdir() when we saw more loose objects than there were
refs. So:

  1. The old code might have spent work on readdir() to fill the cache,
     but then decided there were too many loose objects, wasting that
     effort.

  2. The new code might spend a lot of time on readdir() if you have a
     lot of loose objects, even though there are very few objects to
     ask about.

In practice it probably won't matter either way; see the previous commit
for some discussion of the tradeoff.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-13 14:22:03 +09:00
Jonathan Tan
5400b2a2d9 fetch-pack: be more precise in parsing v2 response
Each section in a protocol v2 response is followed by either a DELIM
packet (indicating more sections to follow) or a FLUSH packet
(indicating none to follow). But when parsing the "acknowledgments"
section, do_fetch_pack_v2() is liberal in accepting both, but determines
whether to continue reading or not based solely on the contents of the
"acknowledgments" section, not on whether DELIM or FLUSH was read.

There is no issue with a protocol-compliant server, but can result in
confusing error messages when communicating with a server that
serves unexpected additional sections. Consider a server that sends
"new-section" after "acknowledgments":

 - client writes request
 - client reads the "acknowledgments" section which contains no "ready",
   then DELIM
 - since there was no "ready", client needs to continue negotiation, and
   writes request
 - client reads "new-section", and reports to the end user "expected
   'acknowledgments', received 'new-section'"

For the person debugging the involved Git implementation(s), the error
message is confusing in that "new-section" was not received in response
to the latest request, but to the first one.

One solution is to always continue reading after DELIM, but in this
case, we can do better. We know from the protocol that "ready" means at
least the packfile section is coming (hence, DELIM) and that no "ready"
means that no sections are to follow (hence, FLUSH). So teach
process_acks() to enforce this.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-01 14:30:22 +09:00
Junio C Hamano
465e73fff3 Merge branch 'tb/filter-alternate-refs'
When pushing into a repository that borrows its objects from an
alternate object store, "git receive-pack" that responds to the
push request on the other side lists the tips of refs in the
alternate to reduce the amount of objects transferred.  This
sometimes is detrimental when the number of refs in the alternate
is absurdly large, in which case the bandwidth saved in potentially
fewer objects transferred is wasted in excessively large ref
advertisement.  The alternate refs that are advertised are now
configurable with a pair of configuration variables.

* tb/filter-alternate-refs:
  transport.c: introduce core.alternateRefsPrefixes
  transport.c: introduce core.alternateRefsCommand
  transport.c: extract 'fill_alternate_refs_command'
  transport: drop refnames from for_each_alternate_ref
2018-10-19 13:34:08 +09:00
Junio C Hamano
0527fbab68 Merge branch 'jt/avoid-ls-refs'
Over some transports, fetching objects with an exact commit object
name can be done without first seeing the ref advertisements.  The
code has been optimized to exploit this.

* jt/avoid-ls-refs:
  fetch: do not list refs if fetching only hashes
  transport: list refs before fetch if necessary
  transport: do not list refs if possible
  transport: allow skipping of ref listing
2018-10-19 13:34:07 +09:00
Junio C Hamano
fa54cccf1f Merge branch 'jt/non-blob-lazy-fetch'
A partial clone that is configured to lazily fetch missing objects
will on-demand issue a "git fetch" request to the originating
repository to fill not-yet-obtained objects.  The request has been
optimized for requesting a tree object (and not the leaf blob
objects contained in it) by telling the originating repository that
no blobs are needed.

* jt/non-blob-lazy-fetch:
  fetch-pack: exclude blobs when lazy-fetching trees
  fetch-pack: avoid object flags if no_dependents
2018-10-19 13:34:07 +09:00
Jeff King
bdf4276c91 transport: drop refnames from for_each_alternate_ref
None of the current callers use the refname parameter we pass to their
callbacks. In theory somebody _could_ do so, but it's actually quite
weird if you think about it: it's a ref in somebody else's repository.
So the name has no meaning locally, and in fact there may be duplicates
if there are multiple alternates.

The users of this interface really only care about seeing some ref tips,
since that promises that the alternate has the full commit graph
reachable from there. So let's keep the information we pass back to the
bare minimum.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-09 14:30:02 +09:00
Jonathan Tan
0177565148 transport: do not list refs if possible
When all refs to be fetched are exact OIDs, it is possible to perform a
fetch without requiring the remote to list refs if protocol v2 is used.
Teach Git to do this.

This currently has an effect only for lazy fetches done from partial
clones. The change necessary to likewise optimize "git fetch <remote>
<sha-1>" will be done in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 09:53:15 +09:00
René Scharfe
22a1646511 fetch-pack: load tip_oids eagerly iff needed
tip_oids_contain() lazily loads refs into an oidset at its first call.
It abuses the internal (sub)member .map.tablesize of that oidset to
check if it has done that already.

Determine if the oidset needs to be populated upfront and then do that
instead.  This duplicates a loop, but simplifies the existing one by
separating concerns between the two.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 11:12:13 -07:00
René Scharfe
bf73282c0b fetch-pack: factor out is_unmatched_ref()
Move the code to determine if a request is unmatched to its own little
helper.  This allows us to reuse it in a subsequent patch.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 11:12:13 -07:00
Jonathan Tan
4c7f9567ea fetch-pack: exclude blobs when lazy-fetching trees
A partial clone with missing trees can be obtained using "git clone
--filter=tree:none <repo>". In such a repository, when a tree needs to
be lazily fetched, any tree or blob it directly or indirectly references
is fetched as well, regardless of whether the original command required
those objects, or if the local repository already had some of them.

This is because the fetch protocol, which the lazy fetch uses, does not
allow clients to request that only the wanted objects be sent, which
would be the ideal solution. This patch implements a partial solution:
specify the "blob:none" filter, somewhat reducing the fetch payload.

This change has no effect when lazily fetching blobs (due to how filters
work). And if lazily fetching a commit (such repositories are difficult
to construct and is not a use case we support very well, but it is
possible), referenced commits and trees are still fetched - only the
blobs are not fetched.

The necessary code change is done in fetch_pack() instead of somewhere
closer to where the "filter" instruction is written to the wire so that
only one part of the code needs to be changed in order for users of all
protocol versions to benefit from this optimization.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 06:03:49 -07:00
Jonathan Tan
12f19a9825 fetch-pack: avoid object flags if no_dependents
When fetch_pack() is invoked as part of another Git command (due to a
lazy fetch from a partial clone, for example), it uses object flags that
may already be used by the outer Git command.

The commit that introduced the lazy fetch feature (88e2f9ed8e
("introduce fetch-object: fetch one promisor object", 2017-12-05)) tried
to avoid this overlap, but it did not avoid it totally. It was
successful in avoiding writing COMPLETE, but did not avoid reading
COMPLETE, and did not avoid writing and reading ALTERNATE.

Ensure that no flags are written or read by fetch_pack() in the case
where it is used to perform a lazy fetch. To do this, it is sufficient
to avoid checking completeness of wanted refs (unnecessary in the case
of lazy fetches), and to avoid negotiation-related work (in the current
implementation, already, no negotiation is performed). After that was
done, the lack of overlap was verified by checking all direct and
indirect usages of COMPLETE and ALTERNATE - that they are read or
written only if no_dependents is false.

There are other possible solutions to this issue:

 (1) Split fetch-pack.{c,h} into a flag-using part and a non-flag-using
     part, and whenever no_dependents is set, only use the
     non-flag-using part.
 (2) Make fetch_pack() be able to be used with arbitrary repository
     objects. fetch_pack() should then create its own repository object
     based on the given repository object, with its own object
     hashtable, so that the flags do not conflict.

(1) is possible but invasive - some functions would need to be split;
and such invasiveness would potentially be unnecessary if we ever were
to need (2) anyway. (2) would be useful if we were to support, say,
submodules that were partial clones themselves, but I don't know when or
if the Git project plans to support those.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-04 06:00:53 -07:00
Jeff King
9001dc2a74 convert "oidcmp() != 0" to "!oideq()"
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:

  if (oidcmp(E1, E2))

As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.

There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Junio C Hamano
f8ca71870a Merge branch 'ab/fsck-transfer-updates'
The test performed at the receiving end of "git push" to prevent
bad objects from entering repository can be customized via
receive.fsck.* configuration variables; we now have gained a
counterpart to do the same on the "git fetch" side, with
fetch.fsck.* configuration variables.

* ab/fsck-transfer-updates:
  fsck: test and document unknown fsck.<msg-id> values
  fsck: add stress tests for fsck.skipList
  fsck: test & document {fetch,receive}.fsck.* config fallback
  fetch: implement fetch.fsck.*
  transfer.fsckObjects tests: untangle confusing setup
  config doc: elaborate on fetch.fsckObjects security
  config doc: elaborate on what transfer.fsckObjects does
  config doc: unify the description of fsck.* and receive.fsck.*
  config doc: don't describe *.fetchObjects twice
  receive.fsck.<msg-id> tests: remove dead code
2018-08-17 13:09:54 -07:00
Junio C Hamano
b160b6e69d Merge branch 'jt/connectivity-check-after-unshallow'
"git fetch" sometimes failed to update the remote-tracking refs,
which has been corrected.

* jt/connectivity-check-after-unshallow:
  fetch-pack: unify ref in and out param
2018-08-15 15:08:28 -07:00
Junio C Hamano
ea30f539ef Merge branch 'bw/fetch-pack-i18n'
i18n updates.

* bw/fetch-pack-i18n:
  fetch-pack: mark die strings for translation
2018-08-15 15:08:20 -07:00
Junio C Hamano
7c85ee6c58 Merge branch 'jt/fetch-negotiator-skipping'
Add a server-side knob to skip commits in exponential/fibbonacci
stride in an attempt to cover wider swath of history with a smaller
number of iterations, potentially accepting a larger packfile
transfer, instead of going back one commit a time during common
ancestor discovery during the "git fetch" transaction.

* jt/fetch-negotiator-skipping:
  negotiator/skipping: skip commits during fetch
2018-08-02 15:30:46 -07:00
Junio C Hamano
30bf8d9f4f Merge branch 'jt/fetch-nego-tip'
"git fetch" learned a new option "--negotiation-tip" to limit the
set of commits it tells the other end as "have", to reduce wasted
bandwidth and cycles, which would be helpful when the receiving
repository has a lot of refs that have little to do with the
history at the remote it is fetching from.

* jt/fetch-nego-tip:
  fetch-pack: support negotiation tip whitelist
2018-08-02 15:30:43 -07:00
Junio C Hamano
3a2a1dc170 Merge branch 'sb/object-store-lookup'
lookup_commit_reference() and friends have been updated to find
in-core object for a specific in-core repository instance.

* sb/object-store-lookup: (32 commits)
  commit.c: allow lookup_commit_reference to handle arbitrary repositories
  commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
  tag.c: allow deref_tag to handle arbitrary repositories
  object.c: allow parse_object to handle arbitrary repositories
  object.c: allow parse_object_buffer to handle arbitrary repositories
  commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
  commit.c: allow set_commit_buffer to handle arbitrary repositories
  commit.c: migrate the commit buffer to the parsed object store
  commit-slabs: remove realloc counter outside of slab struct
  commit.c: allow parse_commit_buffer to handle arbitrary repositories
  tag: allow parse_tag_buffer to handle arbitrary repositories
  tag: allow lookup_tag to handle arbitrary repositories
  commit: allow lookup_commit to handle arbitrary repositories
  tree: allow lookup_tree to handle arbitrary repositories
  blob: allow lookup_blob to handle arbitrary repositories
  object: allow lookup_object to handle arbitrary repositories
  object: allow object_as_type to handle arbitrary repositories
  tag: add repository argument to deref_tag
  tag: add repository argument to parse_tag_buffer
  tag: add repository argument to lookup_tag
  ...
2018-08-02 15:30:42 -07:00
Junio C Hamano
af8ac73801 Merge branch 'jt/fetch-pack-negotiator'
Code restructuring and a small fix to transport protocol v2 during
fetching.

* jt/fetch-pack-negotiator:
  fetch-pack: introduce negotiator API
  fetch-pack: move common check and marking together
  fetch-pack: make negotiation-related vars local
  fetch-pack: use ref adv. to prune "have" sent
  fetch-pack: directly end negotiation if ACK ready
  fetch-pack: clear marks before re-marking
  fetch-pack: split up everything_local()
2018-08-02 15:30:41 -07:00
Jonathan Tan
e2842b39f4 fetch-pack: unify ref in and out param
When a user fetches:
 - at least one up-to-date ref and at least one non-up-to-date ref,
 - using HTTP with protocol v0 (or something else that uses the fetch
   command of a remote helper)
some refs might not be updated after the fetch.

This bug was introduced in commit 989b8c4452 ("fetch-pack: put shallow
info in output parameter", 2018-06-28) which allowed transports to
report the refs that they have fetched in a new out-parameter
"fetched_refs". If they do so, transport_fetch_refs() makes this
information available to its caller.

Users of "fetched_refs" rely on the following 3 properties:
 (1) it is the complete list of refs that was passed to
     transport_fetch_refs(),
 (2) it has shallow information (REF_STATUS_REJECT_SHALLOW set if
     relevant), and
 (3) it has updated OIDs if ref-in-want was used (introduced after
     989b8c4452).

In an effort to satisfy (1), whenever transport_fetch_refs()
filters the refs sent to the transport, it re-adds the filtered refs to
whatever the transport supplies before returning it to the user.
However, the implementation in 989b8c4452 unconditionally re-adds the
filtered refs without checking if the transport refrained from reporting
anything in "fetched_refs" (which it is allowed to do), resulting in an
incomplete list, no longer satisfying (1).

An earlier effort to resolve this [1] solved the issue by readding the
filtered refs only if the transport did not refrain from reporting in
"fetched_refs", but after further discussion, it seems that the better
solution is to revert the API change that introduced "fetched_refs".
This API change was first suggested as part of a ref-in-want
implementation that allowed for ref patterns and, thus, there could be
drastic differences between the input refs and the refs actually fetched
[2]; we eventually decided to only allow exact ref names, but this API
change remained even though its necessity was decreased.

Therefore, revert this API change by reverting commit 989b8c4452, and
make receive_wanted_refs() update the OIDs in the sought array (like how
update_shallow() updates shallow information in the sought array)
instead. A test is also included to show that the user-visible bug
discussed at the beginning of this commit message no longer exists.

[1] https://public-inbox.org/git/20180801171806.GA122458@google.com/
[2] https://public-inbox.org/git/86a128c5fb710a41791e7183207c4d64889f9307.1485381677.git.jonathantanmy@google.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 15:00:52 -07:00
Ævar Arnfjörð Bjarmason
1362df0d41 fetch: implement fetch.fsck.*
Implement support for fetch.fsck.* corresponding with the existing
receive.fsck.*. This allows for pedantically cloning repositories with
specific issues without turning off fetch.fsckObjects.

One such repository is https://github.com/robbyrussell/oh-my-zsh.git
which before this change will emit this error when cloned with
fetch.fsckObjects:

    error: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes
    fatal: Error in object
    fatal: index-pack failed

Now with fetch.fsck.zeroPaddedFilemode=warn we'll warn about that
issue, but the clone will succeed:

    warning: object 2b7227859263b6aabcc28355b0b994995b7148b6: zeroPaddedFilemode: contains zero-padded file modes
    warning: object a18c4d13c2a5fa2d4ecd5346c50e119b999b807d: zeroPaddedFilemode: contains zero-padded file modes
    warning: object 84df066176c8da3fd59b13731a86d90f4f1e5c9d: zeroPaddedFilemode: contains zero-padded file modes

The motivation for this is to be able to turn on fetch.fsckObjects
globally across a fleet of computers but still be able to manually
clone various legacy repositories by either white-listing specific
issues, or better yet whitelist specific objects.

The use of --git-dir=* instead of -C in the tests could be considered
somewhat archaic, but the tests I'm adding here are duplicating the
corresponding receive.* tests with as few changes as possible.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-27 11:36:05 -07:00
Junio C Hamano
88df0fa659 Merge branch 'jt/connectivity-check-after-unshallow'
"git fetch" failed to correctly validate the set of objects it
received when making a shallow history deeper, which has been
corrected.

* jt/connectivity-check-after-unshallow:
  fetch-pack: write shallow, then check connectivity
  fetch-pack: implement ref-in-want
  fetch-pack: put shallow info in output parameter
  fetch: refactor to make function args narrower
  fetch: refactor fetch_refs into two functions
  fetch: refactor the population of peer ref OIDs
  upload-pack: test negotiation with changing repository
  upload-pack: implement ref-in-want
  test-pkt-line: add unpack-sideband subcommand
2018-07-24 14:50:44 -07:00
Brandon Williams
bbb19a8b06 fetch-pack: mark die strings for translation
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 15:59:40 -07:00
Junio C Hamano
00624d608c Merge branch 'sb/object-store-grafts'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-07-18 12:20:28 -07:00
Jonathan Tan
42cc7485a2 negotiator/skipping: skip commits during fetch
Introduce a new negotiation algorithm used during fetch that skips
commits in an effort to find common ancestors faster. The skips grow
similarly to the Fibonacci sequence as the commit walk proceeds further
away from the tips. The skips may cause unnecessary commits to be
included in the packfile, but the negotiation step typically ends more
quickly.

Usage of this algorithm is guarded behind the configuration flag
fetch.negotiationAlgorithm.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16 14:51:12 -07:00
Jonathan Tan
3390e42adb fetch-pack: support negotiation tip whitelist
During negotiation, fetch-pack eventually reports as "have" lines all
commits reachable from all refs. Allow the user to restrict the commits
sent in this way by providing a whitelist of tips; only the tips
themselves and their ancestors will be sent.

Both globs and single objects are supported.

This feature is only supported for protocols that support connect or
stateless-connect (such as HTTP with protocol v2).

This will speed up negotiation when the repository has multiple
relatively independent branches (for example, when a repository
interacts with multiple repositories, such as with linux-next [1] and
torvalds/linux [2]), and the user knows which local branch is likely to
have commits in common with the upstream branch they are fetching.

[1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/
[2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 15:00:41 -07:00
Jonathan Tan
cf1e7c0770 fetch-pack: write shallow, then check connectivity
When fetching, connectivity is checked after the shallow file is
updated. There are 2 issues with this: (1) the connectivity check is
only performed up to ancestors of existing refs (which is not thorough
enough if we were deepening an existing ref in the first place), and (2)
there is no rollback of the shallow file if the connectivity check
fails.

To solve (1), update the connectivity check to check the ancestry chain
completely in the case of a deepening fetch by refraining from passing
"--not --all" when invoking rev-list in connected.c.

To solve (2), have fetch_pack() perform its own connectivity check
before updating the shallow file. To support existing use cases in which
"git fetch-pack" is used to download objects without much regard as to
the connectivity of the resulting objects with respect to the existing
repository, the connectivity check is only done if necessary (that is,
the fetch is not a clone, and the fetch involves shallow/deepen
functionality). "git fetch" still performs its own connectivity check,
preserving correctness but sometimes performing redundant work. This
redundancy is mitigated by the fact that fetch_pack() reports if it has
performed a connectivity check itself, and if the transport supports
connect or stateless-connect, it will bubble up that report so that "git
fetch" knows not to perform the connectivity check in such a case.

This was noticed when a user tried to deepen an existing repository by
fetching with --no-shallow from a server that did not send all necessary
objects - the connectivity check as run by "git fetch" succeeded, but a
subsequent "git fsck" failed.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 14:57:44 -07:00
Stefan Beller
a74093da5e tag: add repository argument to deref_tag
Add a repository argument to allow the callers of deref_tag
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller
c1f5eb4962 commit: add repository argument to lookup_commit
Add a repository argument to allow callers of lookup_commit to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Stefan Beller
5abddd1eb7 object: add repository argument to lookup_object
Add a repository argument to allow callers of lookup_object to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Stefan Beller
109cd76dd3 object: add repository argument to parse_object
Add a repository argument to allow the callers of parse_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Junio C Hamano
b16b60f71b Merge branch 'sb/object-store-grafts' into sb/object-store-lookup
* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-06-29 10:43:28 -07:00
Junio C Hamano
0079732e96 Merge branch 'jk/fetch-all-peeled-fix'
"git fetch-pack --all" used to unnecessarily fail upon seeing an
annotated tag that points at an object other than a commit.

* jk/fetch-all-peeled-fix:
  fetch-pack: test explicitly that --all can fetch tag references pointing to non-commits
  fetch-pack: don't try to fetch peel values with --all
2018-06-28 12:53:32 -07:00
Brandon Williams
733020517a fetch-pack: implement ref-in-want
Implement ref-in-want on the client side so that when a server supports
the "ref-in-want" feature, a client will send "want-ref" lines for each
reference the client wants to fetch.  This feature allows clients to
tolerate inconsistencies that exist when a remote repository's refs
change during the course of negotiation.

This allows a client to request to request a particular ref without
specifying the OID of the ref.  This means that instead of hitting an
error when a ref no longer points at the OID it did at the beginning of
negotiation, negotiation can continue and the value of that ref will be
sent at the termination of negotiation, just before a packfile is sent.

More information on the ref-in-want feature can be found in
Documentation/technical/protocol-v2.txt.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-28 09:33:30 -07:00
Brandon Williams
989b8c4452 fetch-pack: put shallow info in output parameter
Expand the transport fetch method signature, by adding an output
parameter, to allow transports to return information about the refs they
have fetched.  Then communicate shallow status information through this
mechanism instead of by modifying the input list of refs.

This does require clients to sometimes generate the ref map twice: once
from the list of refs provided by the remote (as is currently done) and
potentially once from the new list of refs that the fetch mechanism
provides.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-28 09:33:29 -07:00
Jonathan Tan
ec06283844 fetch-pack: introduce negotiator API
Introduce the new files fetch-negotiator.{h,c}, which contains an API
behind which the details of negotiation are abstracted. Currently, only
one algorithm is available: the existing one.

This patch is written to be easily reviewed: static functions are
moved verbatim from fetch-pack.c to negotiator/default.c, and it can be
seen that the lines replaced by negotiator->X() calls are present in the
X() functions respectively.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15 08:44:24 -07:00
Jonathan Tan
d093bc7582 fetch-pack: move common check and marking together
When receiving 'ACK <object-id> continue' for a common commit, check if
the commit was already known to be common and mark it as such if not up
front. This should make future refactoring of how the information about
common commits is stored more straightforward.

No visible change intended.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15 08:44:24 -07:00
Jonathan Tan
d30fe89c37 fetch-pack: make negotiation-related vars local
Reduce the number of global variables by making the priority queue and
the count of non-common commits in it local, passing them as a struct to
various functions where necessary.

This also helps in the case that fetch_pack() is invoked twice in the
same process (when tag following is required when using a transport that
does not support tag following), in that different priority queues will
now be used in each invocation, instead of reusing the possibly
non-empty one.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15 08:44:23 -07:00
Jonathan Tan
af1c90d13e fetch-pack: use ref adv. to prune "have" sent
In negotiation using protocol v2, fetch-pack sometimes does not make
full use of the information obtained in the ref advertisement:
specifically, that if the server advertises a commit that the client
also has, the client never needs to inform the server that it has the
commit's parents, since it can just tell the server that it has the
advertised commit and it knows that the server can and will infer the
rest.

This is because, in do_fetch_pack_v2(), rev_list_insert_ref_oid() is
invoked before mark_complete_and_common_ref(). This means that if we
have a commit that is both our ref and their ref, it would be enqueued
by rev_list_insert_ref_oid() as SEEN, and since it is thus already SEEN,
mark_complete_and_common_ref() would not enqueue it.

If mark_complete_and_common_ref() were invoked first, as it is in
do_fetch_pack() for protocol v0, then mark_complete_and_common_ref()
would enqueue it with COMMON_REF | SEEN. The addition of COMMON_REF
ensures that its parents are not sent as "have" lines.

Change the order in do_fetch_pack_v2() to be consistent with
do_fetch_pack(), and to avoid sending unnecessary "have" lines.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15 08:44:23 -07:00
Jonathan Tan
21bcf6e429 fetch-pack: directly end negotiation if ACK ready
When "ACK %s ready" is received, find_common() clears rev_list in an
attempt to stop further "have" lines from being sent [1]. It is much
more readable to explicitly break from the loop instead.

So explicitly break from the loop, and make the clearing of the rev_list
happen unconditionally.

[1] The rationale is further described in the originating commit
f2cba9299b ("fetch-pack: Finish negotation if remote replies "ACK %s
ready"", 2011-03-14).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-15 08:44:23 -07:00
Jonathan Tan
af008558cc fetch-pack: clear marks before re-marking
If tag following is required when using a transport that does not
support tag following, fetch_pack() will be invoked twice in the same
process, necessitating a clearing of the object flags used by
fetch_pack() sometime during the second invocation. This is currently
done in find_common(), which means that the invocation of
mark_complete_and_common_ref() in do_fetch_pack() is useless.

(This cannot be reproduced with Git alone, because all transports that
come with Git support tag following.)

Therefore, move this clearing from find_common() to its
parent function do_fetch_pack(), right before it calls
mark_complete_and_common_ref().

This has been occurring since the commit that introduced the clearing of
marks, 420e9af498 ("Fix tag following", 2008-03-19).

The corresponding code for protocol v2 in do_fetch_pack_v2() does not
have this problem, as the clearing of flags is done before any marking
(whether by rev_list_insert_ref_oid() or
mark_complete_and_common_ref()).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-14 11:31:24 -07:00
Jonathan Tan
34c2903456 fetch-pack: split up everything_local()
The function everything_local(), despite its name, also (1) marks
commits as COMPLETE and COMMON_REF and (2) invokes filter_refs() as
important side effects. Extract (1) into its own function
(mark_complete_and_common_ref()) and remove
(2).

The restoring of save_commit_buffer, which was introduced in a1c6d7c1a7
("fetch-pack: restore save_commit_buffer after use", 2017-12-08), is a
concern of the parse_object() call in mark_complete_and_common_ref(), so
it has been moved from the end of everything_local() to the end of
mark_complete_and_common_ref().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-14 11:31:24 -07:00
Jeff King
e9502c0a7f fetch-pack: don't try to fetch peel values with --all
When "fetch-pack --all" sees a tag-to-blob on the remote, it
tries to fetch both the tag itself ("refs/tags/foo") and the
peeled value that the remote advertises ("refs/tags/foo^{}").
Asking for the object pointed to by the latter can cause
upload-pack to complain with "not our ref", since it does
not mark the peeled objects with the OUR_REF (unless they
were at the tip of some other ref).

Arguably upload-pack _should_ be marking those peeled
objects. But it never has in the past, since clients would
generally just ask for the tag and expect to get the peeled
value along with it. And that's how "git fetch" works, as
well as older versions of "fetch-pack --all".

The problem was introduced by 5f0fc64513 (fetch-pack:
eliminate spurious error messages, 2012-09-09). Before then,
the matching logic was something like:

  if (refname is ill-formed)
     do nothing
  else if (doing --all)
     always consider it matched
  else
     look through list of sought refs for a match

That commit wanted to flip the order of the second two arms
of that conditional. But we ended up with:

  if (refname is ill-formed)
    do nothing
  else
    look through list of sought refs for a match

  if (--all and no match so far)
    always consider it matched

That means tha an ill-formed ref will trigger the --all
conditional block, even though we should just be ignoring
it. We can fix that by having a single "else" with all of
the well-formed logic, that checks the sought refs and
"--all" in the correct order.

Reported-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-11 12:50:58 -07:00
Junio C Hamano
54db5c0e1e Merge branch 'jt/partial-clone-proto-v2'
Transfer protocol v2 learned to support the partial clone.

* jt/partial-clone-proto-v2:
  {fetch,upload}-pack: support filter in protocol v2
  upload-pack: read config when serving protocol v2
  upload-pack: fix error message typo
2018-05-30 14:04:10 +09:00
Junio C Hamano
41267e9697 Merge branch 'bw/server-options'
The transport protocol v2 is getting updated further.

* bw/server-options:
  fetch: send server options when using protocol v2
  ls-remote: send server options when using protocol v2
  serve: introduce the server-option capability
2018-05-23 14:38:15 +09:00
Stefan Beller
102de880d2 path.c: migrate global git_path_* to take a repository argument
Migrate all git_path_* functions that are defined in path.c to take a
repository argument. Unlike other patches in this series, do not use the
 #define trick, as we rewrite the whole function, which is rather small.

This doesn't migrate all the functions, as other builtins have their own
local path functions defined using GIT_PATH_FUNC. So keep that macro
around to serve the other locations.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller
c88134870e shallow: add repository argument to is_repository_shallow
Add a repository argument to allow callers of is_repository_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller
19143f139d shallow: add repository argument to register_shallow
Add a repository argument to allow callers of register_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller
cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
Junio C Hamano
9bfa0f9be3 Merge branch 'bw/protocol-v2'
The beginning of the next-gen transfer protocol.

* bw/protocol-v2: (35 commits)
  remote-curl: don't request v2 when pushing
  remote-curl: implement stateless-connect command
  http: eliminate "# service" line when using protocol v2
  http: don't always add Git-Protocol header
  http: allow providing extra headers for http requests
  remote-curl: store the protocol version the server responded with
  remote-curl: create copy of the service name
  pkt-line: add packet_buf_write_len function
  transport-helper: introduce stateless-connect
  transport-helper: refactor process_connect_service
  transport-helper: remove name parameter
  connect: don't request v2 when pushing
  connect: refactor git_connect to only get the protocol version once
  fetch-pack: support shallow requests
  fetch-pack: perform a fetch using v2
  upload-pack: introduce fetch server command
  push: pass ref prefixes when pushing
  fetch: pass ref prefixes when fetching
  ls-remote: pass ref prefixes when requesting a remote's refs
  transport: convert transport_get_remote_refs to take a list of ref prefixes
  ...
2018-05-08 15:59:16 +09:00
Jonathan Tan
ba95710a3b {fetch,upload}-pack: support filter in protocol v2
The fetch-pack/upload-pack protocol v2 was developed independently of
the filter parameter (used in partial fetches), thus it did not include
support for it. Add support for the filter parameter.

Like in the legacy protocol, the server advertises and supports "filter"
only if uploadpack.allowfilter is configured.

Like in the legacy protocol, the client continues with a warning if
"--filter" is specified, but the server does not advertise it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 13:17:19 +09:00
Junio C Hamano
ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2
The beginning of the next-gen transfer protocol.

* bw/protocol-v2: (35 commits)
  remote-curl: don't request v2 when pushing
  remote-curl: implement stateless-connect command
  http: eliminate "# service" line when using protocol v2
  http: don't always add Git-Protocol header
  http: allow providing extra headers for http requests
  remote-curl: store the protocol version the server responded with
  remote-curl: create copy of the service name
  pkt-line: add packet_buf_write_len function
  transport-helper: introduce stateless-connect
  transport-helper: refactor process_connect_service
  transport-helper: remove name parameter
  connect: don't request v2 when pushing
  connect: refactor git_connect to only get the protocol version once
  fetch-pack: support shallow requests
  fetch-pack: perform a fetch using v2
  upload-pack: introduce fetch server command
  push: pass ref prefixes when pushing
  fetch: pass ref prefixes when fetching
  ls-remote: pass ref prefixes when requesting a remote's refs
  transport: convert transport_get_remote_refs to take a list of ref prefixes
  ...
2018-05-02 18:54:10 +09:00
Brandon Williams
5e3548ef16 fetch: send server options when using protocol v2
Teach fetch to optionally accept server options by specifying them on
the cmdline via '-o' or '--server-option'.  These server options are
sent to the remote end when performing a fetch communicating using
protocol version 2.

If communicating using a protocol other than v2 the provided options are
ignored and not sent to the remote end.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-24 11:24:40 +09:00
Junio C Hamano
bbc39d4020 Merge branch 'bw/protocol-v2' into HEAD
* bw/protocol-v2: (35 commits)
  remote-curl: don't request v2 when pushing
  remote-curl: implement stateless-connect command
  http: eliminate "# service" line when using protocol v2
  http: don't always add Git-Protocol header
  http: allow providing extra headers for http requests
  remote-curl: store the protocol version the server responded with
  remote-curl: create copy of the service name
  pkt-line: add packet_buf_write_len function
  transport-helper: introduce stateless-connect
  transport-helper: refactor process_connect_service
  transport-helper: remove name parameter
  connect: don't request v2 when pushing
  connect: refactor git_connect to only get the protocol version once
  fetch-pack: support shallow requests
  fetch-pack: perform a fetch using v2
  upload-pack: introduce fetch server command
  push: pass ref prefixes when pushing
  fetch: pass ref prefixes when fetching
  ls-remote: pass ref prefixes when requesting a remote's refs
  transport: convert transport_get_remote_refs to take a list of ref prefixes
  ...
2018-04-24 11:24:22 +09:00
Stefan Beller
d807c4a01d exec_cmd: rename to use dash in file name
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.

Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11 18:11:00 +09:00
Junio C Hamano
3a1ec60c43 Merge branch 'sb/packfiles-in-repository'
Refactoring of the internal global data structure continues.

* sb/packfiles-in-repository:
  packfile: keep prepare_packed_git() private
  packfile: allow find_pack_entry to handle arbitrary repositories
  packfile: add repository argument to find_pack_entry
  packfile: allow reprepare_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git_one to handle arbitrary repositories
  packfile: add repository argument to reprepare_packed_git
  packfile: add repository argument to prepare_packed_git
  packfile: add repository argument to prepare_packed_git_one
  packfile: allow install_packed_git to handle arbitrary repositories
  packfile: allow rearrange_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git_mru to handle arbitrary repositories
2018-04-11 13:09:55 +09:00
Junio C Hamano
5d806b74d5 Merge branch 'ti/fetch-everything-local-optim'
A "git fetch" from a repository with insane number of refs into a
repository that is already up-to-date still wasted too many cycles
making many lstat(2) calls to see if these objects at the tips
exist as loose objects locally.  These lstat(2) calls are optimized
away by enumerating all loose objects beforehand.

It is unknown if the new strategy negatively affects existing use
cases, fetching into a repository with many loose objects from a
repository with small number of refs.

* ti/fetch-everything-local-optim:
  fetch-pack.c: use oidset to check existence of loose object
2018-04-10 08:25:43 +09:00
Stefan Beller
a49d283435 packfile: add repository argument to reprepare_packed_git
See previous patch for explanation.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:07:43 -07:00
Brandon Williams
f7e2050105 fetch-pack: support shallow requests
Enable shallow clones and deepen requests using protocol version 2 if
the server 'fetch' command supports the 'shallow' feature.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:09 -07:00
Brandon Williams
685fbd3291 fetch-pack: perform a fetch using v2
When communicating with a v2 server, perform a fetch by requesting the
'fetch' command.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:01:08 -07:00
Jonathan Tan
98a2ea46c2 fetch-pack: do not check links for partial fetch
When doing a partial clone or fetch with transfer.fsckobjects=1, use the
--fsck-objects instead of the --strict flag when invoking index-pack so
that links are not checked, only objects. This is because incomplete
links are expected when doing a partial clone or fetch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 10:16:27 -07:00
Takuto Ikuta
024aa4696c fetch-pack.c: use oidset to check existence of loose object
When fetching from a repository with large number of refs, because to
check existence of each refs in local repository to packed and loose
objects, 'git fetch' ends up doing a lot of lstat(2) to non-existing
loose form, which makes it slow.

Instead of making as many lstat(2) calls as the refs the remote side
advertised to see if these objects exist in the loose form, first
enumerate all the existing loose objects in hashmap beforehand and use
it to check existence of them if the number of refs is larger than the
number of loose objects.

With this patch, the number of lstat(2) calls in `git fetch` is reduced
from 411412 to 13794 for chromium repository, it has more than 480000
remote refs.

I took time stat of `git fetch` when fetch-pack happens for chromium
repository 3 times on linux with SSD.
* with this patch
8.105s
8.309s
7.640s
avg: 8.018s

* master
12.287s
11.175s
12.227s
avg: 11.896s

On my MacBook Air which has slower lstat(2).
* with this patch
14.501s

* master
1m16.027s

`git fetch` on slow disk will be improved largely.

Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 11:17:26 -07:00
Junio C Hamano
2fb346c06a Merge branch 'js/packet-read-line-check-null'
Some low level protocol codepath could crash when they get an
unexpected flush packet, which is now fixed.

* js/packet-read-line-check-null:
  always check for NULL return from packet_read_line()
  correct error messages for NULL packet_read_line()
2018-02-27 10:33:55 -08:00
Junio C Hamano
6bed209a20 Merge branch 'jh/partial-clone'
The machinery to clone & fetch, which in turn involves packing and
unpacking objects, have been told how to omit certain objects using
the filtering mechanism introduced by the jh/object-filtering
topic, and also mark the resulting pack as a promisor pack to
tolerate missing objects, taking advantage of the mechanism
introduced by the jh/fsck-promisors topic.

* jh/partial-clone:
  t5616: test bulk prefetch after partial fetch
  fetch: inherit filter-spec from partial clone
  t5616: end-to-end tests for partial clone
  fetch-pack: restore save_commit_buffer after use
  unpack-trees: batch fetching of missing blobs
  clone: partial clone
  partial-clone: define partial clone settings in config
  fetch: support filters
  fetch: refactor calculation of remote list
  fetch-pack: test support excluding large blobs
  fetch-pack: add --no-filter
  fetch-pack, index-pack, transport: partial clone
  upload-pack: add object filtering for partial clone
2018-02-13 13:39:04 -08:00
Junio C Hamano
f3d618d2bf Merge branch 'jh/fsck-promisors'
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.

* jh/fsck-promisors:
  gc: do not repack promisor packfiles
  rev-list: support termination at promisor objects
  sha1_file: support lazily fetching missing objects
  introduce fetch-object: fetch one promisor object
  index-pack: refactor writing of .keep files
  fsck: support promisor objects as CLI argument
  fsck: support referenced promisor objects
  fsck: support refs pointing to promisor objects
  fsck: introduce partialclone extension
  extension.partialclone: introduce partial clone extension
2018-02-13 13:39:03 -08:00
Jeff King
bc9d4dc5b0 correct error messages for NULL packet_read_line()
The packet_read_line() function dies if it gets an
unexpected EOF. It only returns NULL if we get a flush
packet (or technically, a zero-length "0004" packet, but
nobody is supposed to send those, and they are
indistinguishable from a flush in this interface).

Let's correct error messages which claim an unexpected EOF;
it's really an unexpected flush packet.

While we're here, let's also check "!line" instead of
"!len" in the second case. The two events should always
coincide, but checking "!line" makes it more obvious that we
are not about to dereference NULL.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-08 12:37:30 -08:00
Jonathan Tan
a1c6d7c1a7 fetch-pack: restore save_commit_buffer after use
In fetch-pack, the global variable save_commit_buffer is set to 0, but
not restored to its original value after use.

In particular, if show_log() (in log-tree.c) is invoked after
fetch_pack() in the same process, show_log() will return before printing
out the commit message (because the invocation to
get_cached_commit_buffer() returns NULL, because the commit buffer was
not saved). I discovered this when attempting to run "git log -S" in a
partial clone, triggering the case where revision walking lazily loads
missing objects.

Therefore, restore save_commit_buffer to its original value after use.

An alternative to solve the problem I had is to replace
get_cached_commit_buffer() with get_commit_buffer(). That invocation was
introduced in commit a97934d ("use get_cached_commit_buffer where
appropriate", 2014-06-13) to replace "commit->buffer" introduced in
commit 3131b71 ("Add "--show-all" revision walker flag for debugging",
2008-02-13). In the latter commit, the commit author seems to be
deciding between not showing an unparsed commit at all and showing an
unparsed commit without the message (which is what the commit does), and
did not mention parsing the unparsed commit, so I prefer to preserve the
existing behavior.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:58:52 -08:00
Jeff Hostetler
640d8b72fe fetch-pack, index-pack, transport: partial clone
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:58:51 -08:00
Junio C Hamano
79bafd23a8 Merge branch 'jk/fewer-pack-rescan'
Internaly we use 0{40} as a placeholder object name to signal the
codepath that there is no such object (e.g. the fast-forward check
while "git fetch" stores a new remote-tracking ref says "we know
there is no 'old' thing pointed at by the ref, as we are creating
it anew" by passing 0{40} for the 'old' side), and expect that a
codepath to locate an in-core object to return NULL as a sign that
the object does not exist.  A look-up for an object that does not
exist however is quite costly with a repository with large number
of packfiles.  This access pattern has been optimized.

* jk/fewer-pack-rescan:
  sha1_file: fast-path null sha1 as a missing object
  everything_local: use "quick" object existence check
  p5551: add a script to test fetch pack-dir rescans
  t/perf/lib-pack: use fast-import checkpoint to create packs
  p5550: factor out nonsense-pack creation
2017-12-06 09:23:42 -08:00
Jonathan Tan
88e2f9ed8e introduce fetch-object: fetch one promisor object
Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Jeff King
c291293b2e everything_local: use "quick" object existence check
In b495697b82 (fetch-pack: avoid repeatedly re-scanning pack
directory, 2013-01-26), we noticed that everything_local()
could waste time trying to find and parse objects which we
_expect_ to be missing. The solution was to put
has_sha1_file() in front of parse_object() to skip the
more-expensive parse attempt.

That optimization was negated later when has_sha1_file()
learned to do the same re-scan in 45e8a74873 (has_sha1_file:
re-check pack directory before giving up, 2013-08-30).

We can restore it by using the "quick" flag to tell
has_sha1_file (actually has_object_file these days) that we
prefer speed to thoroughness for this call.  See also the
fixes in 5827a0354 and 0eeb077be7 for prior art and
discussion on using the "quick" flag for these cases.

The recently-added performance regression test in p5551
demonstrates the problem. You can see the original fix:

  Test            b495697b82^       b495697b82
  --------------------------------------------------------
  5551.4: fetch   1.68(1.33+0.35)   0.87(0.69+0.18) -48.2%

and then the regression:

  Test            45e8a74873^       45e8a74873
  ---------------------------------------------------------
  5551.4: fetch   0.96(0.77+0.19)   2.55(2.04+0.50) +165.6%

and now our fix:

  Test            HEAD^             HEAD
  --------------------------------------------------------
  5551.4: fetch   7.21(6.58+0.63)   5.47(5.04+0.43) -24.1%

You can also see that other things have gotten a lot slower
since 2013. We'll deal with those in separate patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-21 11:30:41 +09:00
Jonathan Tan
9e6fabde82 oidmap: map with OID as key
This is similar to using the hashmap in hashmap.c, but with an
easier-to-use API. In particular, custom entry comparisons no longer
need to be written, and lookups can be done without constructing a
temporary entry structure.

This is implemented as a thin wrapper over the hashmap API. In
particular, this means that there is an additional 4-byte overhead due
to the fact that the first 4 bytes of the hash is redundantly stored.
For now, I'm taking the simpler approach, but if need be, we can
reimplement oidmap without affecting the callers significantly.

oidset has been updated to use oidmap.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01 17:18:03 +09:00
Jonathan Tan
0abe14f6a5 pack: move {,re}prepare_packed_git and approximate_object_count
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Junio C Hamano
f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Brandon Williams
b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano
02c531eba2 Merge branch 'jt/fetch-allow-tip-sha1-implicitly'
There is no good reason why "git fetch $there $sha1" should fail
when the $sha1 names an object at the tip of an advertised ref,
even when the other side hasn't enabled allowTipSHA1InWant.

* jt/fetch-allow-tip-sha1-implicitly:
  fetch-pack: always allow fetching of literal SHA1s
2017-05-30 11:16:43 +09:00
Junio C Hamano
6b526ced6f Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (53 commits)
  object: convert parse_object* to take struct object_id
  tree: convert parse_tree_indirect to struct object_id
  sequencer: convert do_recursive_merge to struct object_id
  diff-lib: convert do_diff_cache to struct object_id
  builtin/ls-tree: convert to struct object_id
  merge: convert checkout_fast_forward to struct object_id
  sequencer: convert fast_forward_to to struct object_id
  builtin/ls-files: convert overlay_tree_on_cache to object_id
  builtin/read-tree: convert to struct object_id
  sha1_name: convert internals of peel_onion to object_id
  upload-pack: convert remaining parse_object callers to object_id
  revision: convert remaining parse_object callers to object_id
  revision: rename add_pending_sha1 to add_pending_oid
  http-push: convert process_ls_object and descendants to object_id
  refs/files-backend: convert many internals to struct object_id
  refs: convert struct ref_update to use struct object_id
  ref-filter: convert some static functions to struct object_id
  Convert struct ref_array_item to struct object_id
  Convert the verify_pack callback to struct object_id
  Convert lookup_tag to struct object_id
  ...
2017-05-29 12:34:43 +09:00
Junio C Hamano
b15667bbdc Merge branch 'js/larger-timestamps'
Some platforms have ulong that is smaller than time_t, and our
historical use of ulong for timestamp would mean they cannot
represent some timestamp that the platform allows.  Invent a
separate and dedicated timestamp_t (so that we can distingiuish
timestamps and a vanilla ulongs, which along is already a good
move), and then declare uintmax_t is the type to be used as the
timestamp_t.

* js/larger-timestamps:
  archive-tar: fix a sparse 'constant too large' warning
  use uintmax_t for timestamps
  date.c: abort if the system time cannot handle one of our timestamps
  timestamp_t: a new data type for timestamps
  PRItime: introduce a new "printf format" for timestamps
  parse_timestamp(): specify explicitly where we parse timestamps
  t0006 & t5000: skip "far in the future" test when time_t is too limited
  t0006 & t5000: prepare for 64-bit timestamps
  ref-filter: avoid using `unsigned long` for catch-all data type
2017-05-16 11:51:59 +09:00
Jonathan Tan
fdb69d33c4 fetch-pack: always allow fetching of literal SHA1s
fetch-pack, when fetching a literal SHA-1 from a server that is not
configured with uploadpack.allowtipsha1inwant (or similar), always
returns an error message of the form "Server does not allow request for
unadvertised object %s". However, it is sometimes the case that such
object is advertised. This situation would occur, for example, if a user
or a script was provided a SHA-1 instead of a branch or tag name for
fetching, and wanted to invoke "git fetch" or "git fetch-pack" using
that SHA-1.

Teach fetch-pack to also check the SHA-1s of the refs in the received
ref advertisement if a literal SHA-1 was given by the user.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-16 10:17:05 +09:00
brian m. carlson
c251c83df2 object: convert parse_object* to take struct object_id
Make parse_object, parse_object_or_die, and parse_object_buffer take a
pointer to struct object_id.  Remove the temporary variables inserted
earlier, since they are no longer necessary.  Transform all of the
callers using the following semantic patch:

@@
expression E1;
@@
- parse_object(E1.hash)
+ parse_object(&E1)

@@
expression E1;
@@
- parse_object(E1->hash)
+ parse_object(E1)

@@
expression E1, E2;
@@
- parse_object_or_die(E1.hash, E2)
+ parse_object_or_die(&E1, E2)

@@
expression E1, E2;
@@
- parse_object_or_die(E1->hash, E2)
+ parse_object_or_die(E1, E2)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1.hash, E2, E3, E4, E5)
+ parse_object_buffer(&E1, E2, E3, E4, E5)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1->hash, E2, E3, E4, E5)
+ parse_object_buffer(E1, E2, E3, E4, E5)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:58 +09:00
brian m. carlson
bc83266abe Convert lookup_commit* to struct object_id
Convert lookup_commit, lookup_commit_or_die,
lookup_commit_reference, and lookup_commit_reference_gently to take
struct object_id arguments.

Introduce a temporary in parse_object buffer in order to convert this
function.  This is required since in order to convert parse_object and
parse_object_buffer, lookup_commit_reference_gently and
lookup_commit_or_die would need to be converted.  Not introducing a
temporary would therefore require that lookup_commit_or_die take a
struct object_id *, but lookup_commit would take unsigned char *,
leaving a confusing and hard-to-use interface.

parse_object_buffer will lose this temporary in a later patch.

This commit was created with manual changes to commit.c, commit.h, and
object.c, plus the following semantic patch:

@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1.hash, E2)
+ lookup_commit_reference_gently(&E1, E2)

@@
expression E1, E2;
@@
- lookup_commit_reference_gently(E1->hash, E2)
+ lookup_commit_reference_gently(E1, E2)

@@
expression E1;
@@
- lookup_commit_reference(E1.hash)
+ lookup_commit_reference(&E1)

@@
expression E1;
@@
- lookup_commit_reference(E1->hash)
+ lookup_commit_reference(E1)

@@
expression E1;
@@
- lookup_commit(E1.hash)
+ lookup_commit(&E1)

@@
expression E1;
@@
- lookup_commit(E1->hash)
+ lookup_commit(E1)

@@
expression E1, E2;
@@
- lookup_commit_or_die(E1.hash, E2)
+ lookup_commit_or_die(&E1, E2)

@@
expression E1, E2;
@@
- lookup_commit_or_die(E1->hash, E2)
+ lookup_commit_or_die(E1, E2)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson
e92b848cb6 shallow: convert shallow registration functions to object_id
Convert register_shallow and unregister_shallow to take struct
object_id.  register_shallow is a caller of lookup_commit, which we will
convert later.  It doesn't make sense for the registration and
unregistration functions to have incompatible interfaces, so convert
them both.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson
1b283377b1 fetch-pack: convert to struct object_id
Convert all uses of unsigned char [20] to struct object_id.  Switch one
use of get_sha1_hex to parse_oid_hex to avoid the need for a constant.
This change is necessary in order to convert parse_object.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-02 10:46:41 +09:00
Johannes Schindelin
dddbad728c timestamp_t: a new data type for timestamps
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).

So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.

By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.

As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27 13:07:39 +09:00
Junio C Hamano
5938454cbc Merge branch 'dt/xgethostname-nul-termination'
gethostname(2) may not NUL terminate the buffer if hostname does
not fit; unfortunately there is no easy way to see if our buffer
was too small, but at least this will make sure we will not end up
using garbage past the end of the buffer.

* dt/xgethostname-nul-termination:
  xgethostname: handle long hostnames
  use HOST_NAME_MAX to size buffers for gethostname(2)
2017-04-23 22:07:57 -07:00
Junio C Hamano
d2617eb984 Merge branch 'jt/fetch-pack-error-reporting'
"git fetch-pack" was not prepared to accept ERR packet that the
upload-pack can send with a human-readable error message.  It
showed the packet contents with ERR prefix, so there was no data
loss, but it was redundant to say "ERR" in an error message.

* jt/fetch-pack-error-reporting:
  fetch-pack: show clearer error message upon ERR
2017-04-23 22:07:53 -07:00
Johannes Schindelin
cb71f8bdb5 PRItime: introduce a new "printf format" for timestamps
Currently, Git's source code treats all timestamps as if they were
unsigned longs. Therefore, it is okay to write "%lu" when printing them.

There is a substantial problem with that, though: at least on Windows,
time_t is *larger* than unsigned long, and hence we will want to switch
away from the ill-specified `unsigned long` data type.

So let's introduce the pseudo format "PRItime" (currently simply being
defined to "lu") to make it easier to change the data type used for
timestamps.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23 20:19:15 -07:00
David Turner
5781a9a270 xgethostname: handle long hostnames
If the full hostname doesn't fit in the buffer supplied to
gethostname, POSIX does not specify whether the buffer will be
null-terminated, so to be safe, we should do it ourselves.  Introduce
new function, xgethostname, which ensures that there is always a \0
at the end of the buffer.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-18 19:58:04 -07:00
René Scharfe
da25bdb776 use HOST_NAME_MAX to size buffers for gethostname(2)
POSIX limits the length of host names to HOST_NAME_MAX.  Export the
fallback definition from daemon.c and use this constant to make all
buffers used with gethostname(2) big enough for any possible result
and a terminating NUL.

Inspired-by: David Turner <dturner@twosigma.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: David Turner <dturner@twosigma.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-18 19:57:41 -07:00
Jonathan Tan
8e2c7bef03 fetch-pack: show clearer error message upon ERR
Currently, fetch-pack prints a confusing error message ("expected
ACK/NAK") when the server it's communicating with sends a pkt-line
starting with "ERR".  Replace it with a less confusing error message.

Also update the documentation describing the fetch-pack/upload-pack
protocol (pack-protocol.txt) to indicate that "ERR" can be sent in the
place of "ACK" or "NAK". In practice, this has been done for quite some
time by other Git implementations (e.g. JGit sends "want $id not valid")
and by Git itself (since commit bdb31ea: "upload-pack: report "not our
ref" to client", 2017-02-23) whenever a "want" line references an object
that it does not have. (This is uncommon, but can happen if a repository
is garbage-collected during a negotiation.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-17 18:51:28 -07:00
brian m. carlson
910650d2f8 Rename sha1_array to oid_array
Since this structure handles an array of object IDs, rename it to struct
oid_array.  Also rename the accessor functions and the initialization
constant.

This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:

    perl -pi -E 's/struct sha1_array/struct oid_array/g'
    perl -pi -E 's/\bsha1_array_/oid_array_/g'
    perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:56 -07:00
brian m. carlson
98a72ddc12 Make sha1_array_append take a struct object_id *
Convert the callers to pass struct object_id by changing the function
declaration and definition and applying the following semantic patch:

@@
expression E1, E2;
@@
- sha1_array_append(E1, E2.hash)
+ sha1_array_append(E1, &E2)

@@
expression E1, E2;
@@
- sha1_array_append(E1, E2->hash)
+ sha1_array_append(E1, E2)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:55 -07:00
brian m. carlson
ee3051bd23 sha1-array: convert internal storage for struct sha1_array to object_id
Make the internal storage for struct sha1_array use an array of struct
object_id internally.  Update the users of this struct which inspect its
internals.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28 09:59:34 -07:00
Junio C Hamano
07198afbd1 Merge branch 'mm/fetch-show-error-message-on-unadvertised-object'
"git fetch" that requests a commit by object name, when the other
side does not allow such an request, failed without much
explanation.

* mm/fetch-show-error-message-on-unadvertised-object:
  fetch-pack: add specific error for fetching an unadvertised object
  fetch_refs_via_pack: call report_unmatched_refs
  fetch-pack: move code to report unmatched refs to a function
2017-03-14 15:23:18 -07:00
Matt McCutchen
d56583ded6 fetch-pack: add specific error for fetching an unadvertised object
Enhance filter_refs (which decides whether a request for an unadvertised
object should be sent to the server) to record a new match status on the
"struct ref" when a request is not allowed, and have
report_unmatched_refs check for this status and print a special error
message, "Server does not allow request for unadvertised object".

Signed-off-by: Matt McCutchen <matt@mattmccutchen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 11:12:53 -08:00
Matt McCutchen
e860d96bf8 fetch-pack: move code to report unmatched refs to a function
Prepare to reuse this code in transport.c for "git fetch".

While we're here, internationalize the existing error message.

Signed-off-by: Matt McCutchen <matt@mattmccutchen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 11:12:53 -08:00
Jeff King
41a078c60b fetch-pack: cache results of for_each_alternate_ref
We may run for_each_alternate_ref() twice, once in
find_common() and once in everything_local(). This operation
can be expensive, because it involves running a sub-process
which must freshly load all of the alternate's refs from
disk.

Let's cache and reuse the results between the two calls. We
can make some optimizations based on the particular use
pattern in fetch-pack to keep our memory usage down.

The first is that we only care about the sha1s, not the refs
themselves. So it's OK to store only the sha1s, and to
suppress duplicates. The natural fit would therefore be a
sha1_array.

However, sha1_array's de-duplication happens only after it
has read and sorted all entries. It still stores each
duplicate. For an alternate with a large number of refs
pointing to the same commits, this is a needless expense.

Instead, we'd prefer to eliminate duplicates before putting
them in the cache, which implies using a hash. We can
further note that fetch-pack will call parse_object() on
each alternate sha1. We can therefore keep our cache as a
set of pointers to "struct object". That gives us a place to
put our "already seen" bit with an optimized hash lookup.
And as a bonus, the object stores the sha1 for us, so
pointer-to-object is all we need.

There are two extra optimizations I didn't do here:

  - we actually store an array of pointer-to-object.
    Technically we could just walk the obj_hash table
    looking for entries with the ALTERNATE flag set (because
    our use case doesn't care about the order here).

    But that hash table may be mostly composed of
    non-ALTERNATE entries, so we'd waste time walking over
    them. So it would be a slight win in memory use, but a
    loss in CPU.

  - the items we pull out of the cache are actual "struct
    object"s, but then we feed "obj->sha1" to our
    sub-functions, which promptly call parse_object().

    This second parse is cheap, because it starts with
    lookup_object() and will bail immediately when it sees
    we've already parsed the object. We could save the extra
    hash lookup, but it would involve refactoring the
    functions we call. It may or may not be worth the
    trouble.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-08 15:39:55 -08:00
Jeff King
2429d63a46 for_each_alternate_ref: pass name/oid instead of ref struct
Breaking down the fields in the interface makes it easier to
change the backend of for_each_alternate_ref to something
that doesn't use "struct ref" internally.

The only field that callers actually look at is the oid,
anyway. The refname is kept in the interface as a plausible
thing for future code to want.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-08 15:39:55 -08:00
Ralf Thielow
dfbfb9f377 fetch-pack.c: correct command at the beginning of an error message
One error message in fetch-pack.c uses 'git fetch_pack' at the beginning
which is not a git command.  Use 'git fetch-pack' instead.

Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-11 13:28:39 -08:00
Junio C Hamano
a460ea4a3c Merge branch 'nd/shallow-deepen'
The existing "git fetch --depth=<n>" option was hard to use
correctly when making the history of an existing shallow clone
deeper.  A new option, "--deepen=<n>", has been added to make this
easier to use.  "git clone" also learned "--shallow-since=<date>"
and "--shallow-exclude=<tag>" options to make it easier to specify
"I am interested only in the recent N months worth of history" and
"Give me only the history since that version".

* nd/shallow-deepen: (27 commits)
  fetch, upload-pack: --deepen=N extends shallow boundary by N commits
  upload-pack: add get_reachable_list()
  upload-pack: split check_unreachable() in two, prep for get_reachable_list()
  t5500, t5539: tests for shallow depth excluding a ref
  clone: define shallow clone boundary with --shallow-exclude
  fetch: define shallow boundary with --shallow-exclude
  upload-pack: support define shallow boundary by excluding revisions
  refs: add expand_ref()
  t5500, t5539: tests for shallow depth since a specific date
  clone: define shallow clone boundary based on time with --shallow-since
  fetch: define shallow boundary with --shallow-since
  upload-pack: add deepen-since to cut shallow repos based on time
  shallow.c: implement a generic shallow boundary finder based on rev-list
  fetch-pack: use a separate flag for fetch in deepening mode
  fetch-pack.c: mark strings for translating
  fetch-pack: use a common function for verbose printing
  fetch-pack: use skip_prefix() instead of starts_with()
  upload-pack: move rev-list code out of check_non_tip()
  upload-pack: make check_non_tip() clean things up on error
  upload-pack: tighten number parsing at "deepen" lines
  ...
2016-10-10 14:03:50 -07:00
Junio C Hamano
b8688adb12 Merge branch 'rs/qsort'
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of
the time third parameter is redundant.  A new QSORT() macro lets us
omit it.

* rs/qsort:
  show-branch: use QSORT
  use QSORT, part 2
  coccicheck: use --all-includes by default
  remove unnecessary check before QSORT
  use QSORT
  add QSORT
2016-10-10 14:03:46 -07:00
René Scharfe
9ed0d8d6e6 use QSORT
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT.  The resulting code is
shorter and supports empty arrays with NULL pointers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29 15:42:18 -07:00
Jonathan Tan
06b3d386e0 fetch-pack: do not reset in_vain on non-novel acks
The MAX_IN_VAIN mechanism was introduced in commit f061e5f ("fetch-pack:
give up after getting too many "ack continue"", 2006-05-24) to stop ref
negotiation if a number of consecutive "have"s have been sent with no
corresponding new acks. This is to stop the client from digging too deep
in an irrelevant side branch in vain without ever finding a common
ancestor. A use case (as described in that commit) is the scenario in
which the local repository has more roots than the remote repository.

However, during a negotiation in which stateless RPCs are used,
MAX_IN_VAIN will (almost) never trigger (in the more-roots scenario
above and others) because in each new request, the client has to inform
the server of objects it already has and knows the server has (to remind
the server of the state), which the server then acks.

Make fetch-pack only consider, as new acks for the purpose of
MAX_IN_VAIN, acks for objects for which the client has never received an
ack before in this session.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-23 12:37:45 -07:00
Jonathan Tan
da470981de fetch-pack: grow stateless RPC windows exponentially
When updating large repositories, the LARGE_FLUSH limit (that is, the
limit at which the window growth strategy switches from exponential to
linear) is reached quite quickly. Use a conservative exponential growth
strategy when that limit is reached instead (and increase LARGE_FLUSH so
that there is no regression in window size).

This optimization is only applied during stateless RPCs to avoid the
issue raised and fixed in commit 44d8dc54 (Fix potential local
deadlock during fetch-pack, 2011-03-29).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-19 13:27:22 -07:00
Nguyễn Thái Ngọc Duy
cccf74e2da fetch, upload-pack: --deepen=N extends shallow boundary by N commits
In git-fetch, --depth argument is always relative with the latest
remote refs. This makes it a bit difficult to cover this use case,
where the user wants to make the shallow history, say 3 levels
deeper. It would work if remote refs have not moved yet, but nobody
can guarantee that, especially when that use case is performed a
couple months after the last clone or "git fetch --depth". Also,
modifying shallow boundary using --depth does not work well with
clones created by --since or --not.

This patch fixes that. A new argument --deepen=<N> will add <N> more (*)
parent commits to the current history regardless of where remote refs
are.

Have/Want negotiation is still respected. So if remote refs move, the
server will send two chunks: one between "have" and "want" and another
to extend shallow history. In theory, the client could send no "want"s
in order to get the second chunk only. But the protocol does not allow
that. Either you send no want lines, which means ls-remote; or you
have to send at least one want line that carries deep-relative to the
server..

The main work was done by Dongcan Jiang. I fixed it up here and there.
And of course all the bugs belong to me.

(*) We could even support --deepen=<N> where <N> is negative. In that
case we can cut some history from the shallow clone. This operation
(and --depth=<shorter depth>) does not require interaction with remote
side (and more complicated to implement as a result).

Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 14:38:16 -07:00
Nguyễn Thái Ngọc Duy
a45a260086 fetch: define shallow boundary with --shallow-exclude
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 14:38:16 -07:00
Nguyễn Thái Ngọc Duy
508ea88226 fetch: define shallow boundary with --shallow-since
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 14:38:16 -07:00
Nguyễn Thái Ngọc Duy
79891cb90a fetch-pack: use a separate flag for fetch in deepening mode
The shallow repo could be deepened or shortened when then user gives
--depth. But in future that won't be the only way to deepen/shorten a
repo. Stop relying on args->depth in this mode. Future deepening
methods can simply set this flag on instead of updating all these if
expressions.

The new name "deepen" was chosen after the command to define shallow
boundary in pack protocol. New commands also follow this tradition.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 14:38:16 -07:00
Nguyễn Thái Ngọc Duy
1dd73e20d7 fetch-pack.c: mark strings for translating
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 14:38:16 -07:00
Nguyễn Thái Ngọc Duy
0d789a5bc1 fetch-pack: use a common function for verbose printing
This reduces the number of "if (verbose)" which makes it a bit easier
to read imo. It also makes it easier to redirect all these printouts,
to a file for example.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-13 14:38:16 -07:00
Jeff King
df85757244 fetch-pack: isolate sigpipe in demuxer thread
In commit 9ff18fa (fetch-pack: ignore SIGPIPE in sideband
demuxer, 2016-02-24), we started using sigchain_push() to
ignore SIGPIPE in the async demuxer thread. However, this is
rather clumsy, as it ignores SIGPIPE for the entire process,
including the main thread. At the time we didn't have any
per-thread signal support, but we now we do. Let's use it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-20 13:33:56 -07:00
Jeff King
9ff18faf2f fetch-pack: ignore SIGPIPE in sideband demuxer
If the other side feeds us a bogus pack, index-pack (or
unpack-objects) may die early, before consuming all of its
input. As a result, the sideband demuxer may get SIGPIPE
(racily, depending on whether our data made it into the pipe
buffer or not). If this happens and we are compiled with
pthread support, it will take down the main thread, too.

This isn't the end of the world, as the main process will
just die() anyway when it sees index-pack failed. But it
does mean we don't get a chance to say "fatal: index-pack
failed" or similar. And it also means that we racily fail
t5504, as we sometimes die() and sometimes are killed by
SIGPIPE.

So let's ignore SIGPIPE while demuxing the sideband. We are
already careful to check the return value of write(), so we
won't waste time writing to a broken pipe. The caller will
notice the error return from the async thread, though in
practice we don't even get that far, as we die() as soon as
we see that index-pack failed.

The non-sideband case is already fine; we let index-pack
read straight from the socket, so there is no SIGPIPE at
all. Technically the non-threaded async case is also OK
without this (the forked async process gets SIGPIPE), but
it's not worth distinguishing from the threaded case here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-25 13:51:47 -08:00