Merge branch 'jh/partial-clone-doc'
Doc updates. * jh/partial-clone-doc: partial-clone: render design doc using asciidoc
This commit is contained in:
commit
6bbd1034d8
@ -76,6 +76,7 @@ TECH_DOCS += technical/long-running-process-protocol
|
||||
TECH_DOCS += technical/pack-format
|
||||
TECH_DOCS += technical/pack-heuristics
|
||||
TECH_DOCS += technical/pack-protocol
|
||||
TECH_DOCS += technical/partial-clone
|
||||
TECH_DOCS += technical/protocol-capabilities
|
||||
TECH_DOCS += technical/protocol-common
|
||||
TECH_DOCS += technical/protocol-v2
|
||||
|
@ -69,24 +69,24 @@ Design Details
|
||||
|
||||
- A new pack-protocol capability "filter" is added to the fetch-pack and
|
||||
upload-pack negotiation.
|
||||
|
||||
This uses the existing capability discovery mechanism.
|
||||
See "filter" in Documentation/technical/pack-protocol.txt.
|
||||
+
|
||||
This uses the existing capability discovery mechanism.
|
||||
See "filter" in Documentation/technical/pack-protocol.txt.
|
||||
|
||||
- Clients pass a "filter-spec" to clone and fetch which is passed to the
|
||||
server to request filtering during packfile construction.
|
||||
|
||||
There are various filters available to accommodate different situations.
|
||||
See "--filter=<filter-spec>" in Documentation/rev-list-options.txt.
|
||||
+
|
||||
There are various filters available to accommodate different situations.
|
||||
See "--filter=<filter-spec>" in Documentation/rev-list-options.txt.
|
||||
|
||||
- On the server pack-objects applies the requested filter-spec as it
|
||||
creates "filtered" packfiles for the client.
|
||||
|
||||
These filtered packfiles are *incomplete* in the traditional sense because
|
||||
they may contain objects that reference objects not contained in the
|
||||
packfile and that the client doesn't already have. For example, the
|
||||
filtered packfile may contain trees or tags that reference missing blobs
|
||||
or commits that reference missing trees.
|
||||
+
|
||||
These filtered packfiles are *incomplete* in the traditional sense because
|
||||
they may contain objects that reference objects not contained in the
|
||||
packfile and that the client doesn't already have. For example, the
|
||||
filtered packfile may contain trees or tags that reference missing blobs
|
||||
or commits that reference missing trees.
|
||||
|
||||
- On the client these incomplete packfiles are marked as "promisor packfiles"
|
||||
and treated differently by various commands.
|
||||
@ -104,47 +104,47 @@ Handling Missing Objects
|
||||
to repository corruption. To differentiate these cases, the local
|
||||
repository specially indicates such filtered packfiles obtained from the
|
||||
promisor remote as "promisor packfiles".
|
||||
|
||||
These promisor packfiles consist of a "<name>.promisor" file with
|
||||
arbitrary contents (like the "<name>.keep" files), in addition to
|
||||
their "<name>.pack" and "<name>.idx" files.
|
||||
+
|
||||
These promisor packfiles consist of a "<name>.promisor" file with
|
||||
arbitrary contents (like the "<name>.keep" files), in addition to
|
||||
their "<name>.pack" and "<name>.idx" files.
|
||||
|
||||
- The local repository considers a "promisor object" to be an object that
|
||||
it knows (to the best of its ability) that the promisor remote has promised
|
||||
that it has, either because the local repository has that object in one of
|
||||
its promisor packfiles, or because another promisor object refers to it.
|
||||
|
||||
When Git encounters a missing object, Git can see if it a promisor object
|
||||
and handle it appropriately. If not, Git can report a corruption.
|
||||
|
||||
This means that there is no need for the client to explicitly maintain an
|
||||
expensive-to-modify list of missing objects.[a]
|
||||
+
|
||||
When Git encounters a missing object, Git can see if it a promisor object
|
||||
and handle it appropriately. If not, Git can report a corruption.
|
||||
+
|
||||
This means that there is no need for the client to explicitly maintain an
|
||||
expensive-to-modify list of missing objects.[a]
|
||||
|
||||
- Since almost all Git code currently expects any referenced object to be
|
||||
present locally and because we do not want to force every command to do
|
||||
a dry-run first, a fallback mechanism is added to allow Git to attempt
|
||||
to dynamically fetch missing objects from the promisor remote.
|
||||
|
||||
When the normal object lookup fails to find an object, Git invokes
|
||||
fetch-object to try to get the object from the server and then retry
|
||||
the object lookup. This allows objects to be "faulted in" without
|
||||
complicated prediction algorithms.
|
||||
|
||||
For efficiency reasons, no check as to whether the missing object is
|
||||
actually a promisor object is performed.
|
||||
|
||||
Dynamic object fetching tends to be slow as objects are fetched one at
|
||||
a time.
|
||||
+
|
||||
When the normal object lookup fails to find an object, Git invokes
|
||||
fetch-object to try to get the object from the server and then retry
|
||||
the object lookup. This allows objects to be "faulted in" without
|
||||
complicated prediction algorithms.
|
||||
+
|
||||
For efficiency reasons, no check as to whether the missing object is
|
||||
actually a promisor object is performed.
|
||||
+
|
||||
Dynamic object fetching tends to be slow as objects are fetched one at
|
||||
a time.
|
||||
|
||||
- `checkout` (and any other command using `unpack-trees`) has been taught
|
||||
to bulk pre-fetch all required missing blobs in a single batch.
|
||||
|
||||
- `rev-list` has been taught to print missing objects.
|
||||
|
||||
This can be used by other commands to bulk prefetch objects.
|
||||
For example, a "git log -p A..B" may internally want to first do
|
||||
something like "git rev-list --objects --quiet --missing=print A..B"
|
||||
and prefetch those objects in bulk.
|
||||
+
|
||||
This can be used by other commands to bulk prefetch objects.
|
||||
For example, a "git log -p A..B" may internally want to first do
|
||||
something like "git rev-list --objects --quiet --missing=print A..B"
|
||||
and prefetch those objects in bulk.
|
||||
|
||||
- `fsck` has been updated to be fully aware of promisor objects.
|
||||
|
||||
@ -154,11 +154,11 @@ Handling Missing Objects
|
||||
- The global variable "fetch_if_missing" is used to control whether an
|
||||
object lookup will attempt to dynamically fetch a missing object or
|
||||
report an error.
|
||||
|
||||
We are not happy with this global variable and would like to remove it,
|
||||
but that requires significant refactoring of the object code to pass an
|
||||
additional flag. We hope that concurrent efforts to add an ODB API can
|
||||
encompass this.
|
||||
+
|
||||
We are not happy with this global variable and would like to remove it,
|
||||
but that requires significant refactoring of the object code to pass an
|
||||
additional flag. We hope that concurrent efforts to add an ODB API can
|
||||
encompass this.
|
||||
|
||||
|
||||
Fetching Missing Objects
|
||||
@ -168,10 +168,10 @@ Fetching Missing Objects
|
||||
transport_fetch_refs(), setting a new transport option
|
||||
TRANS_OPT_NO_DEPENDENTS to indicate that only the objects themselves are
|
||||
desired, not any object that they refer to.
|
||||
|
||||
Because some transports invoke fetch_pack() in the same process, fetch_pack()
|
||||
has been updated to not use any object flags when the corresponding argument
|
||||
(no_dependents) is set.
|
||||
+
|
||||
Because some transports invoke fetch_pack() in the same process, fetch_pack()
|
||||
has been updated to not use any object flags when the corresponding argument
|
||||
(no_dependents) is set.
|
||||
|
||||
- The local repository sends a request with the hashes of all requested
|
||||
objects as "want" lines, and does not perform any packfile negotiation.
|
||||
@ -187,13 +187,13 @@ Current Limitations
|
||||
|
||||
- The remote used for a partial clone (or the first partial fetch
|
||||
following a regular clone) is marked as the "promisor remote".
|
||||
|
||||
We are currently limited to a single promisor remote and only that
|
||||
remote may be used for subsequent partial fetches.
|
||||
|
||||
We accept this limitation because we believe initial users of this
|
||||
feature will be using it on repositories with a strong single central
|
||||
server.
|
||||
+
|
||||
We are currently limited to a single promisor remote and only that
|
||||
remote may be used for subsequent partial fetches.
|
||||
+
|
||||
We accept this limitation because we believe initial users of this
|
||||
feature will be using it on repositories with a strong single central
|
||||
server.
|
||||
|
||||
- Dynamic object fetching will only ask the promisor remote for missing
|
||||
objects. We assume that the promisor remote has a complete view of the
|
||||
@ -221,13 +221,13 @@ Future Work
|
||||
- Allow more than one promisor remote and define a strategy for fetching
|
||||
missing objects from specific promisor remotes or of iterating over the
|
||||
set of promisor remotes until a missing object is found.
|
||||
|
||||
A user might want to have multiple geographically-close cache servers
|
||||
for fetching missing blobs while continuing to do filtered `git-fetch`
|
||||
commands from the central server, for example.
|
||||
|
||||
Or the user might want to work in a triangular work flow with multiple
|
||||
promisor remotes that each have an incomplete view of the repository.
|
||||
+
|
||||
A user might want to have multiple geographically-close cache servers
|
||||
for fetching missing blobs while continuing to do filtered `git-fetch`
|
||||
commands from the central server, for example.
|
||||
+
|
||||
Or the user might want to work in a triangular work flow with multiple
|
||||
promisor remotes that each have an incomplete view of the repository.
|
||||
|
||||
- Allow repack to work on promisor packfiles (while keeping them distinct
|
||||
from non-promisor packfiles).
|
||||
@ -238,25 +238,25 @@ Future Work
|
||||
- Investigate use of a long-running process to dynamically fetch a series
|
||||
of objects, such as proposed in [5,6] to reduce process startup and
|
||||
overhead costs.
|
||||
|
||||
It would be nice if pack protocol V2 could allow that long-running
|
||||
process to make a series of requests over a single long-running
|
||||
connection.
|
||||
+
|
||||
It would be nice if pack protocol V2 could allow that long-running
|
||||
process to make a series of requests over a single long-running
|
||||
connection.
|
||||
|
||||
- Investigate pack protocol V2 to avoid the info/refs broadcast on
|
||||
each connection with the server to dynamically fetch missing objects.
|
||||
|
||||
- Investigate the need to handle loose promisor objects.
|
||||
|
||||
Objects in promisor packfiles are allowed to reference missing objects
|
||||
that can be dynamically fetched from the server. An assumption was
|
||||
made that loose objects are only created locally and therefore should
|
||||
not reference a missing object. We may need to revisit that assumption
|
||||
if, for example, we dynamically fetch a missing tree and store it as a
|
||||
loose object rather than a single object packfile.
|
||||
|
||||
This does not necessarily mean we need to mark loose objects as promisor;
|
||||
it may be sufficient to relax the object lookup or is-promisor functions.
|
||||
+
|
||||
Objects in promisor packfiles are allowed to reference missing objects
|
||||
that can be dynamically fetched from the server. An assumption was
|
||||
made that loose objects are only created locally and therefore should
|
||||
not reference a missing object. We may need to revisit that assumption
|
||||
if, for example, we dynamically fetch a missing tree and store it as a
|
||||
loose object rather than a single object packfile.
|
||||
+
|
||||
This does not necessarily mean we need to mark loose objects as promisor;
|
||||
it may be sufficient to relax the object lookup or is-promisor functions.
|
||||
|
||||
|
||||
Non-Tasks
|
||||
@ -265,13 +265,13 @@ Non-Tasks
|
||||
- Every time the subject of "demand loading blobs" comes up it seems
|
||||
that someone suggests that the server be allowed to "guess" and send
|
||||
additional objects that may be related to the requested objects.
|
||||
|
||||
No work has gone into actually doing that; we're just documenting that
|
||||
it is a common suggestion. We're not sure how it would work and have
|
||||
no plans to work on it.
|
||||
|
||||
It is valid for the server to send more objects than requested (even
|
||||
for a dynamic object fetch), but we are not building on that.
|
||||
+
|
||||
No work has gone into actually doing that; we're just documenting that
|
||||
it is a common suggestion. We're not sure how it would work and have
|
||||
no plans to work on it.
|
||||
+
|
||||
It is valid for the server to send more objects than requested (even
|
||||
for a dynamic object fetch), but we are not building on that.
|
||||
|
||||
|
||||
Footnotes
|
||||
@ -282,43 +282,43 @@ Footnotes
|
||||
This would essentially be a sorted linear list of OIDs that the were
|
||||
omitted by the server during a clone or subsequent fetches.
|
||||
|
||||
This file would need to be loaded into memory on every object lookup.
|
||||
It would need to be read, updated, and re-written (like the .git/index)
|
||||
on every explicit "git fetch" command *and* on any dynamic object fetch.
|
||||
This file would need to be loaded into memory on every object lookup.
|
||||
It would need to be read, updated, and re-written (like the .git/index)
|
||||
on every explicit "git fetch" command *and* on any dynamic object fetch.
|
||||
|
||||
The cost to read, update, and write this file could add significant
|
||||
overhead to every command if there are many missing objects. For example,
|
||||
if there are 100M missing blobs, this file would be at least 2GiB on disk.
|
||||
The cost to read, update, and write this file could add significant
|
||||
overhead to every command if there are many missing objects. For example,
|
||||
if there are 100M missing blobs, this file would be at least 2GiB on disk.
|
||||
|
||||
With the "promisor" concept, we *infer* a missing object based upon the
|
||||
type of packfile that references it.
|
||||
With the "promisor" concept, we *infer* a missing object based upon the
|
||||
type of packfile that references it.
|
||||
|
||||
|
||||
Related Links
|
||||
-------------
|
||||
[0] https://bugs.chromium.org/p/git/issues/detail?id=2
|
||||
Chromium work item for: Partial Clone
|
||||
[0] https://crbug.com/git/2
|
||||
Bug#2: Partial Clone
|
||||
|
||||
[1] https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/
|
||||
Subject: [RFC] Add support for downloading blobs on demand
|
||||
[1] https://public-inbox.org/git/20170113155253.1644-1-benpeart@microsoft.com/ +
|
||||
Subject: [RFC] Add support for downloading blobs on demand +
|
||||
Date: Fri, 13 Jan 2017 10:52:53 -0500
|
||||
|
||||
[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/
|
||||
Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches)
|
||||
[2] https://public-inbox.org/git/cover.1506714999.git.jonathantanmy@google.com/ +
|
||||
Subject: [PATCH 00/18] Partial clone (from clone to lazy fetch in 18 patches) +
|
||||
Date: Fri, 29 Sep 2017 13:11:36 -0700
|
||||
|
||||
[3] https://public-inbox.org/git/20170426221346.25337-1-jonathantanmy@google.com/
|
||||
Subject: Proposal for missing blob support in Git repos
|
||||
[3] https://public-inbox.org/git/20170426221346.25337-1-jonathantanmy@google.com/ +
|
||||
Subject: Proposal for missing blob support in Git repos +
|
||||
Date: Wed, 26 Apr 2017 15:13:46 -0700
|
||||
|
||||
[4] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/
|
||||
Subject: [PATCH 00/10] RFC Partial Clone and Fetch
|
||||
[4] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@jeffhostetler.com/ +
|
||||
Subject: [PATCH 00/10] RFC Partial Clone and Fetch +
|
||||
Date: Wed, 8 Mar 2017 18:50:29 +0000
|
||||
|
||||
[5] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/
|
||||
Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module
|
||||
[5] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/ +
|
||||
Subject: [PATCH v7 00/10] refactor the filter process code into a reusable module +
|
||||
Date: Fri, 5 May 2017 11:27:52 -0400
|
||||
|
||||
[6] https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/
|
||||
Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand
|
||||
[6] https://public-inbox.org/git/20170714132651.170708-1-benpeart@microsoft.com/ +
|
||||
Subject: [RFC/PATCH v2 0/1] Add support for downloading blobs on demand +
|
||||
Date: Fri, 14 Jul 2017 09:26:50 -0400
|
||||
|
Loading…
Reference in New Issue
Block a user