git-commit-vandalism/Documentation/git-http-backend.txt

302 lines
11 KiB
Plaintext
Raw Normal View History

git-http-backend(1)
===================
NAME
----
git-http-backend - Server side implementation of Git over HTTP
SYNOPSIS
--------
[verse]
'git http-backend'
DESCRIPTION
-----------
A simple CGI program to serve the contents of a Git repository to Git
clients accessing the repository over http:// and https:// protocols.
The program supports clients fetching using both the smart HTTP protocol
and the backwards-compatible dumb HTTP protocol, as well as clients
pushing using the smart HTTP protocol. It also supports Git's
more-efficient "v2" protocol if properly configured; see the
discussion of `GIT_PROTOCOL` in the ENVIRONMENT section below.
It verifies that the directory has the magic file
"git-daemon-export-ok", and it will refuse to export any Git directory
that hasn't explicitly been marked for export this way (unless the
`GIT_HTTP_EXPORT_ALL` environmental variable is set).
By default, only the `upload-pack` service is enabled, which serves
'git fetch-pack' and 'git ls-remote' clients, which are invoked from
'git fetch', 'git pull', and 'git clone'. If the client is authenticated,
the `receive-pack` service is enabled, which serves 'git send-pack'
clients, which is invoked from 'git push'.
SERVICES
--------
These services can be enabled/disabled using the per-repository
configuration file:
http.getanyfile::
This serves Git clients older than version 1.6.6 that are unable to use the
upload pack service. When enabled, clients are able to read
any file within the repository, including objects that are
no longer reachable from a branch but are still present.
It is enabled by default, but a repository can disable it
by setting this configuration item to `false`.
http.uploadpack::
This serves 'git fetch-pack' and 'git ls-remote' clients.
It is enabled by default, but a repository can disable it
by setting this configuration item to `false`.
http.receivepack::
This serves 'git send-pack' clients, allowing push. It is
disabled by default for anonymous users, and enabled by
default for users authenticated by the web server. It can be
disabled by setting this item to `false`, or enabled for all
users, including anonymous users, by setting it to `true`.
URL TRANSLATION
---------------
To determine the location of the repository on disk, 'git http-backend'
concatenates the environment variables PATH_INFO, which is set
automatically by the web server, and GIT_PROJECT_ROOT, which must be set
manually in the web server configuration. If GIT_PROJECT_ROOT is not
set, 'git http-backend' reads PATH_TRANSLATED, which is also set
automatically by the web server.
EXAMPLES
--------
All of the following examples map `http://$hostname/git/foo/bar.git`
to `/var/www/git/foo/bar.git`.
Apache 2.x::
Ensure mod_cgi, mod_alias, and mod_env are enabled, set
GIT_PROJECT_ROOT (or DocumentRoot) appropriately, and
create a ScriptAlias to the CGI:
+
----------------------------------------------------------------
SetEnv GIT_PROJECT_ROOT /var/www/git
SetEnv GIT_HTTP_EXPORT_ALL
ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/
# This is not strictly necessary using Apache and a modern version of
# git-http-backend, as the webserver will pass along the header in the
# environment as HTTP_GIT_PROTOCOL, and http-backend will copy that into
# GIT_PROTOCOL. But you may need this line (or something similar if you
# are using a different webserver), or if you want to support older Git
# versions that did not do that copying.
#
# Having the webserver set up GIT_PROTOCOL is perfectly fine even with
# modern versions (and will take precedence over HTTP_GIT_PROTOCOL,
# which means it can be used to override the client's request).
SetEnvIf Git-Protocol ".*" GIT_PROTOCOL=$0
----------------------------------------------------------------
+
To enable anonymous read access but authenticated write access,
require authorization for both the initial ref advertisement (which we
detect as a push via the service parameter in the query string), and the
receive-pack invocation itself:
+
----------------------------------------------------------------
RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
RewriteCond %{REQUEST_URI} /git-receive-pack$
RewriteRule ^/git/ - [E=AUTHREQUIRED:yes]
<LocationMatch "^/git/">
Order Deny,Allow
Deny from env=AUTHREQUIRED
AuthType Basic
AuthName "Git Access"
Require group committers
Satisfy Any
...
</LocationMatch>
----------------------------------------------------------------
+
If you do not have `mod_rewrite` available to match against the query
string, it is sufficient to just protect `git-receive-pack` itself,
like:
+
----------------------------------------------------------------
<LocationMatch "^/git/.*/git-receive-pack$">
AuthType Basic
AuthName "Git Access"
Require group committers
...
</LocationMatch>
----------------------------------------------------------------
+
doc/http-backend: clarify "half-auth" repo configuration When the http-backend is set up to allow anonymous read but authenticated write, the http-backend manual suggests catching only the "/git-receive-pack" POST of the packfile, not the initial "info/refs?service=git-receive-pack" GET in which we advertise refs. This does work and is secure, as we do not allow any write during the info/refs request, and the information in the ref advertisement is the same that you would get from a fetch. However, the configuration required by the server is slightly more complex. The default `http.receivepack` setting is to allow pushes if the webserver tells us that the user authenticated, and otherwise to return a 403 ("Forbidden"). That works fine if authentication is turned on completely; the initial request requires authentication, and http-backend realizes it is OK to do a push. But for this "half-auth" state, no authentication has occurred during the initial ref advertisement. The http-backend CGI therefore does not think that pushing should be enabled, and responds with a 403. The client cannot continue, even though the server would have allowed it to run if it had provided credentials. It would be much better if the server responded with a 401, asking for credentials during the initial contact. But git-http-backend does not know about the server's auth configuration (so a 401 would be confusing in the case of a true anonymous server). Unfortunately, configuring Apache to recognize the query string and apply the auth appropriately to receive-pack (but not upload-pack) initial requests is non-trivial. The site admin can work around this by just turning on http.receivepack explicitly in its repositories. Let's document this workaround. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-11 05:32:11 +02:00
In this mode, the server will not request authentication until the
client actually starts the object negotiation phase of the push, rather
than during the initial contact. For this reason, you must also enable
the `http.receivepack` config option in any repositories that should
accept a push. The default behavior, if `http.receivepack` is not set,
is to reject any pushes by unauthenticated users; the initial request
will therefore report `403 Forbidden` to the client, without even giving
an opportunity for authentication.
+
To require authentication for both reads and writes, use a Location
directive around the repository, or one of its parent directories:
+
----------------------------------------------------------------
<Location /git/private>
AuthType Basic
AuthName "Private Git Access"
Require group committers
...
</Location>
----------------------------------------------------------------
+
To serve gitweb at the same url, use a ScriptAliasMatch to only
those URLs that 'git http-backend' can handle, and forward the
rest to gitweb:
+
----------------------------------------------------------------
ScriptAliasMatch \
"(?x)^/git/(.*/(HEAD | \
info/refs | \
objects/(info/[^/]+ | \
[0-9a-f]{2}/[0-9a-f]{38} | \
pack/pack-[0-9a-f]{40}\.(pack|idx)) | \
git-(upload|receive)-pack))$" \
/usr/libexec/git-core/git-http-backend/$1
ScriptAlias /git/ /var/www/cgi-bin/gitweb.cgi/
----------------------------------------------------------------
+
To serve multiple repositories from different linkgit:gitnamespaces[7] in a
single repository:
+
----------------------------------------------------------------
SetEnvIf Request_URI "^/git/([^/]*)" GIT_NAMESPACE=$1
ScriptAliasMatch ^/git/[^/]*(.*) /usr/libexec/git-core/git-http-backend/storage.git$1
----------------------------------------------------------------
Accelerated static Apache 2.x::
Similar to the above, but Apache can be used to return static
files that are stored on disk. On many systems this may
be more efficient as Apache can ask the kernel to copy the
file contents from the file system directly to the network:
+
----------------------------------------------------------------
SetEnv GIT_PROJECT_ROOT /var/www/git
AliasMatch ^/git/(.*/objects/[0-9a-f]{2}/[0-9a-f]{38})$ /var/www/git/$1
AliasMatch ^/git/(.*/objects/pack/pack-[0-9a-f]{40}.(pack|idx))$ /var/www/git/$1
ScriptAlias /git/ /usr/libexec/git-core/git-http-backend/
----------------------------------------------------------------
+
This can be combined with the gitweb configuration:
+
----------------------------------------------------------------
SetEnv GIT_PROJECT_ROOT /var/www/git
AliasMatch ^/git/(.*/objects/[0-9a-f]{2}/[0-9a-f]{38})$ /var/www/git/$1
AliasMatch ^/git/(.*/objects/pack/pack-[0-9a-f]{40}.(pack|idx))$ /var/www/git/$1
ScriptAliasMatch \
"(?x)^/git/(.*/(HEAD | \
info/refs | \
objects/info/[^/]+ | \
git-(upload|receive)-pack))$" \
/usr/libexec/git-core/git-http-backend/$1
ScriptAlias /git/ /var/www/cgi-bin/gitweb.cgi/
----------------------------------------------------------------
Lighttpd::
Ensure that `mod_cgi`, `mod_alias`, `mod_auth`, `mod_setenv` are
loaded, then set `GIT_PROJECT_ROOT` appropriately and redirect
all requests to the CGI:
+
----------------------------------------------------------------
alias.url += ( "/git" => "/usr/lib/git-core/git-http-backend" )
$HTTP["url"] =~ "^/git" {
cgi.assign = ("" => "")
setenv.add-environment = (
"GIT_PROJECT_ROOT" => "/var/www/git",
"GIT_HTTP_EXPORT_ALL" => ""
)
}
----------------------------------------------------------------
+
To enable anonymous read access but authenticated write access:
+
----------------------------------------------------------------
$HTTP["querystring"] =~ "service=git-receive-pack" {
include "git-auth.conf"
}
$HTTP["url"] =~ "^/git/.*/git-receive-pack$" {
include "git-auth.conf"
}
----------------------------------------------------------------
+
where `git-auth.conf` looks something like:
+
----------------------------------------------------------------
auth.require = (
"/" => (
"method" => "basic",
"realm" => "Git Access",
"require" => "valid-user"
)
)
# ...and set up auth.backend here
----------------------------------------------------------------
+
To require authentication for both reads and writes:
+
----------------------------------------------------------------
$HTTP["url"] =~ "^/git/private" {
include "git-auth.conf"
}
----------------------------------------------------------------
ENVIRONMENT
-----------
'git http-backend' relies upon the `CGI` environment variables set
by the invoking web server, including:
* PATH_INFO (if GIT_PROJECT_ROOT is set, otherwise PATH_TRANSLATED)
* REMOTE_USER
* REMOTE_ADDR
* CONTENT_TYPE
* QUERY_STRING
* REQUEST_METHOD
The `GIT_HTTP_EXPORT_ALL` environmental variable may be passed to
'git-http-backend' to bypass the check for the "git-daemon-export-ok"
file in each repository before allowing export of that repository.
http-backend: spool ref negotiation requests to buffer When http-backend spawns "upload-pack" to do ref negotiation, it streams the http request body to upload-pack, who then streams the http response back to the client as it reads. In theory, git can go full-duplex; the client can consume our response while it is still sending the request. In practice, however, HTTP is a half-duplex protocol. Even if our client is ready to read and write simultaneously, we may have other HTTP infrastructure in the way, including the webserver that spawns our CGI, or any intermediate proxies. In at least one documented case[1], this leads to deadlock when trying a fetch over http. What happens is basically: 1. Apache proxies the request to the CGI, http-backend. 2. http-backend gzip-inflates the data and sends the result to upload-pack. 3. upload-pack acts on the data and generates output over the pipe back to Apache. Apache isn't reading because it's busy writing (step 1). This works fine most of the time, because the upload-pack output ends up in a system pipe buffer, and Apache reads it as soon as it finishes writing. But if both the request and the response exceed the system pipe buffer size, then we deadlock (Apache blocks writing to http-backend, http-backend blocks writing to upload-pack, and upload-pack blocks writing to Apache). We need to break the deadlock by spooling either the input or the output. In this case, it's ideal to spool the input, because Apache does not start reading either stdout _or_ stderr until we have consumed all of the input. So until we do so, we cannot even get an error message out to the client. The solution is fairly straight-forward: we read the request body into an in-memory buffer in http-backend, freeing up Apache, and then feed the data ourselves to upload-pack. But there are a few important things to note: 1. We limit the in-memory buffer to prevent an obvious denial-of-service attack. This is a new hard limit on requests, but it's unlikely to come into play. The default value is 10MB, which covers even the ridiculous 100,000-ref negotation in the included test (that actually caps out just over 5MB). But it's configurable on the off chance that you don't mind spending some extra memory to make even ridiculous requests work. 2. We must take care only to buffer when we have to. For pushes, the incoming packfile may be of arbitrary size, and we should connect the input directly to receive-pack. There's no deadlock problem here, though, because we do not produce any output until the whole packfile has been read. For upload-pack's initial ref advertisement, we similarly do not need to buffer. Even though we may generate a lot of output, there is no request body at all (i.e., it is a GET, not a POST). [1] http://article.gmane.org/gmane.comp.version-control.git/269020 Test-adapted-from: Dennis Kaarsemaker <dennis@kaarsemaker.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 09:37:09 +02:00
The `GIT_HTTP_MAX_REQUEST_BUFFER` environment variable (or the
`http.maxRequestBuffer` config variable) may be set to change the
largest ref negotiation request that git will handle during a fetch; any
fetch requiring a larger buffer will not succeed. This value should not
normally need to be changed, but may be helpful if you are fetching from
a repository with an extremely large number of refs. The value can be
specified with a unit (e.g., `100M` for 100 megabytes). The default is
10 megabytes.
Clients may probe for optional protocol capabilities (like the v2
protocol) using the `Git-Protocol` HTTP header. In order to support
these, the contents of that header must appear in the `GIT_PROTOCOL`
environment variable. Most webservers will pass this header to the CGI
via the `HTTP_GIT_PROTOCOL` variable, and `git-http-backend` will
automatically copy that to `GIT_PROTOCOL`. However, some webservers may
be more selective about which headers they'll pass, in which case they
need to be configured explicitly (see the mention of `Git-Protocol` in
the Apache config from the earlier EXAMPLES section).
The backend process sets GIT_COMMITTER_NAME to '$REMOTE_USER' and
GIT_COMMITTER_EMAIL to '$\{REMOTE_USER}@http.$\{REMOTE_ADDR\}',
ensuring that any reflogs created by 'git-receive-pack' contain some
identifying information of the remote user who performed the push.
All `CGI` environment variables are available to each of the hooks
invoked by the 'git-receive-pack'.
GIT
---
Part of the linkgit:git[1] suite