Commit Graph

14 Commits

Author SHA1 Message Date
Jeff King
e4c497a194 urlmatch: add underscore to URL_HOST_CHARS
When parsing a URL to normalize it, we allow hostnames to contain only
dot (".") or dash ("-"), plus brackets and colons for IPv6 literals.
This matches the old URL standard in RFC 1738, which says:

  host           = hostname | hostnumber
  hostname       = *[ domainlabel "." ] toplabel
  domainlabel    = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit

But this was later updated by RFC 3986, which is more liberal:

  host        = IP-literal / IPv4address / reg-name
  reg-name    = *( unreserved / pct-encoded / sub-delims )
  unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

While names with underscore in them are not common and possibly violate
some DNS rules, they do work in practice, and we will happily contact
them over http://, git://, or ssh://. It seems odd to ignore them for
purposes of URL matching, especially when the URL RFC seems to allow
them.

There shouldn't be any downside here. It's not a syntactically
significant character in a URL, so we won't be confused about parsing;
we'd have simply rejected such a URL previously (the test here checks
the url code directly, but the obvious user-visible effect would be
failing to match credential.http://foo_bar.example.com.helper, or
similar config in http.<url>.*).

Arguably we'd want to allow tilde ("~") here, too. There's likewise
probably no downside, but I didn't add it simply because it seems like
an even less likely character to appear in a hostname.

Reported-by: Alex Waite <alex@waite.eu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-12 18:29:25 -07:00
Johannes Schindelin
12294990c9 credential: handle credential.<partial-URL>.<key> again
In the patches for CVE-2020-11008, the ability to specify credential
settings in the config for partial URLs got lost. For example, it used
to be possible to specify a credential helper for a specific protocol:

	[credential "https://"]
		helper = my-https-helper

Likewise, it used to be possible to configure settings for a specific
host, e.g.:

	[credential "dev.azure.com"]
		useHTTPPath = true

Let's reinstate this behavior.

While at it, increase the test coverage to document and verify the
behavior with a couple other categories of partial URLs.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-24 15:53:46 -07:00
brian m. carlson
46fd7b3900 credential: allow wildcard patterns when matching config
In some cases, a user will want to use a specific credential helper for
a wildcard pattern, such as https://*.corp.example.com.  We have code
that handles this already with the urlmatch code, so let's use that
instead of our custom code.

Since the urlmatch code is a superset of our current matching in terms
of capabilities, there shouldn't be any cases of things that matched
previously that don't match now.  However, in addition to wildcard
matching, we now use partial path matching, which can cause slightly
different behavior in the case that a helper applies to the prefix
(considering path components) of the remote URL.  While different, this
is probably the behavior people were wanting anyway.

Since we're using the urlmatch code, we need to encode the components
we've gotten into a URL to match, so add a function to percent-encode
data and format the URL with it.  We now also no longer need to the
custom code to match URLs, so let's remove it.

Additionally, the urlmatch code always looks for the best match, whereas
we want all matches for credential helpers to preserve existing
behavior.  Let's add an optional field, select_fn, that lets us control
which items we want (in this case, all of them) and default it to the
best-match code that already exists for other users.

Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-20 13:05:43 -08:00
René Scharfe
5053313562 urlmatch: use hex2chr() in append_normalized_escapes()
Simplify the code by using hex2chr() to convert and check for invalid
characters at the same time instead of doing that sequentially with
one table lookup for each.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-09 09:43:01 -07:00
Patrick Steinhardt
a272b9e70a urlmatch: allow globbing for the URL host part
The URL matching function computes for two URLs whether they match not.
The match is performed by splitting up the URL into different parts and
then doing an exact comparison with the to-be-matched URL.

The main user of `urlmatch` is the configuration subsystem. It allows to
set certain configurations based on the URL which is being connected to
via keys like `http.<url>.*`. A common use case for this is to set
proxies for only some remotes which match the given URL. Unfortunately,
having exact matches for all parts of the URL can become quite tedious
in some setups. Imagine for example a corporate network where there are
dozens or even hundreds of subdomains, which would have to be configured
individually.

Allow users to write an asterisk '*' in place of any 'host' or
'subdomain' label as part of the host name.  For example,
"http.https://*.example.com.proxy" sets "http.proxy" for all direct
subdomains of "https://example.com", e.g. "https://foo.example.com", but
not "https://foo.bar.example.com".

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:22:50 -08:00
Patrick Steinhardt
af99049ca9 urlmatch: include host in urlmatch ranking
In order to be able to rank positive matches by `urlmatch`, we inspect
the path length and user part to decide whether a match is better than
another match. As all other parts are matched exactly between both URLs,
this is the correct thing to do right now.

In the future, though, we want to introduce wild cards for the domain
part. When doing this, it does not make sense anymore to only compare
the path lengths. Instead, we also want to compare the domain lengths to
determine which of both URLs matches the host part more closely.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:22:46 -08:00
Patrick Steinhardt
3ec6e6e8a0 urlmatch: split host and port fields in struct url_info
The `url_info` structure contains information about a normalized URL
with the URL's components being represented by different fields. The
host and port part though are to be accessed by the same `host` field,
so that getting the host and/or port separately becomes more involved
than really necessary.

To make the port more readily accessible, split up the host and port
fields. Namely, the `host_len` will not include the port length anymore
and a new `port_off` field has been added which includes the offset to
the port, if available.

The only user of these fields is `url_normalize_1`. This change makes it
easier later on to treat host and port differently when introducing
globs for domains.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-31 10:06:54 -08:00
Patrick Steinhardt
3e6a0e64a4 urlmatch: enable normalization of URLs with globs
The `url_normalize` function is used to validate and normalize URLs. As
such, it does not allow for some special characters to be part of the
URLs that are to be normalized. As we want to allow using globs in some
configuration keys making use of URLs, namely `http.<url>.<key>`, but
still normalize them, we need to somehow enable some additional allowed
characters.

To do this without having to change all callers of `url_normalize`,
where most do not actually want globbing at all, we split off another
function `url_normalize_1`. This function accepts an additional
parameter `allow_globs`, which is subsequently called by `url_normalize`
with `allow_globs=0`.

As of now, this function is not used with globbing enabled. A caller
will be added in the following commit.

Signed-off-by: Patrick Steinhardt <patrick.steinhardt@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-31 10:06:54 -08:00
Junio C Hamano
667f7eb2ea urlmatch.c: make match_urls() static
No external callers exist.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-15 11:05:48 -08:00
Jeff King
50a71776ab isxdigit: cast input to unsigned char
Otherwise, callers must do so or risk triggering warnings
-Wchar-subscript (and rightfully so; a signed char might
cause us to use a bogus negative index into the
hexval_table).

While we are dropping the now-unnecessary casts from the
caller in urlmatch.c, we can get rid of similar casts in
actually parsing the hex by using the hexval() helper, which
implicitly casts to unsigned (but note that we cannot
implement isxdigit in terms of hexval(), as it also casts
its return value to unsigned).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:36 -07:00
Jeff King
cf4fff579e refactor skip_prefix to return a boolean
The skip_prefix() function returns a pointer to the content
past the prefix, or NULL if the prefix was not found. While
this is nice and simple, in practice it makes it hard to use
for two reasons:

  1. When you want to conditionally skip or keep the string
     as-is, you have to introduce a temporary variable.
     For example:

       tmp = skip_prefix(buf, "foo");
       if (tmp)
	       buf = tmp;

  2. It is verbose to check the outcome in a conditional, as
     you need extra parentheses to silence compiler
     warnings. For example:

       if ((cp = skip_prefix(buf, "foo"))
	       /* do something with cp */

Both of these make it harder to use for long if-chains, and
we tend to use starts_with() instead. However, the first line
of "do something" is often to then skip forward in buf past
the prefix, either using a magic constant or with an extra
strlen(3) (which is generally computed at compile time, but
means we are repeating ourselves).

This patch refactors skip_prefix() to return a simple boolean,
and to provide the pointer value as an out-parameter. If the
prefix is not found, the out-parameter is untouched. This
lets you write:

  if (skip_prefix(arg, "foo ", &arg))
	  do_foo(arg);
  else if (skip_prefix(arg, "bar ", &arg))
	  do_bar(arg);

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:43 -07:00
Thomas Rast
a7f0a0efa5 urlmatch.c: recompute pointer after append_normalized_escapes
When append_normalized_escapes is called, its internal strbuf_add* calls can
cause the strbuf's buf to be reallocated changing the value of the buf pointer.

Do not use the strbuf buf pointer from before any append_normalized_escapes
calls afterwards.  Instead recompute the needed pointer.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-12 15:27:01 -07:00
Junio C Hamano
836b6fb5a5 config: add generic callback wrapper to parse section.<url>.key
Existing configuration parsing functions (e.g. http_options() in
http.c) know how to parse two-level configuration variable names.
We would like to exploit them and parse something like this:

	[http]
		sslVerify = true
	[http "https://weak.example.com"]
		sslVerify = false

and pretend as if http.sslVerify were set to false when talking to
"https://weak.example.com/path".

Introduce `urlmatch_config_entry()` wrapper that:

 - is called with the target URL (e.g. "https://weak.example.com/path"),
   and the two-level variable parser (e.g. `http_options`);

 - uses `url_normalize()` and `match_urls()` to see if configuration
   data matches the target URL; and

 - calls the traditional two-level configuration variable parser
   only for the configuration data whose <url> part matches the
   target URL (and if there are multiple matches, only do so if the
   current match is a better match than the ones previously seen).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 14:58:42 -07:00
Kyle J. McKay
3402a8dc48 config: add helper to normalize and match URLs
Some http.* configuration variables need to take values customized
for the URL we are talking to.  We may want to set http.sslVerify to
true in general but to false only for a certain site, for example,
with a configuration file like this:

	[http]
		sslVerify = true
	[http "https://weak.example.com"]
		sslVerify = false

and let the configuration machinery pick up the latter only when
talking to "https://weak.example.com".  The latter needs to kick in
not only when the URL is exactly "https://weak.example.com", but
also is anything that "match" it, e.g.

	https://weak.example.com/test
	https://me@weak.example.com/test

The <url> in the configuration key consists of the following parts,
and is considered a match to the URL we are attempting to access
under certain conditions:

  . Scheme (e.g., `https` in `https://example.com/`). This field
    must match exactly between the config key and the URL.

  . Host/domain name (e.g., `example.com` in `https://example.com/`).
    This field must match exactly between the config key and the URL.

  . Port number (e.g., `8080` in `http://example.com:8080/`).  This
    field must match exactly between the config key and the URL.
    Omitted port numbers are automatically converted to the correct
    default for the scheme before matching.

  . Path (e.g., `repo.git` in `https://example.com/repo.git`). The
    path field of the config key must match the path field of the
    URL either exactly or as a prefix of slash-delimited path
    elements.  A config key with path `foo/` matches URL path
    `foo/bar`.  A prefix can only match on a slash (`/`) boundary.
    Longer matches take precedence (so a config key with path
    `foo/bar` is a better match to URL path `foo/bar` than a config
    key with just path `foo/`).

  . User name (e.g., `me` in `https://me@example.com/repo.git`). If
    the config key has a user name, it must match the user name in
    the URL exactly. If the config key does not have a user name,
    that config key will match a URL with any user name (including
    none), but at a lower precedence than a config key with a user
    name.

Longer matches take precedence over shorter matches.

This step adds two helper functions `url_normalize()` and
`match_urls()` to help implement the above semantics. The
normalization rules are based on RFC 3986 and should result in any
two equivalent urls being a match.

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 14:57:57 -07:00