From 1f2e05f0b794d9e4b1cf07d63c9efd1325893ecc Mon Sep 17 00:00:00 2001 From: Phillip Wood Date: Mon, 20 Mar 2023 16:10:00 +0000 Subject: [PATCH 1/4] wildmatch: fix exponential behavior MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When dowild() cannot match a '*' or '/**/' wildcard then it must return WM_ABORT_TO_STARSTAR or WM_ABORT_ALL respectively. Failure to observe this results in unnecessary backtracking and the time taken for a failed match increases exponentially with the number of wildcards in the pattern [1]. Unfortunately in some instances dowild() returns WM_NOMATCH for a failed match resulting in long match times for patterns containing multiple wildcards as can be seen in the following benchmark. (Note that the timings in the Benchmark 1 are really measuring the time to execute test-tool rather than the time to match the pattern) Benchmark 1: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a" Time (mean ± σ): 22.8 ms ± 1.7 ms [User: 12.1 ms, System: 10.6 ms] Range (min … max): 19.4 ms … 26.9 ms 113 runs Warning: Ignoring non-zero exit code. Benchmark 2: t/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a*a*a*a*a*a*a*a*a" Time (mean ± σ): 5.244 s ± 0.228 s [User: 5.229 s, System: 0.010 s] Range (min … max): 4.969 s … 5.707 s 10 runs Warning: Ignoring non-zero exit code. Summary 't/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a"' ran 230.37 ± 20.04 times faster than 't/helper/test-tool wildmatch wildmatch aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab "*a*a*a*a*a*a*a*a*a"' The security implications are limited as it only affects operations that are potentially DoS vectors. For example by creating a blob containing such a pattern a malicious user can exploit this behavior to use large amounts of CPU time on a remote server by pushing the blob and then creating a new clone with --filter=sparse:oid. However this filter type is usually disabled as it is known to consume large amounts of CPU time even without this bug. The WM_MATCH changed in the first hunk of this patch comes from the original implementation imported from rsync in 5230f605e1 (Import wildmatch from rsync, 2012-10-15). Compared to the others converted here it is fairly harmless as it only triggers at the end of the pattern and so will only cause a single unnecessary backtrack. The others introduced by 6f1a31f0aa (wildmatch: advance faster in + patterns, 2013-01-01) and 46983441ae (wildmatch: make a special case for "*/" with FNM_PATHNAME, 2013-01-01) are more pernicious and will cause exponential behavior. A new test is added to protect against future regressions. [1] https://research.swtch.com/glob Helped-by: Derrick Stolee Signed-off-by: Phillip Wood Signed-off-by: Junio C Hamano --- t/t3070-wildmatch.sh | 9 +++++++++ wildmatch.c | 12 ++++++++---- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 5d871fde96..b91a7cb712 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -431,4 +431,13 @@ match 1 1 1 1 'a' '[B-a]' match 0 1 0 1 'z' '[Z-y]' match 1 1 1 1 'Z' '[Z-y]' +test_expect_success 'matching does not exhibit exponential behavior' ' + test-tool wildmatch wildmatch \ + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab \ + "*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a" & + pid=$! && + sleep 2 && + ! kill $! +' + test_done diff --git a/wildmatch.c b/wildmatch.c index 7e5a7ea1ea..06861bd8bc 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -114,7 +114,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) * only if there are no more slash characters. */ if (!match_slash) { if (strchr((char *)text, '/')) - return WM_NOMATCH; + return WM_ABORT_TO_STARSTAR; } return WM_MATCH; } else if (!match_slash && *p == '/') { @@ -125,7 +125,7 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) */ const char *slash = strchr((char*)text, '/'); if (!slash) - return WM_NOMATCH; + return WM_ABORT_ALL; text = (const uchar*)slash; /* the slash is consumed by the top-level for loop */ break; @@ -153,8 +153,12 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) break; text++; } - if (t_ch != p_ch) - return WM_NOMATCH; + if (t_ch != p_ch) { + if (match_slash) + return WM_ABORT_ALL; + else + return WM_ABORT_TO_STARSTAR; + } } if ((matched = dowild(p, text, flags)) != WM_NOMATCH) { if (!match_slash || matched != WM_ABORT_TO_STARSTAR) From 81b26f8f2891f1a63d5dbf7c2d4209b8325062b6 Mon Sep 17 00:00:00 2001 From: Phillip Wood Date: Mon, 20 Mar 2023 16:10:01 +0000 Subject: [PATCH 2/4] wildmatch: avoid undefined behavior The code changed in this commit is designed to check if the pattern starts with "**/" or contains "/**/" (see 3a078dec33 (wildmatch: fix "**" special case, 2013-01-01)). Unfortunately when the pattern begins with "**/" `prev_p = p - 2` is evaluated when `p` points to the second "*" and so the subtraction is undefined according to section 6.5.6 of the C standard because the result does not point within the same object as `p`. Fix this by avoiding the subtraction unless it is well defined. Signed-off-by: Phillip Wood Signed-off-by: Junio C Hamano --- wildmatch.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/wildmatch.c b/wildmatch.c index 06861bd8bc..694d2f8e40 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -83,12 +83,12 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) continue; case '*': if (*++p == '*') { - const uchar *prev_p = p - 2; + const uchar *prev_p = p; while (*++p == '*') {} if (!(flags & WM_PATHNAME)) /* without WM_PATHNAME, '*' == '**' */ match_slash = 1; - else if ((prev_p < pattern || *prev_p == '/') && + else if ((prev_p - pattern < 2 || *(prev_p - 2) == '/') && (*p == '\0' || *p == '/' || (p[0] == '\\' && p[1] == '/'))) { /* From 91b81b64e332da185d3ac8679a977c665c80914e Mon Sep 17 00:00:00 2001 From: Phillip Wood Date: Mon, 20 Mar 2023 16:10:02 +0000 Subject: [PATCH 3/4] wildmatch: hide internal return values WM_ABORT_ALL and WM_ABORT_TO_STARSTAR are used internally to limit backtracking when a match fails, they are not of interest to the caller and so should not be public. Suggested-by: Derrick Stolee Signed-off-by: Phillip Wood Signed-off-by: Junio C Hamano --- wildmatch.c | 7 ++++++- wildmatch.h | 2 -- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/wildmatch.c b/wildmatch.c index 694d2f8e40..372aa6ea54 100644 --- a/wildmatch.c +++ b/wildmatch.c @@ -14,6 +14,10 @@ typedef unsigned char uchar; +/* Internal return values */ +#define WM_ABORT_ALL -1 +#define WM_ABORT_TO_STARSTAR -2 + /* What character marks an inverted character class? */ #define NEGATE_CLASS '!' #define NEGATE_CLASS2 '^' @@ -278,5 +282,6 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags) /* Match the "pattern" against the "text" string. */ int wildmatch(const char *pattern, const char *text, unsigned int flags) { - return dowild((const uchar*)pattern, (const uchar*)text, flags); + int res = dowild((const uchar*)pattern, (const uchar*)text, flags); + return res == WM_MATCH ? WM_MATCH : WM_NOMATCH; } diff --git a/wildmatch.h b/wildmatch.h index 5993696298..0c890cb56b 100644 --- a/wildmatch.h +++ b/wildmatch.h @@ -6,8 +6,6 @@ #define WM_NOMATCH 1 #define WM_MATCH 0 -#define WM_ABORT_ALL -1 -#define WM_ABORT_TO_STARSTAR -2 int wildmatch(const char *pattern, const char *text, unsigned int flags); #endif From 3dc0b7f0dcd10e9b10018e43d248d245b78bf9be Mon Sep 17 00:00:00 2001 From: Michael J Gruber Date: Fri, 24 Mar 2023 23:17:11 +0100 Subject: [PATCH 4/4] t3070: make chain lint tester happy 1f2e05f0b7 ("wildmatch: fix exponential behavior", 2023-03-20) introduced a new test with a background process. Backgrounding necessarily gives a result of 0, so that a seemingly broken && chain is not really broken. Adjust t3070 slightly so that our chain lint test recognizes the construct for what it is and does not raise a false positive. Signed-off-by: Michael J Gruber Signed-off-by: Junio C Hamano --- t/t3070-wildmatch.sh | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index b91a7cb712..4dd42df38c 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -432,10 +432,12 @@ match 0 1 0 1 'z' '[Z-y]' match 1 1 1 1 'Z' '[Z-y]' test_expect_success 'matching does not exhibit exponential behavior' ' - test-tool wildmatch wildmatch \ - aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab \ - "*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a" & - pid=$! && + { + test-tool wildmatch wildmatch \ + aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab \ + "*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a" & + pid=$! + } && sleep 2 && ! kill $! '