git-commit-vandalism/refs.c

2615 lines
65 KiB
C
Raw Normal View History

/*
* The backend-independent part of the reference module.
*/
#include "cache.h"
#include "alloc.h"
#include "config.h"
#include "environment.h"
#include "hashmap.h"
#include "gettext.h"
#include "hex.h"
#include "lockfile.h"
#include "iterator.h"
#include "refs.h"
#include "refs/refs-internal.h"
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
#include "run-command.h"
#include "hook.h"
#include "object-store.h"
#include "object.h"
#include "tag.h"
#include "submodule.h"
#include "worktree.h"
#include "strvec.h"
#include "repository.h"
#include "setup.h"
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
#include "sigchain.h"
#include "date.h"
#include "commit.h"
#include "wrapper.h"
/*
* List of all available backends
*/
static struct ref_storage_be *refs_backends = &refs_be_files;
static struct ref_storage_be *find_ref_storage_backend(const char *name)
{
struct ref_storage_be *be;
for (be = refs_backends; be; be = be->next)
if (!strcmp(be->name, name))
return be;
return NULL;
}
/*
* How to handle various characters in refnames:
* 0: An acceptable character for refs
* 1: End-of-component
* 2: ., look for a preceding . to reject .. in refs
* 3: {, look for a preceding @ to reject @{ in refs
* 4: A bad character: ASCII control characters, and
* ":", "?", "[", "\", "^", "~", SP, or TAB
* 5: *, reject unless REFNAME_REFSPEC_PATTERN is set
*/
static unsigned char refname_disposition[256] = {
1, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5, 0, 0, 0, 2, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 4,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 4, 0, 4, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 4, 4
};
struct ref_namespace_info ref_namespace[] = {
[NAMESPACE_HEAD] = {
.ref = "HEAD",
.decoration = DECORATION_REF_HEAD,
.exact = 1,
},
[NAMESPACE_BRANCHES] = {
.ref = "refs/heads/",
.decoration = DECORATION_REF_LOCAL,
},
[NAMESPACE_TAGS] = {
.ref = "refs/tags/",
.decoration = DECORATION_REF_TAG,
},
[NAMESPACE_REMOTE_REFS] = {
/*
* The default refspec for new remotes copies refs from
* refs/heads/ on the remote into refs/remotes/<remote>/.
* As such, "refs/remotes/" has special handling.
*/
.ref = "refs/remotes/",
.decoration = DECORATION_REF_REMOTE,
},
[NAMESPACE_STASH] = {
/*
* The single ref "refs/stash" stores the latest stash.
* Older stashes can be found in the reflog.
*/
.ref = "refs/stash",
.exact = 1,
.decoration = DECORATION_REF_STASH,
},
[NAMESPACE_REPLACE] = {
/*
* This namespace allows Git to act as if one object ID
* points to the content of another. Unlike the other
* ref namespaces, this one can be changed by the
* GIT_REPLACE_REF_BASE environment variable. This
* .namespace value will be overwritten in setup_git_env().
*/
.ref = "refs/replace/",
.decoration = DECORATION_GRAFTED,
},
[NAMESPACE_NOTES] = {
/*
* The refs/notes/commit ref points to the tip of a
* parallel commit history that adds metadata to commits
* in the normal history. This ref can be overwritten
* by the core.notesRef config variable or the
* GIT_NOTES_REFS environment variable.
*/
.ref = "refs/notes/commit",
.exact = 1,
},
[NAMESPACE_PREFETCH] = {
/*
* Prefetch refs are written by the background 'fetch'
* maintenance task. It allows faster foreground fetches
* by advertising these previously-downloaded tips without
* updating refs/remotes/ without user intervention.
*/
.ref = "refs/prefetch/",
},
[NAMESPACE_REWRITTEN] = {
/*
* Rewritten refs are used by the 'label' command in the
* sequencer. These are particularly useful during an
* interactive rebase that uses the 'merge' command.
*/
.ref = "refs/rewritten/",
},
};
void update_ref_namespace(enum ref_namespace namespace, char *ref)
{
struct ref_namespace_info *info = &ref_namespace[namespace];
if (info->ref_updated)
free(info->ref);
info->ref = ref;
info->ref_updated = 1;
}
/*
* Try to read one refname component from the front of refname.
* Return the length of the component found, or -1 if the component is
* not legal. It is legal if it is something reasonable to have under
* ".git/refs/"; We do not like it if:
*
* - it begins with ".", or
* - it has double dots "..", or
* - it has ASCII control characters, or
* - it has ":", "?", "[", "\", "^", "~", SP, or TAB anywhere, or
* - it has "*" anywhere unless REFNAME_REFSPEC_PATTERN is set, or
* - it ends with a "/", or
* - it ends with ".lock", or
* - it contains a "@{" portion
*
* When sanitized is not NULL, instead of rejecting the input refname
* as an error, try to come up with a usable replacement for the input
* refname in it.
*/
static int check_refname_component(const char *refname, int *flags,
struct strbuf *sanitized)
{
const char *cp;
char last = '\0';
size_t component_start = 0; /* garbage - not a reasonable initial value */
if (sanitized)
component_start = sanitized->len;
for (cp = refname; ; cp++) {
int ch = *cp & 255;
unsigned char disp = refname_disposition[ch];
if (sanitized && disp != 1)
strbuf_addch(sanitized, ch);
switch (disp) {
case 1:
goto out;
case 2:
if (last == '.') { /* Refname contains "..". */
if (sanitized)
/* collapse ".." to single "." */
strbuf_setlen(sanitized, sanitized->len - 1);
else
return -1;
}
break;
case 3:
if (last == '@') { /* Refname contains "@{". */
if (sanitized)
sanitized->buf[sanitized->len-1] = '-';
else
return -1;
}
break;
case 4:
/* forbidden char */
if (sanitized)
sanitized->buf[sanitized->len-1] = '-';
else
return -1;
break;
case 5:
if (!(*flags & REFNAME_REFSPEC_PATTERN)) {
/* refspec can't be a pattern */
if (sanitized)
sanitized->buf[sanitized->len-1] = '-';
else
return -1;
}
/*
* Unset the pattern flag so that we only accept
* a single asterisk for one side of refspec.
*/
*flags &= ~ REFNAME_REFSPEC_PATTERN;
break;
}
last = ch;
}
out:
if (cp == refname)
return 0; /* Component has zero length. */
if (refname[0] == '.') { /* Component starts with '.'. */
if (sanitized)
sanitized->buf[component_start] = '-';
else
return -1;
}
if (cp - refname >= LOCK_SUFFIX_LEN &&
!memcmp(cp - LOCK_SUFFIX_LEN, LOCK_SUFFIX, LOCK_SUFFIX_LEN)) {
if (!sanitized)
return -1;
/* Refname ends with ".lock". */
while (strbuf_strip_suffix(sanitized, LOCK_SUFFIX)) {
/* try again in case we have .lock.lock */
}
}
return cp - refname;
}
static int check_or_sanitize_refname(const char *refname, int flags,
struct strbuf *sanitized)
{
int component_len, component_count = 0;
if (!strcmp(refname, "@")) {
/* Refname is a single character '@'. */
if (sanitized)
strbuf_addch(sanitized, '-');
else
return -1;
}
while (1) {
if (sanitized && sanitized->len)
strbuf_complete(sanitized, '/');
/* We are at the start of a path component. */
component_len = check_refname_component(refname, &flags,
sanitized);
if (sanitized && component_len == 0)
; /* OK, omit empty component */
else if (component_len <= 0)
return -1;
component_count++;
if (refname[component_len] == '\0')
break;
/* Skip to next component. */
refname += component_len + 1;
}
if (refname[component_len - 1] == '.') {
/* Refname ends with '.'. */
if (sanitized)
; /* omit ending dot */
else
return -1;
}
if (!(flags & REFNAME_ALLOW_ONELEVEL) && component_count < 2)
return -1; /* Refname has only one component. */
return 0;
}
int check_refname_format(const char *refname, int flags)
{
return check_or_sanitize_refname(refname, flags, NULL);
}
void sanitize_refname_component(const char *refname, struct strbuf *out)
{
if (check_or_sanitize_refname(refname, REFNAME_ALLOW_ONELEVEL, out))
BUG("sanitizing refname '%s' check returned error", refname);
}
int refname_is_safe(const char *refname)
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 20:45:43 +02:00
{
const char *rest;
if (skip_prefix(refname, "refs/", &rest)) {
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 20:45:43 +02:00
char *buf;
int result;
size_t restlen = strlen(rest);
/* rest must not be empty, or start or end with "/" */
if (!restlen || *rest == '/' || rest[restlen - 1] == '/')
return 0;
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 20:45:43 +02:00
/*
* Does the refname try to escape refs/?
* For example: refs/foo/../bar is safe but refs/foo/../../bar
* is not.
*/
buf = xmallocz(restlen);
result = !normalize_path_copy(buf, rest) && !strcmp(buf, rest);
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 20:45:43 +02:00
free(buf);
return result;
}
do {
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 20:45:43 +02:00
if (!isupper(*refname) && *refname != '_')
return 0;
refname++;
} while (*refname);
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 20:45:43 +02:00
return 1;
}
/*
* Return true if refname, which has the specified oid and flags, can
* be resolved to an object in the database. If the referred-to object
* does not exist, emit a warning and return false.
*/
int ref_resolves_to_object(const char *refname,
struct repository *repo,
const struct object_id *oid,
unsigned int flags)
{
if (flags & REF_ISBROKEN)
return 0;
if (!repo_has_object_file(repo, oid)) {
error(_("%s does not point to a valid object!"), refname);
return 0;
}
return 1;
}
char *refs_resolve_refdup(struct ref_store *refs,
const char *refname, int resolve_flags,
struct object_id *oid, int *flags)
{
const char *result;
result = refs_resolve_ref_unsafe(refs, refname, resolve_flags,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
oid, flags);
return xstrdup_or_null(result);
}
char *resolve_refdup(const char *refname, int resolve_flags,
struct object_id *oid, int *flags)
{
return refs_resolve_refdup(get_main_ref_store(the_repository),
refname, resolve_flags,
oid, flags);
}
/* The argument to filter_refs */
struct ref_filter {
const char *pattern;
const char *prefix;
each_ref_fn *fn;
void *cb_data;
};
int read_ref_full(const char *refname, int resolve_flags, struct object_id *oid, int *flags)
{
struct ref_store *refs = get_main_ref_store(the_repository);
if (refs_resolve_ref_unsafe(refs, refname, resolve_flags,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
oid, flags))
return 0;
return -1;
}
int read_ref(const char *refname, struct object_id *oid)
{
return read_ref_full(refname, RESOLVE_REF_READING, oid, NULL);
}
int refs_ref_exists(struct ref_store *refs, const char *refname)
{
return !!refs_resolve_ref_unsafe(refs, refname, RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
NULL, NULL);
}
int ref_exists(const char *refname)
{
return refs_ref_exists(get_main_ref_store(the_repository), refname);
}
static int filter_refs(const char *refname, const struct object_id *oid,
int flags, void *data)
{
struct ref_filter *filter = (struct ref_filter *)data;
if (wildmatch(filter->pattern, refname, 0))
return 0;
if (filter->prefix)
skip_prefix(refname, filter->prefix, &refname);
return filter->fn(refname, oid, flags, filter->cb_data);
}
enum peel_status peel_object(const struct object_id *name, struct object_id *oid)
{
struct object *o = lookup_unknown_object(the_repository, name);
if (o->type == OBJ_NONE) {
int type = oid_object_info(the_repository, name, NULL);
if (type < 0 || !object_as_type(o, type, 0))
return PEEL_INVALID;
}
if (o->type != OBJ_TAG)
return PEEL_NON_TAG;
o = deref_tag_noverify(o);
if (!o)
return PEEL_INVALID;
oidcpy(oid, &o->oid);
return PEEL_PEELED;
}
struct warn_if_dangling_data {
FILE *fp;
const char *refname;
const struct string_list *refnames;
const char *msg_fmt;
};
static int warn_if_dangling_symref(const char *refname,
const struct object_id *oid UNUSED,
int flags, void *cb_data)
{
struct warn_if_dangling_data *d = cb_data;
const char *resolves_to;
if (!(flags & REF_ISSYMREF))
return 0;
resolves_to = resolve_ref_unsafe(refname, 0, NULL, NULL);
if (!resolves_to
|| (d->refname
? strcmp(resolves_to, d->refname)
: !string_list_has_string(d->refnames, resolves_to))) {
return 0;
}
fprintf(d->fp, d->msg_fmt, refname);
fputc('\n', d->fp);
return 0;
}
void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname)
{
struct warn_if_dangling_data data;
data.fp = fp;
data.refname = refname;
data.refnames = NULL;
data.msg_fmt = msg_fmt;
for_each_rawref(warn_if_dangling_symref, &data);
}
void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list *refnames)
{
struct warn_if_dangling_data data;
data.fp = fp;
data.refname = NULL;
data.refnames = refnames;
data.msg_fmt = msg_fmt;
for_each_rawref(warn_if_dangling_symref, &data);
}
int refs_for_each_tag_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref_in(refs, "refs/tags/", fn, cb_data);
}
int for_each_tag_ref(each_ref_fn fn, void *cb_data)
{
return refs_for_each_tag_ref(get_main_ref_store(the_repository), fn, cb_data);
}
int refs_for_each_branch_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref_in(refs, "refs/heads/", fn, cb_data);
}
int for_each_branch_ref(each_ref_fn fn, void *cb_data)
{
return refs_for_each_branch_ref(get_main_ref_store(the_repository), fn, cb_data);
}
int refs_for_each_remote_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref_in(refs, "refs/remotes/", fn, cb_data);
}
int for_each_remote_ref(each_ref_fn fn, void *cb_data)
{
return refs_for_each_remote_ref(get_main_ref_store(the_repository), fn, cb_data);
}
int head_ref_namespaced(each_ref_fn fn, void *cb_data)
{
struct strbuf buf = STRBUF_INIT;
int ret = 0;
struct object_id oid;
int flag;
strbuf_addf(&buf, "%sHEAD", get_git_namespace());
if (!read_ref_full(buf.buf, RESOLVE_REF_READING, &oid, &flag))
ret = fn(buf.buf, &oid, flag, cb_data);
strbuf_release(&buf);
return ret;
}
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-21 22:33:41 +01:00
void normalize_glob_ref(struct string_list_item *item, const char *prefix,
const char *pattern)
{
struct strbuf normalized_pattern = STRBUF_INIT;
if (*pattern == '/')
BUG("pattern must not start with '/'");
if (prefix)
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-21 22:33:41 +01:00
strbuf_addstr(&normalized_pattern, prefix);
else if (!starts_with(pattern, "refs/") &&
strcmp(pattern, "HEAD"))
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-21 22:33:41 +01:00
strbuf_addstr(&normalized_pattern, "refs/");
/*
* NEEDSWORK: Special case other symrefs such as REBASE_HEAD,
* MERGE_HEAD, etc.
*/
log: add option to choose which refs to decorate When `log --decorate` is used, git will decorate commits with all available refs. While in most cases this may give the desired effect, under some conditions it can lead to excessively verbose output. Introduce two command line options, `--decorate-refs=<pattern>` and `--decorate-refs-exclude=<pattern>` to allow the user to select which refs are used in decoration. When "--decorate-refs=<pattern>" is given, only the refs that match the pattern are used in decoration. The refs that match the pattern when "--decorate-refs-exclude=<pattern>" is given, are never used in decoration. These options follow the same convention for mixing negative and positive patterns across the system, assuming that the inclusive default is to match all refs available. (1) if there is no positive pattern given, pretend as if an inclusive default positive pattern was given; (2) for each candidate, reject it if it matches no positive pattern, or if it matches any one of the negative patterns. The rules for what is considered a match are slightly different from the rules used elsewhere. Commands like `log --glob` assume a trailing '/*' when glob chars are not present in the pattern. This makes it difficult to specify a single ref. On the other hand, commands like `describe --match --all` allow specifying exact refs, but do not have the convenience of allowing "shorthand refs" like 'refs/heads' or 'heads' to refer to 'refs/heads/*'. The commands introduced in this patch consider a match if: (a) the pattern contains globs chars, and regular pattern matching returns a match. (b) the pattern does not contain glob chars, and ref '<pattern>' exists, or if ref exists under '<pattern>/' This allows both behaviours (allowing single refs and shorthand refs) yet remaining compatible with existent commands. Helped-by: Kevin Daudt <me@ikke.info> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Rafael Ascensão <rafa.almas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-21 22:33:41 +01:00
strbuf_addstr(&normalized_pattern, pattern);
strbuf_strip_suffix(&normalized_pattern, "/");
item->string = strbuf_detach(&normalized_pattern, NULL);
item->util = has_glob_specials(pattern) ? NULL : item->string;
strbuf_release(&normalized_pattern);
}
int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
const char *prefix, void *cb_data)
{
struct strbuf real_pattern = STRBUF_INIT;
struct ref_filter filter;
int ret;
if (!prefix && !starts_with(pattern, "refs/"))
strbuf_addstr(&real_pattern, "refs/");
else if (prefix)
strbuf_addstr(&real_pattern, prefix);
strbuf_addstr(&real_pattern, pattern);
if (!has_glob_specials(pattern)) {
/* Append implied '/' '*' if not present. */
2015-09-24 23:08:35 +02:00
strbuf_complete(&real_pattern, '/');
/* No need to check for '*', there is none. */
strbuf_addch(&real_pattern, '*');
}
filter.pattern = real_pattern.buf;
filter.prefix = prefix;
filter.fn = fn;
filter.cb_data = cb_data;
ret = for_each_ref(filter_refs, &filter);
strbuf_release(&real_pattern);
return ret;
}
int for_each_glob_ref(each_ref_fn fn, const char *pattern, void *cb_data)
{
return for_each_glob_ref_in(fn, pattern, NULL, cb_data);
}
const char *prettify_refname(const char *name)
{
if (skip_prefix(name, "refs/heads/", &name) ||
skip_prefix(name, "refs/tags/", &name) ||
skip_prefix(name, "refs/remotes/", &name))
; /* nothing */
return name;
}
static const char *ref_rev_parse_rules[] = {
2007-11-11 15:01:46 +01:00
"%.*s",
"refs/%.*s",
"refs/tags/%.*s",
"refs/heads/%.*s",
"refs/remotes/%.*s",
"refs/remotes/%.*s/HEAD",
NULL
};
remote: make refspec follow the same disambiguation rule as local refs When matching a non-wildcard LHS of a refspec against a list of refs, find_ref_by_name_abbrev() returns the first ref that matches using any DWIM rules used by refname_match() in refs.c, even if a better match occurs later in the list of refs. This causes unexpected behavior when (for example) fetching using the refspec "refs/heads/s:<something>" from a remote with both "refs/heads/refs/heads/s" and "refs/heads/s"; even if the former was inadvertently created, one would still expect the latter to be fetched. Similarly, when both a tag T and a branch T exist, fetching T should favor the tag, just like how local refname disambiguation rule works. But because the code walks over ls-remote output from the remote, which happens to be sorted in alphabetical order and has refs/heads/T before refs/tags/T, a request to fetch T is (mis)interpreted as fetching refs/heads/T. Update refname_match(), all of whose current callers care only if it returns non-zero (i.e. matches) to see if an abbreviated name can mean the full name being tested, so that it returns a positive integer whose magnitude can be used to tell the precedence, and fix the find_ref_by_name_abbrev() function not to stop at the first match but find the match with the highest precedence. This is based on an earlier work, which special cased only the exact matches, by Jonathan Tan. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 18:22:37 +02:00
#define NUM_REV_PARSE_RULES (ARRAY_SIZE(ref_rev_parse_rules) - 1)
/*
* Is it possible that the caller meant full_name with abbrev_name?
* If so return a non-zero value to signal "yes"; the magnitude of
* the returned value gives the precedence used for disambiguation.
*
* If abbrev_name cannot mean full_name, return 0.
*/
int refname_match(const char *abbrev_name, const char *full_name)
2007-11-11 15:01:46 +01:00
{
const char **p;
const int abbrev_name_len = strlen(abbrev_name);
remote: make refspec follow the same disambiguation rule as local refs When matching a non-wildcard LHS of a refspec against a list of refs, find_ref_by_name_abbrev() returns the first ref that matches using any DWIM rules used by refname_match() in refs.c, even if a better match occurs later in the list of refs. This causes unexpected behavior when (for example) fetching using the refspec "refs/heads/s:<something>" from a remote with both "refs/heads/refs/heads/s" and "refs/heads/s"; even if the former was inadvertently created, one would still expect the latter to be fetched. Similarly, when both a tag T and a branch T exist, fetching T should favor the tag, just like how local refname disambiguation rule works. But because the code walks over ls-remote output from the remote, which happens to be sorted in alphabetical order and has refs/heads/T before refs/tags/T, a request to fetch T is (mis)interpreted as fetching refs/heads/T. Update refname_match(), all of whose current callers care only if it returns non-zero (i.e. matches) to see if an abbreviated name can mean the full name being tested, so that it returns a positive integer whose magnitude can be used to tell the precedence, and fix the find_ref_by_name_abbrev() function not to stop at the first match but find the match with the highest precedence. This is based on an earlier work, which special cased only the exact matches, by Jonathan Tan. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 18:22:37 +02:00
const int num_rules = NUM_REV_PARSE_RULES;
2007-11-11 15:01:46 +01:00
remote: make refspec follow the same disambiguation rule as local refs When matching a non-wildcard LHS of a refspec against a list of refs, find_ref_by_name_abbrev() returns the first ref that matches using any DWIM rules used by refname_match() in refs.c, even if a better match occurs later in the list of refs. This causes unexpected behavior when (for example) fetching using the refspec "refs/heads/s:<something>" from a remote with both "refs/heads/refs/heads/s" and "refs/heads/s"; even if the former was inadvertently created, one would still expect the latter to be fetched. Similarly, when both a tag T and a branch T exist, fetching T should favor the tag, just like how local refname disambiguation rule works. But because the code walks over ls-remote output from the remote, which happens to be sorted in alphabetical order and has refs/heads/T before refs/tags/T, a request to fetch T is (mis)interpreted as fetching refs/heads/T. Update refname_match(), all of whose current callers care only if it returns non-zero (i.e. matches) to see if an abbreviated name can mean the full name being tested, so that it returns a positive integer whose magnitude can be used to tell the precedence, and fix the find_ref_by_name_abbrev() function not to stop at the first match but find the match with the highest precedence. This is based on an earlier work, which special cased only the exact matches, by Jonathan Tan. Helped-by: Jonathan Tan <jonathantanmy@google.com> Helped-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-01 18:22:37 +02:00
for (p = ref_rev_parse_rules; *p; p++)
if (!strcmp(full_name, mkpath(*p, abbrev_name_len, abbrev_name)))
return &ref_rev_parse_rules[num_rules] - p;
2007-11-11 15:01:46 +01:00
return 0;
}
/*
* Given a 'prefix' expand it by the rules in 'ref_rev_parse_rules' and add
* the results to 'prefixes'
*/
void expand_ref_prefix(struct strvec *prefixes, const char *prefix)
{
const char **p;
int len = strlen(prefix);
for (p = ref_rev_parse_rules; *p; p++)
strvec_pushf(prefixes, *p, len, prefix);
}
static const char default_branch_name_advice[] = N_(
"Using '%s' as the name for the initial branch. This default branch name\n"
"is subject to change. To configure the initial branch name to use in all\n"
"of your new repositories, which will suppress this warning, call:\n"
"\n"
"\tgit config --global init.defaultBranch <name>\n"
"\n"
"Names commonly chosen instead of 'master' are 'main', 'trunk' and\n"
"'development'. The just-created branch can be renamed via this command:\n"
"\n"
"\tgit branch -m <name>\n"
);
char *repo_default_branch_name(struct repository *r, int quiet)
{
const char *config_key = "init.defaultbranch";
const char *config_display_key = "init.defaultBranch";
char *ret = NULL, *full_ref;
const char *env = getenv("GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME");
if (env && *env)
ret = xstrdup(env);
else if (repo_config_get_string(r, config_key, &ret) < 0)
die(_("could not retrieve `%s`"), config_display_key);
if (!ret) {
ret = xstrdup("master");
if (!quiet)
advise(_(default_branch_name_advice), ret);
}
full_ref = xstrfmt("refs/heads/%s", ret);
if (check_refname_format(full_ref, 0))
die(_("invalid branch name: %s = %s"), config_display_key, ret);
free(full_ref);
return ret;
}
const char *git_default_branch_name(int quiet)
{
static char *ret;
if (!ret)
ret = repo_default_branch_name(the_repository, quiet);
return ret;
}
/*
* *string and *len will only be substituted, and *string returned (for
* later free()ing) if the string passed in is a magic short-hand form
* to name a branch.
*/
static char *substitute_branch_name(struct repository *r,
const char **string, int *len,
int nonfatal_dangling_mark)
{
struct strbuf buf = STRBUF_INIT;
struct interpret_branch_name_options options = {
.nonfatal_dangling_mark = nonfatal_dangling_mark
};
int ret = repo_interpret_branch_name(r, *string, *len, &buf, &options);
if (ret == *len) {
size_t size;
*string = strbuf_detach(&buf, &size);
*len = size;
return (char *)*string;
}
return NULL;
}
int repo_dwim_ref(struct repository *r, const char *str, int len,
struct object_id *oid, char **ref, int nonfatal_dangling_mark)
{
char *last_branch = substitute_branch_name(r, &str, &len,
nonfatal_dangling_mark);
int refs_found = expand_ref(r, str, len, oid, ref);
free(last_branch);
return refs_found;
}
int expand_ref(struct repository *repo, const char *str, int len,
struct object_id *oid, char **ref)
{
const char **p, *r;
int refs_found = 0;
struct strbuf fullref = STRBUF_INIT;
*ref = NULL;
for (p = ref_rev_parse_rules; *p; p++) {
struct object_id oid_from_ref;
struct object_id *this_result;
int flag;
struct ref_store *refs = get_main_ref_store(repo);
this_result = refs_found ? &oid_from_ref : oid;
strbuf_reset(&fullref);
strbuf_addf(&fullref, *p, len, str);
r = refs_resolve_ref_unsafe(refs, fullref.buf,
RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
this_result, &flag);
if (r) {
if (!refs_found++)
*ref = xstrdup(r);
if (!warn_ambiguous_refs)
break;
} else if ((flag & REF_ISSYMREF) && strcmp(fullref.buf, "HEAD")) {
warning(_("ignoring dangling symref %s"), fullref.buf);
} else if ((flag & REF_ISBROKEN) && strchr(fullref.buf, '/')) {
warning(_("ignoring broken ref %s"), fullref.buf);
}
}
strbuf_release(&fullref);
return refs_found;
}
int repo_dwim_log(struct repository *r, const char *str, int len,
struct object_id *oid, char **log)
{
struct ref_store *refs = get_main_ref_store(r);
char *last_branch = substitute_branch_name(r, &str, &len, 0);
const char **p;
int logs_found = 0;
struct strbuf path = STRBUF_INIT;
*log = NULL;
for (p = ref_rev_parse_rules; *p; p++) {
struct object_id hash;
const char *ref, *it;
strbuf_reset(&path);
strbuf_addf(&path, *p, len, str);
ref = refs_resolve_ref_unsafe(refs, path.buf,
RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
oid ? &hash : NULL, NULL);
if (!ref)
continue;
if (refs_reflog_exists(refs, path.buf))
it = path.buf;
else if (strcmp(ref, path.buf) &&
refs_reflog_exists(refs, ref))
it = ref;
else
continue;
if (!logs_found++) {
*log = xstrdup(it);
if (oid)
oidcpy(oid, &hash);
}
if (!warn_ambiguous_refs)
break;
}
strbuf_release(&path);
free(last_branch);
return logs_found;
}
int dwim_log(const char *str, int len, struct object_id *oid, char **log)
{
return repo_dwim_log(the_repository, str, len, oid, log);
}
int is_per_worktree_ref(const char *refname)
{
return starts_with(refname, "refs/worktree/") ||
starts_with(refname, "refs/bisect/") ||
starts_with(refname, "refs/rewritten/");
}
static int is_pseudoref_syntax(const char *refname)
{
const char *c;
for (c = refname; *c; c++) {
if (!isupper(*c) && *c != '-' && *c != '_')
return 0;
}
/*
* HEAD is not a pseudoref, but it certainly uses the
* pseudoref syntax.
*/
return 1;
}
static int is_current_worktree_ref(const char *ref) {
return is_pseudoref_syntax(ref) || is_per_worktree_ref(ref);
refs: new ref types to make per-worktree refs visible to all worktrees One of the problems with multiple worktree is accessing per-worktree refs of one worktree from another worktree. This was sort of solved by multiple ref store, where the code can open the ref store of another worktree and has access to the ref space of that worktree. The problem with this is reporting. "HEAD" in another ref space is also called "HEAD" like in the current ref space. In order to differentiate them, all the code must somehow carry the ref store around and print something like "HEAD from this ref store". But that is not feasible (or possible with a _lot_ of work). With the current design, we pass a reference around as a string (so called "refname"). Extending this design to pass a string _and_ a ref store is a nightmare, especially when handling extended SHA-1 syntax. So we do it another way. Instead of entering a separate ref space, we make refs from other worktrees available in the current ref space. So "HEAD" is always HEAD of the current worktree, but then we can have "worktrees/blah/HEAD" to denote HEAD from a worktree named "blah". This syntax coincidentally matches the underlying directory structure which makes implementation a bit easier. The main worktree has to be treated specially because well... it's special from the beginning. So HEAD from the main worktree is acccessible via the name "main-worktree/HEAD" instead of "worktrees/main/HEAD" because "main" could be just another secondary worktree. This patch also makes it possible to specify refs from one worktree in another one, e.g. git log worktrees/foo/HEAD Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 10:08:54 +02:00
}
enum ref_worktree_type parse_worktree_ref(const char *maybe_worktree_ref,
const char **worktree_name, int *worktree_name_length,
const char **bare_refname)
refs: new ref types to make per-worktree refs visible to all worktrees One of the problems with multiple worktree is accessing per-worktree refs of one worktree from another worktree. This was sort of solved by multiple ref store, where the code can open the ref store of another worktree and has access to the ref space of that worktree. The problem with this is reporting. "HEAD" in another ref space is also called "HEAD" like in the current ref space. In order to differentiate them, all the code must somehow carry the ref store around and print something like "HEAD from this ref store". But that is not feasible (or possible with a _lot_ of work). With the current design, we pass a reference around as a string (so called "refname"). Extending this design to pass a string _and_ a ref store is a nightmare, especially when handling extended SHA-1 syntax. So we do it another way. Instead of entering a separate ref space, we make refs from other worktrees available in the current ref space. So "HEAD" is always HEAD of the current worktree, but then we can have "worktrees/blah/HEAD" to denote HEAD from a worktree named "blah". This syntax coincidentally matches the underlying directory structure which makes implementation a bit easier. The main worktree has to be treated specially because well... it's special from the beginning. So HEAD from the main worktree is acccessible via the name "main-worktree/HEAD" instead of "worktrees/main/HEAD" because "main" could be just another secondary worktree. This patch also makes it possible to specify refs from one worktree in another one, e.g. git log worktrees/foo/HEAD Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 10:08:54 +02:00
{
const char *name_dummy;
int name_length_dummy;
const char *ref_dummy;
refs: new ref types to make per-worktree refs visible to all worktrees One of the problems with multiple worktree is accessing per-worktree refs of one worktree from another worktree. This was sort of solved by multiple ref store, where the code can open the ref store of another worktree and has access to the ref space of that worktree. The problem with this is reporting. "HEAD" in another ref space is also called "HEAD" like in the current ref space. In order to differentiate them, all the code must somehow carry the ref store around and print something like "HEAD from this ref store". But that is not feasible (or possible with a _lot_ of work). With the current design, we pass a reference around as a string (so called "refname"). Extending this design to pass a string _and_ a ref store is a nightmare, especially when handling extended SHA-1 syntax. So we do it another way. Instead of entering a separate ref space, we make refs from other worktrees available in the current ref space. So "HEAD" is always HEAD of the current worktree, but then we can have "worktrees/blah/HEAD" to denote HEAD from a worktree named "blah". This syntax coincidentally matches the underlying directory structure which makes implementation a bit easier. The main worktree has to be treated specially because well... it's special from the beginning. So HEAD from the main worktree is acccessible via the name "main-worktree/HEAD" instead of "worktrees/main/HEAD" because "main" could be just another secondary worktree. This patch also makes it possible to specify refs from one worktree in another one, e.g. git log worktrees/foo/HEAD Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 10:08:54 +02:00
if (!worktree_name)
worktree_name = &name_dummy;
if (!worktree_name_length)
worktree_name_length = &name_length_dummy;
if (!bare_refname)
bare_refname = &ref_dummy;
if (skip_prefix(maybe_worktree_ref, "worktrees/", bare_refname)) {
const char *slash = strchr(*bare_refname, '/');
*worktree_name = *bare_refname;
if (!slash) {
*worktree_name_length = strlen(*worktree_name);
/* This is an error condition, and the caller tell because the bare_refname is "" */
*bare_refname = *worktree_name + *worktree_name_length;
return REF_WORKTREE_OTHER;
}
*worktree_name_length = slash - *bare_refname;
*bare_refname = slash + 1;
if (is_current_worktree_ref(*bare_refname))
return REF_WORKTREE_OTHER;
}
*worktree_name = NULL;
*worktree_name_length = 0;
if (skip_prefix(maybe_worktree_ref, "main-worktree/", bare_refname)
&& is_current_worktree_ref(*bare_refname))
return REF_WORKTREE_MAIN;
*bare_refname = maybe_worktree_ref;
if (is_current_worktree_ref(maybe_worktree_ref))
return REF_WORKTREE_CURRENT;
return REF_WORKTREE_SHARED;
}
long get_files_ref_lock_timeout_ms(void)
{
static int configured = 0;
/* The default timeout is 100 ms: */
static int timeout_ms = 100;
if (!configured) {
git_config_get_int("core.filesreflocktimeout", &timeout_ms);
configured = 1;
}
return timeout_ms;
}
int refs_delete_ref(struct ref_store *refs, const char *msg,
const char *refname,
const struct object_id *old_oid,
unsigned int flags)
{
struct ref_transaction *transaction;
struct strbuf err = STRBUF_INIT;
transaction = ref_store_transaction_begin(refs, &err);
if (!transaction ||
ref_transaction_delete(transaction, refname, old_oid,
flags, msg, &err) ||
ref_transaction_commit(transaction, &err)) {
error("%s", err.buf);
ref_transaction_free(transaction);
strbuf_release(&err);
return 1;
}
ref_transaction_free(transaction);
strbuf_release(&err);
return 0;
}
int delete_ref(const char *msg, const char *refname,
const struct object_id *old_oid, unsigned int flags)
{
return refs_delete_ref(get_main_ref_store(the_repository), msg, refname,
old_oid, flags);
}
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
static void copy_reflog_msg(struct strbuf *sb, const char *msg)
{
char c;
int wasspace = 1;
while ((c = *msg++)) {
if (wasspace && isspace(c))
continue;
wasspace = isspace(c);
if (wasspace)
c = ' ';
strbuf_addch(sb, c);
}
strbuf_rtrim(sb);
}
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
static char *normalize_reflog_message(const char *msg)
{
struct strbuf sb = STRBUF_INIT;
if (msg && *msg)
copy_reflog_msg(&sb, msg);
return strbuf_detach(&sb, NULL);
}
int should_autocreate_reflog(const char *refname)
{
switch (log_all_ref_updates) {
case LOG_REFS_ALWAYS:
return 1;
case LOG_REFS_NORMAL:
return starts_with(refname, "refs/heads/") ||
starts_with(refname, "refs/remotes/") ||
starts_with(refname, "refs/notes/") ||
!strcmp(refname, "HEAD");
default:
return 0;
}
}
int is_branch(const char *refname)
{
return !strcmp(refname, "HEAD") || starts_with(refname, "refs/heads/");
}
struct read_ref_at_cb {
const char *refname;
timestamp_t at_time;
int cnt;
int reccnt;
struct object_id *oid;
int found_it;
struct object_id ooid;
struct object_id noid;
int tz;
timestamp_t date;
char **msg;
timestamp_t *cutoff_time;
int *cutoff_tz;
int *cutoff_cnt;
};
static void set_read_ref_cutoffs(struct read_ref_at_cb *cb,
timestamp_t timestamp, int tz, const char *message)
{
if (cb->msg)
*cb->msg = xstrdup(message);
if (cb->cutoff_time)
*cb->cutoff_time = timestamp;
if (cb->cutoff_tz)
*cb->cutoff_tz = tz;
if (cb->cutoff_cnt)
*cb->cutoff_cnt = cb->reccnt;
}
static int read_ref_at_ent(struct object_id *ooid, struct object_id *noid,
const char *email UNUSED,
timestamp_t timestamp, int tz,
const char *message, void *cb_data)
{
struct read_ref_at_cb *cb = cb_data;
int reached_count;
cb->tz = tz;
cb->date = timestamp;
/*
* It is not possible for cb->cnt == 0 on the first iteration because
* that special case is handled in read_ref_at().
*/
if (cb->cnt > 0)
cb->cnt--;
reached_count = cb->cnt == 0 && !is_null_oid(ooid);
if (timestamp <= cb->at_time || reached_count) {
set_read_ref_cutoffs(cb, timestamp, tz, message);
/*
* we have not yet updated cb->[n|o]oid so they still
* hold the values for the previous record.
*/
if (!is_null_oid(&cb->ooid) && !oideq(&cb->ooid, noid))
warning(_("log for ref %s has gap after %s"),
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
cb->refname, show_date(cb->date, cb->tz, DATE_MODE(RFC2822)));
if (reached_count)
oidcpy(cb->oid, ooid);
else if (!is_null_oid(&cb->ooid) || cb->date == cb->at_time)
oidcpy(cb->oid, noid);
else if (!oideq(noid, cb->oid))
warning(_("log for ref %s unexpectedly ended on %s"),
cb->refname, show_date(cb->date, cb->tz,
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
DATE_MODE(RFC2822)));
cb->found_it = 1;
}
cb->reccnt++;
oidcpy(&cb->ooid, ooid);
oidcpy(&cb->noid, noid);
return cb->found_it;
}
static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
struct object_id *noid,
const char *email UNUSED,
timestamp_t timestamp, int tz,
const char *message, void *cb_data)
{
struct read_ref_at_cb *cb = cb_data;
set_read_ref_cutoffs(cb, timestamp, tz, message);
oidcpy(cb->oid, noid);
/* We just want the first entry */
return 1;
}
static int read_ref_at_ent_oldest(struct object_id *ooid, struct object_id *noid,
const char *email UNUSED,
timestamp_t timestamp, int tz,
const char *message, void *cb_data)
{
struct read_ref_at_cb *cb = cb_data;
set_read_ref_cutoffs(cb, timestamp, tz, message);
oidcpy(cb->oid, ooid);
if (is_null_oid(cb->oid))
oidcpy(cb->oid, noid);
/* We just want the first entry */
return 1;
}
int read_ref_at(struct ref_store *refs, const char *refname,
unsigned int flags, timestamp_t at_time, int cnt,
struct object_id *oid, char **msg,
timestamp_t *cutoff_time, int *cutoff_tz, int *cutoff_cnt)
{
struct read_ref_at_cb cb;
memset(&cb, 0, sizeof(cb));
cb.refname = refname;
cb.at_time = at_time;
cb.cnt = cnt;
cb.msg = msg;
cb.cutoff_time = cutoff_time;
cb.cutoff_tz = cutoff_tz;
cb.cutoff_cnt = cutoff_cnt;
cb.oid = oid;
if (cb.cnt == 0) {
refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
return 0;
}
refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
if (!cb.reccnt) {
if (flags & GET_OID_QUIETLY)
exit(128);
else
die(_("log for %s is empty"), refname);
}
if (cb.found_it)
return 0;
refs_for_each_reflog_ent(refs, refname, read_ref_at_ent_oldest, &cb);
return 1;
}
struct ref_transaction *ref_store_transaction_begin(struct ref_store *refs,
struct strbuf *err)
{
struct ref_transaction *tr;
assert(err);
CALLOC_ARRAY(tr, 1);
tr->ref_store = refs;
return tr;
}
struct ref_transaction *ref_transaction_begin(struct strbuf *err)
{
return ref_store_transaction_begin(get_main_ref_store(the_repository), err);
}
void ref_transaction_free(struct ref_transaction *transaction)
{
size_t i;
if (!transaction)
return;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
case REF_TRANSACTION_CLOSED:
/* OK */
break;
case REF_TRANSACTION_PREPARED:
BUG("free called on a prepared reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
for (i = 0; i < transaction->nr; i++) {
free(transaction->updates[i]->msg);
free(transaction->updates[i]);
}
free(transaction->updates);
free(transaction);
}
struct ref_update *ref_transaction_add_update(
struct ref_transaction *transaction,
const char *refname, unsigned int flags,
const struct object_id *new_oid,
const struct object_id *old_oid,
const char *msg)
{
struct ref_update *update;
if (transaction->state != REF_TRANSACTION_OPEN)
BUG("update called for transaction that is not open");
FLEX_ALLOC_STR(update, refname, refname);
ALLOC_GROW(transaction->updates, transaction->nr + 1, transaction->alloc);
transaction->updates[transaction->nr++] = update;
update->flags = flags;
if (flags & REF_HAVE_NEW)
oidcpy(&update->new_oid, new_oid);
if (flags & REF_HAVE_OLD)
oidcpy(&update->old_oid, old_oid);
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
update->msg = normalize_reflog_message(msg);
return update;
}
int ref_transaction_update(struct ref_transaction *transaction,
const char *refname,
const struct object_id *new_oid,
const struct object_id *old_oid,
unsigned int flags, const char *msg,
struct strbuf *err)
{
assert(err);
if (!(flags & REF_SKIP_REFNAME_VERIFICATION) &&
((new_oid && !is_null_oid(new_oid)) ?
check_refname_format(refname, REFNAME_ALLOW_ONELEVEL) :
!refname_is_safe(refname))) {
strbuf_addf(err, _("refusing to update ref with bad name '%s'"),
refs.c: allow listing and deleting badly named refs We currently do not handle badly named refs well: $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\. $ git branch fatal: Reference has invalid format: 'refs/heads/master.....@*@\.' $ git branch -D master.....@\*@\\. error: branch 'master.....@*@\.' not found. Users cannot recover from a badly named ref without manually finding and deleting the loose ref file or appropriate line in packed-refs. Making that easier will make it easier to tweak the ref naming rules in the future, for example to forbid shell metacharacters like '`' and '"', without putting people in a state that is hard to get out of. So allow "branch --list" to show these refs and allow "branch -d/-D" and "update-ref -d" to delete them. Other commands (for example to rename refs) will continue to not handle these refs but can be changed in later patches. Details: In resolving functions, refuse to resolve refs that don't pass the git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to resolve refs that escape the refs/ directory and do not match the pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD"). In locking functions, refuse to act on badly named refs unless they are being deleted and either are in the refs/ directory or match [A-Z_]*. Just like other invalid refs, flag resolved, badly named refs with the REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them in all iteration functions except for for_each_rawref. Flag badly named refs (but not symrefs pointing to badly named refs) with a REF_BAD_NAME flag to make it easier for future callers to notice and handle them specially. For example, in a later patch for-each-ref will use this flag to detect refs whose names can confuse callers parsing for-each-ref output. In the transaction API, refuse to create or update badly named refs, but allow deleting them (unless they try to escape refs/ and don't match [A-Z_]*). Signed-off-by: Ronnie Sahlberg <sahlberg@google.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-03 20:45:43 +02:00
refname);
return -1;
}
if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS)
BUG("illegal flags 0x%x passed to ref_transaction_update()", flags);
refs: work around gcc-11 warning with REF_HAVE_NEW Using gcc-11 (or 12) to compile refs.o with -O3 results in: In file included from hashmap.h:4, from cache.h:6, from refs.c:5: In function ‘oidcpy’, inlined from ‘ref_transaction_add_update’ at refs.c:1065:3, inlined from ‘ref_transaction_update’ at refs.c:1094:2, inlined from ‘ref_transaction_verify’ at refs.c:1132:9: hash.h:262:9: warning: argument 2 null where non-null expected [-Wnonnull] 262 | memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from git-compat-util.h:177, from cache.h:4, from refs.c:5: refs.c: In function ‘ref_transaction_verify’: /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared ‘nonnull’ 43 | extern void *memcpy (void *__restrict __dest, const void *__restrict __src, | ^~~~~~ That call to memcpy() is in a conditional block that requires REF_HAVE_NEW to be set. But in ref_transaction_update(), we make sure it isn't set coming in: if (flags & ~REF_TRANSACTION_UPDATE_ALLOWED_FLAGS) BUG("illegal flags 0x%x passed to ref_transaction_update()", flags); and then only set it if the variable isn't NULL: flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0); So it should be impossible to reach that memcpy() with a NULL oid. But for whatever reason, gcc doesn't accept that hitting the BUG() means we won't go any further, even though it's marked with the noreturn attribute. And the conditional is correct; ALLOWED_FLAGS doesn't contain HAVE_NEW or HAVE_OLD, and you can even simplify it to check for those flags explicitly and the compiler still complains. We can work around this by just clearing the disallowed flags explicitly. This should be a noop because of the BUG() check, but it makes the compiler happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-19 22:28:30 +01:00
/*
* Clear flags outside the allowed set; this should be a noop because
* of the BUG() check above, but it works around a -Wnonnull warning
* with some versions of "gcc -O3".
*/
flags &= REF_TRANSACTION_UPDATE_ALLOWED_FLAGS;
flags |= (new_oid ? REF_HAVE_NEW : 0) | (old_oid ? REF_HAVE_OLD : 0);
ref_transaction_add_update(transaction, refname, flags,
new_oid, old_oid, msg);
return 0;
}
int ref_transaction_create(struct ref_transaction *transaction,
const char *refname,
const struct object_id *new_oid,
unsigned int flags, const char *msg,
struct strbuf *err)
{
clone: die() instead of BUG() on bad refs When cloning directly from a local repository, we load a list of refs based on scanning the $GIT_DIR/refs/ directory of the "server" repository. If files exist in that directory that do not parse as hexadecimal hashes, then the ref array used by write_remote_refs() ends up with some entries with null OIDs. This causes us to hit a BUG() statement in ref_transaction_create(): BUG: create called without valid new_oid This BUG() call used to be a die() until 033abf97f (Replace all die("BUG: ...") calls by BUG() ones, 2018-05-02). Before that, the die() was added by f04c5b552 (ref_transaction_create(): check that new_sha1 is valid, 2015-02-17). The original report for this bug [1] mentioned that this problem did not exist in Git 2.27.0. The failure bisects unsurprisingly to 968f12fda (refs: turn on GIT_REF_PARANOIA by default, 2021-09-24). When GIT_REF_PARANOIA is enabled, this case always fails as far back as I am able to successfully compile and test the Git codebase. [1] https://github.com/git-for-windows/git/issues/3781 There are two approaches to consider here. One would be to remove this BUG() statement in favor of returning with an error. There are only two callers to ref_transaction_create(), so this would have a limited impact. The other approach would be to add special casing in 'git clone' to avoid this faulty input to the method. While I originally started with changing 'git clone', I decided that modifying ref_transaction_create() was a more complete solution. This prevents failing with a BUG() statement when we already have a good way to report an error (including a reason for that error) within the method. Both callers properly check the return value and die() with the error message, so this is an appropriate direction. The added test helps check against a regression, but does check that our intended error message is handled correctly. Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-25 15:47:30 +02:00
if (!new_oid || is_null_oid(new_oid)) {
strbuf_addf(err, "'%s' has a null OID", refname);
return 1;
}
return ref_transaction_update(transaction, refname, new_oid,
null_oid(), flags, msg, err);
}
int ref_transaction_delete(struct ref_transaction *transaction,
const char *refname,
const struct object_id *old_oid,
unsigned int flags, const char *msg,
struct strbuf *err)
{
if (old_oid && is_null_oid(old_oid))
BUG("delete called with old_oid set to zeros");
return ref_transaction_update(transaction, refname,
null_oid(), old_oid,
flags, msg, err);
}
int ref_transaction_verify(struct ref_transaction *transaction,
const char *refname,
const struct object_id *old_oid,
unsigned int flags,
struct strbuf *err)
{
if (!old_oid)
BUG("verify called with old_oid set to NULL");
return ref_transaction_update(transaction, refname,
NULL, old_oid,
flags, NULL, err);
}
int refs_update_ref(struct ref_store *refs, const char *msg,
const char *refname, const struct object_id *new_oid,
const struct object_id *old_oid, unsigned int flags,
enum action_on_err onerr)
{
struct ref_transaction *t = NULL;
struct strbuf err = STRBUF_INIT;
int ret = 0;
t = ref_store_transaction_begin(refs, &err);
if (!t ||
ref_transaction_update(t, refname, new_oid, old_oid, flags, msg,
&err) ||
ref_transaction_commit(t, &err)) {
ret = 1;
ref_transaction_free(t);
}
if (ret) {
const char *str = _("update_ref failed for ref '%s': %s");
switch (onerr) {
case UPDATE_REFS_MSG_ON_ERR:
error(str, refname, err.buf);
break;
case UPDATE_REFS_DIE_ON_ERR:
die(str, refname, err.buf);
break;
case UPDATE_REFS_QUIET_ON_ERR:
break;
}
strbuf_release(&err);
return 1;
}
strbuf_release(&err);
if (t)
ref_transaction_free(t);
return 0;
}
int update_ref(const char *msg, const char *refname,
const struct object_id *new_oid,
const struct object_id *old_oid,
unsigned int flags, enum action_on_err onerr)
{
return refs_update_ref(get_main_ref_store(the_repository), msg, refname, new_oid,
old_oid, flags, onerr);
}
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
/*
* Check that the string refname matches a rule of the form
* "{prefix}%.*s{suffix}". So "foo/bar/baz" would match the rule
* "foo/%.*s/baz", and return the string "bar".
*/
static const char *match_parse_rule(const char *refname, const char *rule,
size_t *len)
{
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
/*
* Check that rule matches refname up to the first percent in the rule.
* We can bail immediately if not, but otherwise we leave "rule" at the
* %-placeholder, and "refname" at the start of the potential matched
* name.
*/
while (*rule != '%') {
if (!*rule)
BUG("rev-parse rule did not have percent");
if (*refname++ != *rule++)
return NULL;
}
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
/*
* Check that our "%" is the expected placeholder. This assumes there
* are no other percents (placeholder or quoted) in the string, but
* that is sufficient for our rev-parse rules.
*/
if (!skip_prefix(rule, "%.*s", &rule))
return NULL;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
/*
* And now check that our suffix (if any) matches.
*/
if (!strip_suffix(refname, rule, len))
return NULL;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
return refname; /* len set by strip_suffix() */
}
char *refs_shorten_unambiguous_ref(struct ref_store *refs,
const char *refname, int strict)
{
int i;
struct strbuf resolved_buf = STRBUF_INIT;
/* skip first rule, it will always match */
shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant The ref_rev_parse_rules[] array is terminated with a NULL entry, and we count it and store the result in the local nr_rules variable. But we don't need to do so; since the array is a constant, we can compute its size directly. The original code probably didn't do that because it was written as part of for-each-ref, and saw the array only as a pointer. It was migrated in 7c2b3029df (make get_short_ref a public function, 2009-04-07) and could have been updated then, but that subtlety was not noticed. We even have a constant that represents this value already, courtesy of 60650a48c0 (remote: make refspec follow the same disambiguation rule as local refs, 2018-08-01), though again, nobody noticed at the time that it could be used here, too. The current count-up isn't a big deal, as we need to preprocess that array anyway. But it will become more cumbersome as we refactor the shortening code. So let's get rid of it and just use the constant everywhere. Note that there are two things here that aren't just simple text replacements: 1. We also use nr_rules to see if a previous call has initialized the static pre-processing variables. We can just use the scanf_fmts pointer to do the same thing, as it is non-NULL only after we've done that initialization. 2. If nr_rules is zero after we've counted it up, we bail from the function. This code is unreachable, though, as the set of rules is hard-coded and non-empty. And that becomes even more apparent now that we are using the constant. So we can drop this conditional completely (and ironically, the code would have the same output if it _did_ trigger, as we'd simply skip the loop entirely and return the whole refname). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:18 +01:00
for (i = NUM_REV_PARSE_RULES - 1; i > 0 ; --i) {
int j;
int rules_to_fail = i;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
const char *short_name;
shorten_unambiguous_ref(): avoid integer truncation We parse the shortened name "foo" out of the full refname "refs/heads/foo", and then assign the result of strlen(short_name) to an int, which may truncate or wrap to negative. In practice, this should never happen, as it requires a 2GB refname. And even somebody trying to do something malicious should at worst end up with a confused answer (we use the size only to feed back as a placeholder length to strbuf_addf() to see if there are any collisions in the lookup rules). And it may even be impossible to trigger this, as we parse the string with sscanf(), and stdio formatting functions are not known for handling large strings well. I didn't test, but I wouldn't be surprised if sscanf() on many platforms simply reports no match here. But even if it is not a problem in practice so far, it is worth fixing for two reasons: 1. We'll shortly be replacing the sscanf() call with a real parser which will handle arbitrary-sized strings. 2. Assigning strlen() to an int is an anti-pattern that requires people to look twice when auditing for real overflow problems. So we'll make this a size_t. Unfortunately we still have to cast to int eventually for the strbuf_addf() call, but at least we can localize the cast there, and check that it will be valid. I used our new cast helper here, which will just bail completely. That should be OK, as anybody with a 2GB refname is up to no good, but if we really wanted to, we could detect it manually and just refuse to shorten the refname. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:14 +01:00
size_t short_name_len;
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
short_name = match_parse_rule(refname, ref_rev_parse_rules[i],
&short_name_len);
if (!short_name)
continue;
/*
* in strict mode, all (except the matched one) rules
* must fail to resolve to a valid non-ambiguous ref
*/
if (strict)
shorten_unambiguous_ref(): use NUM_REV_PARSE_RULES constant The ref_rev_parse_rules[] array is terminated with a NULL entry, and we count it and store the result in the local nr_rules variable. But we don't need to do so; since the array is a constant, we can compute its size directly. The original code probably didn't do that because it was written as part of for-each-ref, and saw the array only as a pointer. It was migrated in 7c2b3029df (make get_short_ref a public function, 2009-04-07) and could have been updated then, but that subtlety was not noticed. We even have a constant that represents this value already, courtesy of 60650a48c0 (remote: make refspec follow the same disambiguation rule as local refs, 2018-08-01), though again, nobody noticed at the time that it could be used here, too. The current count-up isn't a big deal, as we need to preprocess that array anyway. But it will become more cumbersome as we refactor the shortening code. So let's get rid of it and just use the constant everywhere. Note that there are two things here that aren't just simple text replacements: 1. We also use nr_rules to see if a previous call has initialized the static pre-processing variables. We can just use the scanf_fmts pointer to do the same thing, as it is non-NULL only after we've done that initialization. 2. If nr_rules is zero after we've counted it up, we bail from the function. This code is unreachable, though, as the set of rules is hard-coded and non-empty. And that becomes even more apparent now that we are using the constant. So we can drop this conditional completely (and ironically, the code would have the same output if it _did_ trigger, as we'd simply skip the loop entirely and return the whole refname). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:18 +01:00
rules_to_fail = NUM_REV_PARSE_RULES;
/*
* check if the short name resolves to a valid ref,
* but use only rules prior to the matched one
*/
for (j = 0; j < rules_to_fail; j++) {
const char *rule = ref_rev_parse_rules[j];
/* skip matched rule */
if (i == j)
continue;
/*
* the short name is ambiguous, if it resolves
* (with this previous rule) to a valid ref
* read_ref() returns 0 on success
*/
strbuf_reset(&resolved_buf);
strbuf_addf(&resolved_buf, rule,
shorten_unambiguous_ref(): avoid integer truncation We parse the shortened name "foo" out of the full refname "refs/heads/foo", and then assign the result of strlen(short_name) to an int, which may truncate or wrap to negative. In practice, this should never happen, as it requires a 2GB refname. And even somebody trying to do something malicious should at worst end up with a confused answer (we use the size only to feed back as a placeholder length to strbuf_addf() to see if there are any collisions in the lookup rules). And it may even be impossible to trigger this, as we parse the string with sscanf(), and stdio formatting functions are not known for handling large strings well. I didn't test, but I wouldn't be surprised if sscanf() on many platforms simply reports no match here. But even if it is not a problem in practice so far, it is worth fixing for two reasons: 1. We'll shortly be replacing the sscanf() call with a real parser which will handle arbitrary-sized strings. 2. Assigning strlen() to an int is an anti-pattern that requires people to look twice when auditing for real overflow problems. So we'll make this a size_t. Unfortunately we still have to cast to int eventually for the strbuf_addf() call, but at least we can localize the cast there, and check that it will be valid. I used our new cast helper here, which will just bail completely. That should be OK, as anybody with a 2GB refname is up to no good, but if we really wanted to, we could detect it manually and just refuse to shorten the refname. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:14 +01:00
cast_size_t_to_int(short_name_len),
short_name);
if (refs_ref_exists(refs, resolved_buf.buf))
break;
}
/*
* short name is non-ambiguous if all previous rules
* haven't resolved to a valid ref
*/
if (j == rules_to_fail) {
strbuf_release(&resolved_buf);
shorten_unambiguous_ref(): avoid sscanf() To shorten a fully qualified ref (e.g., taking "refs/heads/foo" to just "foo"), we munge the usual lookup rules ("refs/heads/%.*s", etc) to drop the ".*" modifier (so "refs/heads/%s"), and then use sscanf() to match that against the refname, pulling the "%s" content into a separate buffer. This has a few downsides: - sscanf("%s") reportedly misbehaves on macOS with some input and locale combinations, returning a partial or garbled string. See this thread: https://lore.kernel.org/git/CAGF3oAcCi+fG12j-1U0hcrWwkF5K_9WhOi6ZPHBzUUzfkrZDxA@mail.gmail.com/ - scanf's matching of "%s" is greedy. So the "refs/remotes/%s/HEAD" rule would never pull "origin" out of "refs/remotes/origin/HEAD". Instead it always produced "origin/HEAD", which is redundant with the "refs/remotes/%s" rule. - scanf in general is an error-prone interface. For example, scanning for "%s" will copy bytes into a destination string, which must have been correctly sized ahead of time to avoid a buffer overflow. In this case, the code is OK (the buffer is pessimistically sized to match the original string, which should give us a maximum). But in general, we do not want to encourage people to use scanf at all. So instead, let's note that our lookup rules are not arbitrary format strings, but all contain exactly one "%.*s" placeholder. We already rely on this, both for lookup (we feed the lookup format along with exactly one int/ptr combo to snprintf, etc) and for shortening (we munge "%.*s" to "%s", and then insist that sscanf() finds exactly one result). We can parse this manually by just matching the bytes that occur before and after the "%.*s" placeholder. While we have a few extra lines of parsing code, the result is arguably simpler, as can skip the preprocessing step and its tricky memory management entirely. The in-code comments should explain the parsing strategy, but there's one subtle change here. The original code allocated a single buffer, and then overwrote it in each loop iteration, since that's the only option sscanf() gives us. But our parser can actually return a ptr/len combo for the matched string, which is all we need (since we just feed it back to the lookup rules with "%.*s"), and then copy it only when returning to the caller. There are a few new tests here, all using symbolic-ref (the code can be triggered in many ways, but symrefs are convenient in that we don't need to create a real ref, which avoids any complications from the filesystem munging the name): - the first covers the real-world case which misbehaved on macOS. Setting LC_ALL is required to trigger the problem there (since otherwise our tests use LC_ALL=C), and hopefully is at worst simply ignored on other systems (and doesn't cause libc to complain, etc, on systems without that locale). - the second covers the "origin/HEAD" case as discussed above, which is now fixed - the remainder are for "weird" cases that work both before and after this patch, but would be easy to get wrong with off-by-one problems in the parsing (and came out of discussions and earlier iterations of the patch that did get them wrong). - absent here are tests of boring, expected-to-work cases like "refs/heads/foo", etc. Those are covered all over the test suite both explicitly (for-each-ref's refname:short) and implicitly (in the output of git-status, etc). Reported-by: 孟子易 <mengziyi540841@gmail.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-02-15 16:16:21 +01:00
return xmemdupz(short_name, short_name_len);
}
}
strbuf_release(&resolved_buf);
return xstrdup(refname);
}
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
char *shorten_unambiguous_ref(const char *refname, int strict)
{
return refs_shorten_unambiguous_ref(get_main_ref_store(the_repository),
refname, strict);
}
int parse_hide_refs_config(const char *var, const char *value, const char *section,
struct string_list *hide_refs)
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
{
const char *key;
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
if (!strcmp("transfer.hiderefs", var) ||
(!parse_config_key(var, section, NULL, NULL, &key) &&
!strcmp(key, "hiderefs"))) {
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
char *ref;
int len;
if (!value)
return config_error_nonbool(var);
ref = xstrdup(value);
len = strlen(ref);
while (len && ref[len - 1] == '/')
ref[--len] = '\0';
string_list_append_nodup(hide_refs, ref);
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
}
return 0;
}
int ref_is_hidden(const char *refname, const char *refname_full,
const struct string_list *hide_refs)
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
{
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-28 22:23:26 +02:00
int i;
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-28 22:23:26 +02:00
for (i = hide_refs->nr - 1; i >= 0; i--) {
const char *match = hide_refs->items[i].string;
const char *subject;
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-28 22:23:26 +02:00
int neg = 0;
const char *p;
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-28 22:23:26 +02:00
if (*match == '!') {
neg = 1;
match++;
}
if (*match == '^') {
subject = refname_full;
match++;
} else {
subject = refname;
}
/* refname can be NULL when namespaces are used. */
if (subject &&
skip_prefix(subject, match, &p) &&
(!*p || *p == '/'))
refs: support negative transfer.hideRefs If you hide a hierarchy of refs using the transfer.hideRefs config, there is no way to later override that config to "unhide" it. This patch implements a "negative" hide which causes matches to immediately be marked as unhidden, even if another match would hide it. We take care to apply the matches in reverse-order from how they are fed to us by the config machinery, as that lets our usual "last one wins" config precedence work (and entries in .git/config, for example, will override /etc/gitconfig). So you can now do: $ git config --system transfer.hideRefs refs/secret $ git config transfer.hideRefs '!refs/secret/not-so-secret' to hide refs/secret in all repos, except for one public bit in one specific repo. Or you can even do: $ git clone \ -u "git -c transfer.hiderefs="!refs/foo" upload-pack" \ remote:repo.git to clone remote:repo.git, overriding any hiding it has configured. There are two alternatives that were considered and rejected: 1. A generic config mechanism for removing an item from a list. E.g.: (e.g., "[transfer] hideRefs -= refs/foo"). This is nice because it could apply to other multi-valued config, as well. But it is not nearly as flexible. There is no way to say: [transfer] hideRefs = refs/secret hideRefs = refs/secret/not-so-secret Having explicit negative specifications means we can override previous entries, even if they are not the same literal string. 2. Adding another variable to override some parts of hideRefs (e.g., "exposeRefs"). This solves the problem from alternative (1), but it cannot easily obey the normal config precedence, because it would use two separate lists. For example: [transfer] hideRefs = refs/secret exposeRefs = refs/secret/not-so-secret hideRefs = refs/secret/not-so-secret/no-really-its-secret With two lists, we have to apply the "expose" rules first, and only then apply the "hide" rules. But that does not match what the above config intends. Of course we could internally parse that to a single list, respecting the ordering, which saves us having to invent the new "!" syntax. But using a single name communicates to the user that the ordering _is_ important. And "!" is well-known for negation, and should not appear at the beginning of a ref (it is actually valid in a ref-name, but all entries here should be fully-qualified, starting with "refs/"). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-28 22:23:26 +02:00
return !neg;
upload/receive-pack: allow hiding ref hierarchies A repository may have refs that are only used for its internal bookkeeping purposes that should not be exposed to the others that come over the network. Teach upload-pack to omit some refs from its initial advertisement by paying attention to the uploadpack.hiderefs multi-valued configuration variable. Do the same to receive-pack via the receive.hiderefs variable. As a convenient short-hand, allow using transfer.hiderefs to set the value to both of these variables. Any ref that is under the hierarchies listed on the value of these variable is excluded from responses to requests made by "ls-remote", "fetch", etc. (for upload-pack) and "push" (for receive-pack). Because these hidden refs do not count as OUR_REF, an attempt to fetch objects at the tip of them will be rejected, and because these refs do not get advertised, "git push :" will not see local branches that have the same name as them as "matching" ones to be sent. An attempt to update/delete these hidden refs with an explicit refspec, e.g. "git push origin :refs/hidden/22", is rejected. This is not a new restriction. To the pusher, it would appear that there is no such ref, so its push request will conclude with "Now that I sent you all the data, it is time for you to update the refs. I saw that the ref did not exist when I started pushing, and I want the result to point at this commit". The receiving end will apply the compare-and-swap rule to this request and rejects the push with "Well, your update request conflicts with somebody else; I see there is such a ref.", which is the right thing to do. Otherwise a push to a hidden ref will always be "the last one wins", which is not a good default. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-19 01:08:30 +01:00
}
return 0;
}
const char *find_descendant_ref(const char *dirname,
const struct string_list *extras,
const struct string_list *skip)
{
int pos;
if (!extras)
return NULL;
/*
* Look at the place where dirname would be inserted into
* extras. If there is an entry at that position that starts
* with dirname (remember, dirname includes the trailing
* slash) and is not in skip, then we have a conflict.
*/
for (pos = string_list_find_insert_index(extras, dirname, 0);
pos < extras->nr; pos++) {
const char *extra_refname = extras->items[pos].string;
if (!starts_with(extra_refname, dirname))
break;
if (!skip || !string_list_has_string(skip, extra_refname))
return extra_refname;
}
return NULL;
}
int refs_head_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
struct object_id oid;
int flag;
if (refs_resolve_ref_unsafe(refs, "HEAD", RESOLVE_REF_READING,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
&oid, &flag))
return fn("HEAD", &oid, flag, cb_data);
return 0;
}
int head_ref(each_ref_fn fn, void *cb_data)
{
return refs_head_ref(get_main_ref_store(the_repository), fn, cb_data);
}
struct ref_iterator *refs_ref_iterator_begin(
struct ref_store *refs,
const char *prefix, int trim,
enum do_for_each_ref_flags flags)
{
struct ref_iterator *iter;
if (!(flags & DO_FOR_EACH_INCLUDE_BROKEN)) {
static int ref_paranoia = -1;
if (ref_paranoia < 0)
refs: turn on GIT_REF_PARANOIA by default The original point of the GIT_REF_PARANOIA flag was to include broken refs in iterations, so that possibly-destructive operations would not silently ignore them (and would generally instead try to operate on the oids and fail when the objects could not be accessed). We already turned this on by default for some dangerous operations, like "repack -ad" (where missing a reachability tip would mean dropping the associated history). But it was not on for general use, even though it could easily result in the spreading of corruption (e.g., imagine cloning a repository which simply omits some of its refs because their objects are missing; the result quietly succeeds even though you did not clone everything!). This patch turns on GIT_REF_PARANOIA by default. So a clone as mentioned above would actually fail (upload-pack tells us about the broken ref, and when we ask for the objects, pack-objects fails to deliver them). This may be inconvenient when working with a corrupted repository, but: - we are better off to err on the side of complaining about corruption, and then provide mechanisms for explicitly loosening safety. - this is only one type of corruption anyway. If we are missing any other objects in the history that _aren't_ ref tips, then we'd behave similarly (happily show the ref, but then barf when we started traversing). We retain the GIT_REF_PARANOIA variable, but simply default it to "1" instead of "0". That gives the user an escape hatch for loosening this when working with a corrupt repository. It won't work across a remote connection to upload-pack (because we can't necessarily set environment variables on the remote), but there the client has other options (e.g., choosing which refs to fetch). As a bonus, this also makes ref iteration faster in general (because we don't have to call has_object_file() for each ref), though probably not noticeably so in the general case. In a repo with a million refs, it shaved a few hundred milliseconds off of upload-pack's advertisement; that's noticeable, but most repos are not nearly that large. The possible downside here is that any operation which iterates refs but doesn't ever open their objects may now quietly claim to have X when the object is corrupted (e.g., "git rev-list new-branch --not --all" will treat a broken ref as uninteresting). But again, that's not really any different than corruption below the ref level. We might have refs/heads/old-branch as non-corrupt, but we are not actively checking that we have the entire reachable history. Or the pointed-to object could even be corrupted on-disk (but our "do we have it" check would still succeed). In that sense, this is merely bringing ref-corruption in line with general object corruption. One alternative implementation would be to actually check for broken refs, and then _immediately die_ if we see any. That would cause the "rev-list --not --all" case above to abort immediately. But in many ways that's the worst of all worlds: - it still spends time looking up the objects an extra time - it still doesn't catch corruption below the ref level - it's even more inconvenient; with the current implementation of GIT_REF_PARANOIA for something like upload-pack, we can make the advertisement and let the client choose a non-broken piece of history. If we bail as soon as we see a broken ref, they cannot even see the advertisement. The test changes here show some of the fallout. A non-destructive "git repack -adk" now fails by default (but we can override it). Deleting a broken ref now actually tells the hooks the correct "before" state, rather than a confusing null oid. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 20:46:13 +02:00
ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 1);
if (ref_paranoia) {
flags |= DO_FOR_EACH_INCLUDE_BROKEN;
flags |= DO_FOR_EACH_OMIT_DANGLING_SYMREFS;
}
}
iter = refs->be->iterator_begin(refs, prefix, flags);
/*
* `iterator_begin()` already takes care of prefix, but we
* might need to do some trimming:
*/
if (trim)
iter = prefix_ref_iterator_begin(iter, "", trim);
/* Sanity check for subclasses: */
if (!iter->ordered)
BUG("reference iterator is not ordered");
return iter;
}
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 06:15:16 +02:00
/*
* Call fn for each reference in the specified submodule for which the
* refname begins with prefix. If trim is non-zero, then trim that
* many characters off the beginning of each refname before passing
* the refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to
* include broken references in the iteration. If fn ever returns a
* non-zero value, stop the iteration and return that value;
* otherwise, return 0.
*/
static int do_for_each_repo_ref(struct repository *r, const char *prefix,
each_repo_ref_fn fn, int trim, int flags,
void *cb_data)
{
struct ref_iterator *iter;
struct ref_store *refs = get_main_ref_store(r);
if (!refs)
return 0;
iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
return do_for_each_repo_ref_iterator(r, iter, fn, cb_data);
}
struct do_for_each_ref_help {
each_ref_fn *fn;
void *cb_data;
};
static int do_for_each_ref_helper(struct repository *r,
const char *refname,
const struct object_id *oid,
int flags,
void *cb_data)
{
struct do_for_each_ref_help *hp = cb_data;
return hp->fn(refname, oid, flags, hp->cb_data);
}
static int do_for_each_ref(struct ref_store *refs, const char *prefix,
each_ref_fn fn, int trim,
enum do_for_each_ref_flags flags, void *cb_data)
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 06:15:16 +02:00
{
struct ref_iterator *iter;
struct do_for_each_ref_help hp = { fn, cb_data };
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 06:15:16 +02:00
if (!refs)
return 0;
iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 06:15:16 +02:00
return do_for_each_repo_ref_iterator(the_repository, iter,
do_for_each_ref_helper, &hp);
do_for_each_ref(): reimplement using reference iteration Use the reference iterator interface to implement do_for_each_ref(). Delete a bunch of code supporting the old for_each_ref() implementation. And now that do_for_each_ref() is generic code (it is no longer tied to the files backend), move it to refs.c. The implementation is via a new function, do_for_each_ref_iterator(), which takes a reference iterator as argument and calls a callback function for each of the references in the iterator. This change requires the current_ref performance hack for peel_ref() to be implemented via ref_iterator_peel() rather than peel_entry() because we don't have a ref_entry handy (it is hidden under three layers: file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So: * do_for_each_ref_iterator() records the active iterator in current_ref_iter while it is running. * peel_ref() checks whether current_ref_iter is pointing at the requested reference. If so, it asks the iterator to peel the reference (which it can do efficiently via its "peel" virtual function). For extra safety, we do the optimization only if the refname *addresses* are the same, not only if the refname *strings* are the same, to forestall possible mixups between refnames that come from different ref_iterators. Please note that this optimization of peel_ref() is only available when iterating via do_for_each_ref_iterator() (including all of the for_each_ref() functions, which call it indirectly). It would be complicated to implement a similar optimization when iterating directly using a reference iterator, because multiple reference iterators can be in use at the same time, with interleaved calls to ref_iterator_advance(). (In fact we do exactly that in merge_ref_iterator.) But that is not necessary. peel_ref() is only called while iterating over references. Callers who iterate using the for_each_ref() functions benefit from the optimization described above. Callers who iterate using reference iterators directly have access to the ref_iterator, so they can call ref_iterator_peel() themselves to get an analogous optimization in a more straightforward manner. If we rewrite all callers to use the reference iteration API, then we can remove the current_ref_iter hack permanently. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-18 06:15:16 +02:00
}
int refs_for_each_ref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, "", fn, 0, 0, cb_data);
}
int for_each_ref(each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref(get_main_ref_store(the_repository), fn, cb_data);
}
int refs_for_each_ref_in(struct ref_store *refs, const char *prefix,
each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, prefix, fn, strlen(prefix), 0, cb_data);
}
int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data)
{
return refs_for_each_ref_in(get_main_ref_store(the_repository), prefix, fn, cb_data);
}
int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(get_main_ref_store(the_repository),
prefix, fn, 0, 0, cb_data);
}
int refs_for_each_fullref_in(struct ref_store *refs, const char *prefix,
each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, prefix, fn, 0, 0, cb_data);
}
int for_each_replace_ref(struct repository *r, each_repo_ref_fn fn, void *cb_data)
{
const char *git_replace_ref_base = ref_namespace[NAMESPACE_REPLACE].ref;
return do_for_each_repo_ref(r, git_replace_ref_base, fn,
strlen(git_replace_ref_base),
DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
}
int for_each_namespaced_ref(each_ref_fn fn, void *cb_data)
{
struct strbuf buf = STRBUF_INIT;
int ret;
strbuf_addf(&buf, "%srefs/", get_git_namespace());
ret = do_for_each_ref(get_main_ref_store(the_repository),
buf.buf, fn, 0, 0, cb_data);
strbuf_release(&buf);
return ret;
}
int refs_for_each_rawref(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
return do_for_each_ref(refs, "", fn, 0,
DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
}
int for_each_rawref(each_ref_fn fn, void *cb_data)
{
return refs_for_each_rawref(get_main_ref_store(the_repository), fn, cb_data);
}
static int qsort_strcmp(const void *va, const void *vb)
{
const char *a = *(const char **)va;
const char *b = *(const char **)vb;
return strcmp(a, b);
}
static void find_longest_prefixes_1(struct string_list *out,
struct strbuf *prefix,
const char **patterns, size_t nr)
{
size_t i;
for (i = 0; i < nr; i++) {
char c = patterns[i][prefix->len];
if (!c || is_glob_special(c)) {
string_list_append(out, prefix->buf);
return;
}
}
i = 0;
while (i < nr) {
size_t end;
/*
* Set "end" to the index of the element _after_ the last one
* in our group.
*/
for (end = i + 1; end < nr; end++) {
if (patterns[i][prefix->len] != patterns[end][prefix->len])
break;
}
strbuf_addch(prefix, patterns[i][prefix->len]);
find_longest_prefixes_1(out, prefix, patterns + i, end - i);
strbuf_setlen(prefix, prefix->len - 1);
i = end;
}
}
static void find_longest_prefixes(struct string_list *out,
const char **patterns)
{
struct strvec sorted = STRVEC_INIT;
struct strbuf prefix = STRBUF_INIT;
strvec_pushv(&sorted, patterns);
QSORT(sorted.v, sorted.nr, qsort_strcmp);
find_longest_prefixes_1(out, &prefix, sorted.v, sorted.nr);
strvec_clear(&sorted);
strbuf_release(&prefix);
}
int refs_for_each_fullref_in_prefixes(struct ref_store *ref_store,
const char *namespace,
const char **patterns,
each_ref_fn fn, void *cb_data)
{
struct string_list prefixes = STRING_LIST_INIT_DUP;
struct string_list_item *prefix;
struct strbuf buf = STRBUF_INIT;
int ret = 0, namespace_len;
find_longest_prefixes(&prefixes, patterns);
if (namespace)
strbuf_addstr(&buf, namespace);
namespace_len = buf.len;
for_each_string_list_item(prefix, &prefixes) {
strbuf_addstr(&buf, prefix->string);
ret = refs_for_each_fullref_in(ref_store, buf.buf, fn, cb_data);
if (ret)
break;
strbuf_setlen(&buf, namespace_len);
}
string_list_clear(&prefixes, 0);
strbuf_release(&buf);
return ret;
}
static int refs_read_special_head(struct ref_store *ref_store,
const char *refname, struct object_id *oid,
struct strbuf *referent, unsigned int *type,
int *failure_errno)
{
struct strbuf full_path = STRBUF_INIT;
struct strbuf content = STRBUF_INIT;
int result = -1;
strbuf_addf(&full_path, "%s/%s", ref_store->gitdir, refname);
if (strbuf_read_file(&content, full_path.buf, 0) < 0)
goto done;
result = parse_loose_ref_contents(content.buf, oid, referent, type,
failure_errno);
done:
strbuf_release(&full_path);
strbuf_release(&content);
return result;
}
int refs_read_raw_ref(struct ref_store *ref_store, const char *refname,
struct object_id *oid, struct strbuf *referent,
unsigned int *type, int *failure_errno)
{
assert(failure_errno);
if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) {
return refs_read_special_head(ref_store, refname, oid, referent,
type, failure_errno);
}
return ref_store->be->read_raw_ref(ref_store, refname, oid, referent,
type, failure_errno);
}
refs: add ability for backends to special-case reading of symbolic refs Reading of symbolic and non-symbolic references is currently treated the same in reference backends: we always call `refs_read_raw_ref()` and then decide based on the returned flags what type it is. This has one downside though: symbolic references may be treated different from normal references in a backend from normal references. The packed-refs backend for example doesn't even know about symbolic references, and as a result it is pointless to even ask it for one. There are cases where we really only care about whether a reference is symbolic or not, but don't care about whether it exists at all or may be a non-symbolic reference. But it is not possible to optimize for this case right now, and as a consequence we will always first check for a loose reference to exist, and if it doesn't, we'll query the packed-refs backend for a known-to-not-be-symbolic reference. This is inefficient and requires us to search all packed references even though we know to not care for the result at all. Introduce a new function `refs_read_symbolic_ref()` which allows us to fix this case. This function will only ever return symbolic references and can thus optimize for the scenario layed out above. By default, if the backend doesn't provide an implementation for it, we just use the old code path and fall back to `read_raw_ref()`. But in case the backend provides its own, more efficient implementation, we will use that one instead. Note that this function is explicitly designed to not distinguish between missing references and non-symbolic references. If it did, we'd be forced to always search the packed-refs backend to see whether the symbolic reference the user asked for really doesn't exist, or if it exists as a non-symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:33:46 +01:00
int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refname,
struct strbuf *referent)
{
return ref_store->be->read_symbolic_ref(ref_store, refname, referent);
refs: add ability for backends to special-case reading of symbolic refs Reading of symbolic and non-symbolic references is currently treated the same in reference backends: we always call `refs_read_raw_ref()` and then decide based on the returned flags what type it is. This has one downside though: symbolic references may be treated different from normal references in a backend from normal references. The packed-refs backend for example doesn't even know about symbolic references, and as a result it is pointless to even ask it for one. There are cases where we really only care about whether a reference is symbolic or not, but don't care about whether it exists at all or may be a non-symbolic reference. But it is not possible to optimize for this case right now, and as a consequence we will always first check for a loose reference to exist, and if it doesn't, we'll query the packed-refs backend for a known-to-not-be-symbolic reference. This is inefficient and requires us to search all packed references even though we know to not care for the result at all. Introduce a new function `refs_read_symbolic_ref()` which allows us to fix this case. This function will only ever return symbolic references and can thus optimize for the scenario layed out above. By default, if the backend doesn't provide an implementation for it, we just use the old code path and fall back to `read_raw_ref()`. But in case the backend provides its own, more efficient implementation, we will use that one instead. Note that this function is explicitly designed to not distinguish between missing references and non-symbolic references. If it did, we'd be forced to always search the packed-refs backend to see whether the symbolic reference the user asked for really doesn't exist, or if it exists as a non-symbolic reference. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 10:33:46 +01:00
}
const char *refs_resolve_ref_unsafe(struct ref_store *refs,
const char *refname,
int resolve_flags,
2021-10-16 11:39:08 +02:00
struct object_id *oid,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
int *flags)
{
static struct strbuf sb_refname = STRBUF_INIT;
struct object_id unused_oid;
int unused_flags;
int symref_count;
if (!oid)
oid = &unused_oid;
if (!flags)
flags = &unused_flags;
*flags = 0;
if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
!refname_is_safe(refname))
return NULL;
/*
* repo_dwim_ref() uses REF_ISBROKEN to distinguish between
* missing refs and refs that were present but invalid,
* to complain about the latter to stderr.
*
* We don't know whether the ref exists, so don't set
* REF_ISBROKEN yet.
*/
*flags |= REF_BAD_NAME;
}
for (symref_count = 0; symref_count < SYMREF_MAXDEPTH; symref_count++) {
unsigned int read_flags = 0;
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
int failure_errno;
if (refs_read_raw_ref(refs, refname, oid, &sb_refname,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
&read_flags, &failure_errno)) {
*flags |= read_flags;
refs_resolve_ref_unsafe: handle d/f conflicts for writes If our call to refs_read_raw_ref() fails, we check errno to see if the ref is simply missing, or if we encountered a more serious error. If it's just missing, then in "write" mode (i.e., when RESOLVE_REFS_READING is not set), this is perfectly fine. However, checking for ENOENT isn't sufficient to catch all missing-ref cases. In the filesystem backend, we may also see EISDIR when we try to resolve "a" and "a/b" exists. Likewise, we may see ENOTDIR if we try to resolve "a/b" and "a" exists. In both of those cases, we know that our resolved ref doesn't exist, but we return an error (rather than reporting the refname and returning a null sha1). This has been broken for a long time, but nobody really noticed because the next step after resolving without the READING flag is usually to lock the ref and write it. But in both of those cases, the write will fail with the same errno due to the directory/file conflict. There are two cases where we can notice this, though: 1. If we try to write "a" and there's a leftover directory already at "a", even though there is no ref "a/b". The actual write is smart enough to move the empty "a" out of the way. This is reasonably rare, if only because the writing code has to do an independent resolution before trying its write (because the actual update_ref() code handles this case fine). The notes-merge code does this, and before the fix in the prior commit t3308 erroneously expected this case to fail. 2. When resolving symbolic refs, we typically do not use the READING flag because we want to resolve even symrefs that point to unborn refs. Even if those unborn refs could not actually be written because of d/f conflicts with existing refs. You can see this by asking "git symbolic-ref" to report the target of a symref pointing past a d/f conflict. We can fix the problem by recognizing the other "missing" errnos and treating them like ENOENT. This should be safe to do even for callers who are then going to actually write the ref, because the actual writing process will fail if the d/f conflict is a real one (and t1404 checks these cases). Arguably this should be the responsibility of the files-backend to normalize all "missing ref" errors into ENOENT (since something like EISDIR may not be meaningful at all to a database backend). However other callers of refs_read_raw_ref() may actually care about the distinction; putting this into resolve_ref() is the minimal fix for now. The new tests in t1401 use git-symbolic-ref, which is the most direct way to check the resolution by itself. Interestingly we actually had a test that setup this case already, but we only used it to verify that the funny state could be overwritten, not that it could be resolved. We also add a new test in t3200, as "branch -m" was the original motivation for looking into this. What happens is this: 0. HEAD is pointing to branch "a" 1. The user asks to rename "a" to "a/b". 2. We create "a/b" and delete "a". 3. We then try to update any worktree HEADs that point to the renamed ref (including the main repo HEAD). To do that, we have to resolve each HEAD. But now our HEAD is pointing at "a", and we get EISDIR due to the loose "a/b". As a result, we think there is no HEAD, and we do not update it. It now points to the bogus "a". Interestingly this case used to work, but only accidentally. Before 31824d180d (branch: fix branch renaming not updating HEADs correctly, 2017-08-24), we'd update any HEAD which we couldn't resolve. That was wrong, but it papered over the fact that we were incorrectly failing to resolve HEAD. So while the bug demonstrated by the git-symbolic-ref is quite old, the regression to "branch -m" is recent. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 16:42:17 +02:00
/* In reading mode, refs must eventually resolve */
if (resolve_flags & RESOLVE_REF_READING)
return NULL;
/*
* Otherwise a missing ref is OK. But the files backend
* may show errors besides ENOENT if there are
* similarly-named refs.
*/
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
if (failure_errno != ENOENT &&
failure_errno != EISDIR &&
failure_errno != ENOTDIR)
return NULL;
refs_resolve_ref_unsafe: handle d/f conflicts for writes If our call to refs_read_raw_ref() fails, we check errno to see if the ref is simply missing, or if we encountered a more serious error. If it's just missing, then in "write" mode (i.e., when RESOLVE_REFS_READING is not set), this is perfectly fine. However, checking for ENOENT isn't sufficient to catch all missing-ref cases. In the filesystem backend, we may also see EISDIR when we try to resolve "a" and "a/b" exists. Likewise, we may see ENOTDIR if we try to resolve "a/b" and "a" exists. In both of those cases, we know that our resolved ref doesn't exist, but we return an error (rather than reporting the refname and returning a null sha1). This has been broken for a long time, but nobody really noticed because the next step after resolving without the READING flag is usually to lock the ref and write it. But in both of those cases, the write will fail with the same errno due to the directory/file conflict. There are two cases where we can notice this, though: 1. If we try to write "a" and there's a leftover directory already at "a", even though there is no ref "a/b". The actual write is smart enough to move the empty "a" out of the way. This is reasonably rare, if only because the writing code has to do an independent resolution before trying its write (because the actual update_ref() code handles this case fine). The notes-merge code does this, and before the fix in the prior commit t3308 erroneously expected this case to fail. 2. When resolving symbolic refs, we typically do not use the READING flag because we want to resolve even symrefs that point to unborn refs. Even if those unborn refs could not actually be written because of d/f conflicts with existing refs. You can see this by asking "git symbolic-ref" to report the target of a symref pointing past a d/f conflict. We can fix the problem by recognizing the other "missing" errnos and treating them like ENOENT. This should be safe to do even for callers who are then going to actually write the ref, because the actual writing process will fail if the d/f conflict is a real one (and t1404 checks these cases). Arguably this should be the responsibility of the files-backend to normalize all "missing ref" errors into ENOENT (since something like EISDIR may not be meaningful at all to a database backend). However other callers of refs_read_raw_ref() may actually care about the distinction; putting this into resolve_ref() is the minimal fix for now. The new tests in t1401 use git-symbolic-ref, which is the most direct way to check the resolution by itself. Interestingly we actually had a test that setup this case already, but we only used it to verify that the funny state could be overwritten, not that it could be resolved. We also add a new test in t3200, as "branch -m" was the original motivation for looking into this. What happens is this: 0. HEAD is pointing to branch "a" 1. The user asks to rename "a" to "a/b". 2. We create "a/b" and delete "a". 3. We then try to update any worktree HEADs that point to the renamed ref (including the main repo HEAD). To do that, we have to resolve each HEAD. But now our HEAD is pointing at "a", and we get EISDIR due to the loose "a/b". As a result, we think there is no HEAD, and we do not update it. It now points to the bogus "a". Interestingly this case used to work, but only accidentally. Before 31824d180d (branch: fix branch renaming not updating HEADs correctly, 2017-08-24), we'd update any HEAD which we couldn't resolve. That was wrong, but it papered over the fact that we were incorrectly failing to resolve HEAD. So while the bug demonstrated by the git-symbolic-ref is quite old, the regression to "branch -m" is recent. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 16:42:17 +02:00
oidclr(oid);
if (*flags & REF_BAD_NAME)
*flags |= REF_ISBROKEN;
return refname;
}
*flags |= read_flags;
if (!(read_flags & REF_ISSYMREF)) {
if (*flags & REF_BAD_NAME) {
oidclr(oid);
*flags |= REF_ISBROKEN;
}
return refname;
}
refname = sb_refname.buf;
if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
oidclr(oid);
return refname;
}
if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
!refname_is_safe(refname))
return NULL;
*flags |= REF_ISBROKEN | REF_BAD_NAME;
}
}
return NULL;
}
/* backend functions */
int refs_init_db(struct strbuf *err)
{
struct ref_store *refs = get_main_ref_store(the_repository);
return refs->be->init_db(refs, err);
}
const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
struct object_id *oid, int *flags)
{
return refs_resolve_ref_unsafe(get_main_ref_store(the_repository), refname,
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
resolve_flags, oid, flags);
}
int resolve_gitlink_ref(const char *submodule, const char *refname,
struct object_id *oid)
{
struct ref_store *refs;
int flags;
refs = get_submodule_ref_store(submodule);
if (!refs)
return -1;
refs API: remove "failure_errno" from refs_resolve_ref_unsafe() Remove the now-unused "failure_errno" parameter from the refs_resolve_ref_unsafe() signature. In my recent 96f6623ada0 (Merge branch 'ab/refs-errno-cleanup', 2021-11-29) series we made all of its callers explicitly request the errno via an output parameter. As that series shows all but one caller ended up passing in a boilerplate "ignore_errno", since they only cared about whether the return value was NULL or not, i.e. if the ref could be resolved. There was one small issue with that series fixed with a follow-up in 31e39123695 (Merge branch 'ab/refs-errno-cleanup', 2022-01-14) a small bug in that series was fixed. After those two there was one caller left in sequencer.c that used the "failure_errno', but as of the preceding commit it uses a boilerplate "ignore_errno" instead. This leaves the public refs API without any use of "failure_errno" at all. We could still do with a bit of cleanup and generalization between refs.c and refs/files-backend.c before the "reftable" integration lands, but that's all internal to the reference code itself. So let's remove this output parameter. Not only isn't it used now, but it's unlikely that we'll want it again in the future. We'd like to slowly move the refs API to a more file-backend independent way of communicating error codes, having it use a "failure_errno" was only the first step in that direction. If this or any other function needs to communicate what specifically is wrong with the requested "refname" it'll be better to have the function set some output enum of well-defined error states than piggy-backend on "errno". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-26 15:37:01 +01:00
if (!refs_resolve_ref_unsafe(refs, refname, 0, oid, &flags) ||
is_null_oid(oid))
return -1;
return 0;
}
struct ref_store_hash_entry
{
struct hashmap_entry ent;
struct ref_store *refs;
/* NUL-terminated identifier of the ref store: */
char name[FLEX_ARRAY];
};
static int ref_store_hash_cmp(const void *cmp_data UNUSED,
const struct hashmap_entry *eptr,
const struct hashmap_entry *entry_or_key,
const void *keydata)
{
const struct ref_store_hash_entry *e1, *e2;
const char *name;
e1 = container_of(eptr, const struct ref_store_hash_entry, ent);
e2 = container_of(entry_or_key, const struct ref_store_hash_entry, ent);
name = keydata ? keydata : e2->name;
return strcmp(e1->name, name);
}
static struct ref_store_hash_entry *alloc_ref_store_hash_entry(
const char *name, struct ref_store *refs)
{
struct ref_store_hash_entry *entry;
FLEX_ALLOC_STR(entry, name, name);
hashmap_entry_init(&entry->ent, strhash(name));
entry->refs = refs;
return entry;
}
/* A hashmap of ref_stores, stored by submodule name: */
static struct hashmap submodule_ref_stores;
/* A hashmap of ref_stores, stored by worktree id: */
static struct hashmap worktree_ref_stores;
/*
* Look up a ref store by name. If that ref_store hasn't been
* registered yet, return NULL.
*/
static struct ref_store *lookup_ref_store_map(struct hashmap *map,
const char *name)
{
struct ref_store_hash_entry *entry;
unsigned int hash;
if (!map->tablesize)
/* It's initialized on demand in register_ref_store(). */
return NULL;
hash = strhash(name);
entry = hashmap_get_entry_from_hash(map, hash, name,
struct ref_store_hash_entry, ent);
return entry ? entry->refs : NULL;
}
/*
* Create, record, and return a ref_store instance for the specified
* gitdir.
*/
static struct ref_store *ref_store_init(struct repository *repo,
const char *gitdir,
unsigned int flags)
{
const char *be_name = "files";
struct ref_storage_be *be = find_ref_storage_backend(be_name);
struct ref_store *refs;
if (!be)
BUG("reference backend %s is unknown", be_name);
refs = be->init(repo, gitdir, flags);
return refs;
}
struct ref_store *get_main_ref_store(struct repository *r)
{
if (r->refs_private)
return r->refs_private;
if (!r->gitdir)
BUG("attempting to get main_ref_store outside of repository");
r->refs_private = ref_store_init(r, r->gitdir, REF_STORE_ALL_CAPS);
r->refs_private = maybe_debug_wrap_ref_store(r->gitdir, r->refs_private);
return r->refs_private;
}
/*
* Associate a ref store with a name. It is a fatal error to call this
* function twice for the same name.
*/
static void register_ref_store_map(struct hashmap *map,
const char *type,
struct ref_store *refs,
const char *name)
{
struct ref_store_hash_entry *entry;
if (!map->tablesize)
hashmap_init(map, ref_store_hash_cmp, NULL, 0);
entry = alloc_ref_store_hash_entry(name, refs);
if (hashmap_put(map, &entry->ent))
BUG("%s ref_store '%s' initialized twice", type, name);
}
struct ref_store *get_submodule_ref_store(const char *submodule)
{
struct strbuf submodule_sb = STRBUF_INIT;
struct ref_store *refs;
char *to_free = NULL;
size_t len;
struct repository *subrepo;
if (!submodule)
return NULL;
len = strlen(submodule);
while (len && is_dir_sep(submodule[len - 1]))
len--;
if (!len)
return NULL;
if (submodule[len])
/* We need to strip off one or more trailing slashes */
submodule = to_free = xmemdupz(submodule, len);
refs = lookup_ref_store_map(&submodule_ref_stores, submodule);
if (refs)
goto done;
strbuf_addstr(&submodule_sb, submodule);
if (!is_nonbare_repository_dir(&submodule_sb))
goto done;
if (submodule_to_gitdir(&submodule_sb, submodule))
goto done;
subrepo = xmalloc(sizeof(*subrepo));
/*
* NEEDSWORK: Make get_submodule_ref_store() work with arbitrary
* superprojects other than the_repository. This probably should be
* done by making it take a struct repository * parameter instead of a
* submodule path.
*/
if (repo_submodule_init(subrepo, the_repository, submodule,
null_oid())) {
free(subrepo);
goto done;
}
refs = ref_store_init(subrepo, submodule_sb.buf,
REF_STORE_READ | REF_STORE_ODB);
register_ref_store_map(&submodule_ref_stores, "submodule",
refs, submodule);
done:
strbuf_release(&submodule_sb);
free(to_free);
return refs;
}
struct ref_store *get_worktree_ref_store(const struct worktree *wt)
{
struct ref_store *refs;
const char *id;
if (wt->is_current)
return get_main_ref_store(the_repository);
id = wt->id ? wt->id : "/";
refs = lookup_ref_store_map(&worktree_ref_stores, id);
if (refs)
return refs;
if (wt->id)
refs = ref_store_init(the_repository,
git_common_path("worktrees/%s", wt->id),
REF_STORE_ALL_CAPS);
else
refs = ref_store_init(the_repository,
get_git_common_dir(),
REF_STORE_ALL_CAPS);
if (refs)
register_ref_store_map(&worktree_ref_stores, "worktree",
refs, id);
return refs;
}
void base_ref_store_init(struct ref_store *refs, struct repository *repo,
const char *path, const struct ref_storage_be *be)
{
refs->be = be;
refs->repo = repo;
refs->gitdir = xstrdup(path);
}
/* backend functions */
int refs_pack_refs(struct ref_store *refs, unsigned int flags)
{
return refs->be->pack_refs(refs, flags);
}
refs: switch peel_ref() to peel_iterated_oid() The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char *refname, const struct object_id *oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 20:44:43 +01:00
int peel_iterated_oid(const struct object_id *base, struct object_id *peeled)
{
refs: switch peel_ref() to peel_iterated_oid() The peel_ref() interface is confusing and error-prone: - it's typically used by ref iteration callbacks that have both a refname and oid. But since they pass only the refname, we may load the ref value from the filesystem again. This is inefficient, but also means we are open to a race if somebody simultaneously updates the ref. E.g., this: int some_ref_cb(const char *refname, const struct object_id *oid, ...) { if (!peel_ref(refname, &peeled)) printf("%s peels to %s", oid_to_hex(oid), oid_to_hex(&peeled); } could print nonsense. It is correct to say "refname peels to..." (you may see the "before" value or the "after" value, either of which is consistent), but mentioning both oids may be mixing before/after values. Worse, whether this is possible depends on whether the optimization to read from the current iterator value kicks in. So it is actually not possible with: for_each_ref(some_ref_cb); but it _is_ possible with: head_ref(some_ref_cb); which does not use the iterator mechanism (though in practice, HEAD should never peel to anything, so this may not be triggerable). - it must take a fully-qualified refname for the read_ref_full() code path to work. Yet we routinely pass it partial refnames from callbacks to for_each_tag_ref(), etc. This happens to work when iterating because there we do not call read_ref_full() at all, and only use the passed refname to check if it is the same as the iterator. But the requirements for the function parameters are quite unclear. Instead of taking a refname, let's instead take an oid. That fixes both problems. It's a little funny for a "ref" function not to involve refs at all. The key thing is that it's optimizing under the hood based on having access to the ref iterator. So let's change the name to make it clear why you'd want this function versus just peel_object(). There are two other directions I considered but rejected: - we could pass the peel information into the each_ref_fn callback. However, we don't know if the caller actually wants it or not. For packed-refs, providing it is essentially free. But for loose refs, we actually have to peel the object, which would be wasteful in most cases. We could likewise pass in a flag to the callback indicating whether the peeled information is known, but that complicates those callbacks, as they then have to decide whether to manually peel themselves. Plus it requires changing the interface of every callback, whether they care about peeling or not, and there are many of them. - we could make a function to return the peeled value of the current iterated ref (computing it if necessary), and BUG() otherwise. I.e.: int peel_current_iterated_ref(struct object_id *out); Each of the current callers is an each_ref_fn callback, so they'd mostly be happy. But: - we use those callbacks with functions like head_ref(), which do not use the iteration code. So we'd need to handle the fallback case there, anyway. - it's possible that a caller would want to call into generic code that sometimes is used during iteration and sometimes not. This encapsulates the logic to do the fast thing when possible, and fallback when necessary. The implementation is mostly obvious, but I want to call out a few things in the patch: - the test-tool coverage for peel_ref() is now meaningless, as it all collapses to a single peel_object() call (arguably they were pretty uninteresting before; the tricky part of that function is the fast-path we see during iteration, but these calls didn't trigger that). I've just dropped it entirely, though note that some other tests relied on the tags we created; I've moved that creation to the tests where it matters. - we no longer need to take a ref_store parameter, since we'd never look up a ref now. We do still rely on a global "current iterator" variable which _could_ be kept per-ref-store. But in practice this is only useful if there are multiple recursive iterations, at which point the more appropriate solution is probably a stack of iterators. No caller used the actual ref-store parameter anyway (they all call the wrapper that passes the_repository). - the original only kicked in the optimization when the "refname" pointer matched (i.e., not string comparison). We do likewise with the "oid" parameter here, but fall back to doing an actual oideq() call. This in theory lets us kick in the optimization more often, though in practice no current caller cares. It should never be wrong, though (peeling is a property of an object, so two refs pointing to the same object would peel identically). - the original took care not to touch the peeled out-parameter unless we found something to put in it. But no caller cares about this, and anyway, it is enforced by peel_object() itself (and even in the optimized iterator case, that's where we eventually end up). We can shorten the code and avoid an extra copy by just passing the out-parameter through the stack. Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-20 20:44:43 +01:00
if (current_ref_iter &&
(current_ref_iter->oid == base ||
oideq(current_ref_iter->oid, base)))
return ref_iterator_peel(current_ref_iter, peeled);
return peel_object(base, peeled) ? -1 : 0;
}
int refs_create_symref(struct ref_store *refs,
const char *ref_target,
const char *refs_heads_master,
const char *logmsg)
{
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
char *msg;
int retval;
msg = normalize_reflog_message(logmsg);
retval = refs->be->create_symref(refs, ref_target, refs_heads_master,
msg);
free(msg);
return retval;
}
int create_symref(const char *ref_target, const char *refs_heads_master,
const char *logmsg)
{
return refs_create_symref(get_main_ref_store(the_repository), ref_target,
refs_heads_master, logmsg);
}
int ref_update_reject_duplicates(struct string_list *refnames,
struct strbuf *err)
{
size_t i, n = refnames->nr;
assert(err);
for (i = 1; i < n; i++) {
int cmp = strcmp(refnames->items[i - 1].string,
refnames->items[i].string);
if (!cmp) {
strbuf_addf(err,
_("multiple updates for ref '%s' not allowed"),
refnames->items[i].string);
return 1;
} else if (cmp > 0) {
BUG("ref_update_reject_duplicates() received unsorted list");
}
}
return 0;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
static int run_transaction_hook(struct ref_transaction *transaction,
const char *state)
{
struct child_process proc = CHILD_PROCESS_INIT;
struct strbuf buf = STRBUF_INIT;
refs: remove lookup cache for reference-transaction hook When adding the reference-transaction hook, there were concerns about the performance impact it may have on setups which do not make use of the new hook at all. After all, it gets executed every time a reftx is prepared, committed or aborted, which linearly scales with the number of reference-transactions created per session. And as there are code paths like `git push` which create a new transaction for each reference to be updated, this may translate to calling `find_hook()` quite a lot. To address this concern, a cache was added with the intention to not repeatedly do negative hook lookups. Turns out this cache caused a regression, which was fixed via e5256c82e5 (refs: fix interleaving hook calls with reference-transaction hook, 2020-08-07). In the process of discussing the fix, we realized that the cache doesn't really help even in the negative-lookup case. While performance tests added to benchmark this did show a slight improvement in the 1% range, this really doesn't warrent having a cache. Furthermore, it's quite flaky, too. E.g. running it twice in succession produces the following results: Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.2: update-ref 2.79(2.16+0.74) 2.73(2.12+0.71) -2.2% 1400.3: update-ref --stdin 0.22(0.08+0.14) 0.21(0.08+0.12) -4.5% Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.2: update-ref 2.70(2.09+0.72) 2.74(2.13+0.71) +1.5% 1400.3: update-ref --stdin 0.21(0.10+0.10) 0.21(0.08+0.13) +0.0% One case notably absent from those benchmarks is a single executable searching for the hook hundreds of times, which is exactly the case for which the negative cache was added. p1400.2 will spawn a new update-ref for each transaction and p1400.3 only has a single reference-transaction for all reference updates. So this commit adds a third benchmark, which performs an non-atomic push of a thousand references. This will create a new reference transaction per reference. But even for this case, the negative cache doesn't consistently improve performance: Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.4: nonatomic push 6.63(6.50+0.13) 6.81(6.67+0.14) +2.7% 1400.4: nonatomic push 6.35(6.21+0.14) 6.39(6.23+0.16) +0.6% 1400.4: nonatomic push 6.43(6.31+0.13) 6.42(6.28+0.15) -0.2% So let's just remove the cache altogether to simplify the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-25 12:35:24 +02:00
const char *hook;
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
int ret = 0, i;
refs: remove lookup cache for reference-transaction hook When adding the reference-transaction hook, there were concerns about the performance impact it may have on setups which do not make use of the new hook at all. After all, it gets executed every time a reftx is prepared, committed or aborted, which linearly scales with the number of reference-transactions created per session. And as there are code paths like `git push` which create a new transaction for each reference to be updated, this may translate to calling `find_hook()` quite a lot. To address this concern, a cache was added with the intention to not repeatedly do negative hook lookups. Turns out this cache caused a regression, which was fixed via e5256c82e5 (refs: fix interleaving hook calls with reference-transaction hook, 2020-08-07). In the process of discussing the fix, we realized that the cache doesn't really help even in the negative-lookup case. While performance tests added to benchmark this did show a slight improvement in the 1% range, this really doesn't warrent having a cache. Furthermore, it's quite flaky, too. E.g. running it twice in succession produces the following results: Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.2: update-ref 2.79(2.16+0.74) 2.73(2.12+0.71) -2.2% 1400.3: update-ref --stdin 0.22(0.08+0.14) 0.21(0.08+0.12) -4.5% Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.2: update-ref 2.70(2.09+0.72) 2.74(2.13+0.71) +1.5% 1400.3: update-ref --stdin 0.21(0.10+0.10) 0.21(0.08+0.13) +0.0% One case notably absent from those benchmarks is a single executable searching for the hook hundreds of times, which is exactly the case for which the negative cache was added. p1400.2 will spawn a new update-ref for each transaction and p1400.3 only has a single reference-transaction for all reference updates. So this commit adds a third benchmark, which performs an non-atomic push of a thousand references. This will create a new reference transaction per reference. But even for this case, the negative cache doesn't consistently improve performance: Test master pks-reftx-hook-remove-cache -------------------------------------------------------------------------- 1400.4: nonatomic push 6.63(6.50+0.13) 6.81(6.67+0.14) +2.7% 1400.4: nonatomic push 6.35(6.21+0.14) 6.39(6.23+0.16) +0.6% 1400.4: nonatomic push 6.43(6.31+0.13) 6.42(6.28+0.15) -0.2% So let's just remove the cache altogether to simplify the code. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-25 12:35:24 +02:00
hook = find_hook("reference-transaction");
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
if (!hook)
return ret;
strvec_pushl(&proc.args, hook, state, NULL);
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
proc.in = -1;
proc.stdout_to_stderr = 1;
proc.trace2_hook_name = "reference-transaction";
ret = start_command(&proc);
if (ret)
return ret;
sigchain_push(SIGPIPE, SIG_IGN);
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
strbuf_reset(&buf);
strbuf_addf(&buf, "%s %s %s\n",
oid_to_hex(&update->old_oid),
oid_to_hex(&update->new_oid),
update->refname);
if (write_in_full(proc.in, buf.buf, buf.len) < 0) {
if (errno != EPIPE) {
/* Don't leak errno outside this API */
errno = 0;
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
ret = -1;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
break;
}
}
close(proc.in);
sigchain_pop(SIGPIPE);
strbuf_release(&buf);
ret |= finish_command(&proc);
return ret;
}
int ref_transaction_prepare(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
int ret;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
/* Good. */
break;
case REF_TRANSACTION_PREPARED:
BUG("prepare called twice on reference transaction");
break;
case REF_TRANSACTION_CLOSED:
BUG("prepare called on a closed reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
if (refs->repo->objects->odb->disable_ref_updates) {
strbuf_addstr(err,
_("ref updates forbidden inside quarantine environment"));
return -1;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
ret = refs->be->transaction_prepare(refs, transaction, err);
if (ret)
return ret;
ret = run_transaction_hook(transaction, "prepared");
if (ret) {
ref_transaction_abort(transaction, err);
die(_("ref updates aborted by hook"));
}
return 0;
}
int ref_transaction_abort(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
int ret = 0;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
/* No need to abort explicitly. */
break;
case REF_TRANSACTION_PREPARED:
ret = refs->be->transaction_abort(refs, transaction, err);
break;
case REF_TRANSACTION_CLOSED:
BUG("abort called on a closed reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
run_transaction_hook(transaction, "aborted");
ref_transaction_free(transaction);
return ret;
}
int ref_transaction_commit(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
int ret;
switch (transaction->state) {
case REF_TRANSACTION_OPEN:
/* Need to prepare first. */
ret = ref_transaction_prepare(transaction, err);
if (ret)
return ret;
break;
case REF_TRANSACTION_PREPARED:
/* Fall through to finish. */
break;
case REF_TRANSACTION_CLOSED:
BUG("commit called on a closed reference transaction");
break;
default:
BUG("unexpected reference transaction state");
break;
}
refs: implement reference transaction hook The low-level reference transactions used to update references are currently completely opaque to the user. While certainly desirable in most usecases, there are some which might want to hook into the transaction to observe all queued reference updates as well as observing the abortion or commit of a prepared transaction. One such usecase would be to have a set of replicas of a given Git repository, where we perform Git operations on all of the repositories at once and expect the outcome to be the same in all of them. While there exist hooks already for a certain subset of Git commands that could be used to implement a voting mechanism for this, many others currently don't have any mechanism for this. The above scenario is the motivation for the new "reference-transaction" hook that reaches directly into Git's reference transaction mechanism. The hook receives as parameter the current state the transaction was moved to ("prepared", "committed" or "aborted") and gets via its standard input all queued reference updates. While the exit code gets ignored in the "committed" and "aborted" states, a non-zero exit code in the "prepared" state will cause the transaction to be aborted prematurely. Given the usecase described above, a voting mechanism can now be implemented via this hook: as soon as it gets called, it will take all of stdin and use it to cast a vote to a central service. When all replicas of the repository agree, the hook will exit with zero, otherwise it will abort the transaction by returning non-zero. The most important upside is that this will catch _all_ commands writing references at once, allowing to implement strong consistency for reference updates via a single mechanism. In order to test the impact on the case where we don't have any "reference-transaction" hook installed in the repository, this commit introduce two new performance tests for git-update-refs(1). Run against an empty repository, it produces the following results: Test origin/master HEAD -------------------------------------------------------------------- 1400.2: update-ref 2.70(2.10+0.71) 2.71(2.10+0.73) +0.4% 1400.3: update-ref --stdin 0.21(0.09+0.11) 0.21(0.07+0.14) +0.0% The performance test p1400.2 creates, updates and deletes a branch a thousand times, thus averaging runtime of git-update-refs over 3000 invocations. p1400.3 instead calls `git-update-refs --stdin` three times and queues a thousand creations, updates and deletes respectively. As expected, p1400.3 consistently shows no noticeable impact, as for each batch of updates there's a single call to access(3P) for the negative hook lookup. On the other hand, for p1400.2, one can see an impact caused by this patchset. But doing five runs of the performance tests where each one was run with GIT_PERF_REPEAT_COUNT=10, the overhead ranged from -1.5% to +1.1%. These inconsistent performance numbers can be explained by the overhead of spawning 3000 processes. This shows that the overhead of assembling the hook path and executing access(3P) once to check if it's there is mostly outweighed by the operating system's overhead. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 08:56:14 +02:00
ret = refs->be->transaction_finish(refs, transaction, err);
if (!ret)
run_transaction_hook(transaction, "committed");
return ret;
}
int refs_verify_refname_available(struct ref_store *refs,
const char *refname,
const struct string_list *extras,
const struct string_list *skip,
struct strbuf *err)
{
const char *slash;
const char *extra_refname;
struct strbuf dirname = STRBUF_INIT;
struct strbuf referent = STRBUF_INIT;
struct object_id oid;
unsigned int type;
struct ref_iterator *iter;
int ok;
int ret = -1;
/*
* For the sake of comments in this function, suppose that
* refname is "refs/foo/bar".
*/
assert(err);
strbuf_grow(&dirname, strlen(refname) + 1);
for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
/*
* Just saying "Is a directory" when we e.g. can't
* lock some multi-level ref isn't very informative,
* the user won't be told *what* is a directory, so
* let's not use strerror() below.
*/
int ignore_errno;
/* Expand dirname to the new prefix, not including the trailing slash: */
strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
/*
* We are still at a leading dir of the refname (e.g.,
* "refs/foo"; if there is a reference with that name,
* it is a conflict, *unless* it is in skip.
*/
if (skip && string_list_has_string(skip, dirname.buf))
continue;
if (!refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
&type, &ignore_errno)) {
strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
dirname.buf, refname);
goto cleanup;
}
if (extras && string_list_has_string(extras, dirname.buf)) {
strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
refname, dirname.buf);
goto cleanup;
}
}
/*
* We are at the leaf of our refname (e.g., "refs/foo/bar").
* There is no point in searching for a reference with that
* name, because a refname isn't considered to conflict with
* itself. But we still need to check for references whose
* names are in the "refs/foo/bar/" namespace, because they
* *do* conflict.
*/
strbuf_addstr(&dirname, refname + dirname.len);
strbuf_addch(&dirname, '/');
iter = refs_ref_iterator_begin(refs, dirname.buf, 0,
DO_FOR_EACH_INCLUDE_BROKEN);
while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
if (skip &&
string_list_has_string(skip, iter->refname))
continue;
strbuf_addf(err, _("'%s' exists; cannot create '%s'"),
iter->refname, refname);
ref_iterator_abort(iter);
goto cleanup;
}
if (ok != ITER_DONE)
BUG("error while iterating over references");
extra_refname = find_descendant_ref(dirname.buf, extras, skip);
if (extra_refname)
strbuf_addf(err, _("cannot process '%s' and '%s' at the same time"),
refname, extra_refname);
else
ret = 0;
cleanup:
strbuf_release(&referent);
strbuf_release(&dirname);
return ret;
}
int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data)
{
struct ref_iterator *iter;
struct do_for_each_ref_help hp = { fn, cb_data };
iter = refs->be->reflog_iterator_begin(refs);
return do_for_each_repo_ref_iterator(the_repository, iter,
do_for_each_ref_helper, &hp);
}
int for_each_reflog(each_ref_fn fn, void *cb_data)
{
return refs_for_each_reflog(get_main_ref_store(the_repository), fn, cb_data);
}
int refs_for_each_reflog_ent_reverse(struct ref_store *refs,
const char *refname,
each_reflog_ent_fn fn,
void *cb_data)
{
return refs->be->for_each_reflog_ent_reverse(refs, refname,
fn, cb_data);
}
int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn,
void *cb_data)
{
return refs_for_each_reflog_ent_reverse(get_main_ref_store(the_repository),
refname, fn, cb_data);
}
int refs_for_each_reflog_ent(struct ref_store *refs, const char *refname,
each_reflog_ent_fn fn, void *cb_data)
{
return refs->be->for_each_reflog_ent(refs, refname, fn, cb_data);
}
int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn,
void *cb_data)
{
return refs_for_each_reflog_ent(get_main_ref_store(the_repository), refname,
fn, cb_data);
}
int refs_reflog_exists(struct ref_store *refs, const char *refname)
{
return refs->be->reflog_exists(refs, refname);
}
int reflog_exists(const char *refname)
{
return refs_reflog_exists(get_main_ref_store(the_repository), refname);
}
int refs_create_reflog(struct ref_store *refs, const char *refname,
struct strbuf *err)
{
return refs->be->create_reflog(refs, refname, err);
}
int safe_create_reflog(const char *refname, struct strbuf *err)
{
return refs_create_reflog(get_main_ref_store(the_repository), refname,
err);
}
int refs_delete_reflog(struct ref_store *refs, const char *refname)
{
return refs->be->delete_reflog(refs, refname);
}
int delete_reflog(const char *refname)
{
return refs_delete_reflog(get_main_ref_store(the_repository), refname);
}
int refs_reflog_expire(struct ref_store *refs,
const char *refname,
unsigned int flags,
reflog_expiry_prepare_fn prepare_fn,
reflog_expiry_should_prune_fn should_prune_fn,
reflog_expiry_cleanup_fn cleanup_fn,
void *policy_cb_data)
{
return refs->be->reflog_expire(refs, refname, flags,
prepare_fn, should_prune_fn,
cleanup_fn, policy_cb_data);
}
int reflog_expire(const char *refname,
unsigned int flags,
reflog_expiry_prepare_fn prepare_fn,
reflog_expiry_should_prune_fn should_prune_fn,
reflog_expiry_cleanup_fn cleanup_fn,
void *policy_cb_data)
{
return refs_reflog_expire(get_main_ref_store(the_repository),
refname, flags,
prepare_fn, should_prune_fn,
cleanup_fn, policy_cb_data);
}
int initial_ref_transaction_commit(struct ref_transaction *transaction,
struct strbuf *err)
{
struct ref_store *refs = transaction->ref_store;
return refs->be->initial_transaction_commit(refs, transaction, err);
}
void ref_transaction_for_each_queued_update(struct ref_transaction *transaction,
ref_transaction_for_each_queued_update_fn cb,
void *cb_data)
{
int i;
for (i = 0; i < transaction->nr; i++) {
struct ref_update *update = transaction->updates[i];
cb(update->refname,
(update->flags & REF_HAVE_OLD) ? &update->old_oid : NULL,
(update->flags & REF_HAVE_NEW) ? &update->new_oid : NULL,
cb_data);
}
}
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
int refs_delete_refs(struct ref_store *refs, const char *logmsg,
struct string_list *refnames, unsigned int flags)
{
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
char *msg;
int retval;
msg = normalize_reflog_message(logmsg);
retval = refs->be->delete_refs(refs, msg, refnames, flags);
free(msg);
return retval;
}
int delete_refs(const char *msg, struct string_list *refnames,
unsigned int flags)
{
return refs_delete_refs(get_main_ref_store(the_repository), msg, refnames, flags);
}
int refs_rename_ref(struct ref_store *refs, const char *oldref,
const char *newref, const char *logmsg)
{
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
char *msg;
int retval;
msg = normalize_reflog_message(logmsg);
retval = refs->be->rename_ref(refs, oldref, newref, msg);
free(msg);
return retval;
}
int rename_ref(const char *oldref, const char *newref, const char *logmsg)
{
return refs_rename_ref(get_main_ref_store(the_repository), oldref, newref, logmsg);
}
branch: add a --copy (-c) option to go with --move (-m) Add the ability to --copy a branch and its reflog and configuration, this uses the same underlying machinery as the --move (-m) option except the reflog and configuration is copied instead of being moved. This is useful for e.g. copying a topic branch to a new version, e.g. work to work-2 after submitting the work topic to the list, while preserving all the tracking info and other configuration that goes with the branch, and unlike --move keeping the other already-submitted branch around for reference. Like --move, when the source branch is the currently checked out branch the HEAD is moved to the destination branch. In the case of --move we don't really have a choice (other than remaining on a detached HEAD) and in order to keep the functionality consistent, we are doing it in similar way for --copy too. The most common usage of this feature is expected to be moving to a new topic branch which is a copy of the current one, in that case moving to the target branch is what the user wants, and doesn't unexpectedly behave differently than --move would. One outstanding caveat of this implementation is that: git checkout maint && git checkout master && git branch -c topic && git checkout - Will check out 'maint' instead of 'master'. This is because the @{-N} feature (or its -1 shorthand "-") relies on HEAD reflogs created by the checkout command, so in this case we'll checkout maint instead of master, as the user might expect. What to do about that is left to a future change. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Sahil Dua <sahildua2305@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-18 23:19:16 +02:00
int refs_copy_existing_ref(struct ref_store *refs, const char *oldref,
const char *newref, const char *logmsg)
{
reflog: cleanse messages in the refs.c layer Regarding reflog messages: - We expect that a reflog message consists of a single line. The file format used by the files backend may add a LF after the message as a delimiter, and output by commands like "git log -g" may complete such an incomplete line by adding a LF at the end, but philosophically, the terminating LF is not a part of the message. - We however allow callers of refs API to supply a random sequence of NUL terminated bytes. We cleanse caller-supplied message by squashing a run of whitespaces into a SP, and by trimming trailing whitespace, before storing the message. This is how we tolerate, instead of erring out, a message with LF in it (be it at the end, in the middle, or both). Currently, the cleansing of the reflog message is done by the files backend, before the log is written out. This is sufficient with the current code, as that is the only backend that writes reflogs. But new backends can be added that write reflogs, and we'd want the resulting log message we would read out of "log -g" the same no matter what backend is used, and moving the code to do so to the generic layer is a way to do so. An added benefit is that the "cleansing" function could be updated later, independent from individual backends, to e.g. allow multi-line log messages if we wanted to, and when that happens, it would help a lot to ensure we covered all bases if the cleansing function (which would be updated) is called from the generic layer. Side note: I am not interested in supporting multi-line reflog messages right at the moment (nobody is asking for it), but I envision that instead of the "squash a run of whitespaces into a SP and rtrim" cleansing, we can %urlencode problematic bytes in the message *AND* append a SP at the end, when a new version of Git that supports multi-line and/or verbatim reflog messages writes a reflog record. The reading side can detect the presense of SP at the end (which should have been rtrimmed out if it were written by existing versions of Git) as a signal that decoding %urlencode recovers the original reflog message. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 19:19:53 +02:00
char *msg;
int retval;
msg = normalize_reflog_message(logmsg);
retval = refs->be->copy_ref(refs, oldref, newref, msg);
free(msg);
return retval;
branch: add a --copy (-c) option to go with --move (-m) Add the ability to --copy a branch and its reflog and configuration, this uses the same underlying machinery as the --move (-m) option except the reflog and configuration is copied instead of being moved. This is useful for e.g. copying a topic branch to a new version, e.g. work to work-2 after submitting the work topic to the list, while preserving all the tracking info and other configuration that goes with the branch, and unlike --move keeping the other already-submitted branch around for reference. Like --move, when the source branch is the currently checked out branch the HEAD is moved to the destination branch. In the case of --move we don't really have a choice (other than remaining on a detached HEAD) and in order to keep the functionality consistent, we are doing it in similar way for --copy too. The most common usage of this feature is expected to be moving to a new topic branch which is a copy of the current one, in that case moving to the target branch is what the user wants, and doesn't unexpectedly behave differently than --move would. One outstanding caveat of this implementation is that: git checkout maint && git checkout master && git branch -c topic && git checkout - Will check out 'maint' instead of 'master'. This is because the @{-N} feature (or its -1 shorthand "-") relies on HEAD reflogs created by the checkout command, so in this case we'll checkout maint instead of master, as the user might expect. What to do about that is left to a future change. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Sahil Dua <sahildua2305@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-18 23:19:16 +02:00
}
int copy_existing_ref(const char *oldref, const char *newref, const char *logmsg)
{
return refs_copy_existing_ref(get_main_ref_store(the_repository), oldref, newref, logmsg);
branch: add a --copy (-c) option to go with --move (-m) Add the ability to --copy a branch and its reflog and configuration, this uses the same underlying machinery as the --move (-m) option except the reflog and configuration is copied instead of being moved. This is useful for e.g. copying a topic branch to a new version, e.g. work to work-2 after submitting the work topic to the list, while preserving all the tracking info and other configuration that goes with the branch, and unlike --move keeping the other already-submitted branch around for reference. Like --move, when the source branch is the currently checked out branch the HEAD is moved to the destination branch. In the case of --move we don't really have a choice (other than remaining on a detached HEAD) and in order to keep the functionality consistent, we are doing it in similar way for --copy too. The most common usage of this feature is expected to be moving to a new topic branch which is a copy of the current one, in that case moving to the target branch is what the user wants, and doesn't unexpectedly behave differently than --move would. One outstanding caveat of this implementation is that: git checkout maint && git checkout master && git branch -c topic && git checkout - Will check out 'maint' instead of 'master'. This is because the @{-N} feature (or its -1 shorthand "-") relies on HEAD reflogs created by the checkout command, so in this case we'll checkout maint instead of master, as the user might expect. What to do about that is left to a future change. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Sahil Dua <sahildua2305@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-18 23:19:16 +02:00
}