Currently, the API for iterating over references is via a family of
for_each_ref()-type functions that invoke a callback function for each
selected reference. All of these eventually call do_for_each_ref(),
which knows how to do one thing: iterate in parallel through two
ref_caches, one for loose and one for packed refs, giving loose
references precedence over packed refs. This is rather complicated code,
and is quite specialized to the files backend. It also requires callers
to encapsulate their work into a callback function, which often means
that they have to define and use a "cb_data" struct to manage their
context.
The current design is already bursting at the seams, and will become
even more awkward in the upcoming world of multiple reference storage
backends:
* Per-worktree vs. shared references are currently handled via a kludge
in git_path() rather than iterating over each part of the reference
namespace separately and merging the results. This kludge will cease
to work when we have multiple reference storage backends.
* The current scheme is inflexible. What if we sometimes want to bypass
the ref_cache, or use it only for packed or only for loose refs? What
if we want to store symbolic refs in one type of storage backend and
non-symbolic ones in another?
In the future, each reference backend will need to define its own way of
iterating over references. The crux of the problem with the current
design is that it is impossible to compose for_each_ref()-style
iterations, because the flow of control is owned by the for_each_ref()
function. There is nothing that a caller can do but iterate through all
references in a single burst, so there is no way for it to interleave
references from multiple backends and present the result to the rest of
the world as a single compound backend.
This commit introduces a new iteration primitive for references: a
ref_iterator. A ref_iterator is a polymorphic object that a reference
storage backend can be asked to instantiate. There are three functions
that can be applied to a ref_iterator:
* ref_iterator_advance(): move to the next reference in the iteration
* ref_iterator_abort(): end the iteration before it is exhausted
* ref_iterator_peel(): peel the reference currently being looked at
Iterating using a ref_iterator leaves the flow of control in the hands
of the caller, which means that ref_iterators from multiple
sources (e.g., loose and packed refs) can be composed and presented to
the world as a single compound ref_iterator.
It also means that the backend code for implementing reference iteration
will sometimes be more complicated. For example, the
cache_ref_iterator (which iterates over a ref_cache) can't use the C
stack to recurse; instead, it must manage its own stack internally as
explicit data structures. There is also a lot of boilerplate connected
with object-oriented programming in C.
Eventually, end-user callers will be able to be written in a more
natural way—managing their own flow of control rather than having to
work via callbacks. Since there will only be a few reference backends
but there are many consumers of this API, this is a good tradeoff.
More importantly, we gain composability, and especially the possibility
of writing interchangeable parts that can work with any ref_iterator.
For example, merge_ref_iterator implements a generic way of merging the
contents of any two ref_iterators. It is used to merge loose + packed
refs as part of the implementation of the files_ref_iterator. But it
will also be possible to use it to merge other pairs of reference
sources (e.g., per-worktree vs. shared refs).
Another example is prefix_ref_iterator, which can be used to trim a
prefix off the front of reference names before presenting them to the
caller (e.g., "refs/heads/master" -> "master").
In this patch, we introduce the iterator abstraction and many utilities,
and implement a reference iterator for the files ref storage backend.
(I've written several other obvious utilities, for example a generic way
to filter references being iterated over. These will probably be useful
in the future. But they are not needed for this patch series, so I am
not including them at this time.)
In a moment we will rewrite do_for_each_ref() to work via reference
iterators (allowing some special-purpose code to be discarded), and do
something similar for reflogs. In future patch series, we will expose
the ref_iterator abstraction in the public refs API so that callers can
use it directly.
Implementation note: I tried abstracting this a layer further to allow
generic iterators (over arbitrary types of objects) and generic
utilities like a generic merge_iterator. But the implementation in C was
very cumbersome, involving (in my opinion) too much boilerplate and too
much unsafe casting, some of which would have had to be done on the
caller side. However, I did put a few iterator-related constants in a
top-level header file, iterator.h, as they will be useful in a moment to
implement iteration over directory trees and possibly other types of
iterators in the future.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract new function ref_resolves_to_object() from
entry_resolves_to_object(). It can be used even if there is no ref_entry
at hand.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Free up the old name for a more general purpose.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is not a nonbare repository where a submodule is supposedly
located, then don't instantiate a ref_cache for it.
The analogous check can be removed from resolve_gitlink_ref().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will be useful for passing REF_NODEREF through.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the context of the for_each_ref() functions, call the prefix that
references must start with "prefix". (In some places it was called
"base".) This is clearer, and also prevents confusion with another
planned use of the word "base".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now lock_ref_sha1_basic() is only called with flags==REF_NODEREF. So we
don't have to handle other cases anymore.
This enables several simplifications, the most interesting of which come
from the fact that ref_lock::orig_ref_name is now always the same as
ref_lock::ref_name:
* Remove ref_lock::orig_ref_name
* Remove local variable orig_refname from lock_ref_sha1_basic()
* ref_name can be initialize once and its value reused
* commit_ref_update() never has to write to the reflog for
lock->orig_ref_name
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
If a transaction includes a non-NODEREF update to a symbolic reference,
we don't have to look it up in lock_ref_for_update(). The reference will
be dereferenced anyway when the split-off update is processed.
This change requires that we store a backpointer from the split-off
update to its parent update, for two reasons:
* We still want to report the original reference name in error messages.
So if an error occurs when checking the split-off update's old_sha1,
walk the parent_update pointers back to find the original reference
name, and report that one.
* We still need to write the old_sha1 of the symref to its reflog. So
after we read the split-off update's reference value, walk the
parent_update pointers back and fill in their old_sha1 fields.
Aside from eliminating unnecessary reads, this change fixes a
subtle (though not very serious) race condition: in the old code, the
old_sha1 of the symref was resolved before the reference that it pointed
at was locked. So it was possible that the old_sha1 value logged to the
symref's reflog could be wrong if another process changed the downstream
reference before it was locked.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Before the previous patch, our first read of the reference happened
before the reference was locked, so we couldn't trust its value and had
to read it again. But now that our first read of the reference happens
after acquiring the lock, there is no need to read it a second time. So
move the read_ref_full() call into the (update->type & REF_ISSYMREF)
block.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Before committing ref updates, split symbolic ref updates into two
parts: an update to the underlying ref, and a log-only update to the
symbolic ref. This ensures that both references are locked correctly
during the transaction, including while their reflogs are updated.
Similarly, if the reference pointed to by HEAD is modified directly, add
a separate log-only update to HEAD, rather than leaving the job of
updating HEAD's reflog to commit_ref_update(). This change ensures that
HEAD is locked correctly while its reflog is being modified, as well as
being cheaper (HEAD only needs to be resolved once).
This makes use of a new function, lock_raw_ref(), which is analogous to
read_raw_ref(), but acquires a lock on the reference before reading it.
This change still has two problems:
* There are redundant read_ref_full() reference lookups.
* It is still possible to get incorrect reflogs for symbolic references
if there is a concurrent update by another process, since the old_oid
of a symref is determined before the lock on the pointed-to ref is
held.
Both problems will soon be fixed.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
WIP
When renaming refs, don't dereference either the origin or the destination
before renaming.
The origin does not need to be dereferenced because it is presently
forbidden to rename symbolic refs.
Not dereferencing the destination fixes a bug where renaming on top of
a broken symref would use the pointed-to ref name for the moved
reflog.
Add a test for the reflog bug.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
The refs infrastructure learns about log-only ref updates, which only
update the reflog. Later, we will use this to separate symbolic
reference resolution from ref updating.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
It is nonsensical (and a little bit dangerous) to use REF_ISPRUNING
without REF_NODEREF. Forbid it explicitly. Change the one REF_ISPRUNING
caller to pass REF_NODEREF too.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
* Always start error messages with a lower-case letter.
* Always enclose reference names in single quotes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
resolve_ref_unsafe() can cope with being called with NULL passed to its
flags argument. So lock_ref_sha1_basic() can just hand its own type
parameter through.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Among other things, document the (important!) requirement that input
refname be checked for safety before calling this function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
This will hopefully reduce confusion with the "flags" arguments that are
used in many functions in this module as an input parameter to choose
how the function should operate.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
These microoptimizations don't make a significant difference in speed.
And they cause problems if somebody ever wants to modify the function to
add updates to a transaction as part of processing it, as will happen
shortly.
Make the same changes in initial_ref_transaction_commit().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Even if there is an empty directory where we look for the loose version
of a reference, check for a packed reference before giving up. This
fixes the failing test that was introduced two commits ago.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Part of the bug revealed in the last commit is that resolve_ref_unsafe()
incorrectly returns EISDIR if it finds a directory in the place where it
is looking for a loose reference, even if the corresponding packed
reference exists. lock_ref_sha1_basic() notices the bogus EISDIR, and
use it as an indication that it should call remove_empty_directories()
and call resolve_ref_unsafe() again.
But resolve_ref_unsafe() shouldn't report EISDIR in this case. If we
would simply make that change, then remove_empty_directories() wouldn't
get called anymore, and the empty directory would get in the way when
commit_ref() calls commit_lock_file() to rename the lockfile into place.
So instead of relying on lock_ref_sha1_basic() to delete empty
directories, teach commit_ref(), just before calling commit_lock_file(),
to check whether a directory is in the way, and if so, try to delete it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Code restructuring around the "refs" area to prepare for pluggable
refs backends.
* dt/pre-refs-backend: (24 commits)
refs: on symref reflog expire, lock symref not referrent
refs: move resolve_ref_unsafe into common code
show_head_ref(): check the result of resolve_ref_namespace()
check_aliased_update(): check that dst_name is non-NULL
checkout_paths(): remove unneeded flag variable
cmd_merge(): remove unneeded flag variable
fsck_head_link(): remove unneeded flag variable
read_raw_ref(): change flags parameter to unsigned int
files-backend: inline resolve_ref_1() into resolve_ref_unsafe()
read_raw_ref(): manage own scratch space
files-backend: break out ref reading
resolve_ref_1(): eliminate local variable "bad_name"
resolve_ref_1(): reorder code
resolve_ref_1(): eliminate local variable
resolve_ref_unsafe(): ensure flags is always set
resolve_ref_unsafe(): use for loop to count up to MAXDEPTH
resolve_missing_loose_ref(): simplify semantics
t1430: improve test coverage of deletion of badly-named refs
t1430: test for-each-ref in the presence of badly-named refs
t1430: don't rely on symbolic-ref for creating broken symrefs
...
When locking a symbolic ref to expire a reflog, lock the symbolic
ref (using REF_NODEREF) instead of its referent.
Add a test for this.
Signed-off-by: David Turner <dturner@twopensource.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that resolve_ref_unsafe's only interaction with the backend is
through read_raw_ref, we can move it into the common code. Later,
we'll replace read_raw_ref with a backend function.
Signed-off-by: David Turner <dturner@twopensource.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_raw_ref() is going to be part of the vtable for reference backends,
so clean up its interface to use "unsigned int flags" rather than "int
flags". Its caller still uses signed int for its flags arguments. But
changing that would touch a lot of code, so leave it for now.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_unsafe() wasn't doing anything useful anymore.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of creating scratch space in resolve_ref_unsafe() and passing
it down through resolve_ref_1 to read_raw_ref(), teach read_raw_ref()
to manage its own scratch space. This reduces coupling across the
functions at the cost of some extra allocations.
Also, when read_raw_ref() is implemented for different reference
backends, the other implementations might have different scratch
space requirements.
Note that we now preserve errno across the calls to strbuf_release(),
which calls free() and can thus theoretically overwrite errno.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor resolve_ref_1 in terms of a new function read_raw_ref, which
is responsible for reading ref data from the ref storage.
Later, we will make read_raw_ref a pluggable backend function, and make
resolve_ref_unsafe common.
Signed-off-by: David Turner <dturner@twopensource.com>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We can use (*flags & REF_BAD_NAME) for that purpose.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no need to adjust *flags if we're just about to fail.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In place of `buf`, use `refname`, which is anyway a better description
of what is being pointed at.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the caller passes flags==NULL, then set it to point at a local
scratch variable. This removes the need for a lot of "if (flags)"
guards in resolve_ref_1() and resolve_missing_loose_ref().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop's there anyway; we might as well use it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make resolve_missing_loose_ref() only responsible for looking up a
packed reference, without worrying about whether we want to read or
write the reference and without setting errno on failure. Move the other
logic to the caller.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make do_for_each_ref take a submodule as an argument instead of a
ref_cache. Since all for_each_*ref* functions are defined in terms of
do_for_each_ref, we can then move them into the common code.
Later, we can simply make do_for_each_ref into a backend function.
Signed-off-by: David Turner <dturner@twopensource.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These don't use any backend-specific functions. These were previously
defined in terms of the do_head_ref helper function, but since they
are otherwise identical, we don't need that function.
Signed-off-by: David Turner <dturner@twopensource.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Emit an informative error when failed to hold lock of HEAD.
2233066e (refs: add a new function set_worktree_head_symref,
2016-03-27) added set_worktree_head_symref(), but this is missing a
call to unable_to_lock_message() after hold_lock_file_for_update()
fails, so it emits an empty error message:
% git branch -m oldname newname
error:
error: HEAD of working tree /path/to/wt is not updated
fatal: Branch renamed to newname, but HEAD is not updated!
Thanks to Eric Sunshine for pointing this out.
Signed-off-by: Kazuki Yamaguchi <k@rhe.jp>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new function set_worktree_head_symref, to update HEAD symref for
the specified worktree.
To update HEAD of a linked working tree,
create_symref("worktrees/$work_tree/HEAD", "refs/heads/$branch", msg)
could be used. However when it comes to updating HEAD of the main
working tree, it is unusable because it uses $GIT_DIR for
worktree-specific symrefs (HEAD).
The new function takes git_dir (real directory) as an argument, and
updates HEAD of the working tree. This function will be used when
renaming a branch.
Signed-off-by: Kazuki Yamaguchi <k@rhe.jp>
Acked-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update various codepaths to avoid manually-counted malloc().
* jk/tighten-alloc: (22 commits)
ewah: convert to REALLOC_ARRAY, etc
convert ewah/bitmap code to use xmalloc
diff_populate_gitlink: use a strbuf
transport_anonymize_url: use xstrfmt
git-compat-util: drop mempcpy compat code
sequencer: simplify memory allocation of get_message
test-path-utils: fix normalize_path_copy output buffer size
fetch-pack: simplify add_sought_entry
fast-import: simplify allocation in start_packfile
write_untracked_extension: use FLEX_ALLOC helper
prepare_{git,shell}_cmd: use argv_array
use st_add and st_mult for allocation size computation
convert trivial cases to FLEX_ARRAY macros
use xmallocz to avoid size arithmetic
convert trivial cases to ALLOC_ARRAY
convert manual allocations to argv_array
argv-array: add detach function
add helpers for allocating flex-array structs
harden REALLOC_ARRAY and xcalloc against size_t overflow
tree-diff: catch integer overflow in combine_diff_path allocation
...
Using FLEX_ARRAY macros reduces the amount of manual
computation size we have to do. It also ensures we don't
overflow size_t, and it makes sure we write the same number
of bytes that we allocated.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The underlying machinery used by "ls-files -o" and other commands
have been taught not to create empty submodule ref cache for a
directory that is not a submodule. This removes a ton of wasted
CPU cycles.
* jk/ref-cache-non-repository-optim:
resolve_gitlink_ref: ignore non-repository paths
clean: make is_git_repository a public function
When we want to look up a submodule ref, we use
get_ref_cache(path) to find or auto-create its ref cache.
But if we feed a path that isn't actually a git repository,
we blindly create the ref cache, and then may die deeper in
the code when we try to access it. This is a problem because
many callers speculatively feed us a path that looks vaguely
like a repository, and expect us to tell them when it is
not.
This patch teaches resolve_gitlink_ref to reject
non-repository paths without creating a ref_cache. This
avoids the die(), and also performs better if you have a
large number of these faux-submodule directories (because
the ref_cache lookup is linear, under the assumption that
there won't be a large number of submodules).
To accomplish this, we also break get_ref_cache into two
pieces: the lookup and auto-creation (the latter is lumped
into create_ref_cache). This lets us first cheaply ask our
cache "is it a submodule we know about?" If so, we can avoid
repeating our filesystem lookup. So lookups of real
submodules are not penalized; they examine the submodule's
.git directory only once.
The test in t3000 demonstrates a case where this improves
correctness (we used to just die). The new perf case in
p7300 shows off the speed improvement in an admittedly
pathological repository:
Test HEAD^ HEAD
----------------------------------------------------------------
7300.4: ls-files -o 66.97(66.15+0.87) 0.33(0.08+0.24) -99.5%
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes call lock_ref_sha1_basic with REF_NODEREF
to operate directly on a symbolic ref. This is used, for
example, to move to a detached HEAD, or when updating
the contents of HEAD via checkout or symbolic-ref.
However, the first step of the function is to resolve the
refname to get the "old" sha1, and we do so without telling
resolve_ref_unsafe() that we are only interested in the
symref. As a result, we may detect a problem there not with
the symref itself, but with something it points to.
The real-world example I found (and what is used in the test
suite) is a HEAD pointing to a ref that cannot exist,
because it would cause a directory/file conflict with other
existing refs. This situation is somewhat broken, of
course, as trying to _commit_ on that HEAD would fail. But
it's not explicitly forbidden, and we should be able to move
away from it. However, neither "git checkout" nor "git
symbolic-ref" can do so. We try to take the lock on HEAD,
which is pointing to a non-existent ref. We bail from
resolve_ref_unsafe() with errno set to EISDIR, and the lock
code thinks we are attempting to create a d/f conflict.
Of course we're not. The problem is that the lock code has
no idea what level we were at when we got EISDIR, so trying
to diagnose or remove empty directories for HEAD is not
useful.
To make things even more complicated, we only get EISDIR in
the loose-ref case. If the refs are packed, the resolution
may "succeed", giving us the pointed-to ref in "refname",
but a null oid. Later, we say "ah, the null oid means we are
creating; let's make sure there is room for it", but
mistakenly check against the _resolved_ refname, not the
original.
We can fix this by making two tweaks:
1. Call resolve_ref_unsafe() with RESOLVE_REF_NO_RECURSE
when REF_NODEREF is set. This means any errors
we get will be from the orig_refname, and we can act
accordingly.
We already do this in the REF_DELETING case, but we
should do it for update, too.
2. If we do get a "refname" return from
resolve_ref_unsafe(), even with RESOLVE_REF_NO_RECURSE
it may be the name of the ref pointed-to by a symref.
We already normalize this back to orig_refname before
taking the lockfile, but we need to do so before the
null_oid check.
While we're rearranging the REF_NODEREF handling, we can
also bump the initialization of lflags to the top of the
function, where we are setting up other flags. This saves us
from having yet another conditional block on REF_NODEREF
just to set it later.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our basic strategy for taking a ref lock is:
1. Create $ref.lock to take the lock
2. Read the ref again while holding the lock (during which
time we know that nobody else can be updating it).
3. Compare the value we read to the expected "old_sha1"
The value we read in step (2) is returned to the caller via
the lock->old_oid field, who may use it for other purposes
(such as writing a reflog).
If we have no "old_sha1" (i.e., we are unconditionally
taking the lock), then we obviously must omit step 3. But we
_also_ omit step 2. This seems like a nice optimization, but
it means that the caller sees only whatever was left in
lock->old_oid from previous calls to resolve_ref_unsafe(),
which happened outside of the lock.
We can demonstrate this race pretty easily. Imagine you have
three commits, $one, $two, and $three. One script just flips
between $one and $two, without providing an old-sha1:
while true; do
git update-ref -m one refs/heads/foo $one
git update-ref -m two refs/heads/foo $two
done
Meanwhile, another script tries to set the value to $three,
also not using an old-sha1:
while true; do
git update-ref -m three refs/heads/foo $three
done
If these run simultaneously, we'll see a lot of lock
contention, but each of the writes will succeed some of the
time. The reflog may record movements between any of the
three refs, but we would expect it to provide a consistent
log: the "from" field of each log entry should be the same
as the "to" field of the previous one.
But if we check this:
perl -alne '
print "mismatch on line $."
if defined $last && $F[0] ne $last;
$last = $F[1];
' .git/logs/refs/heads/foo
we'll see many mismatches. Why?
Because sometimes, in the time between lock_ref_sha1_basic
filling lock->old_oid via resolve_ref_unsafe() and it taking
the lock, there may be a complete write by another process.
And the "from" field in our reflog entry will be wrong, and
will refer to an older value.
This is probably quite rare in practice. It requires writers
which do not provide an old-sha1 value, and it is a very
quick race. However, it is easy to fix: we simply perform
step (2), the read-under-lock, whether we have an old-sha1
or not. Then the value we hand back to the caller is always
atomic.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally hold a lock on the matching ref while writing
to its reflog; this prevents two simultaneous writers from
clobbering each other's reflog lines (it does not even have
to be two symref updates; because we don't hold the lock, we
could race with somebody writing to the pointed-to ref via
HEAD, for example).
We can fix this by writing the reflog before we commit the
lockfile. This runs the risk of writing the reflog but
failing the final rename(), but at least we now err on the
same side as the rest of the ref code.
Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The create_symref() function predates the existence of
"struct lock_file", let alone the more recent "struct
ref_lock". Instead, it just does its own manual dot-locking.
Besides being more code, this has a few downsides:
- if git is interrupted while holding the lock, we don't
clean up the lockfile
- we don't do the usual directory/filename conflict check.
So you can sometimes create a symref "refs/heads/foo/bar",
even if "refs/heads/foo" exists (namely, if the refs are
packed and we do not hit the d/f conflict in the
filesystem).
This patch refactors create_symref() to use the "struct
ref_lock" interface, which handles both of these things.
There are a few bonus cleanups that come along with it:
- we leaked ref_path in some error cases
- the symref contents were stored in a fixed-size buffer,
putting an artificial (albeit large) limitation on the
length of the refname. We now write through fprintf, and
handle refnames of any size.
- we called adjust_shared_perm only after the file was
renamed into place, creating a potential race with
readers in a shared repository. The lockfile code now
handles this when creating the lockfile, making it
atomic.
- the legacy prefer_symlink_refs path did not do any
locking at all. Admittedly, it is not atomic from a
reader's perspective (as it unlinks and re-creates the
symlink to overwrite), but at least it cannot conflict
with other writers now.
- the result of this patch is hopefully more readable. It
eliminates three goto labels. Two were for error checking
that is now simplified, and the third was to reach shared
code that has been pulled into its own function.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time, create_symref() was used only to point
HEAD at a branch name, and the variable names reflect that
(e.g., calling the path git_HEAD). However, it is much more
generic these days (and has been for some time). Let's
update the variable names to make it easier to follow:
- `ref_target` is now just `refname`. This is closer to
the `ref` that is already in `cache.h`, but with the
extra twist that "name" makes it clear this is the name
and not a ref struct. Dropping "target" hopefully makes
it clear that we are talking about the symref itself,
not what it points to.
- `git_HEAD` is now `ref_path`; the on-disk path
corresponding to `ref`.
- `refs_heads_master` is now just `target`; i.e., what the
symref points at. This term also matches what is in
the symlink(2) manpage (at least on Linux).
- the buffer to hold the symref file's contents was simply
called `ref`. It's now `buf` (admittedly also generic,
but at least not actively introducing confusion with the
other variable holding the refname).
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create new function find_descendant_ref, to hold one of the ref
conflict checks used in verify_refname_available. Multiple backends
will need this function, so move it to the common code.
Also move rename_ref_available to the common code, because alternate
backends might need it and it has no files-backend-specific code.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Because HEAD and stash are per-worktree, every refs backend needs to
go through the files backend to write these refs.
So create a new function, files_log_ref_write, and add it to
refs/refs-internal.h. Later, we will use this to handle reflog updates
for per-worktree symbolic refs (HEAD).
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
As another step in the move to pluggable reference backends, move the
code that is specific to the filesystem-based reference backend (i.e.,
the current system of storing references as loose and packed files) into
a separate file, refs/files-backend.c.
Aside from a tiny bit of file header boilerplate, this commit only moves
a subset of the code verbatim from refs.c to the new file, as can easily
be verified using patience diff:
git diff --patience $commit^:refs.c $commit:refs.c
git diff --patience $commit^:refs.c $commit:refs/files-backend.c
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>