Commit Graph

109 Commits

Author SHA1 Message Date
Jeff King
6294dcb49f lock_ref_sha1_basic: always fill old_oid while holding lock
Our basic strategy for taking a ref lock is:

  1. Create $ref.lock to take the lock

  2. Read the ref again while holding the lock (during which
     time we know that nobody else can be updating it).

  3. Compare the value we read to the expected "old_sha1"

The value we read in step (2) is returned to the caller via
the lock->old_oid field, who may use it for other purposes
(such as writing a reflog).

If we have no "old_sha1" (i.e., we are unconditionally
taking the lock), then we obviously must omit step 3. But we
_also_ omit step 2. This seems like a nice optimization, but
it means that the caller sees only whatever was left in
lock->old_oid from previous calls to resolve_ref_unsafe(),
which happened outside of the lock.

We can demonstrate this race pretty easily. Imagine you have
three commits, $one, $two, and $three. One script just flips
between $one and $two, without providing an old-sha1:

  while true; do
    git update-ref -m one refs/heads/foo $one
    git update-ref -m two refs/heads/foo $two
  done

Meanwhile, another script tries to set the value to $three,
also not using an old-sha1:

  while true; do
    git update-ref -m three refs/heads/foo $three
  done

If these run simultaneously, we'll see a lot of lock
contention, but each of the writes will succeed some of the
time. The reflog may record movements between any of the
three refs, but we would expect it to provide a consistent
log: the "from" field of each log entry should be the same
as the "to" field of the previous one.

But if we check this:

  perl -alne '
    print "mismatch on line $."
            if defined $last && $F[0] ne $last;
    $last = $F[1];
  ' .git/logs/refs/heads/foo

we'll see many mismatches. Why?

Because sometimes, in the time between lock_ref_sha1_basic
filling lock->old_oid via resolve_ref_unsafe() and it taking
the lock, there may be a complete write by another process.
And the "from" field in our reflog entry will be wrong, and
will refer to an older value.

This is probably quite rare in practice. It requires writers
which do not provide an old-sha1 value, and it is a very
quick race. However, it is easy to fix: we simply perform
step (2), the read-under-lock, whether we have an old-sha1
or not. Then the value we hand back to the caller is always
atomic.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-13 09:05:30 -08:00
Jeff King
396da8f7a0 create_symref: write reflog while holding lock
We generally hold a lock on the matching ref while writing
to its reflog; this prevents two simultaneous writers from
clobbering each other's reflog lines (it does not even have
to be two symref updates; because we don't hold the lock, we
could race with somebody writing to the pointed-to ref via
HEAD, for example).

We can fix this by writing the reflog before we commit the
lockfile. This runs the risk of writing the reflog but
failing the final rename(), but at least we now err on the
same side as the rest of the ref code.

Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-29 10:34:25 -08:00
Jeff King
370e5ad65e create_symref: use existing ref-lock code
The create_symref() function predates the existence of
"struct lock_file", let alone the more recent "struct
ref_lock". Instead, it just does its own manual dot-locking.
Besides being more code, this has a few downsides:

 - if git is interrupted while holding the lock, we don't
   clean up the lockfile

 - we don't do the usual directory/filename conflict check.
   So you can sometimes create a symref "refs/heads/foo/bar",
   even if "refs/heads/foo" exists (namely, if the refs are
   packed and we do not hit the d/f conflict in the
   filesystem).

This patch refactors create_symref() to use the "struct
ref_lock" interface, which handles both of these things.
There are a few bonus cleanups that come along with it:

 - we leaked ref_path in some error cases

 - the symref contents were stored in a fixed-size buffer,
   putting an artificial (albeit large) limitation on the
   length of the refname. We now write through fprintf, and
   handle refnames of any size.

 - we called adjust_shared_perm only after the file was
   renamed into place, creating a potential race with
   readers in a shared repository. The lockfile code now
   handles this when creating the lockfile, making it
   atomic.

 - the legacy prefer_symlink_refs path did not do any
   locking at all. Admittedly, it is not atomic from a
   reader's perspective (as it unlinks and re-creates the
   symlink to overwrite), but at least it cannot conflict
   with other writers now.

 - the result of this patch is hopefully more readable. It
   eliminates three goto labels. Two were for error checking
   that is now simplified, and the third was to reach shared
   code that has been pulled into its own function.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-29 10:33:31 -08:00
Jeff King
b9badadd06 create_symref: modernize variable names
Once upon a time, create_symref() was used only to point
HEAD at a branch name, and the variable names reflect that
(e.g., calling the path git_HEAD). However, it is much more
generic these days (and has been for some time). Let's
update the variable names to make it easier to follow:

  - `ref_target` is now just `refname`. This is closer to
    the `ref` that is already in `cache.h`, but with the
    extra twist that "name" makes it clear this is the name
    and not a ref struct. Dropping "target" hopefully makes
    it clear that we are talking about the symref itself,
    not what it points to.

  - `git_HEAD` is now `ref_path`; the on-disk path
    corresponding to `ref`.

  - `refs_heads_master` is now just `target`; i.e., what the
    symref points at. This term also matches what is in
    the symlink(2) manpage (at least on Linux).

  - the buffer to hold the symref file's contents was simply
    called `ref`. It's now `buf` (admittedly also generic,
    but at least not actively introducing confusion with the
    other variable holding the refname).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-29 10:33:09 -08:00
Junio C Hamano
e0048d3e0d Merge branch 'sg/lock-file-commit-error'
Cosmetic improvement to lock-file error messages.

* sg/lock-file-commit-error:
  Make error message after failing commit_lock_file() less confusing
2015-12-11 10:40:55 -08:00
David Turner
0845122c39 refs: break out ref conflict checks
Create new function find_descendant_ref, to hold one of the ref
conflict checks used in verify_refname_available. Multiple backends
will need this function, so move it to the common code.

Also move rename_ref_available to the common code, because alternate
backends might need it and it has no files-backend-specific code.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 04:52:01 -05:00
David Turner
5f3c3a4e6f files_log_ref_write: new function
Because HEAD and stash are per-worktree, every refs backend needs to
go through the files backend to write these refs.

So create a new function, files_log_ref_write, and add it to
refs/refs-internal.h. Later, we will use this to handle reflog updates
for per-worktree symbolic refs (HEAD).

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 04:52:01 -05:00
Michael Haggerty
7bd9bcf372 refs: split filesystem-based refs code into a new file
As another step in the move to pluggable reference backends, move the
code that is specific to the filesystem-based reference backend (i.e.,
the current system of storing references as loose and packed files) into
a separate file, refs/files-backend.c.

Aside from a tiny bit of file header boilerplate, this commit only moves
a subset of the code verbatim from refs.c to the new file, as can easily
be verified using patience diff:

    git diff --patience $commit^:refs.c $commit:refs.c
    git diff --patience $commit^:refs.c $commit:refs/files-backend.c

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 04:52:01 -05:00
Michael Haggerty
4cb77009e1 refs/refs-internal.h: new header file
There are a number of constants, structs, and static functions defined
in refs.c and treated as private to the references module. But we want
to support multiple reference backends within the reference module,
and those backends will need access to some heretofore private
declarations.

We don't want those declarations to be visible to non-refs code, so we
don't want to move them to refs.h. Instead, add a new header file,
refs/refs-internal.h, that is intended to be included only from within
the refs module. Make some functions non-static and move some
declarations (and their corresponding docstrings) from refs.c to this
file.

In a moment we will add more content to the "refs" subdirectory.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 04:52:01 -05:00