Instead, work directly with update->flags. This has the advantage that
the REF_DELETING bit, set in the first loop, can be read in the second
loop instead of having to be recomputed. Plus, it was potentially
confusing having both update->flags and flags, which sometimes had
different values.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That was the last caller, so delete function write_ref_sha1().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of what it does is unneeded from these call sites.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the first step towards separating the checking and writing of
the new reference value to committing the change.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier rewrite to use strbuf_getwholeline() instead of fgets(3)
to read packed-refs file revealed that the former is unacceptably
inefficient.
* jk/reading-packed-refs:
t1430: add another refs-escape test
read_packed_refs: avoid double-checking sane refs
strbuf_getwholeline: use getdelim if it is available
strbuf_getwholeline: avoid calling strbuf_grow
strbuf_addch: avoid calling strbuf_grow
config: use getc_unlocked when reading from file
strbuf_getwholeline: use getc_unlocked
git-compat-util: add fallbacks for unlocked stdio
strbuf_getwholeline: use getc macro
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.
* nd/multiple-work-trees: (41 commits)
prune --worktrees: fix expire vs worktree existence condition
t1501: fix test with split index
t2026: fix broken &&-chain
t2026 needs procondition SANITY
git-checkout.txt: a note about multiple checkout support for submodules
checkout: add --ignore-other-wortrees
checkout: pass whole struct to parse_branchname_arg instead of individual flags
git-common-dir: make "modules/" per-working-directory directory
checkout: do not fail if target is an empty directory
t2025: add a test to make sure grafts is working from a linked checkout
checkout: don't require a work tree when checking out into a new one
git_path(): keep "info/sparse-checkout" per work-tree
count-objects: report unused files in $GIT_DIR/worktrees/...
gc: support prune --worktrees
gc: factor out gc.pruneexpire parsing code
gc: style change -- no SP before closing parenthesis
checkout: clean up half-prepared directories in --to mode
checkout: reject if the branch is already checked out elsewhere
prune: strategies for linked checkouts
checkout: support checking out into a new working directory
...
Now that lock_ref_sha1_basic() gives us back its error messages via a
strbuf, incorporate its error message into our error message rather
than emitting two separate error messages.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
While we are in the area, let's remove a superfluous definite article
from the error message that is emitted when the reference cannot be
locked. This improves how it reads and makes it a bit shorter.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Now that lock_ref_sha1_basic() gives us back its error messages via a
strbuf, incorporate its error message into our error message rather
than emitting one error messages to stderr immediately and returning a
second to our caller.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Now that lock_ref_sha1_basic() gives us back its error messages via a
strbuf, incorporate its error message into our error message rather
than emitting two separate error messages.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
If there is a failure to lock a reference that is likely caused by a
D/F conflict (e.g., trying to lock "refs/foo/bar" when reference
"refs/foo" already exists), invoke verify_refname_available() to try
to generate a more helpful error message.
That function might not detect an error. For example, some
non-reference file might be blocking the deletion of an
otherwise-empty directory tree, or there might be a race with another
process that just deleted the offending reference. In such cases,
generate the strerror-based error message like before.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
For now, change the callers to spew the error to stderr like before.
But soon we will change them to incorporate the reason for the failure
into their own error messages.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
It shouldn't be spewing errors directly to stderr.
For now, change its callers to spew the errors to stderr.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Rename is_refname_available() to verify_refname_available() and change
its return value from 1 for success to 0 for success, to be consistent
with our error-handling convention. In a moment it will also get a
"struct strbuf *err" parameter.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
If two references that D/F conflict (e.g., "refs/foo" and
"refs/foo/bar") are created in a single transaction, the old code
discovered the problem only after the "commit" phase of
ref_transaction_commit() had already begun. This could leave some
references updated and others not, which violates the promise of
atomicity.
Instead, check for such conflicts during the "locking" phase:
* Teach is_refname_available() to take an "extras" parameter that can
contain extra reference names with which the specified refname must
not conflict.
* Change lock_ref_sha1_basic() to take an "extras" parameter, which it
passes through to is_refname_available().
* Change ref_transaction_commit() to pass "affected_refnames" to
lock_ref_sha1_basic() as its "extras" argument.
This change fixes a test case in t1404.
This code is a bit stricter than it needs to be. We could conceivably
allow reference "refs/foo/bar" to be created in the same transaction
as "refs/foo" is deleted (or vice versa). But that would be
complicated to implement, because it is not possible to lock
"refs/foo/bar" while "refs/foo" exists as a loose reference, but on
the other hand we don't want to delete some references before adding
others (because that could leave a gap during which required objects
are unreachable). There is also a complication that reflog files'
paths can conflict.
Any less-strict implementation would probably require tricks like the
packing of all references before the start of the real transaction, or
the use of temporary intermediate reference names.
So for now let's accept too-strict checks. Some reference update
transactions will be rejected unnecessarily, but they will be rejected
in their entirety rather than leaving the repository in an
intermediate state, as would happen now.
Please note that there is still one kind of D/F conflict that is *not*
handled correctly. If two processes are running at the same time, and
one tries to create "refs/foo" at the same time that the other tries
to create "refs/foo/bar", then they can race with each other. Both
processes can obtain their respective locks ("refs/foo.lock" and
"refs/foo/bar.lock"), proceed to the "commit" phase of
ref_transaction_commit(), and then the slower process will discover
that it cannot rename its lockfile into place (after possibly having
committed changes to other references). There appears to be no way to
fix this race without changing the locking policy, which in turn would
require a change to *all* Git clients.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Detect duplicates by storing the reference names in a string_list and
sorting that, instead of sorting the ref_updates directly.
* In a moment the string_list will be used for another purpose, too.
* This removes the need for the custom comparison function
ref_update_compare().
* This means that we can carry out the updates in the order that the
user specified them instead of reordering them. This might be handy
someday if, we want to permit multiple updates to a single reference
as long as they are compatible with each other.
Note: we can't use string_list_remove_duplicates() to check for
duplicates, because we need to know the name of the reference that
appeared multiple times, to be used in the error message.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
In the first loop (over prefixes of refname), use dirname to keep
track of the current prefix. This is not an improvement in itself, but
in a moment we will start using dirname for a role where a
NUL-terminated string is needed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Now that we don't need a ref_entry to pass to
report_refname_conflict(), it is sufficient to store the refname of
the conflicting reference.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
It wasn't pulling its weight. And we are about to need code similar to
this where no ref_entry is available and with more diverse error
messages. Rather than try to generalize the function, just inline it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
It wasn't pulling its weight. And in a moment we will need similar
tests that take a refname rather than a ref_entry as parameter, which
would have made entry_matches() even less useful.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
This change wouldn't be worth it by itself, but in a moment we will
use the strbuf for more string juggling.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
The function had a "dir" parameter that was shadowed by a local "dir"
variable within a code block. Use the former in place of the latter.
(This is consistent with "dir"'s use elsewhere in the function.)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Change the comments to a running example of running the function with
refname set to "refs/foo/bar". Add some more explanation of the logic.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
The 'lock_fd' is the same as 'lk->fd'. No need to store it twice so remove
it.
No functional changes intended.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prior to d0f810f (refs.c: allow listing and deleting badly
named refs, 2014-09-03), read_packed_refs would barf on any
malformed refnames by virtue of calling create_ref_entry
with the "check" parameter set to 1. That commit loosened
our reading so that we call check_refname_format ourselves
and just set a REF_BAD_NAME flag.
We then call create_ref_entry with the check parameter set
to 0. That function learned to do an extra safety check even
when the check parameter is 0, so that we don't load any
dangerous refnames (like "../../../etc/passwd"). This is
implemented by calling refname_is_safe() in
create_ref_entry().
However, we can observe that refname_is_safe() can only be
true if check_refname_format() also failed. So in the common
case of a sanely named ref, we perform _both_ checks, even
though we know that the latter will never trigger. This has
a noticeable performance impact when the packed-refs file is
large.
Let's drop the refname_is_safe check from create_ref_entry(),
and make it the responsibility of the caller. Of the three
callers that pass a check parameter of "0", two will have
just called check_refname_format(), and can check the
refname-safety only when it fails. The third case,
pack_if_possible_fn, is copying from an existing ref entry,
which must have previously passed our safety check.
With this patch, running "git rev-parse refs/heads/does-not-exist"
on a repo with a large (1.6GB) packed-refs file went from:
real 0m6.768s
user 0m6.340s
sys 0m0.432s
to:
real 0m5.703s
user 0m5.276s
sys 0m0.432s
for a wall-clock speedup of 15%.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git prune" used to largely ignore broken refs when deciding which
objects are still being used, which could spread an existing small
damage and make it a larger one.
* jk/prune-with-corrupt-refs:
refs.c: drop curate_packed_refs
repack: turn on "ref paranoia" when doing a destructive repack
prune: turn on ref_paranoia flag
refs: introduce a "ref paranoia" flag
t5312: test object deletion code paths in a corrupted repository
When we delete a ref, we have to rewrite the entire
packed-refs file. We take this opportunity to "curate" the
packed-refs file and drop any entries that are crufty or
broken.
Dropping broken entries (e.g., with bogus names, or ones
that point to missing objects) is actively a bad idea, as it
means that we lose any notion that the data was there in the
first place. Aside from the general hackiness that we might
lose any information about ref "foo" while deleting an
unrelated ref "bar", this may seriously hamper any attempts
by the user at recovering from the corruption in "foo".
They will lose the sha1 and name of "foo"; the exact pointer
may still be useful even if they recover missing objects
from a different copy of the repository. But worse, once the
ref is gone, there is no trace of the corruption. A
follow-up "git prune" may delete objects, even though it
would otherwise bail when seeing corruption.
We could just drop the "broken" bits from
curate_packed_refs, and continue to drop the "crufty" bits:
refs whose loose counterpart exists in the filesystem. This
is not wrong to do, and it does have the advantage that we
may write out a slightly smaller packed-refs file. But it
has two disadvantages:
1. It is a potential source of races or mistakes with
respect to these refs that are otherwise unrelated to
the operation. To my knowledge, there aren't any active
problems in this area, but it seems like an unnecessary
risk.
2. We have to spend time looking up the matching loose
refs for every item in the packed-refs file. If you
have a large number of packed refs that do not change,
that outweighs the benefit from writing out a smaller
packed-refs file (it doesn't get smaller, and you do a
bunch of directory traversal to find that out).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most operations that iterate over refs are happy to ignore
broken cruft. However, some operations should be performed
with knowledge of these broken refs, because it is better
for the operation to choke on a missing object than it is to
silently pretend that the ref did not exist (e.g., if we are
computing the set of reachable tips in order to prune
objects).
These processes could just call for_each_rawref, except that
ref iteration is often hidden behind other interfaces. For
instance, for a destructive "repack -ad", we would have to
inform "pack-objects" that we are destructive, and then it
would in turn have to tell the revision code that our
"--all" should include broken refs.
It's much simpler to just set a global for "dangerous"
operations that includes broken refs in all iterations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Various issues around "reflog expire", e.g. using --updateref when
expiring a reflog for a symbolic reference, have been corrected
and/or made saner.
* mh/expire-updateref-fixes:
reflog_expire(): never update a reference to null_sha1
reflog_expire(): ignore --updateref for symbolic references
reflog: improve and update documentation
struct ref_lock: delete the force_write member
lock_ref_sha1_basic(): do not set force_write for missing references
write_ref_sha1(): move write elision test to callers
write_ref_sha1(): remove check for lock == NULL
Currently, if --updateref is specified and the very last reflog entry
is expired or deleted, the reference's value is set to 0{40}. This is
an invalid state of the repository, and breaks, for example, "git
fsck" and "git for-each-ref".
The only place we use --updateref in our own code is when dropping
stash entries. In that code, the very next step is to check if the
reflog has been made empty, and if so, delete the "refs/stash"
reference entirely. Thus that code path ultimately leaves the
repository in a valid state.
But we don't want to the repository in an invalid state even
temporarily, and we don't want to leave an invalid state if other
callers of "git reflog expire|delete --updateref" don't think to do
the extra cleanup step.
So, if "git reflog expire|delete" leaves no more entries in the
reflog, just leave the reference unchanged.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are expiring reflog entries for a symbolic reference, then how
should --updateref be handled if the newest reflog entry is expired?
Option 1: Update the referred-to reference. (This is what the current
code does.) This doesn't make sense, because the referred-to reference
has its own reflog, which hasn't been rewritten.
Option 2: Update the symbolic reference itself (as in, REF_NODEREF).
This would convert the symbolic reference into a non-symbolic
reference (e.g., detaching HEAD), which is surely not what a user
would expect.
Option 3: Error out. This is plausible, but it would make the
following usage impossible:
git reflog expire ... --updateref --all
Option 4: Ignore --updateref for symbolic references.
We choose to implement option 4.
Note: another problem in this code will be fixed in a moment.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead, compute the value when it is needed.
Signed-off-by: Stefan Beller <sbeller@google.com>
Edited-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a reference is missing, its SHA-1 will be null_sha1, which can't
possibly match a new value that ref_transaction_commit() is trying to
update it to. So there is no need to set force_write in this scenario.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_ref_sha1() previously skipped the write if the reference already
had the desired value, unless lock->force_write was set. Instead,
perform that test at the callers.
Two of the callers (in rename_ref()) unconditionally set force_write
just before calling write_ref_sha1(), so they don't need the extra
check at all. Nor do they need to set force_write anymore.
The last caller, in ref_transaction_commit(), still needs the test.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
None of the callers pass NULL to this function, and there doesn't seem
to be any usefulness to allowing them to do so.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".
* jk/blame-commit-label:
blame.c: fix garbled error message
use xstrdup_or_null to replace ternary conditionals
builtin/commit.c: use xstrdup_or_null instead of envdup
builtin/apply.c: use xstrdup_or_null instead of null_strdup
git-compat-util: add xstrdup_or_null helper
Add a docstring for update_ref(), emphasizing its similarity to
ref_transaction_update(). Rename its parameters to match those of
ref_transaction_update().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If NULL is passed to ref_transaction_update()'s new_sha1 parameter,
then just verify old_sha1 (under lock) without trying to change the
new value of the reference.
Use this functionality to add a new function ref_transaction_verify(),
which checks the current value of the reference under lock but doesn't
change it.
Use ref_transaction_verify() in the implementation of "git update-ref
--stdin"'s "verify" command to avoid the awkward need to "update" the
reference to its existing value.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It makes no sense to delete a reference that is already known not to
exist.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Creating a reference requires a new_sha1 that is not NULL and not
null_sha1.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead, verify the reference's old value if and only if old_sha1 is
non-NULL.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead, verify the reference's old value if and only if old_sha1 is
non-NULL.
ref_transaction_delete() will get the same treatment in a moment.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of having a separate have_old field, record this boolean value
as a bit in the "flags" field.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the following functions' "flags" arguments from "int" to
"unsigned int":
* ref_transaction_update()
* ref_transaction_create()
* ref_transaction_delete()
* update_ref()
* delete_ref()
* lock_ref_sha1_basic()
Also change the "flags" member in "struct ref_update" to unsigned.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason to "reserve" a gap between the public and private
flags values.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is only used internally now. Document it a little bit better, too.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Restructure "reflog expire" to fit the reflogs better with the
recently updated ref API.
Looked reasonable (except that some shortlog entries stood out like
a sore thumb).
* mh/reflog-expire: (24 commits)
refs.c: let fprintf handle the formatting
refs.c: don't expose the internal struct ref_lock in the header file
lock_any_ref_for_update(): inline function
refs.c: remove unlock_ref/close_ref/commit_ref from the refs api
reflog_expire(): new function in the reference API
expire_reflog(): treat the policy callback data as opaque
Move newlog and last_kept_sha1 to "struct expire_reflog_cb"
expire_reflog(): move rewrite to flags argument
expire_reflog(): move verbose to flags argument
expire_reflog(): pass flags through to expire_reflog_ent()
struct expire_reflog_cb: a new callback data type
Rename expire_reflog_cb to expire_reflog_policy_cb
expire_reflog(): move updateref to flags argument
expire_reflog(): move dry_run to flags argument
expire_reflog(): add a "flags" argument
expire_reflog(): extract two policy-related functions
Extract function should_expire_reflog_ent()
expire_reflog(): use a lock_file for rewriting the reflog file
expire_reflog(): return early if the reference has no reflog
expire_reflog(): rename "ref" parameter to "refname"
...
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".
* jk/blame-commit-label:
blame.c: fix garbled error message
use xstrdup_or_null to replace ternary conditionals
builtin/commit.c: use xstrdup_or_null instead of envdup
builtin/apply.c: use xstrdup_or_null instead of null_strdup
git-compat-util: add xstrdup_or_null helper
* mh/reflog-expire: (24 commits)
refs.c: let fprintf handle the formatting
refs.c: don't expose the internal struct ref_lock in the header file
lock_any_ref_for_update(): inline function
refs.c: remove unlock_ref/close_ref/commit_ref from the refs api
reflog_expire(): new function in the reference API
expire_reflog(): treat the policy callback data as opaque
Move newlog and last_kept_sha1 to "struct expire_reflog_cb"
expire_reflog(): move rewrite to flags argument
expire_reflog(): move verbose to flags argument
expire_reflog(): pass flags through to expire_reflog_ent()
struct expire_reflog_cb: a new callback data type
Rename expire_reflog_cb to expire_reflog_policy_cb
expire_reflog(): move updateref to flags argument
expire_reflog(): move dry_run to flags argument
expire_reflog(): add a "flags" argument
expire_reflog(): extract two policy-related functions
Extract function should_expire_reflog_ent()
expire_reflog(): use a lock_file for rewriting the reflog file
expire_reflog(): return early if the reference has no reflog
expire_reflog(): rename "ref" parameter to "refname"
...
This replaces "x ? xstrdup(x) : NULL" with xstrdup_or_null(x).
The change is fairly mechanical, with the exception of
resolve_refdup, which can eliminate a temporary variable.
There are still a few hits grepping for "?.*xstrdup", but
these are of slightly different forms and cannot be
converted (e.g., "x ? xstrdup(x->foo) : NULL").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Don't just reset, but release the resource held by the local
variable that is about to go out of scope.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git did not correctly read an overlong refname from a packed refs
file.
* jk/read-packed-refs-without-path-max:
read_packed_refs: use skip_prefix instead of static array
read_packed_refs: pass strbuf to parse_ref_line
read_packed_refs: use a strbuf for reading lines
The code that reads the reflog from the newer to the older entries
did not handle an entry that crosses a boundary of block it uses to
read them correctly.
* jk/for-each-reflog-ent-reverse:
for_each_reflog_ent_reverse: turn leftover check into assertion
for_each_reflog_ent_reverse: fix newlines on block boundaries
"git remote update --prune" to drop many refs has been optimized.
* mh/simplify-repack-without-refs:
sort_string_list(): rename to string_list_sort()
prune_remote(): iterate using for_each_string_list_item()
prune_remote(): rename local variable
repack_without_refs(): make the refnames argument a string_list
prune_remote(): sort delete_refs_list references en masse
prune_remote(): initialize both delete_refs lists in a single loop
prune_remote(): exit early if there are no stale references
Instead of calculating whether to put a plus or minus sign, offload
the responsibilty to the fprintf function.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now the struct ref_lock is used completely internally, so let's
remove it from the header file.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Inline the function at its one remaining caller (which is within
refs.c) and remove it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
unlock|close|commit_ref can be made static since there are no more external
callers.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move expire_reflog() into refs.c and rename it to reflog_expire().
Turn the three policy functions into function pointers that are passed
into reflog_expire(). Add function prototypes and documentation to
refs.h.
[jc: squashed in $gmane/261582, drop "extern" in function definition]
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Stefan Beller <sbeller@google.com>
Tweaked-by: Ramsay Jones
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Break out the code to create the string and writing it to the file
descriptor from log_ref_write and add it into a dedicated function
log_ref_write_fd. It is a nice unit of work.
For now this is only used from log_ref_write, but in the future it
might have other callers.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to recognize the packed-refs header and skip to the
"traits" part of the line. We currently do it by feeding
sizeof() a static const array to strncmp. However, it's a
bit simpler to just skip_prefix, which expresses the
intention more directly, and without remembering to account
for the NUL-terminator in each sizeof() call.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we have a strbuf in read_packed_refs, we can pass
it straight to the line parser, which saves us an extra
strlen.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Current code uses a fixed PATH_MAX-sized buffer for reading
packed-refs lines. This is a reasonable guess, in the sense
that git generally cannot work with refs larger than
PATH_MAX. However, there are a few cases where it is not
great:
1. Some systems may have a low value of PATH_MAX, but can
actually handle larger paths in practice. Fixing this
code path probably isn't enough to make them work
completely with long refs, but it is a step in the
right direction.
2. We use fgets, which will happily give us half a line on
the first read, and then the rest of the line on the
second. This is probably OK in practice, because our
refline parser is careful enough to look for the
trailing newline on the first line. The second line may
look like a peeled line to us, but since "^" is illegal
in refnames, it is not likely to come up.
Still, it does not hurt to be more careful.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our loop should always process all lines, even if we hit the
beginning of the file. We have a conditional after the loop
ends to double-check that there is nothing left and to
process it. But this should never happen, and is a sign of a
logic bug in the loop. Let's turn it into a BUG assertion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we read a reflog file in reverse, we read whole chunks
of BUFSIZ bytes, then loop over the buffer, parsing any
lines we find. We find the beginning of each line by looking
for the newline from the previous line. If we don't find
one, we know that we are either at the beginning of
the file, or that we have to read another block.
In the latter case, we stuff away what we have into a
strbuf, read another block, and continue our parse. But we
missed one case here. If we did find a newline, and it is at
the beginning of the block, we must also stuff that newline
into the strbuf, as it belongs to the block we are about to
read.
The minimal fix here would be to add this special case to
the conditional that checks whether we found a newline.
But we can make the flow a little clearer by rearranging a
bit: we first handle lines that we are going to show, and
then at the end of each loop, stuff away any leftovers if
necessary. That lets us fold this special-case in with the
more common "we ended in the middle of a line" case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref_transaction_update function can already be used to create refs by
passing null_sha1 as the old_sha1 parameter. Simplify by replacing
transaction_create with a thin wrapper.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous patch, git_snpath() is modified to allocate a new
strbuf buffer because vsnpath() needs that. But that makes it
awkward because git_snpath() receives a pre-allocated buffer from
outside and has to copy data back. Rename it to strbuf_git_path()
and make it receive strbuf directly.
Using git_path() in update_refs_for_switch() which used to call
git_snpath() is safe because that function and all of its callers do
not keep any pointer to the round-robin buffer pool allocated by
get_pathname().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.
After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.
Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.
This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers have string_lists available already, whereas two
of them had to read data out of a string_list into an array of strings
just to call this function. So change repack_without_refs() to take
the list of refnames to omit as a string_list, and change the callers
accordingly.
Suggested-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lock_ref_sha1_basic is inconsistent about when it calls
die() and when it returns NULL to signal an error. This is
annoying to any callers that want to recover from a locking
error.
This seems to be mostly historical accident. It was added in
4bd18c4 (Improve abstraction of ref lock/write.,
2006-05-17), which returned an error in all cases except
calling safe_create_leading_directories, in which case it
died. Later, 40aaae8 (Better error message when we are
unable to lock the index file, 2006-08-12) asked
hold_lock_file_for_update to die for us, leaving the
resolve_ref code-path the only one which returned NULL.
We tried to correct that in 5cc3cef (lock_ref_sha1(): do not
sometimes error() and sometimes die()., 2006-09-30),
by converting all of the die() calls into returns. But we
missed the "die" flag passed to the lock code, leaving us
inconsistent. This state persisted until e5c223e
(lock_ref_sha1_basic(): if locking fails with ENOENT, retry,
2014-01-18). Because of its retry scheme, it does not ask
the lock code to die, but instead manually dies with
unable_to_lock_die().
We can make this consistent with the other return paths by
converting this to use unable_to_lock_message(), and
returning NULL. This is safe to do because all callers
already needed to check the return value of the function,
since it could fail (and return NULL) for other reasons.
[jk: Added excessive history explanation]
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Corner-case bugfixes for "git fetch" around reflog handling.
* jk/fetch-reflog-df-conflict:
ignore stale directories when checking reflog existence
fetch: load all default config at startup
When we update a ref, we have two rules for whether or not
we actually update the reflog:
1. If the reflog already exists, we will always append to
it.
2. If log_all_ref_updates is set, we will create a new
reflog file if necessary.
We do the existence check by trying to open the reflog file,
either with or without O_CREAT (depending on log_all_ref_updates).
If it fails, then we check errno to see what happened.
If we were not using O_CREAT and we got ENOENT, the file
doesn't exist, and we return success (there isn't a reflog
already, and we were not told to make a new one).
If we get EISDIR, then there is likely a stale directory
that needs to be removed (e.g., there used to be "foo/bar",
it was deleted, and the directory "foo" was left. Now we
want to create the ref "foo"). If O_CREAT is set, then we
catch this case, try to remove the directory, and retry our
open. So far so good.
But if we get EISDIR and O_CREAT is not set, then we treat
this as any other error, which is not right. Like ENOENT,
EISDIR is an indication that we do not have a reflog, and we
should silently return success (we were not told to create
it). Instead, the current code reports this as an error, and
we fail to update the ref at all.
Note that this is relatively unlikely to happen, as you
would have to have had reflogs turned on, and then later
turned them off (it could also happen due to a bug in fetch,
but that was fixed in the previous commit). However, it's
quite easy to fix: we just need to treat EISDIR like ENOENT
for the non-O_CREAT case, and silently return (note that
this early return means we can also simplify the O_CREAT
case).
Our new tests cover both cases (O_CREAT and non-O_CREAT).
The first one already worked, of course.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When removal of a loose or packed ref fails, bail out instead of
trying to finish the transaction. This way, a single error message
can be printed (instead of multiple messages being concatenated by
mistake) and the operator can try to solve the underlying problem
before there is a chance to muck things up even more.
In particular, when git fails to remove a ref, git goes on to try to
delete the reflog. Exiting early lets us keep the reflog.
When git succeeds in deleting a ref A and fails to remove a ref B, it
goes on to try to delete both reflogs. It would be better to just
remove the reflog for A, but that would be a more invasive change.
Failing early means we keep both reflogs, which puts the operator in a
good position to understand the problem and recover.
A long term goal is to avoid these problems altogether and roll back
the transaction on failure. That kind of transactionality will have
to wait for a later series (the plan for which is to make all
destructive work happen in a single update of the packed-refs file).
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some functions that take a strbuf argument to append an error treat
!err as an indication that the message should be suppressed (e.g.,
ref_update_reject_duplicates). Others write the message to stderr on
!err (e.g., repack_without_refs). Others crash (e.g.,
ref_transaction_update).
Some of these behaviors are for historical reasons and others were
accidents. Luckily no callers pass err == NULL any more. Simplify
by consistently requiring the strbuf argument.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently do not handle badly named refs well:
$ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\.
$ git branch
fatal: Reference has invalid format: 'refs/heads/master.....@*@\.'
$ git branch -D master.....@\*@\\.
error: branch 'master.....@*@\.' not found.
Users cannot recover from a badly named ref without manually finding
and deleting the loose ref file or appropriate line in packed-refs.
Making that easier will make it easier to tweak the ref naming rules
in the future, for example to forbid shell metacharacters like '`'
and '"', without putting people in a state that is hard to get out of.
So allow "branch --list" to show these refs and allow "branch -d/-D"
and "update-ref -d" to delete them. Other commands (for example to
rename refs) will continue to not handle these refs but can be changed
in later patches.
Details:
In resolving functions, refuse to resolve refs that don't pass the
git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
resolve refs that escape the refs/ directory and do not match the
pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD").
In locking functions, refuse to act on badly named refs unless they
are being deleted and either are in the refs/ directory or match [A-Z_]*.
Just like other invalid refs, flag resolved, badly named refs with the
REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
in all iteration functions except for for_each_rawref.
Flag badly named refs (but not symrefs pointing to badly named refs)
with a REF_BAD_NAME flag to make it easier for future callers to
notice and handle them specially. For example, in a later patch
for-each-ref will use this flag to detect refs whose names can confuse
callers parsing for-each-ref output.
In the transaction API, refuse to create or update badly named refs,
but allow deleting them (unless they try to escape refs/ and don't match
[A-Z_]*).
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since v1.7.9-rc1~10^2 (write_head_info(): handle "extra refs" locally,
2012-01-06), this trick to keep track of ".have" refs that are only
valid on the wire and not on the filesystem is not needed any more.
Simplify by removing support for the REFNAME_DOT_COMPONENT flag.
This means we'll be slightly stricter with invalid refs found in a
packed-refs file or during clone. read_loose_refs() already checks
for and skips refnames with .components so it is not affected.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a repository gets in a broken state with too much symref nesting,
it cannot be repaired with "git branch -d":
$ git symbolic-ref refs/heads/nonsense refs/heads/nonsense
$ git branch -d nonsense
error: branch 'nonsense' not found.
Worse, "git update-ref --no-deref -d" doesn't work for such repairs
either:
$ git update-ref -d refs/heads/nonsense
error: unable to resolve reference refs/heads/nonsense: Too many levels of symbolic links
Fix both by teaching resolve_ref_unsafe a new RESOLVE_REF_NO_RECURSE
flag and passing it when appropriate.
Callers can still read the value of a symref (for example to print a
message about it) with that flag set --- resolve_ref_unsafe will
resolve one level of symrefs and stop there.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading). Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.
While at it, swap two of the arguments in the function to put output
arguments at the end. As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.
Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No external users call write_ref_sha1 any more so let's declare it static.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In _commit, ENOTDIR can happen in the call to lock_ref_sha1_basic, either
when we lstat the new refname or if the name checking function reports that
the same type of conflict happened. In both cases, it means that we can not
create the new ref due to a name conflict.
Start defining specific return codes for _commit. TRANSACTION_NAME_CONFLICT
refers to a failure to create a ref due to a name conflict with another ref.
TRANSACTION_GENERIC_ERROR is for all other errors.
When "git fetch" is creating refs, name conflicts differ from other errors in
that they are likely to be resolved by running "git remote prune <remote>".
"git fetch" currently inspects errno to decide whether to give that advice.
Once it switches to the transaction API, it can check for
TRANSACTION_NAME_CONFLICT instead.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change is_refname_available to take a list of strings to exclude when
checking for conflicts instead of just one single name. We can already
exclude a single name for the sake of renames. This generalizes that support.
ref_transaction_commit already tracks a set of refs that are being deleted
in an array. This array is then used to exclude refs from being written to
the packed-refs file. At some stage we will want to change this array to a
struct string_list and then we can pass it to is_refname_available via the
call to lock_ref_sha1_basic. That will allow us to perform transactions
that perform multiple renames as long as there are no conflicts within the
starting or ending state.
For example, that would allow a single transaction that contains two
renames that are both individually conflicting:
m -> n/n
n -> m/m
No functional change intended yet.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Skip using the lock_any_ref_for_update wrapper and call lock_ref_sha1_basic
directly from the commit function.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the check for check_refname_format from lock_any_ref_for_update to
lock_ref_sha1_basic. At some later stage we will get rid of
lock_any_ref_for_update completely. This has no visible impact to callers
except for the inability to lock badly named refs, which is not possible
today already for other reasons.(*)
Keep lock_any_ref_for_update as a no-op wrapper. It is the public facing
version of this interface and keeping it as a separate function will make
it easier to experiment with the internal lock_ref_sha1_basic signature.
(*) For example, if lock_ref_sha1_basic checks the refname format and
refuses to lock badly named refs, it will not be possible to delete
such refs because the first step of deletion is to lock the ref. We
currently already fail in that case because these refs are not recognized
to exist:
$ cp .git/refs/heads/master .git/refs/heads/echo...\*\*
$ git branch -D .git/refs/heads/echo...\*\*
error: branch '.git/refs/heads/echo...**' not found.
This has been broken for a while. Later patches in the series will start
repairing the handling of badly named refs.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We call read_ref_full with a pointer to flags from rename_ref but since
we never actually use the returned flags we can just pass NULL here instead.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the ref transaction API so that we pass the reflog message to the
create/delete/update functions instead of to ref_transaction_commit.
This allows different reflog messages for each ref update in a multi-ref
transaction.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an err argument to delete_ref_loose so that we can pass a descriptive
error string back to the caller. Pass the err argument from transaction
commit to this function so that transaction users will have a nice error
string if the transaction failed due to delete_ref_loose.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lock_ref_sha1_basic is used to lock refs that sit directly in the .git
dir such as HEAD and MERGE_HEAD in addition to the more ordinary refs
under "refs/". Remove the note claiming otherwise.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify the function warn_if_unremovable slightly. Additionally, change
behaviour slightly. If we failed to remove the object because the object
does not exist, we can still return success back to the caller since none of
the callers depend on "fail if the file did not exist".
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function to return the path of the file that is locked by a
lock_file object. This reduces the knowledge that callers have to have
about the lock_file layout.
Suggested-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it harder to misread the name as LOCK_NODE_REF.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value. (That will be fixed in a moment.)
Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file. But lock_file objects are often reused. By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's bad manners. Especially since there could be a signal during the
call to unlink_or_warn(), in which case the signal handler will see
the wrong filename and delete the reference file, leaving the lockfile
behind.
So make our own copy to work with.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few places that use these values, so define constants for
them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is used for other things besides the index, so rename it
accordingly.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"rev-parse --verify --quiet $name" is meant to quietly exit with a
non-zero status when $name is not a valid object name, but still
gave error messages in some cases.
* da/rev-parse-verify-quiet:
stash: prefer --quiet over shell redirection of the standard error stream
refs: make rev-parse --quiet actually quiet
t1503: use test_must_be_empty
Documentation: a note about stdout for git rev-parse --verify --quiet
Optimize the check to see if a ref $F can be created by making sure
no existing ref has $F/ as its prefix, which especially matters in
a repository with a large number of existing refs.
* jk/faster-name-conflicts:
refs: speed up is_refname_available
Optimize the code path to write out the packed-refs file, which
especially matters in a repository with a large number of refs.
* jk/write-packed-refs-via-stdio:
refs: write packed_refs file using stdio
When a reflog is deleted, e.g. when "git stash" clears its stashes,
"git rev-parse --verify --quiet" dies:
fatal: Log for refs/stash is empty.
The reason is that the get_sha1() code path does not allow us
to suppress this message.
Pass the flags bitfield through get_sha1_with_context() so that
read_ref_at() can suppress the message.
Use get_sha1_with_context1() instead of get_sha1() in rev-parse
so that the --quiet flag is honored.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our filesystem ref storage does not allow D/F conflicts; so
if "refs/heads/a/b" exists, we do not allow "refs/heads/a"
to exist (and vice versa). This falls out naturally for
loose refs, where the filesystem enforces the condition. But
for packed-refs, we have to make the check ourselves.
We do so by iterating over the entire packed-refs namespace
and checking whether each name creates a conflict. If you
have a very large number of refs, this is quite inefficient,
as you end up doing a large number of comparisons with
uninteresting bits of the ref tree (e.g., we know that all
of "refs/tags" is uninteresting in the example above, yet we
check each entry in it).
Instead, let's take advantage of the fact that we have the
packed refs stored as a trie of ref_entry structs. We can
find each component of the proposed refname as we walk
through the trie, checking for D/F conflicts as we go. For a
refname of depth N (i.e., 4 in the above example), we only
have to visit N nodes. And at each visit, we can binary
search the M names at that level, for a total complexity of
O(N lg M). ("M" is different at each level, of course, but
we can take the worst-case "M" as a bound).
In a pathological case of fetching 30,000 fresh refs into a
repository with 8.5 million refs, this dropped the time to
run "git fetch" from tens of minutes to ~30s.
This may also help smaller cases in which we check against
loose refs (which we do when renaming a ref), as we may
avoid a disk access for unrelated loose directories.
Note that the tests we add appear at first glance to be
redundant with what is already in t3210. However, the early
tests are not robust; they are run with reflogs turned on,
meaning that we are not actually testing
is_refname_available at all! The operations will still fail
because the reflogs will hit D/F conflicts in the
filesystem. To get a true test, we must turn off reflogs
(but we don't want to do so for the entire script, because
the point of turning them on was to cover some other cases).
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After "pack-refs --prune" packed refs at the top-level, it failed
to prune them.
* jk/prune-top-level-refs-after-packing:
pack-refs: prune top-level refs like "refs/foo"
The second batch of the transactional ref update series.
* rs/ref-transaction-1: (22 commits)
update-ref --stdin: pass transaction around explicitly
update-ref --stdin: narrow scope of err strbuf
refs.c: make delete_ref use a transaction
refs.c: make prune_ref use a transaction to delete the ref
refs.c: remove lock_ref_sha1
refs.c: remove the update_ref_write function
refs.c: remove the update_ref_lock function
refs.c: make lock_ref_sha1 static
walker.c: use ref transaction for ref updates
fast-import.c: use a ref transaction when dumping tags
receive-pack.c: use a reference transaction for updating the refs
refs.c: change update_ref to use a transaction
branch.c: use ref transaction for all ref updates
fast-import.c: change update_branch to use ref transactions
sequencer.c: use ref transactions for all ref updates
commit.c: use ref transactions for updates
replace.c: use the ref transaction functions for updates
tag.c: use ref transactions when doing updates
refs.c: add transaction.status and track OPEN/CLOSED
refs.c: make ref_transaction_begin take an err argument
...
We write each line of a new packed-refs file individually
using a write() syscall (and sometimes 2, if the ref is
peeled). Since each line is only about 50-100 bytes long,
this creates a lot of system call overhead.
We can instead open a stdio handle around our descriptor and
use fprintf to write to it. The extra buffering is not a
problem for us, because nobody will read our new packed-refs
file until we call commit_lock_file (by which point we have
flushed everything).
On a pathological repository with 8.5 million refs, this
dropped the time to run `git pack-refs` from 20s to 6s.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change delete_ref to use a ref transaction for the deletion. At the same time
since we no longer have any callers of repack_without_ref we can now delete
this function.
Change delete_ref to return 0 on success and 1 on failure instead of the
previous 0 on success either 1 or -1 on failure.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change prune_ref to delete the ref using a ref transaction. To do this we also
need to add a new flag REF_ISPRUNING that will tell the transaction that we
do not want to delete this ref from the packed refs. This flag is private to
refs.c and not exposed to external callers.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lock_ref_sha1 was only called from one place in refs.c and only provided
a check that the refname was sane before adding back the initial "refs/"
part of the ref path name, the initial "refs/" that this caller had already
stripped off before calling lock_ref_sha1.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we only call update_ref_write from a single place and we only call it
with onerr==QUIET_ON_ERR we can just as well get rid of it and just call
write_ref_sha1 directly. This changes the return status for _commit from
1 to -1 on failures when writing to the ref. Eventually we will want
_commit to start returning more detailed error conditions than the current
simple success/failure.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we now only call update_ref_lock with onerr==QUIET_ON_ERR we no longer
need this function and can replace it with just calling lock_any_ref_for_update
directly.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No external callers reference lock_ref_sha1 any more so let's declare it
static.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the update_ref helper function to use a ref transaction internally.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Track the state of a transaction in a new state field. Check the field for
sanity, i.e. that state must be OPEN when _commit/_create/_delete or
_update is called or else die(BUG:...)
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an err argument to _begin so that on non-fatal failures in future ref
backends we can report a nice error back to the caller.
While _begin can currently never fail for other reasons than OOM, in which
case we die() anyway, we may add other types of backends in the future.
For example, a hypothetical MySQL backend could fail in _begin with
"Can not connect to MySQL server. No route to host".
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change ref_transaction_delete() to do basic error checking and return
non-zero on error. Update all callers to check the return for
ref_transaction_delete(). There are currently no conditions in _delete that
will return error but there will be in the future. Add an err argument that
will be updated on failure.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do basic error checking in ref_transaction_create() and make it return
non-zero on error. Update all callers to check the result of
ref_transaction_create(). There are currently no conditions in _create that
will return error but there will be in the future. Add an err argument that
will be updated on failure.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After we have packed all refs, we prune any loose refs that
correspond to what we packed. We do so by first taking a
lock with lock_ref_sha1, and then deleting the loose ref
file.
However, lock_ref_sha1 will refuse to take a lock on any
refs that exist at the top-level of the "refs/" directory,
and we skip pruning the ref. This is almost certainly not
what we want to happen here. The criteria to be pruned
should not differ from that to be packed; if a ref makes it
to prune_ref, it's because we want it both packed and
pruned (if there are refs you do not want to be packed, they
should be omitted much earlier by pack_ref_is_possible,
which we do in this case if --all is not given).
We can fix this by switching to lock_any_ref_for_update.
This behaves exactly the same with the exception of this
top-level check.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:
1. Look for an existing object struct. If we don't have
one, allocate and return a new one.
2. Double check that any object we have is the expected
type (and complain and return NULL otherwise).
3. Convert an object with type OBJ_NONE (from a prior
call to lookup_unknown_object) to the expected type.
We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.
Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.
Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:
- for lookup_{commit,tree,etc} functions, we are just
pulling steps 2 and 3 into a function that does the same
thing.
- for the call in peel_object, we currently only do step 3
(but we want to consolidate it with the others, as
mentioned above). However, step 2 is a noop here, as the
surrounding conditional makes sure we have OBJ_NONE
(which we want to keep to avoid an extraneous call to
sha1_object_info).
- for the call in lookup_commit_reference_gently, we are
currently doing step 2 but not step 3. However, step 3
is a noop here. The object we got will have just come
from deref_tag, which must have figured out the type for
each object in order to know when to stop peeling.
Therefore the type will never be OBJ_NONE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.
* jk/alloc-commit-id:
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
Early part of the "ref transaction" topic.
* rs/ref-transaction-0:
refs.c: change ref_transaction_update() to do error checking and return status
refs.c: remove the onerr argument to ref_transaction_commit
update-ref: use err argument to get error from ref_transaction_commit
refs.c: make update_ref_write update a strbuf on failure
refs.c: make ref_update_reject_duplicates take a strbuf argument for errors
refs.c: log_ref_write should try to return meaningful errno
refs.c: make resolve_ref_unsafe set errno to something meaningful on error
refs.c: commit_packed_refs to return a meaningful errno on failure
refs.c: make remove_empty_directories always set errno to something sane
refs.c: verify_lock should set errno to something meaningful
refs.c: make sure log_ref_setup returns a meaningful errno
refs.c: add an err argument to repack_without_refs
lockfile.c: make lock_file return a meaningful errno on failurei
lockfile.c: add a new public function unable_to_lock_message
refs.c: add a strbuf argument to ref_transaction_commit for error logging
refs.c: allow passing NULL to ref_transaction_free
refs.c: constify the sha arguments for ref_transaction_create|delete|update
refs.c: ref_transaction_commit should not free the transaction
refs.c: remove ref_transaction_rollback
Both refs.c and fsck.c have their own private copies of the is_branch function.
Delete the is_branch function from fsck.c and make the version in refs.c
public.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/strip-suffix:
prepare_packed_git_one: refactor duplicate-pack check
verify-pack: use strbuf_strip_suffix
strbuf: implement strbuf_strip_suffix
index-pack: use strip_suffix to avoid magic numbers
use strip_suffix instead of ends_with in simple cases
replace has_extension with ends_with
implement ends_with via strip_suffix
add strip_suffix function
sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
Update ref_transaction_update() do some basic error checking and return
non-zero on error. Update all callers to check ref_transaction_update() for
error. There are currently no conditions in _update that will return error but
there will be in the future. Add an err argument that will be updated on
failure. In future patches we will start doing both locking and checking
for name conflicts in _update instead of _commit at which time this function
will start returning errors for these conditions.
Also check for BUGs during update and die(BUG:...) if we are calling
_update with have_old but the old_sha1 pointer is NULL.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Since all callers now use QUIET_ON_ERR we no longer need to provide an onerr
argument any more. Remove the onerr argument from the ref_transaction_commit
signature.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Change update_ref_write to also update an error strbuf on failure.
This makes the error available to ref_transaction_commit callers if the
transaction failed due to update_ref_sha1/write_ref_sha1 failures.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Make ref_update_reject_duplicates return any error that occurs through a
new strbuf argument. This means that when a transaction commit fails in
this function we will now be able to pass a helpful error message back to the
caller.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Making errno from write_ref_sha1() meaningful, which should fix
* a bug in "git checkout -b" where it prints strerror(errno)
despite errno possibly being zero or clobbered
* a bug in "git fetch"'s s_update_ref, which trusts the result of an
errno == ENOTDIR check to detect D/F conflicts
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Making errno when returning from resolve_ref_unsafe() meaningful,
which should fix
* a bug in lock_ref_sha1_basic, where it assumes EISDIR
means it failed due to a directory being in the way
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Making errno when returning from commit_packed_refs() meaningful,
which should fix
* a bug in "git clone" where it prints strerror(errno) based on
errno, despite errno possibly being zero and potentially having
been clobbered by that point
* the same kind of bug in "git pack-refs"
and prepares for repack_without_refs() to get a meaningful
error message when commit_packed_refs() fails without falling into
the same bug.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Making errno when returning from remove_empty_directories() more
obviously meaningful, which should provide some peace of mind for
people auditing lock_ref_sha1_basic.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Making errno when returning from verify_lock() meaningful, which
should almost but not completely fix
* a bug in "git fetch"'s s_update_ref, which trusts the result of an
errno == ENOTDIR check to detect D/F conflicts
ENOTDIR makes sense as a sign that a file was in the way of a
directory we wanted to create. Should "git fetch" also look for
ENOTEMPTY or EEXIST to catch cases where a directory was in the way
of a file to be created?
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Making errno when returning from log_ref_setup() meaningful,
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Update repack_without_refs to take an err argument and update it if there
is a failure. Pass the err variable from ref_transaction_commit to this
function so that callers can print a meaningful error message if _commit
fails due to this function.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Making errno when returning from lock_file() meaningful, which should
fix
* an existing almost-bug in lock_ref_sha1_basic where it assumes
errno==ENOENT is meaningful and could waste some work on retries
* an existing bug in repack_without_refs where it prints
strerror(errno) and picks advice based on errno, despite errno
potentially being zero and potentially having been clobbered by
that point
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Add a strbuf argument to _commit so that we can pass an error string back to
the caller. So that we can do error logging from the caller instead of from
_commit.
Longer term plan is to first convert all callers to use onerr==QUIET_ON_ERR
and craft any log messages from the callers themselves and finally remove the
onerr argument completely.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Allow ref_transaction_free(NULL) as a no-op. This makes ref_transaction_free
easier to use and more similar to plain 'free'.
In particular, it lets us rollback unconditionally as part of cleanup code
after setting 'transaction = NULL' if a transaction has been committed or
rolled back already.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
ref_transaction_create|delete|update has no need to modify the sha1
arguments passed to it so it should use const unsigned char* instead
of unsigned char*.
Some functions, such as fast_forward_to(), already have its old/new
sha1 arguments as consts. This function will at some point need to
use ref_transaction_update() in which case this change is required.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
We do not yet need both a rollback and a free function for transactions.
Remove ref_transaction_rollback and use ref_transaction_free instead.
At a later stage we may reintroduce a rollback function if we want to start
adding reusable transactions and similar.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
When we call lookup_commit, lookup_tree, etc, the logic goes
something like:
1. Look for an existing object struct. If we don't have
one, allocate and return a new one.
2. Double check that any object we have is the expected
type (and complain and return NULL otherwise).
3. Convert an object with type OBJ_NONE (from a prior
call to lookup_unknown_object) to the expected type.
We can encapsulate steps 2 and 3 in a helper function which
checks whether we have the expected object type, converts
OBJ_NONE as appropriate, and returns the object.
Not only does this shorten the code, but it also provides
one central location for converting OBJ_NONE objects into
objects of other types. Future patches will use that to
enforce type-specific invariants.
Since this is a refactoring, we would want it to behave
exactly as the current code. It takes a little reasoning to
see that this is the case:
- for lookup_{commit,tree,etc} functions, we are just
pulling steps 2 and 3 into a function that does the same
thing.
- for the call in peel_object, we currently only do step 3
(but we want to consolidate it with the others, as
mentioned above). However, step 2 is a noop here, as the
surrounding conditional makes sure we have OBJ_NONE
(which we want to keep to avoid an extraneous call to
sha1_object_info).
- for the call in lookup_commit_reference_gently, we are
currently doing step 2 but not step 3. However, step 3
is a noop here. The object we got will have just come
from deref_tag, which must have figured out the type for
each object in order to know when to stop peeling.
Therefore the type will never be OBJ_NONE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fixes to a topic that is already in 'master'.
* dt/refs-check-refname-component-sse-fix:
refs: fix valgrind suppression file
refs.c: handle REFNAME_REFSPEC_PATTERN at end of page
When a ref crosses a memory page boundary, we restart the parsing
at the beginning with the bytewise code. Pass the original flags
to that code, rather than the current flags.
Reported-By: Øyvind A. Holm <sunny@sunbase.org>
Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two are almost the same function, with the exception
that has_extension only matches if there is content before
the suffix. So ends_with(".exe", ".exe") is true, but
has_extension would not be.
This distinction does not matter to any of the callers,
though, and we can just replace uses of has_extension with
ends_with. We prefer the "ends_with" name because it is more
generic, and there is nothing about the function that
requires it to be used for file extensions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize check_refname_component using SSE2 on x86_64.
git rev-parse HEAD is a good test-case for this, since it does almost
nothing except parse refs. For one particular repo with about 60k
refs, almost all packed, the timings are:
Look up table: 29 ms
SSE2: 23 ms
This cuts about 20% off of the runtime.
Ondřej Bílka <neleai@seznam.cz> suggested an SSE2 approach to the
substring searches, which netted a speed boost over the SSE4.2 code I
had initially written.
Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git remote rm" and "git remote prune" can involve removing many
refs at once, which is not a very efficient thing to do when very
many refs exist in the packed-refs file.
* jl/remote-rm-prune:
remote prune: optimize "dangling symref" check/warning
remote: repack packed-refs once when deleting multiple refs
remote rm: delete remote configuration as the last
In a repository with many refs, check_refname_component can be a major
contributor to the runtime of some git commands. One such command is
git rev-parse HEAD
Timings for one particular repo, with about 60k refs, almost all
packed, are:
Old: 35 ms
New: 29 ms
Many other commands which read refs are also sped up.
Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_ref_at has its own parsing of the reflog file for no really good reason
so lets change this to use the existing reflog iterators. This removes one
instance where we manually unmarshall the reflog file format.
Remove the now redundant ref_msg function.
Log messages for errors are changed slightly. We no longer print the file
name for the reflog, instead we refer to it as 'Log for ref <refname>'.
This might be a minor useability regression, but I don't really think so, since
experienced users would know where the log is anyway and inexperienced users
would not know what to do about/how to repair 'Log ... has gap ...' anyway.
Adapt the t1400 test to handle the change in log messages.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'git remote prune' was used to delete many refs in a repository
with many refs, a lot of time was spent checking for (now) dangling
symbolic refs pointing to the deleted ref, since warn_dangling_symref()
was once per deleted ref to check all other refs in the repository.
Avoid this using the new warn_dangling_symrefs() function which
makes one pass over all refs and checks for all the deleted refs in
one go, after they have all been deleted.
Signed-off-by: Jens Lindström <jl@opera.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'git remote rm' or 'git remote prune' were used in a repository
with many refs, and needed to delete many remote-tracking refs, a lot
of time was spent deleting those refs since for each deleted ref,
repack_without_refs() was called to rewrite packed-refs without just
that deleted ref.
To avoid this, call repack_without_refs() first to repack without all
the refs that will be deleted, before calling delete_ref() to delete
each one completely. The call to repack_without_ref() in delete_ref()
then becomes a no-op, since packed-refs already won't contain any of
the deleted refs.
Signed-off-by: Jens Lindström <jl@opera.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add two new functions, reflog_exists and delete_reflog, to hide the internal
reflog implementation (that they are files under .git/logs/...) from callers.
Update checkout.c to use these functions in update_refs_for_switch instead of
building pathnames and calling out to file access functions. Update reflog.c
to use these to check if the reflog exists. Now there are still many places
in reflog.c where we are still leaking the reflog storage implementation but
this at least reduces the number of such dependencies by one. Finally
change two places in refs.c itself to use the new function to check if a ref
exists or not isntead of build-path-and-stat(). Now, this is strictly not all
that important since these are in parts of refs that are implementing the
actual file storage backend but on the other hand it will not hurt either.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we free the transaction when we are done, there is no need to
make a copy of transaction->updates before working with it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It used to be that ref_transaction_commit() allocated a temporary
array to hold the types of references while it is working. Instead,
add a type field to ref_update that ref_transaction_commit() can use
as its scratch space.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that we manage ref_update objects internally, we can use them to
hold some of the scratch space we need when actually carrying out the
updates. Store the (struct ref_lock *) there.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use temporary variables in the for-loop blocks to simplify expressions
in the rest of the loop.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is consistent with the usual nomenclature.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been superseded by reference transactions. This also means
that struct ref_update can become private.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build out the API for dealing with a bunch of reference checks and
changes within a transaction. Define an opaque ref_transaction type
that is managed entirely within refs.c. Introduce functions for
beginning a transaction, adding updates to a transaction, and
committing/rolling back a transaction.
This API will soon replace update_refs().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old signature of update_refs() required a
(const struct ref_update **) for its updates_orig argument. The
"const" is presumably there to promise that the function will not
modify the contents of the structures.
But this declaration does not permit the function to be called with a
(struct ref_update **), which is perfectly legitimate. C's type
system is not powerful enough to express what we'd like. So remove
the first "const" from the declaration.
On the other hand, the function *can* promise not to modify the
pointers within the array that is passed to it without inconveniencing
its callers. So add a "const" that has that effect, making the final
declaration
(struct ref_update * const *).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Given that these constants are only being used when updating
references, it is inappropriate to give them such generic names as
"DIE_ON_ERR". So prefix their names with "UPDATE_REFS_".
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We invented hashcpy() to keep the abstraction of "object name"
behind it. Use it instead of calling memcpy() with hard-coded
20-byte length when moving object names between pieces of memory.
Leave ppc/sha1.c as-is, because the function is about the SHA-1 hash
algorithm whose output is and will always be 20 bytes.
Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Sun He <sunheehnus@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it clear that we don't use fnmatch() anymore.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up and protection against concurrent write access to the
ref namespace.
* mh/safe-create-leading-directories:
rename_tmp_log(): on SCLD_VANISHED, retry
rename_tmp_log(): limit the number of remote_empty_directories() attempts
rename_tmp_log(): handle a possible mkdir/rmdir race
rename_ref(): extract function rename_tmp_log()
remove_dir_recurse(): handle disappearing files and directories
remove_dir_recurse(): tighten condition for removing unreadable dir
lock_ref_sha1_basic(): if locking fails with ENOENT, retry
lock_ref_sha1_basic(): on SCLD_VANISHED, retry
safe_create_leading_directories(): add new error value SCLD_VANISHED
cmd_init_db(): when creating directories, handle errors conservatively
safe_create_leading_directories(): introduce enum for return values
safe_create_leading_directories(): always restore slash at end of loop
safe_create_leading_directories(): split on first of multiple slashes
safe_create_leading_directories(): rename local variable
safe_create_leading_directories(): add explicit "slash" pointer
safe_create_leading_directories(): reduce scope of local variable
safe_create_leading_directories(): fix format of "if" chaining
If safe_create_leading_directories() fails because a file along the
path unexpectedly vanished, try again from the beginning. Try at most
4 times.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This doesn't seem to be a likely error, but we've got the counter
anyway, so we might as well use it for an added bit of safety.
Please note that the first call to rename() is optimistic, and it is
normal for it to fail if there is a directory in the way. So bump the
total number of allowed attempts to 4, to be sure that we can still
have at least 3 retries in the case of a race.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a directory vanishes while renaming the temporary reflog file,
retry (up to 3 times). This could happen if another process deletes
the directory created by safe_create_leading_directories() just before
we rename the file into the directory.
As far as I can tell, this race could not occur internal to git. The
only time that a directory under $GIT_DIR/logs is deleted is if room
has to be made for a log file for a reference with the same name;
for example, in the following sequence:
git branch foo/bar # Creates file .git/logs/refs/heads/foo/bar
git branch -d foo/bar # Deletes file but leaves .git/logs/refs/heads/foo/
git branch foo # Deletes .git/logs/refs/heads/foo/
But the only reason the last command deletes the directory is because
it wants to create a file with the same name. So if another process
(e.g.,
git branch foo/baz
) wants to create that directory, one of the two is doomed to failure
anyway because of a D/F conflict.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If hold_lock_file_for_update() fails with errno==ENOENT, it might be
because somebody else (for example, a pack-refs process) has just
deleted one of the lockfile's ancestor directories. So if this
condition is detected, try again (up to 3 times).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If safe_create_leading_directories() fails because a file along the
path unexpectedly vanished, try again (up to 3 times).
This can occur if another process is deleting directories at the same
time as we are trying to make them. For example, "git pack-refs
--all" tries to delete the loose refs and any empty directories that
are left behind. If a pack-refs process is running, then it might
delete a directory that we need to put a new loose reference in.
If safe_create_leading_directories() thinks this might have happened,
then take its advice and try again (maximum three attempts).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to use two separate rules for the normal ref resolution
dwimming and dwimming done to decide which remote ref to grab. The
third parameter to refname_match() selected which rules to use.
When these two rules were harmonized in
2011-11-04 dd621df9cd refs DWIMmery: use the same rule for both "git fetch" and others
, ref_fetch_rules was #defined to avoid potential breakages for
in-flight topics.
It is now safe to remove the backwards-compatibility code, so remove
refname_match()'s third parameter, make ref_rev_parse_rules private to
refs.c, and remove ref_fetch_rules entirely.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mh/shorten-unambigous-ref:
shorten_unambiguous_ref(): tighten up pointer arithmetic
gen_scanf_fmt(): delete function and use snprintf() instead
shorten_unambiguous_ref(): introduce a new local variable
As long as we're being pathologically stingy with mallocs, we might as
well do the math right and save 6 (!) bytes.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To replace "%.*s" with "%s", all we have to do is use snprintf()
to interpolate "%s" into the pattern.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When filling the scanf_fmts array, use a separate variable to keep
track of the offset to avoid clobbering total_len (which we will need
in the next commit).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function 'invalidate_ref_cache' was introduced in 79c7ca5 (2011-10-17,
invalidate_ref_cache(): rename function from invalidate_cached_refs())
by a rename and elevated to be publicly usable in 8be8bde (2011-10-17,
invalidate_ref_cache(): expose this function in the refs API)
However it is not used anymore, as 8bf90dc (2011-10-17, write_ref_sha1():
only invalidate the loose ref cache) and (much) later 506a760 (2013-04-22,
refs: change how packed refs are deleted) removed any calls to this
function. So it seems as if we don't need that function any more,
good bye!
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In shorten_unambiguous_ref, we build and cache a reverse-map of the
rev-parse rules like this:
static char **scanf_fmts;
static int nr_rules;
if (!nr_rules) {
for (; ref_rev_parse_rules[nr_rules]; nr_rules++)
... generate scanf_fmts ...
}
where ref_rev_parse_rules is terminated with a NULL pointer.
Compiling with "gcc -O2 -Wall" does not cause any problems, but
compiling with "-O3 -Wall" generates:
$ make CFLAGS='-O3 -Wall' refs.o
refs.c: In function ‘shorten_unambiguous_ref’:
refs.c:3379:29: warning: array subscript is above array bounds [-Warray-bounds]
for (; ref_rev_parse_rules[nr_rules]; nr_rules++)
Curiously, we can silence this by explicitly nr_rules to 0
in the beginning of the loop, even though the compiler
should be able to tell that we follow this code path only
when nr_rules is already 0.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A call to update_ref_lock() passes '0' to the 'int *type_p' parameter.
Noticed by sparse. ("Using plain integer as NULL pointer")
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Instead of typing four capital letters "HEAD", you can say "@" now,
e.g. "git log @".
* fc/at-head:
Add new @ shortcut for HEAD
sha1-name: pass len argument to interpret_branch_name()
Give "update-refs" a "--stdin" option to read multiple update
requests and perform them in an all-or-none fashion.
* bk/refs-multi-update:
update-ref: add test cases covering --stdin signature
update-ref: support multiple simultaneous updates
refs: add update_refs for multiple simultaneous updates
refs: add function to repack without multiple refs
refs: factor delete_ref loose ref step into a helper
refs: factor update_ref steps into helpers
refs: report ref type from lock_any_ref_for_update
reset: rename update_refs to reset_refs
Typing 'HEAD' is tedious, especially when we can use '@' instead.
The reason for choosing '@' is that it follows naturally from the
ref@op syntax (e.g. HEAD@{u}), except we have no ref, and no
operation, and when we don't have those, it makes sens to assume
'HEAD'.
So now we can use 'git show @~1', and all that goody goodness.
Until now '@' was a valid name, but it conflicts with this idea, so
let's make it invalid. Probably very few people, if any, used this name.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow a safer "rewind of the remote tip" push than blind "--force",
by requiring that the overwritten remote ref to be unchanged since
the new history to replace it was prepared.
The machinery is more or less ready. The "--force" option is again
the big red button to override any safety, thanks to J6t's sanity
(the original round allowed --lockref to defeat --force).
The logic to choose the default implemented here is fragile
(e.g. "git fetch" after seeing a failure will update the
remote-tracking branch and will make the next "push" pass,
defeating the safety pretty easily). It is suitable only for the
simplest workflows, and it may hurt users more than it helps them.
* jc/push-cas:
push: teach --force-with-lease to smart-http transport
send-pack: fix parsing of --force-with-lease option
t5540/5541: smart-http does not support "--force-with-lease"
t5533: test "push --force-with-lease"
push --force-with-lease: tie it all together
push --force-with-lease: implement logic to populate old_sha1_expect[]
remote.c: add command line option parser for "--force-with-lease"
builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN
cache.h: move remote/connect API out of it
Add 'struct ref_update' to encode the information needed to update or
delete a ref (name, new sha1, optional old sha1, no-deref flag). Add
function 'update_refs' accepting an array of updates to perform. First
sort the input array to order locks consistently everywhere and reject
multiple updates to the same ref. Then acquire locks on all refs with
verified old values. Then update or delete all refs accordingly. Fail
if any one lock cannot be obtained or any one old value does not match.
Though the refs themselves cannot be modified together in a single
atomic transaction, this function does enable some useful semantics.
For example, a caller may create a new branch starting from the head of
another branch and rewind the original branch at the same time. This
transfers ownership of commits between branches without risk of losing
commits added to the original branch by a concurrent process, or risk of
a concurrent process creating the new branch first.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Generalize repack_without_ref as repack_without_refs to support a list
of refs and implement the former in terms of the latter.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor loose ref deletion into helper function delete_ref_loose to allow
later use elsewhere.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor the lock and write steps and error handling into helper functions
update_ref_lock and update_ref_write to allow later use elsewhere.
Expose lock_any_ref_for_update's type_p to update_ref_lock callers.
While at it, drop "static" from the local "lock" variable as it is not
necessary to keep across invocations.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is useful to make sure we don't step outside the boundaries of what
we are interpreting at the moment. For example while interpreting
foobar@{u}~1, the job of interpret_branch_name() ends right before ~1,
but there's no way to figure that out inside the function, unless the
len argument is passed.
So let's do that.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose lock_ref_sha1_basic's type_p argument to callers of
lock_any_ref_for_update. Update all call sites to ignore it by passing
NULL for now.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit cdfd94837b, as it
does not just apply to "@" (and forms with modifiers like @{u}
applied to it), but also affects e.g. "refs/heads/@/foo", which it
shouldn't.
The basic idea of giving a short-hand might be good, and the topic
can be retried later, but let's revert to avoid affecting existing
use cases for now for the upcoming release.
Fix a NULL-pointer dereference during nested iterations over
references (for example, when replace references are being used).
* mh/packed-refs-do-one-ref-recursion:
do_one_ref(): save and restore value of current_ref
If do_one_ref() is called recursively, then the inner call should not
permanently overwrite the value stored in current_ref by the outer
call. Aside from the tiny optimization loss, peel_ref() expects the
value of current_ref not to change across a call to peel_entry(). But
in the presence of replace references that assumption could be
violated by a recursive call to do_one_ref:
do_for_each_entry()
do_one_ref()
builtin/describe.c:get_name()
peel_ref()
peel_entry()
peel_object ()
deref_tag_noverify()
parse_object()
lookup_replace_object()
do_lookup_replace_object()
prepare_replace_object()
do_for_each_ref()
do_for_each_entry()
do_for_each_entry_in_dir()
do_one_ref()
The inner call to do_one_ref() was unconditionally setting current_ref
to NULL when it was done, causing peel_ref() to perform an invalid
memory access.
So change do_one_ref() to save the old value of current_ref before
overwriting it, and restore the old value afterward rather than
setting it to NULL.
Reported-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me. This structure is not
about the local ref used by sha1-name API to name local objects.
It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values. It belongs to "remote.h" together
with "struct refspec".
While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-refs" that races with new ref creation or deletion have
been susceptible to lossage of refs under right conditions, which
has been tightened up.
* mh/ref-races:
for_each_ref: load all loose refs before packed refs
get_packed_ref_cache: reload packed-refs file when it changes
add a stat_validity struct
Extract a struct stat_data from cache_entry
packed_ref_cache: increment refcount when locked
do_for_each_entry(): increment the packed refs cache refcount
refs: manage lifetime of packed refs cache via reference counting
refs: implement simple transactions for the packed-refs file
refs: wrap the packed refs cache in a level of indirection
pack_refs(): split creation of packed refs and entry writing
repack_without_ref(): split list curation and entry writing
Now that we keep track of the packed-refs file metadata, we can detect
when the packed-refs file has been modified since we last read it, and
we do so automatically every time that get_packed_ref_cache() is
called. So there is no need to invalidate the cache automatically
when lock_packed_refs() is called; usually the old copy will still be
valid.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are iterating through the refs using for_each_ref (or
any of its sister functions), we can get into a race
condition with a simultaneous "pack-refs --prune" that looks
like this:
0. We have a large number of loose refs, and a few packed
refs. refs/heads/z/foo is loose, with no matching entry
in the packed-refs file.
1. Process A starts iterating through the refs. It loads
the packed-refs file from disk, then starts lazily
traversing through the loose ref directories.
2. Process B, running "pack-refs --prune", writes out the
new packed-refs file. It then deletes the newly packed
refs, including refs/heads/z/foo.
3. Meanwhile, process A has finally gotten to
refs/heads/z (it traverses alphabetically). It
descends, but finds nothing there. It checks its
cached view of the packed-refs file, but it does not
mention anything in "refs/heads/z/" at all (it predates
the new file written by B in step 2).
The traversal completes successfully without mentioning
refs/heads/z/foo at all (the name, of course, isn't
important; but the more refs you have and the farther down
the alphabetical list a ref is, the more likely it is to hit
the race). If refs/heads/z/foo did exist in the packed refs
file at state 0, we would see an entry for it, but it would
show whatever sha1 the ref had the last time it was packed
(which could be an arbitrarily long time ago).
This can be especially dangerous when process A is "git
prune", as it means our set of reachable tips will be
incomplete, and we may erroneously prune objects reachable
from that tip (the same thing can happen if "repack -ad" is
used, as it simply drops unreachable objects that are
packed).
This patch solves it by loading all of the loose refs for
our traversal into our in-memory cache, and then refreshing
the packed-refs cache. Because a pack-refs writer will
always put the new packed-refs file into place before
starting the prune, we know that any loose refs we fail to
see will either truly be missing, or will have already been
put in the packed-refs file by the time we refresh.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once we read the packed-refs file into memory, we cache it
to save work on future ref lookups. However, our cache may
be out of date with respect to what is on disk if another
process is simultaneously packing the refs. Normally it
is acceptable for us to be a little out of date, since there
is no guarantee whether we read the file before or after the
simultaneous update. However, there is an important special
case: our packed-refs file must be up to date with respect
to any loose refs we read. Otherwise, we risk the following
race condition:
0. There exists a loose ref refs/heads/master.
1. Process A starts and looks up the ref "master". It
first checks $GIT_DIR/master, which does not exist. It
then loads (and caches) the packed-refs file to see if
"master" exists in it, which it does not.
2. Meanwhile, process B runs "pack-refs --all --prune". It
creates a new packed-refs file which contains
refs/heads/master, and removes the loose copy at
$GIT_DIR/refs/heads/master.
3. Process A continues its lookup, and eventually tries
$GIT_DIR/refs/heads/master. It sees that the loose ref
is missing, and falls back to the packed-refs file. But
it examines its cached version, which does not have
refs/heads/master. After trying a few other prefixes,
it reports master as a non-existent ref.
There are many variants (e.g., step 1 may involve process A
looking up another ref entirely, so even a fully qualified
refname can fail). One of the most interesting ones is if
"refs/heads/master" is already packed. In that case process
A will not see it as missing, but rather will report
whatever value happened to be in the packed-refs file before
process B repacked (which might be an arbitrarily old
value).
We can fix this by making sure we reload the packed-refs
file from disk after looking at any loose refs. That's
unacceptably slow, so we can check its stat()-validity as a
proxy, and read it only when it appears to have changed.
Reading the packed-refs file after performing any loose-ref
system calls is sufficient because we know the ordering of
the pack-refs process: it always makes sure the newly
written packed-refs file is installed into place before
pruning any loose refs. As long as those operations by B
appear in their executed order to process A, by the time A
sees the missing loose ref, the new packed-refs file must be
in place.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Increment the packed_ref_cache reference count while it is locked to
prevent its being freed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function calls a user-supplied callback function which could do
something that causes the packed refs cache to be invalidated. So
acquire a reference count on the data structure to prevent our copy
from being freed while we are iterating over it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In struct packed_ref_cache, keep a count of the number of users of the
data structure. Only free the packed ref cache when the reference
count goes to zero rather than when the packed ref cache is cleared.
This mechanism will be used to prevent the cache data structure from
being freed while it is being iterated over.
So far, only the reference in struct ref_cache::packed is counted;
other users will be adjusted in separate commits.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Handle simple transactions for the packed-refs file at the
packed_ref_cache level via new functions lock_packed_refs(),
commit_packed_refs(), and rollback_packed_refs().
Only allow the packed ref cache to be modified (via add_packed_ref())
while the packed refs file is locked.
Change clone to add the new references within a transaction.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we know, we can solve any problem in this manner. In this case,
the problem is to avoid freeing a packed refs cache while somebody is
using it. So add a level of indirection as a prelude to
reference-counting the packed refs cache.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split pack_refs() into multiple passes:
* Iterate over loose refs. For each one that can be turned into a
packed ref, create a corresponding entry in the packed refs cache.
* Write the packed refs to the packed-refs file.
This change isolates the mutation of the packed-refs file to a single
place.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The repack_without_ref() function first removes the deleted ref from
the internal packed-refs list, then writes the packed-refs list to
disk, omitting any broken or stale entries. This patch splits that
second step into multiple passes:
* collect the list of refnames that should be deleted from packed_refs
* delete those refnames from the cache
* write the remainder to the packed-refs file
The purpose of this change is to make the "write the remainder" part
reusable.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We read loose references in two steps. The code is roughly:
lstat()
if error ENOENT:
loose ref is missing; look for corresponding packed ref
else if S_ISLNK:
readlink()
if error:
report failure
else if S_ISDIR:
report failure
else
open()
if error:
report failure
read()
The problem is that the first filesystem call, to lstat(), is not
atomic with the second filesystem call, to readlink() or open().
Therefore it is possible for another process to change the file
between our two calls, for example:
* If the other process deletes the file, our second call will fail
with ENOENT, which we *should* interpret as "loose ref is missing;
look for corresponding packed ref". This can arise if the other
process is pack-refs; it might have just written a new packed-refs
file containing the old contents of the reference then deleted the
loose ref.
* If the other process changes a symlink into a plain file, our call
to readlink() will fail with EINVAL, which we *should* respond to by
trying to open() and read() the file.
The old code treats the reference as missing in both of these cases,
which is incorrect.
So instead, handle errors more selectively: if the result of
readline()/open() is a failure that is inconsistent with the result of
the previous lstat(), then something is fishy. In this case jump back
and start over again with a fresh call to lstat().
One race is still possible and undetected: another process could
change the file from a regular file into a symlink between the call to
lstat and the call to open(). The open() call would silently follow
the symlink and not know that something is wrong. This situation
could be detected in two ways:
* On systems that support O_NOFOLLOW, pass that option to the open().
* On other systems, call fstat() on the fd returned by open() and make
sure that it agrees with the stat info from the original lstat().
However, we don't use symlinks anymore, so this situation is unlikely.
Moreover, it doesn't appear that treating a symlink as a regular file
would have grave consequences; after all, this is exactly how the code
handles non-relative symlinks. So this commit leaves that race
unaddressed.
Note that this solves only the part of the race within
resolve_ref_unsafe. In the situation described above, we may still be
depending on a cached view of the packed-refs file; that race will be
dealt with in a future patch.
This problem was reported and diagnosed by Jeff King <peff@peff.net>,
and this solution is derived from his patch.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is only one "break" statement within the loop, which jumps to
the code after the loop that handles the case of a file that holds a
SHA-1. So move that code from below the loop into the if statement
where the break was previously located. This makes the logic flow
more local.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The nesting was getting a bit out of hand, and it's about to get
worse.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update reading and updating packed-refs file, correcting corner case
bugs.
* mh/packed-refs-various: (33 commits)
refs: handle the main ref_cache specially
refs: change do_for_each_*() functions to take ref_cache arguments
pack_one_ref(): do some cheap tests before a more expensive one
pack_one_ref(): use write_packed_entry() to do the writing
pack_one_ref(): use function peel_entry()
refs: inline function do_not_prune()
pack_refs(): change to use do_for_each_entry()
refs: use same lock_file object for both ref-packing functions
pack_one_ref(): rename "path" parameter to "refname"
pack-refs: merge code from pack-refs.{c,h} into refs.{c,h}
pack-refs: rename handle_one_ref() to pack_one_ref()
refs: extract a function write_packed_entry()
repack_without_ref(): write peeled refs in the rewritten file
t3211: demonstrate loss of peeled refs if a packed ref is deleted
refs: change how packed refs are deleted
search_ref_dir(): return an index rather than a pointer
repack_without_ref(): silence errors for dangling packed refs
t3210: test for spurious error messages for dangling packed refs
refs: change the internal reference-iteration API
refs: extract a function peel_entry()
...
Typing 'HEAD' is tedious, especially when we can use '@' instead.
The reason for choosing '@' is that it follows naturally from the
ref@op syntax (e.g. HEAD@{u}), except we have no ref, and no
operation, and when we don't have those, it makes sens to assume
'HEAD'.
So now we can use 'git show @~1', and all that goody goodness.
Until now '@' was a valid name, but it conflicts with this idea, so
let's make it invalid. Probably very few people, if any, used this name.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Hold the ref_cache instance for the main repository in a dedicated,
statically-allocated instance to avoid the need for a function call
and a linked-list traversal when it is needed.
Suggested by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the callers convert submodule names into ref_cache pointers.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change pack_refs() to work with a file descriptor instead of a FILE*
(making the file-locking code less awkward) and use
write_packed_entry() to do the writing.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change pack_one_ref() to call peel_entry() rather than using its own
code for peeling references. Aside from sharing code, this lets it
take advantage of the optimization introduced by 6c4a060d7d.
Please note that we *could* use any peeled values that happen to
already be stored in the ref_entries, which would avoid some object
lookups for references that were already packed. But doing so would
also propagate any peeling errors across runs of "git pack-refs" and
give no way to recover from such errors. And "git pack-refs" isn't
run often enough that the performance cost is a problem. So instead,
add a new option to peel_entry() to force the entry to be re-peeled,
and call it with that option set.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Function do_not_prune() was redundantly checking REF_ISSYMREF, which
was already tested at the top of pack_one_ref(), so remove that check.
And the rest was trivial, so inline the function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack_refs() was not using any of the extra features of for_each_ref(),
so change it to use do_for_each_entry(). This also gives it access to
the ref_entry and in particular its peeled field, which will be taken
advantage of in the next commit.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use a single struct lock_file for both pack_refs() and
repack_without_ref().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make this function conform to the naming convention established in
65385ef7d4 for the rest of the refs.c file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-refs.c doesn't contain much code, and the code it does contain is
closely related to reference handling. Moreover, there is some
duplication between pack_refs() and repack_without_ref(). Therefore,
merge pack-refs.c into refs.c and pack-refs.h into refs.h.
The code duplication will be addressed in future commits.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the I/O code from the "business logic" in repack_ref_fn().
Later there will be another caller for this function.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a reference that existed in the packed-refs file is deleted, the
packed-refs file must be rewritten. Previously, the file was
rewritten without any peeled refs, even if the file contained peeled
refs when it was read. This was not a bug, because the packed-refs
file header didn't claim that the file contained peeled values. But
it had a performance cost, because the repository would lose the
benefit of having precomputed peeled references until pack-refs was
run again.
Teach repack_without_ref() to write peeled refs to the packed-refs
file (regardless of whether they were present in the old version of
the file).
This means that if the old version of the packed-refs file was not
fully peeled, then repack_without_ref() will have to peel references.
To avoid the expense of reading lots of loose references, we take two
shortcuts relative to pack-refs:
* If the peeled value of a reference is already known (i.e., because
it was read from the old version of the packed-refs file), then
output that peeled value again without any checks. This is the
usual code path and should avoid any noticeable overhead. (This is
different than pack-refs, which always re-peels references.)
* We don't verify that the packed ref is still current. It could be
that a packed references is overridden by a loose reference, in
which case the packed ref is no longer needed and might even refer
to an object that has been garbage collected. But we don't check;
instead, we just try to peel all references. If peeling is
successful, the peeled value is written out (even though it might
not be needed any more); if not, then the reference is silently
omitted from the output.
The extra overhead of peeling references in repack_without_ref()
should only be incurred the first time the packed-refs file is written
by a version of Git that knows about the "fully-peeled" attribute.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function remove_ref(), which removes a single entry from a
reference cache.
Use this function to reimplement repack_without_ref(). The old
version iterated over all refs, packing all of them except for the one
to be deleted, then discarded the entire packed reference cache. The
new version deletes the doomed reference from the cache *before*
iterating.
This has two advantages:
* the code for writing packed-refs becomes simpler, because it doesn't
have to exclude one of the references.
* it is no longer necessary to discard the packed refs cache after
deleting a reference: symbolic refs cannot be packed, so packed
references cannot depend on each other, so the rest of the packed
refs cache remains valid after a reference is deleted.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change search_ref_dir() to return the index of the sought entry (or -1
on error) rather than a pointer to the entry. This will make it more
natural to use the function for removing an entry from the list.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop emitting an error message when deleting a packed reference if we
find another dangling packed reference that is overridden by a loose
reference. See the previous commit for a longer explanation of the
issue.
We have to be careful to make sure that the invalid packed reference
really *is* overridden by a loose reference; otherwise what we have
found is repository corruption, which we *should* report.
Please note that this approach is vulnerable to a race condition
similar to the race conditions already known to affect packed
references [1]:
* Process 1 tries to peel packed reference X as part of deleting
another packed reference. It discovers that X does not refer to a
valid object (because the object that it referred to has been
garbage collected).
* Process 2 tries to delete reference X. It starts by deleting the
loose reference X.
* Process 1 checks whether there is a loose reference X. There is not
(it has just been deleted by process 2), so process 1 reports a
spurious error "X does not point to a valid object!"
The worst case seems relatively harmless, and the fix is identical to
the fix that will be needed for the other race conditions (namely
holding a lock on the packed-refs file during *all* reference
deletions), so we leave the cleaning up of all of them as a future
project.
[1] http://thread.gmane.org/gmane.comp.version-control.git/211956
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Establish an internal API for iterating over references, which gives
the callback functions direct access to the ref_entry structure
describing the reference. (Do not change the iteration API that is
exposed outside of the module.)
Define a new internal callback signature
int each_ref_entry_fn(struct ref_entry *entry, void *cb_data)
Change do_for_each_ref_in_dir() and do_for_each_ref_in_dirs() to
accept each_ref_entry_fn callbacks, and rename them to
do_for_each_entry_in_dir() and do_for_each_entry_in_dirs(),
respectively. Adapt their callers accordingly.
Add a new function do_for_each_entry() analogous to do_for_each_ref()
but using the new callback style.
Change do_one_ref() into an each_ref_entry_fn that does some
bookkeeping and then calls a wrapped each_ref_fn.
Reimplement do_for_each_ref() in terms of do_for_each_entry(), using
do_one_ref() as an adapter.
Please note that the responsibility for setting current_ref remains in
do_one_ref(), which means that current_ref is *not* set when iterating
over references via the new internal API. This is not a disadvantage,
because current_ref is not needed by callers of the internal API (they
receive a pointer to the current ref_entry anyway). But more
importantly, this change prevents peel_ref() from returning invalid
results in the following scenario:
When iterating via the external API, the iteration always includes
both packed and loose references, and in particular never presents a
packed ref if there is a loose ref with the same name. The internal
API, on the other hand, gives the option to iterate over only the
packed references. During such an iteration, there is no check
whether the packed ref might be hidden by a loose ref of the same
name. But until now the packed ref was recorded in current_ref during
the iteration. So if peel_ref() were called with the reference name
corresponding to current ref, it would return the peeled version of
the packed ref even though there might be a loose ref that peels to a
different value. This scenario doesn't currently occur in the code,
but fix it to prevent things from breaking in a very confusing way in
the future.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Peel the entry, and as a side effect store the peeled value in the
entry. Use this function from two places in peel_ref(); a third
caller will be added soon.
Please note that this change can lead to ref_entries for unpacked refs
being peeled. This has no practical benefit but is harmless.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old version was inconsistent: when a reference was
REF_KNOWS_PEELED but with a null peeled value, it returned non-zero
for the current reference but zero for other references. Change the
behavior for non-current references to match that of current_ref,
which is what callers expect. Document the behavior.
Current callers only call peel_ref() from within a for_each_ref-style
iteration and only for the current ref; therefore, the buggy code path
was never reached.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of just returning a success/failure bit, return an enumeration
value that explains the reason for any failure. This will come in
handy shortly.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a nice, logical unit of work, and putting it in a function
removes the need to use a goto in peel_ref(). Soon it will also have
other uses.
The algorithm is unchanged.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a nice unit of work and soon will be needed from multiple
locations.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of copying the reference's SHA1 into a caller-supplied
variable, just return the ref_entry itself (or NULL if there is no
such entry). This change will allow the function to be used from
elsewhere.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no way to drop out of the while loop. This code has been
dead since 432ad41e.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Document the bits that can appear in the "flags" parameter passed to
an each_ref_function and/or in the ref_entry::flag field.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An internal function used to implement "git checkout @{-1}" was
hard to use correctly.
* jc/reflog-reverse-walk:
refs.c: fix fread error handling
reflog: add for_each_reflog_ent_reverse() API
for_each_recent_reflog_ent(): simplify opening of a reflog file
for_each_reflog_ent(): extract a helper to process a single entry
Not that we do not actively encourage having annotated tags outside
refs/tags/ hierarchy, but they were not advertised correctly to the
ls-remote and fetch with recent version of Git.
* jk/fully-peeled-packed-ref:
pack-refs: add fully-peeled trait
pack-refs: write peeled entry for non-tags
use parse_object_or_die instead of die("bad object")
avoid segfaults on parse_object failure
fread returns the number of items read, with no special error return.
Commit 98f85ff (reflog: add for_each_reflog_ent_reverse() API -
2013-03-08) introduced a call to fread which checks for an error with
"nread < 0" which is tautological since nread is unsigned. The correct
check in this case (which tries to read a single item) is "nread != 1".
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Older versions of pack-refs did not write peel lines for
refs outside of refs/tags. This meant that on reading the
pack-refs file, we might set the REF_KNOWS_PEELED flag for
such a ref, even though we do not know anything about its
peeled value.
The previous commit updated the writer to always peel, no
matter what the ref is. That means that packed-refs files
written by newer versions of git are fine to be read by both
old and new versions of git. However, we still have the
problem of reading packed-refs files written by older
versions of git, or by other implementations which have not
yet learned the same trick.
The simplest fix would be to always unset the
REF_KNOWS_PEELED flag for refs outside of refs/tags that do
not have a peel line (if it has a peel line, we know it is
valid, but we cannot assume a missing peel line means
anything). But that loses an important optimization, as
upload-pack should not need to load the object pointed to by
refs/heads/foo to determine that it is not a tag.
Instead, we add a "fully-peeled" trait to the packed-refs
file. If it is set, we know that we can trust a missing peel
line to mean that a ref cannot be peeled. Otherwise, we fall
back to assuming nothing.
[commit message and tests by Jeff King <peff@peff.net>]
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout -" is a short-hand for "git checkout @{-1}" and the
"@{nth}" notation for a negative number is to find nth previous
checkout in the reflog of the HEAD to determine the name of the
branch the user was on. We would want to find the nth most recent
reflog entry that matches "checkout: moving from X to Y" for this.
Unfortunately, reflog is implemented as an append-only file, and the
API to iterate over its entries, for_each_reflog_ent(), reads the
file in order, giving the entries from the oldest to newer. For the
purpose of finding nth most recent one, this API forces us to record
the last n entries in a rotating buffer and give the result out only
after we read everything. To optimize for a common case of finding
the nth most recent one for a small value of n, we also have a side
API for_each_recent_reflog_ent() that starts reading near the end of
the file, but it still has to read the entries in the "wrong" order.
The implementation of understanding @{-1} uses this interface.
This all becomes unnecessary if we add an API to let us iterate over
reflog entries in the reverse order, from the newest to older.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Split the logic that takes a single line of reflog entry in a
strbuf, parses the message, and calls the callback function out of
the loop into a separate helper function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the server side to redact the refs/ namespace it shows to the
client.
Will merge to 'master'.
* jc/hidden-refs:
upload/receive-pack: allow hiding ref hierarchies
upload-pack: simplify request validation
upload-pack: share more code
A repository may have refs that are only used for its internal
bookkeeping purposes that should not be exposed to the others that
come over the network.
Teach upload-pack to omit some refs from its initial advertisement
by paying attention to the uploadpack.hiderefs multi-valued
configuration variable. Do the same to receive-pack via the
receive.hiderefs variable. As a convenient short-hand, allow using
transfer.hiderefs to set the value to both of these variables.
Any ref that is under the hierarchies listed on the value of these
variable is excluded from responses to requests made by "ls-remote",
"fetch", etc. (for upload-pack) and "push" (for receive-pack).
Because these hidden refs do not count as OUR_REF, an attempt to
fetch objects at the tip of them will be rejected, and because these
refs do not get advertised, "git push :" will not see local branches
that have the same name as them as "matching" ones to be sent.
An attempt to update/delete these hidden refs with an explicit
refspec, e.g. "git push origin :refs/hidden/22", is rejected. This
is not a new restriction. To the pusher, it would appear that there
is no such ref, so its push request will conclude with "Now that I
sent you all the data, it is time for you to update the refs. I saw
that the ref did not exist when I started pushing, and I want the
result to point at this commit". The receiving end will apply the
compare-and-swap rule to this request and rejects the push with
"Well, your update request conflicts with somebody else; I see there
is such a ref.", which is the right thing to do. Otherwise a push to
a hidden ref will always be "the last one wins", which is not a good
default.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify ref_entry_cmp_sslice() by using strncmp() to compare the
length-limited key and a NUL-terminated entry. While we're at it,
retain the const attribute of the input pointers.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-refs" that ran in parallel to another process that created
new refs had a nasty race.
* jk/repack-ref-racefix:
refs: do not use cached refs in repack_without_ref
When we delete a ref that is packed, we rewrite the whole
packed-refs file and simply omit the ref that no longer
exists. However, we base the rewrite on whatever happens to
be in our refs cache, not what is necessarily on disk. That
opens us up to a race condition if another process is
simultaneously packing the refs, as we will overwrite their
newly-made pack-refs file with our potentially stale data,
losing commits.
You can demonstrate the race like this:
# setup some repositories
git init --bare parent &&
(cd parent && git config core.logallrefupdates true) &&
git clone parent child &&
(cd child && git commit --allow-empty -m base)
# in one terminal, repack the refs repeatedly
cd parent &&
while true; do
git pack-refs --all
done
# in another terminal, simultaneously push updates to
# master, and create and delete an unrelated ref
cd child &&
while true; do
git push origin HEAD:newbranch &&
git commit --allow-empty -m foo
us=`git rev-parse master` &&
git push origin master &&
git push origin :newbranch &&
them=`git --git-dir=../parent rev-parse master` &&
if test "$them" != "$us"; then
echo >&2 "$them" != "$us"
exit 1
fi
done
In many cases the two processes will conflict over locking
the packed-refs file, and the deletion of newbranch will
simply fail. But eventually you will hit the race, which
happens like this:
1. We push a new commit to master. It is already packed
(from the looping pack-refs call). We write the new
value (let us call it B) to $GIT_DIR/refs/heads/master,
but the old value (call it A) remains in the
packed-refs file.
2. We push the deletion of newbranch, spawning a
receive-pack process. Receive-pack advertises all refs
to the client, causing it to iterate over each ref; it
caches the packed refs in memory, which points at the
stale value A.
3. Meanwhile, a separate pack-refs process is running. It
runs to completion, updating the packed-refs file to
point master at B, and deleting $GIT_DIR/refs/heads/master
which also pointed at B.
4. Back in the receive-pack process, we get the
instruction to delete :newbranch. We take a lock on
packed-refs (which works, as the other pack-refs
process has already finished). We then rewrite the
contents using the cached refs, which contain the stale
value A.
The resulting packed-refs file points master once again at
A. The loose ref which would override it to point at B was
deleted (rightfully) in step 3. As a result, master now
points at A. The only trace that B ever existed in the
parent is in the reflog: the final entry will show master
moving from A to B, even though the ref still points at A
(so you can detect this race after the fact, because the
next reflog entry will move from A to C).
We can fix this by invalidating the packed-refs cache after
we have taken the lock. This means that we will re-read the
packed-refs file, and since we have the lock, we will be
sure that what we read will be atomically up-to-date when we
write (it may be out of date with respect to loose refs, but
that is OK, as loose refs take precedence).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"update-ref -d --deref SYM" to delete a ref through a symbolic ref
that points to it did not remove it correctly.
* jh/update-ref-d-through-symref:
Fix failure to delete a packed ref through a symref
t1400-update-ref: Add test verifying bug with symrefs in delete_ref()
When "update-ref -d --no-deref SYM" tried to delete a symbolic ref
SYM, it incorrectly locked the underlying reference pointed by SYM,
not the symbolic ref itself.
* rs/lock-correct-ref-during-delete:
refs: lock symref that is to be deleted, not its target
When deleting a ref through a symref (e.g. using 'git update-ref -d HEAD'
to delete refs/heads/master), we would remove the loose ref, but a packed
version of the same ref would remain, the end result being that instead of
deleting refs/heads/master we would appear to reset it to its state as of
the last repack.
This patch fixes the issue, by making sure we pass the correct ref name
when invoking repack_without_ref() from within delete_ref().
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When delete_ref is called on a symref then it locks its target and then
either deletes the target or the symref, depending on whether the flag
REF_NODEREF was set in the parameter delopt.
Instead, simply pass the flag to lock_ref_sha1_basic, which will then
either lock the target or the symref, and delete the locked ref.
This reimplements part of eca35a25 (Fix git branch -m for symrefs.).
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of peel_ref is to dereference tags; if the base
object is not a tag, then we can return early without even
loading the object into memory.
This patch accomplishes that by checking sha1_object_info
for the type. For a packed object, we can get away with just
looking in the pack index. For a loose object, we only need
to inflate the first couple of header bytes.
This is a bit of a gamble; if we do find a tag object, then
we will end up loading the content anyway, and the extra
lookup will have been wasteful. However, if it is not a tag
object, then we save loading the object entirely. Depending
on the ratio of non-tags to tags in the input, this can be a
minor win or minor loss.
However, it does give us one potential major win: if a ref
points to a large blob (e.g., via an unannotated tag), then
we can avoid looking at it entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The idea of the peel_ref function is to dereference tag
objects recursively until we hit a non-tag, and return the
sha1. Conceptually, it should return 0 if it is successful
(and fill in the sha1), or -1 if there was nothing to peel.
However, the current behavior is much more confusing. For a
regular loose ref, the behavior is as described above. But
there is an optimization to reuse the peeled-ref value for a
ref that came from a packed-refs file. If we have such a
ref, we return its peeled value, even if that peeled value
is null (indicating that we know the ref definitely does
_not_ peel).
It might seem like such information is useful to the caller,
who would then know not to bother loading and trying to peel
the object. Except that they should not bother loading and
trying to peel the object _anyway_, because that fallback is
already handled by peel_ref. In other words, the whole point
of calling this function is that it handles those details
internally, and you either get a sha1, or you know that it
is not peel-able.
This patch catches the null sha1 case internally and
converts it into a -1 return value (i.e., there is nothing
to peel). This simplifies callers, which do not need to
bother checking themselves.
Two callers are worth noting:
- in pack-objects, a comment indicates that there is a
difference between non-peelable tags and unannotated
tags. But that is not the case (before or after this
patch). Whether you get a null sha1 has to do with
internal details of how peel_ref operated.
- in show-ref, if peel_ref returns a failure, the caller
tries to decide whether to try peeling manually based on
whether the REF_ISPACKED flag is set. But this doesn't
make any sense. If the flag is set, that does not
necessarily mean the ref came from a packed-refs file
with the "peeled" extension. But it doesn't matter,
because even if it didn't, there's no point in trying to
peel it ourselves, as peel_ref would already have done
so. In other words, the fallback peeling is guaranteed
to fail.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are asked to peel a ref to a sha1, we internally call
deref_tag, which will recursively parse each tagged object
until we reach a non-tag. This has the benefit that we will
verify our ability to load and parse the pointed-to object.
However, there is a performance downside: we may not need to
load that object at all (e.g., if we are listing peeled
simply listing peeled refs), or it may be a large object
that should follow a streaming code path (e.g., an annotated
tag of a large blob).
It makes more sense for peel_ref to choose the fast thing
rather than performing the extra check, for two reasons:
1. We will already sometimes short-circuit the tag parsing
in favor of a peeled entry from a packed-refs file. So
we are already favoring speed in some cases, and it is
not wise for a caller to rely on peel_ref to detect
corruption.
2. We already silently ignore much larger corruptions,
like a ref that points to a non-existent object, or a
tag object that exists but is corrupted.
2. peel_ref is not the right place to check for such a
database corruption. It is returning only the sha1
anyway, not the actual object. Any callers which use
that sha1 to load an object will soon discover the
corruption anyway, so we are really just pushing back
the discovery to later in the program.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid unnecessary temporary allocations while looking for matching refs
inside refs API.
By René Scharfe (3) and Junio C Hamano (1)
* rs/refs-string-slice:
refs: do not create ref_entry when searching
refs: use strings directly in find_containing_dir()
refs: convert parameter of create_dir_entry() to length-limited string
refs: convert parameter of search_ref_dir() to length-limited string
The code to lazily read loose refs unnecessarily read the refs in a
subhierarchy by mistake when we free the data for the subhierarchy.
By Michael Haggerty
* mh/ref-api-lazy-loose:
free_ref_entry(): do not trigger reading of loose refs
The old code allowed many references to be efficiently added to a
single directory, because it just appended the references to the
containing directory unsorted without doing any searching (and
therefore without requiring any intermediate sorting). But the old
code was inefficient when a large number of subdirectories were added
to a directory, because the directory always had to be searched to see
if the new subdirectory already existed, and this search required the
directory to be sorted first. The same was repeated for every new
subdirectory, so the time scaled like O(N^2), where N is the number of
subdirectories within a single directory.
In practice, references are often added to the ref_cache in
lexicographic order, for example when reading the packed-refs file.
So build some intelligence into add_entry_to_dir() to optimize for the
case of references and/or subdirectories being added in lexicographic
order: if the existing entries were already sorted, and the new entry
comes after the last existing entry, then adjust ref_dir::sorted to
reflect the fact that the ref_dir is still sorted.
Thanks to Peff for pointing out the performance regression that
inspired this change.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The search_ref_dir() function is about looking up an existing ref_entry in
a sorted array of ref_entry stored in dir->entries, but it still allocates
a new ref_entry and frees it before returning. This is only because the
call to bsearch(3) was coded in a suboptimal way. Unlike the comparison
function given to qsort(3), the first parameter to its comparison function
does not need to point at an object that is shaped like an element in the
array.
Introduce a new comparison function that takes a counted string as the key
and an element in an array of ref_entry and give it to bsearch(), so that
we do not have to allocate a new ref_entry that we will never return to
the caller anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the parameter subdirname of search_for_subdir() to a
length-limted string and then simply pass the interesting slice of the
refname from find_containing_dir(), thereby avoiding to duplicate the
string.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do not call get_ref_dir() from within free_ref_entry(), because that
triggers the reading of loose refs, only for them to be freed
immediately.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refs API is updated to lazily read sub-hierarchies of refs/ namespace,
so that we do not have to grab everything from the filesystem when we
are only interested in listing branches, for example.
By Michael Haggerty (17) and Junio C Hamano (1)
* mh/ref-api-lazy-loose:
refs: fix find_containing_dir() regression
refs: read loose references lazily
read_loose_refs(): eliminate ref_cache argument
struct ref_dir: store a reference to the enclosing ref_cache
search_for_subdir(): return (ref_dir *) instead of (ref_entry *)
get_ref_dir(): add function for getting a ref_dir from a ref_entry
read_loose_refs(): rename function from get_ref_dir()
refs: wrap top-level ref_dirs in ref_entries
find_containing_dir(): use strbuf in implementation of this function
bisect: copy filename string obtained from git_path()
do_for_each_reflog(): use a strbuf to hold logfile name
do_for_each_reflog(): return early on error
get_ref_dir(): take the containing directory as argument
refs.c: extract function search_for_subdir()
get_ref_dir(): require that the dirname argument ends in '/'
get_ref_dir(): rename "base" parameter to "dirname"
get_ref_dir(): use a strbuf to hold refname
get_ref_dir(): return early if directory cannot be read
The function used to return NULL when asked to find the containing
directory for a ref that does not exist, allowing the caller to
omit iteration altogether. But a misconversion in an earlier change
"refs.c: extract function search_for_subdir()" started returning the
top-level directory entry, forcing callers to walk everything.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of reading the whole directory of loose references the first
time any are needed, only read them on demand, one directory at a
time.
Use a new ref_entry flag bit REF_INCOMPLETE to indicate that the entry
represents a REF_DIR that hasn't been read yet. Whenever any entries
from such a directory are needed, read all of the loose references
from that directory.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ref_cache can now be read from the ref_dir.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This means that a directory ref_entry contains all of the information
needed by read_loose_refs().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That is what all the callers want, so give it to them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert all accesses of a ref_dir within a ref_entry to use this
function. This function will later be responsible for reading loose
references from disk on demand.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new name better describes the function's purpose, and also makes
the old name available for a more suitable purpose.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it turtles all the way down. This affects the loose and packed
fields of ref_cache instances.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This simplifies the bookkeeping and allows an (artificial) restriction
on refname component length to be removed.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, the "dir" argument to get_ref_dir() was a pointer to the
top-level ref_dir. Change the function to expect a pointer to the
ref_dir corresponding to dirname. This allows entries to be added
directly to dir, without having to recurse through the reference trie
each time (i.e., we can use add_entry_to_dir() instead of add_ref()).
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This removes some conditional code and makes it consistent with the
way that direntry names are stored. Please note that this function is
never used on the top-level .git directory; it is always called for
directories at level .git/refs or deeper.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>