Change typedef each_ref_fn to take a "const struct object_id *oid"
parameter instead of "const unsigned char *sha1".
To aid this transition, implement an adapter that can be used to wrap
old-style functions matching the old typedef, which is now called
"each_ref_sha1_fn"), and make such functions callable via the new
interface. This requires the old function and its cb_data to be
wrapped in a "struct each_ref_fn_sha1_adapter", and that object to be
used as the cb_data for an adapter function, each_ref_fn_adapter().
This is an enormous diff, but most of it consists of simple,
mechanical changes to the sites that call any of the "for_each_ref"
family of functions. Subsequent to this change, the call sites can be
rewritten one by one to use the new interface.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In everything_local(), we used to assign the current ref's value
found in ref->old_sha1 to ref->new_sha1 when we already have all the
necessary objects to complete the history leading to that
commit. This copying was broken at 49bb805e (Do not ask for
objects known to be complete., 2005-10-19) and ever since we
instead stuffed a random bytes in ref->new_sha1 here. No
code complained or failed due to this breakage.
It turns out that no code path that comes after this
assignment even looks at ref->new_sha1 at all.
- The only caller of everything_local(), do_fetch_pack(),
returns this list of refs, whose element has bogus
new_sha1 values, to its caller. It does not look at the
elements itself, but does pass them to find_common, which
looks only at the name and old_sha1 fields.
- The only caller of do_fetch_pack(), fetch_pack(), returns this
list to its caller. It does not look at the elements nor act on
them.
- One of the two callers of fetch_pack() is cmd_fetch_pack(), the
top-level that implements "git fetch-pack". The only thing it
looks at in the elements of the returned ref list is the old_sha1
and name fields.
- The other caller of fetch_pack() is fetch_refs_via_pack() in the
transport layer, which is a helper that implements "git fetch".
It only cares about whether the returned list is empty (i.e.
failed to fetch anything).
Just drop the bogus assignment, that is not even necessary. The
remote-tracking refs are updated based on a different list and not
using the ref list being manipulated by this code path; the caller
do_fetch_pack() created a copy of that real ref list and passed the
copy down to this function, and modifying the elements here does not
affect anything.
Noticed-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the server supports allow_tip_sha1_in_want, we add any
unmatched raw-sha1 entries in our "sought" list of refs to
the list of refs we will ask the other side for. We do so by
inserting the original "struct ref" directly into our list,
rather than making a copy. This has several problems.
The most minor problem is that one cannot ever free the
resulting list; it contains structs that are copies of the
remote refs (made earlier by fetch_pack) along with sought
refs that are referenced elsewhere.
But more importantly that we set the ref->next pointer to
NULL, chopping off the remainder of any existing list that
the ref was a part of. We get the set of "sought" refs in
an array rather than a linked list, but that array is often
in turn generated from a list. The test modification in
t5516 demonstrates this. Rather than fetching just an exact
sha1, we fetch that sha1 plus another ref:
- we build a linked list of refs to fetch when do_fetch
calls get_ref_map; the exact sha1 is first, followed by
the named ref ("refs/heads/extra" in this case).
- we pass that linked list to transport_fetch_ref, which
squashes it into an array of pointers
- that array goes to fetch_pack, which calls filter_ref.
There we generate the want list from a mix of what the
remote side has advertised, and the "sought" entry for
the exact sha1. We set the sought entry's "next" pointer
to NULL.
- after we return from transport_fetch_refs, we then try
to update the refs by following the linked list. But our
list is now truncated, and we do not update
refs/heads/extra at all.
We can fix this by making a copy of the ref. There's nothing
that fetch_pack does to it that must be reflected in the
original "sought" list (and indeed, if that were the case we
would have a serious bug, because it is only exact-sha1
entries which are treated this way).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the server supports allow_tip_sha1_in_want, then
fetch-pack's filter_refs function tries to check whether a
ref is a request for a straight sha1 by running:
if (get_sha1_hex(ref->name, ref->old_sha1))
...
I.e., we are using get_sha1_hex to ask "is this ref name a
sha1?". If it is true, then the contents of ref->old_sha1
will end up unchanged. But if it is false, then get_sha1_hex
makes no guarantees about what it has written. With a ref
name like "abcdefoo", we would overwrite 3 bytes of
ref->old_sha1 before realizing that it was not a sha1.
This is likely not a problem in practice, as anything in
refs->name (besides a sha1) will start with "refs/", meaning
that we would notice on the first character that there is a
problem. Still, we are making assumptions about the state
left in the output when get_sha1_hex returns an error (e.g.,
it could start from the end of the string, or error check
the values only once they were placed in the output). It's
better to be defensive.
We could just check that we have exactly 40 characters of
sha1. But let's be even more careful and make sure that we
have a 40-char hex refname that matches what is in old_sha1.
This is perhaps overly defensive, but spells out our
assumptions clearly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most struct child_process variables are cleared using memset first after
declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead. That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_*()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/skip-prefix:
http-push: refactor parsing of remote object names
imap-send: use skip_prefix instead of using magic numbers
use skip_prefix to avoid repeated calculations
git: avoid magic number with skip_prefix
fetch-pack: refactor parsing in get_ack
fast-import: refactor parsing of spaces
stat_opt: check extra strlen call
daemon: use skip_prefix to avoid magic numbers
fast-import: use skip_prefix for parsing input
use skip_prefix to avoid repeating strings
use skip_prefix to avoid magic numbers
transport-helper: avoid reading past end-of-string
fast-import: fix read of uninitialized argv memory
apply: use skip_prefix instead of raw addition
refactor skip_prefix to return a boolean
avoid using skip_prefix as a boolean
daemon: mark some strings as const
parse_diff_color_slot: drop ofs parameter
There are several uses of the magic number "line+45" when
parsing ACK lines from the server, and it's rather unclear
why 45 is the correct number. We can make this more clear by
keeping a running pointer as we parse, using skip_prefix to
jump past the first "ACK ", then adding 40 to jump past
get_sha1_hex (which is still magical, but hopefully 40 is
less magical to readers of git code).
Note that this actually puts us at line+44. The original
required some character between the sha1 and further ACK
flags (it is supposed to be a space, but we never enforced
that). We start our search for flags at line+44, which
meanas we are slightly more liberal than the old code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's a common idiom to match a prefix and then skip past it
with a magic number, like:
if (starts_with(foo, "bar"))
foo += 3;
This is easy to get wrong, since you have to count the
prefix string yourself, and there's no compiler check if the
string changes. We can use skip_prefix to avoid the magic
numbers here.
Note that some of these conversions could be much shorter.
For example:
if (starts_with(arg, "--foo=")) {
bar = arg + 6;
continue;
}
could become:
if (skip_prefix(arg, "--foo=", &bar))
continue;
However, I have left it as:
if (skip_prefix(arg, "--foo=", &v)) {
bar = v;
continue;
}
to visually match nearby cases which need to actually
process the string. Like:
if (skip_prefix(arg, "--foo=", &v)) {
bar = atoi(v);
continue;
}
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert three cases of checking for a constant prefix using memcmp() to
starts_with(). This way there is no need for magic string length
constants and we avoid running over the end of the string should it be
shorter than the prefix.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Attempts to show where a single-strand-of-pearls break in "git log"
output.
* nd/log-show-linear-break:
log: add --show-linear-break to help see non-linear history
object.h: centralize object flag allocation
While the field "flags" is mainly used by the revision walker, it is
also used in many other places. Centralize the whole flag allocation to
one place for a better overview (and easier to move flags if we have
too).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Serving objects from a shallow repository needs to write a
new file to hold the temporary shallow boundaries but it was not
cleaned when we exit due to die() or a signal.
* jk/shallow-update-fix:
shallow: verify shallow file after taking lock
shallow: automatically clean up shallow tempfiles
shallow: use stat_validity to check for up-to-date file
We sometimes write tempfiles of the form "shallow_XXXXXX"
during fetch/push operations with shallow repositories.
Under normal circumstances, we clean up the result when we
are done. However, we do no take steps to clean up after
ourselves when we exit due to die() or signal death.
This patch teaches the tempfile creation code to register
handlers to clean up after ourselves. To handle this, we
change the ownership semantics of the filename returned by
setup_temporary_shallow. It now keeps a copy of the filename
itself, and returns only a const pointer to it.
We can also do away with explicit tempfile removal in the
callers. They all exit not long after finishing with the
file, so they can rely on the auto-cleanup, simplifying the
code.
Note that we keep things simple and maintain only a single
filename to be cleaned. This is sufficient for the current
caller, but we future-proof it with a die("BUG").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In smart http, upload-pack adds new shallow lines at the beginning of
each rpc response. Only shallow lines from the first rpc call are
useful. After that they are thrown away. It's designed this way
because upload-pack is stateless and has no idea when its shallow
lines are helpful or not.
So after refs are negotiated with multi_ack_detailed and the server
thinks it learned enough, it sends "ACK obj-id ready", terminates the
rpc call and waits for the final rpc round. The client sends "done".
The server sends another response, which also has shallow lines at
the beginning, and the last "ACK obj-id" line.
When no-done is active, the last round is cut out, the server sends
"ACK obj-id ready" and "ACK obj-id" in the same rpc
response. fetch-pack is updated to recognize this and not send
"done". However it still tries to consume shallow lines, which are
never sent.
Update the code, make sure to skip consuming shallow lines when
no-done is enabled.
Reported-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone" would fail to clone from a repository that has a ref
directly under "refs/", e.g. "refs/stash", because different
validation paths do different things on such a refname. Loosen the
client side's validation to allow such a ref.
* jk/allow-fetch-onelevel-refname:
fetch-pack: do not filter out one-level refs
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).
* nd/shallow-clone: (31 commits)
t5537: fix incorrect expectation in test case 10
shallow: remove unused code
send-pack.c: mark a file-local function static
git-clone.txt: remove shallow clone limitations
prune: clean .git/shallow after pruning objects
clone: use git protocol for cloning shallow repo locally
send-pack: support pushing from a shallow clone via http
receive-pack: support pushing to a shallow clone via http
smart-http: support shallow fetch/clone
remote-curl: pass ref SHA-1 to fetch-pack as well
send-pack: support pushing to a shallow clone
receive-pack: allow pushes that update .git/shallow
connected.c: add new variant that runs with --shallow-file
add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
receive/send-pack: support pushing from a shallow clone
receive-pack: reorder some code in unpack()
fetch: add --update-shallow to accept refs that update .git/shallow
upload-pack: make sure deepening preserves shallow roots
fetch: support fetching from a shallow repository
clone: support remote shallow repository
...
Currently fetching a one-level ref like "refs/foo" does not
work consistently. The outer "git fetch" program filters the
list of refs, checking each against check_refname_format.
Then it feeds the result to do_fetch_pack to actually
negotiate the haves/wants and get the pack. The fetch-pack
code does its own filter, and it behaves differently.
The fetch-pack filter looks for refs in "refs/", and then
feeds everything _after_ the slash (i.e., just "foo") into
check_refname_format. But check_refname_format is not
designed to look at a partial refname. It complains that the
ref has only one component, thinking it is at the root
(i.e., alongside "HEAD"), when in reality we just fed it a
partial refname.
As a result, we omit a ref like "refs/foo" from the pack
request, even though "git fetch" then tries to store the
resulting ref. If we happen to get the object anyway (e.g.,
because the ref is contained in another ref we are
fetching), then the fetch succeeds. But if it is a unique
object, we fail when trying to update "refs/foo".
We can fix this by just passing the whole refname into
check_refname_format; we know the part we were omitting is
"refs/", which is acceptable in a refname. This at least
makes the checks consistent with each other.
This problem happens most commonly with "refs/stash", which
is the only one-level ref in wide use. However, our test
does not use "refs/stash", as we may later want to restrict
it specifically (not because it is one-level, but because
of the semantics of stashes).
We may also want to do away with the multiple levels of
filtering (which can cause problems when they are out of
sync), or even forbid one-level refs entirely. However,
those decisions can come later; this fixes the most
immediate problem, which is the mismatch between the two.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 58babfff ("shallow.c: the 8 steps to select new commits for
.git/shallow", 05-12-2013) added a function to implement step 5 of
the quoted eight steps, namely 'remove_nonexistent_ours_in_pack()'.
This function implements an optional optimization step in the new
shallow commit selection algorithm. However, this function has no
callers. (The commented out call sites would need to change, in
order to provide information required by the function.)
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove a few duplicate implementations of prefix/suffix comparison
functions, and rename them to starts_with and ends_with.
* cc/starts-n-ends-with:
replace {pre,suf}fixcmp() with {starts,ends}_with()
strbuf: introduce starts_with() and ends_with()
builtin/remote: remove postfixcmp() and use suffixcmp() instead
environment: normalize use of prefixcmp() by removing " != 0"
The same steps are done as in when --update-shallow is not given. The
only difference is we now add all shallow commits in "ours" and
"theirs" to .git/shallow (aka "step 8").
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch just put together pieces from the 8 steps patch. We stop at
step 7 and reject refs that require new shallow commits.
Note that, by rejecting refs that require new shallow commits, we
leave dangling objects in the repo, which become "object islands" by
the next "git fetch" of the same source.
If the first fetch our "ours" set is zero and we do practically
nothing at step 7, "ours" is full at the next fetch and we may need to
walk through commits for reachability test. Room for improvement.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Cloning from a shallow repository does not follow the "8 steps for new
.git/shallow" because if it does we need to get through step 6 for all
refs. That means commit walking down to the bottom.
Instead the rule to create .git/shallow is simpler and, more
importantly, cheap: if a shallow commit is found in the pack, it's
probably used (i.e. reachable from some refs), so we add it. Others
are dropped.
One may notice this method seems flawed by the word "probably". A
shallow commit may not be reachable from any refs at all if it's
attached to an object island (a group of objects that are not
reachable by any refs).
If that object island is not complete, a new fetch request may send
more objects to connect it to some ref. At that time, because we
incorrectly installed the shallow commit in this island, the user will
not see anything after that commit (fsck is still ok). This is not
desired.
Given that object islands are rare (C Git never sends such islands for
security reasons) and do not really harm the repository integrity, a
tradeoff is made to surprise the user occasionally but work faster
everyday.
A new option --strict could be added later that follows exactly the 8
steps. "git prune" can also learn to remove dangling objects _and_ the
shallow commits that are attached to them from .git/shallow.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Leaving only the function definitions and declarations so that any
new topic in flight can still make use of the old functions, replace
existing uses of the prefixcmp() and suffixcmp() with new API
functions.
The change can be recreated by mechanically applying this:
$ git grep -l -e prefixcmp -e suffixcmp -- \*.c |
grep -v strbuf\\.c |
xargs perl -pi -e '
s|!prefixcmp\(|starts_with\(|g;
s|prefixcmp\(|!starts_with\(|g;
s|!suffixcmp\(|ends_with\(|g;
s|suffixcmp\(|!ends_with\(|g;
'
on the result of preparatory changes in this series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/robustify-parse-commit:
checkout: do not die when leaving broken detached HEAD
use parse_commit_or_die instead of custom message
use parse_commit_or_die instead of segfaulting
assume parse_commit checks for NULL commit
assume parse_commit checks commit->object.parsed
log_tree_diff: die when we fail to parse a commit
The codepath that send_pack() calls pack_objects() mistakenly
closed the same file descriptor twice, leading to potentially
closing a wrong file descriptor that was opened in the meantime.
* jl/pack-transfer-avoid-double-close:
Clear fd after closing to avoid double-close error
The parse_commit function will check the "parsed" flag of
the object and do nothing if it is set. There is no need
for callers to check the flag themselves, and doing so only
clutters the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is no sufficient overlap between old and new history
during a "git fetch" into a shallow repository, objects that the
sending side knows the receiving end has were unnecessarily sent.
* nd/fetch-into-shallow:
Add testcase for needless objects during a shallow fetch
list-objects: mark more commits as edges in mark_edges_uninteresting
list-objects: reduce one argument in mark_edges_uninteresting
upload-pack: delegate rev walking in shallow fetch to pack-objects
shallow: add setup_temporary_shallow()
shallow: only add shallow graft points to new shallow file
move setup_alternate_shallow and write_shallow_commits to shallow.c
In send_pack(), clear the fd passed to pack_objects() by setting
it to -1, since pack_objects() closes the fd (via a call to
run_command()). Likewise, in get_pack(), clear the fd passed to
run_command().
Not doing so risks having git_transport_push(), caller of
send_pack(), closing the fd again, possibly incorrectly closing
some other open file; or similarly with fetch_refs_from_pack(),
indirect caller of get_pack().
Signed-off-by: Jens Lindström <jl@opera.com>
Acked-by: Jeff King <peff@peff.net>
Acked-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When there is no sufficient overlap between old and new history
during a fetch into a shallow repository, we unnecessarily sent
objects the sending side knows the receiving end has.
* nd/fetch-into-shallow:
Add testcase for needless objects during a shallow fetch
list-objects: mark more commits as edges in mark_edges_uninteresting
list-objects: reduce one argument in mark_edges_uninteresting
upload-pack: delegate rev walking in shallow fetch to pack-objects
shallow: add setup_temporary_shallow()
shallow: only add shallow graft points to new shallow file
move setup_alternate_shallow and write_shallow_commits to shallow.c
When --shallow-file is added to the command line, it has to be
before the subcommand name, the first argument won't be the command
name any more. Stop assuming that and keep track of the command name
explicitly.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow a safer "rewind of the remote tip" push than blind "--force",
by requiring that the overwritten remote ref to be unchanged since
the new history to replace it was prepared.
The machinery is more or less ready. The "--force" option is again
the big red button to override any safety, thanks to J6t's sanity
(the original round allowed --lockref to defeat --force).
The logic to choose the default implemented here is fragile
(e.g. "git fetch" after seeing a failure will update the
remote-tracking branch and will make the next "push" pass,
defeating the safety pretty easily). It is suitable only for the
simplest workflows, and it may hurt users more than it helps them.
* jc/push-cas:
push: teach --force-with-lease to smart-http transport
send-pack: fix parsing of --force-with-lease option
t5540/5541: smart-http does not support "--force-with-lease"
t5533: test "push --force-with-lease"
push --force-with-lease: tie it all together
push --force-with-lease: implement logic to populate old_sha1_expect[]
remote.c: add command line option parser for "--force-with-lease"
builtin/push.c: use OPT_BOOL, not OPT_BOOLEAN
cache.h: move remote/connect API out of it
The recent "short-cut clone connectivity check" topic broke a shallow
repository when a fetch operation tries to auto-follow tags.
* nd/fetch-pack-shallow-fix:
fetch-pack: do not remove .git/shallow file when --depth is not specified
The recent "short-cut clone connectivity check" topic broke a
shallow repository when a fetch operation tries to auto-follow tags.
* nd/fetch-pack-shallow-fix:
fetch-pack: do not remove .git/shallow file when --depth is not specified
fetch_pack() can remove .git/shallow file when a shallow repository
becomes a full one again. This behavior is triggered incorrectly when
tags are also fetched because fetch_pack() will be called twice. At
the first fetch_pack() call:
- shallow_lock is set up
- alternate_shallow_file points to shallow_lock.filename, which is
"shallow.lock"
- commit_lock_file is called, which sets shallow_lock.filename to "".
alternate_shallow_file also becomes "" because it points to the
same memory.
At the second call, setup_alternate_shallow() is not called and
alternate_shallow_file remains "". It's mistaken as unshallow case and
.git/shallow is removed. The end result is a broken repository.
Fix this by always initializing alternate_shallow_file when
fetch_pack() is called. As an extra measure, check if args->depth > 0
before commit/rollback shallow file.
Reported-by: Kacper Kornet <kornet@camk.edu.pl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The definition of "struct ref" in "cache.h", a header file so
central to the system, always confused me. This structure is not
about the local ref used by sha1-name API to name local objects.
It is what refspecs are expanded into, after finding out what refs
the other side has, to define what refs are updated after object
transfer succeeds to what values. It belongs to "remote.h" together
with "struct refspec".
While we are at it, also move the types and functions related to the
Git transport connection to a new header file connect.h
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call find_common to start finding common ancestors
with the remote side of a fetch, the first thing we do is
insert the tip of each ref into our rev_list linked list. We
keep the list sorted the whole time with
commit_list_insert_by_date, which means our insertion ends
up doing O(n^2) timestamp comparisons.
We could teach rev_list_push to use an unsorted list, and
then sort it once after we have added each ref. However, in
get_rev, we process the list by popping commits off the
front and adding parents back in timestamp-sorted order. So
that procedure would still operate on the large list.
Instead, we can replace the linked list with a heap-based
priority queue, which can do O(log n) insertion, making the
whole insertion procedure O(n log n).
As a result of switching to the prio_queue struct, we fix
two minor bugs:
1. When we "pop" a commit in get_rev, and when we clear
the rev_list in find_common, we do not take care to
free the "struct commit_list", and just leak its
memory. With the prio_queue implementation, the memory
management is handled for us.
2. In get_rev, we look at the head commit of the list,
possibly push its parents onto the list, and then "pop"
the front of the list off, assuming it is the same
element that we just peeked at. This is typically going
to be the case, but would not be in the face of clock
skew: the parents are inserted by date, and could
potentially be inserted at the head of the list if they
have a timestamp newer than their descendent. In this
case, we would accidentally pop the parent, and never
process it at all.
The new implementation pulls the commit off of the
queue as we examine it, and so does not suffer from
this problem.
With this patch, a fetch of a single commit into a
repository with 50,000 refs went from:
real 0m7.984s
user 0m7.852s
sys 0m0.120s
to:
real 0m2.017s
user 0m1.884s
sys 0m0.124s
Before this patch, a larger case with 370K refs still had
not completed after tens of minutes; with this patch, it
completes in about 12 seconds.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We insert the commit pointed to by each ref one-by-one into
the "complete" commit_list using insert_by_date. Because
each insertion is O(n), we end up with O(n^2) behavior.
This typically doesn't matter, because the number of refs is
reasonably small. And even if there are a lot of refs, they
often point to a smaller set of objects (in which case the
optimization in commit ea5f220 keeps our "n" small).
However, in pathological repositories (hundreds of thousands
of refs, each pointing to a unique commit), this quadratic
behavior can make a difference. Since we do not care about
the list order until we have finished building it, we can
simply keep it unsorted during the insertion phase, then
sort it afterwards.
On a repository like the one described above, this dropped
the time to do a no-op fetch from 2.0s to 1.7s. On normal
repositories, it probably does not matter at all, but it
does not hurt to protect ourselves from pathological cases.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to make sure the cloned repository is good, we run "rev-list
--objects --not --all $new_refs" on the repository. This is expensive
on large repositories. This patch attempts to mitigate the impact in
this special case.
In the "good" clone case, we only have one pack. If all of the
following are met, we can be sure that all objects reachable from the
new refs exist, which is the intention of running "rev-list ...":
- all refs point to an object in the pack
- there are no dangling pointers in any object in the pack
- no objects in the pack point to objects outside the pack
The second and third checks can be done with the help of index-pack as
a slight variation of --strict check (which introduces a new condition
for the shortcut: pack transfer must be used and the number of objects
large enough to call index-pack). The first is checked in
check_everything_connected after we get an "ok" from index-pack.
"index-pack + new checks" is still faster than the current "index-pack
+ rev-list", which is the whole point of this patch. If any of the
conditions fail, we fall back to the good old but expensive "rev-list
..". In that case it's even more expensive because we have to pay for
the new checks in index-pack. But that should only happen when the
other side is either buggy or malicious.
Cloning linux-2.6 over file://
before after
real 3m25.693s 2m53.050s
user 5m2.037s 4m42.396s
sys 0m13.750s 0m16.574s
A more realistic test with ssh:// over wireless
before after
real 11m26.629s 10m4.213s
user 5m43.196s 5m19.444s
sys 0m35.812s 0m37.630s
This shortcut is not applied to shallow clones, partly because shallow
clones should have no more objects than a usual fetch and the cost of
rev-list is acceptable, partly to avoid dealing with corner cases when
grafting is involved.
This shortcut does not apply to unpack-objects code path either
because the number of objects must be small in order to trigger that
code path.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
index-pack --strict looks up and follows parent commits. If shallow
information is not ready by the time index-pack is run, index-pack may
be led to non-existent objects. Make fetch-pack save shallow file to
disk before invoking index-pack.
git learns new global option --shallow-file to pass on the alternate
shallow file path. Undocumented (and not even support --shallow-file=
syntax) because it's unlikely to be used again elsewhere.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Clean up pkt-line API, implementation and its callers to make them
more robust.
* jk/pkt-line-cleanup:
do not use GIT_TRACE_PACKET=3 in tests
remote-curl: always parse incoming refs
remote-curl: move ref-parsing code up in file
remote-curl: pass buffer straight to get_remote_heads
teach get_remote_heads to read from a memory buffer
pkt-line: share buffer/descriptor reading implementation
pkt-line: provide a LARGE_PACKET_MAX static buffer
pkt-line: move LARGE_PACKET_MAX definition from sideband
pkt-line: teach packet_read_line to chomp newlines
pkt-line: provide a generic reading function with options
pkt-line: drop safe_write function
pkt-line: move a misplaced comment
write_or_die: raise SIGPIPE when we get EPIPE
upload-archive: use argv_array to store client arguments
upload-archive: do not copy repo name
send-pack: prefer prefixcmp over memcmp in receive_status
fetch-pack: fix out-of-bounds buffer offset in get_ack
upload-pack: remove packet debugging harness
upload-pack: do not add duplicate objects to shallow list
upload-pack: use get_sha1_hex to parse "shallow" lines
Allows requests to fetch objects at any tip of refs (including
hidden ones). It seems that there may be use cases even outside
Gerrit (e.g. $gmane/215701).
* jc/fetch-raw-sha1:
fetch: fetch objects by their exact SHA-1 object names
upload-pack: optionally allow fetching from the tips of hidden refs
fetch: use struct ref to represent refs to be fetched
parse_fetch_refspec(): clarify the codeflow a bit
Most of the callers of packet_read_line just read into a
static 1000-byte buffer (callers which handle arbitrary
binary data already use LARGE_PACKET_MAX). This works fine
in practice, because:
1. The only variable-sized data in these lines is a ref
name, and refs tend to be a lot shorter than 1000
characters.
2. When sending ref lines, git-core always limits itself
to 1000 byte packets.
However, the only limit given in the protocol specification
in Documentation/technical/protocol-common.txt is
LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
pack-protocol.txt, and then only describing what we write,
not as a specific limit for readers.
This patch lets us bump the 1000-byte limit to
LARGE_PACKET_MAX. Even though git-core will never write a
packet where this makes a difference, there are two good
reasons to do this:
1. Other git implementations may have followed
protocol-common.txt and used a larger maximum size. We
don't bump into it in practice because it would involve
very long ref names.
2. We may want to increase the 1000-byte limit one day.
Since packets are transferred before any capabilities,
it's difficult to do this in a backwards-compatible
way. But if we bump the size of buffer the readers can
handle, eventually older versions of git will be
obsolete enough that we can justify bumping the
writers, as well. We don't have plans to do this
anytime soon, but there is no reason not to start the
clock ticking now.
Just bumping all of the reading bufs to LARGE_PACKET_MAX
would waste memory. Instead, since most readers just read
into a temporary buffer anyway, let's provide a single
static buffer that all callers can use. We can further wrap
this detail away by having the packet_read_line wrapper just
use the buffer transparently and return a pointer to the
static storage. That covers most of the cases, and the
remaining ones already read into their own LARGE_PACKET_MAX
buffers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The packets sent during ref negotiation are all terminated
by newline; even though the code to chomp these newlines is
short, we end up doing it in a lot of places.
This patch teaches packet_read_line to auto-chomp the
trailing newline; this lets us get rid of a lot of inline
chomping code.
As a result, some call-sites which are not reading
line-oriented data (e.g., when reading chunks of packfiles
alongside sideband) transition away from packet_read_line to
the generic packet_read interface. This patch converts all
of the existing callsites.
Since the function signature of packet_read_line does not
change (but its behavior does), there is a possibility of
new callsites being introduced in later commits, silently
introducing an incompatibility. However, since a later
patch in this series will change the signature, such a
commit would have to be merged directly into this commit,
not to the tip of the series; we can therefore ignore the
issue.
This is an internal cleanup and should produce no change of
behavior in the normal case. However, there is one corner
case to note. Callers of packet_read_line have never been
able to tell the difference between a flush packet ("0000")
and an empty packet ("0004"), as both cause packet_read_line
to return a length of 0. Readers treat them identically,
even though Documentation/technical/protocol-common.txt says
we must not; it also says that implementations should not
send an empty pkt-line.
By stripping out the newline before the result gets to the
caller, we will now treat the newline-only packet ("0005\n")
the same as an empty packet, which in turn gets treated like
a flush packet. In practice this doesn't matter, as neither
empty nor newline-only packets are part of git's protocols
(at least not for the line-oriented bits, and readers who
are not expecting line-oriented packets will be calling
packet_read directly, anyway). But even if we do decide to
care about the distinction later, it is orthogonal to this
patch. The right place to tighten would be to stop treating
empty packets as flush packets, and this change does not
make doing so any harder.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>