When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.
So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.
This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "struct name_path" data is examined in only two places:
we generate it in process_tree(), and we convert it to a
single string in path_name(). Everyone else just passes it
through to those functions.
We can further note that process_tree() already keeps a
single strbuf with the leading tree path, for use with
tree_entry_interesting().
Instead of building a separate name_path linked list, let's
just use the one we already build in "base". This reduces
the amount of code (especially tricky code in path_name()
which did not check for integer overflows caused by deep
or large pathnames).
It is also more efficient in some instances. Any time we
were using tree_entry_interesting, we were building up the
strbuf anyway, so this is an immediate and obvious win
there. In cases where we were not, we trade off storing
"pathname/" in a strbuf on the heap for each level of the
path, instead of two pointers and an int on the stack (with
one pointer into the tree object). On a 64-bit system, the
latter is 20 bytes; so if path components are less than that
on average, this has lower peak memory usage. In practice
it probably doesn't matter either way; we are already
holding in memory all of the tree objects leading up to each
pathname, and for normal-depth pathnames, we are only
talking about hundreds of bytes.
This patch leaves "struct name_path" as a thin wrapper
around the strbuf, to avoid disrupting callbacks. We should
fix them, but leaving it out makes this diff easier to view.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git rev-list" shows an object with its associated path
name, it does so by walking the name_path linked list and
printing each component (stopping at any embedded NULs or
newlines).
We'd like to eventually get rid of name_path entirely in
favor of a single buffer, and dropping this custom printing
code is part of that. As a first step, let's use path_name()
to format the list into a single buffer, and print that.
This is strictly less efficient than the original, but it's
a temporary step in the refactoring; our end game will be to
get the fully formatted name in the first place.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent "git prune" traverses young unreachable objects to safekeep
old objects in the reachability chain from them, which sometimes
caused error messages that are unnecessarily alarming.
* jk/squelch-missing-link-warning-for-unreachable:
suppress errors on missing UNINTERESTING links
silence broken link warnings with revs->ignore_missing_links
add quieter versions of parse_{tree,commit}
When we are traversing commit parents along the
UNINTERESTING side of a revision walk, we do not care if
the parent turns out to be missing. That lets us limit
traversals using unreachable and possibly incomplete
sections of history. However, we do still print error
messages about the missing commits; this patch suppresses
the error, as well.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We set revs->ignore_missing_links to instruct the
revision-walking machinery that we know the history graph
may be incomplete. For example, we use it when walking
unreachable but recent objects; we want to add what we can,
but it's OK if the history is incomplete.
However, we still print error messages for the missing
objects, which can be confusing. This is not an error, but
just a normal situation when transitioning from a repository
last pruned by an older git (which can leave broken segments
of history) to a more recent one (where we try to preserve
whole reachable segments).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-list --objects $old --not --all" to see if everything that
is reachable from $old is already connected to the existing refs
was very inefficient.
* jk/still-interesting:
limit_list: avoid quadratic behavior from still_interesting
When we are limiting a rev-list traversal due to
UNINTERESTING refs, we have to walk down the tips (both
interesting and uninteresting) to find where they intersect.
We keep a queue of commits to examine, pop commits off
the queue one by one, and potentially add their parents. The
size of the queue will naturally fluctuate based on the
"width" of the history graph; i.e., the number of
simultaneous lines of development. But for the most part it
will stay in the same ballpark as the initial number of tips
we fed, shrinking over time (as we hit common ancestors of
the tips). So roughly speaking, if we start with `N` tips,
we'll spend much of the time with a queue around `N` items.
For each UNINTERESTING commit we pop, we call
still_interesting to check whether marking its parents as
UNINTERESTING has made the whole queue uninteresting (in
which case we can quit early). Because the queue is stored
as a linked list, this is `O(N)`, where `N` is the number of
items in the queue. So processing a queue with `N` commits
marked UNINTERESTING (and one or more interesting commits)
will take `O(N^2)`.
If you feed a lot of positive tips, this isn't a problem.
They aren't UNINTERESTING, so they don't incur the
still_interesting check. It also isn't a problem if you
traverse from an interesting tip to some UNINTERESTING
bases. We order the queue by recency, so the interesting
commits stay at the front of the queue as we walk down them.
The linear check can exit early as soon as it sees one
interesting commit left in the queue.
But if you want to know whether an older commit is reachable
from a set of newer tips, we end up processing in the
opposite direction: from the UNINTERESTING ones down to the
interesting one. This may happen when we call:
git rev-list $commits --not --all
in check_everything_connected after a fetch. If we fetched
something much older than most of our refs, and if we have a
large number of refs, the traversal cost is dominated by the
quadratic behavior.
These commands simulate the connectivity check of such a
fetch, when you have `$n` distinct refs in the receiver:
# positive ref is 100,000 commits deep
git rev-list --all | head -100000 | tail -1 >input
# huge number of more recent negative refs
git rev-list --all | head -$n | sed s/^/^/ >>input
time git rev-list --stdin <input
Here are timings for various `n` on the linux.git
repository. The `n=1` case provides a baseline for just
walking the commits, which lets us see the still_interesting
overhead. The times marked with `+` subtract that baseline
to show just the extra time growth due to the large number
of refs. The `x` numbers show the slowdown of the adjusted
time versus the prior trial.
n | before | after
--------------------------------------------------------
1 | 0.991s | 0.848s
10000 | 1.120s (+0.129s) | 0.885s (+0.037s)
20000 | 1.451s (+0.460s, 3.5x) | 0.923s (+0.075s, 2.0x)
40000 | 2.731s (+1.740s, 3.8x) | 0.994s (+0.146s, 1.9x)
80000 | 8.235s (+7.244s, 4.2x) | 1.123s (+0.275s, 1.9x)
Each trial doubles `n`, so you can see the quadratic (`4x`)
behavior before this patch. Afterwards, we have a roughly
linear relationship.
The implementation is fairly straightforward. Whenever we do
the linear search, we cache the interesting commit we find,
and next time check it before doing another linear search.
If that commit is removed from the list or becomes
UNINTERESTING itself, then we fall back to the linear
search. This is very similar to the trick used by fce87ae
(Fix quadratic performance in rewrite_one., 2008-07-12).
I considered and rejected several possible alternatives:
1. Keep a count of UNINTERESTING commits in the queue.
This requires managing the count not only when removing
an item from the queue, but also when marking an item
as UNINTERESTING. That requires touching the other
functions which mark commits, and would require knowing
quickly which commits are in the queue (lookup in the
queue is linear, so we would need an auxiliary
structure or to also maintain an IN_QUEUE flag in each
commit object).
2. Keep a separate list of interesting commits. Drop items
from it when they are dropped from the queue, or if
they become UNINTERESTING. This again suffers from
extra complexity to maintain the list, not to mention
CPU and memory.
3. Use a better data structure for the queue. This is
something that could help the fix in fce87ae, because
we order the queue by recency, and it is about
inserting quickly in recency order. So a normal
priority queue would help there. But here, we cannot
disturb the order of the queue, which makes things
harder. We really do need an auxiliary index to track
the flag we care about, which is basically option (2)
above.
The "cache" trick is simple, and the numbers above show that
it works well in practice. This is because the length of
time it takes to find an interesting commit is proportional
to the length of time it will remain cached (i.e., if we
have to walk a long way to find it, it also means we have to
pop a lot of elements in the queue until we get rid of it
and have to find another interesting commit).
The worst case is still quadratic, though. We could have `N`
uninteresting commits at the front of the queue, followed by
`N` interesting commits, where commit `i` has parent `i+N`.
When we pop commit `i`, we will notice that the parent of
the next commit, `i+1+N` is still interesting and cache it.
But then handling commit `i+1`, we will mark its parent
`i+1+N` uninteresting, and immediately invalidate our cache.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log --graph --no-walk A B..." is a otcnflicting request that
asks nonsense; no-walk tells us show discrete points in the
history, while graph asks to draw connections between these
discrete points. Forbid the combination.
* dj/log-graph-with-no-walk:
revision: forbid combining --graph and --no-walk
"git rev-list --bisect --first-parent" does not work (yet) and can
even cause SEGV; forbid it. "git log --bisect --first-parent"
would not be useful until "git bisect --first-parent" materializes,
so it is also forbidden for now.
* kd/rev-list-bisect-first-parent:
rev-list: refuse --first-parent combined with --bisect
rev-list --bisect is used by git bisect, but never together with
--first-parent. Because rev-list --bisect together with --first-parent
is not handled currently, and even leads to segfaults, refuse to use
both options together.
Because this is not supported, it makes little sense to use git log
--bisect --first parent either, because refs/heads/bad is not limited to
the first parent chain.
Helped-by: Junio C. Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because "--graph" is about connected history while --no-walk is
about discrete points, it does not make sense to allow these two
options at the same time. [1]
This change makes a few calls to "show --graph" fail in t4052, but
asking to show one commit with graph is a nonsensical thing to do.
Thus, tests on "show --graph" in t4052 have been removed [2,3].
Same tests on "show" without --graph option have already been tested
in 4052.
3 testcases have been added to test this patch.
[1]: http://article.gmane.org/gmane.comp.version-control.git/216083
[2]: http://article.gmane.org/gmane.comp.version-control.git/264950
[3]: http://article.gmane.org/gmane.comp.version-control.git/265107
Helped-By: Eric Sunshine <sunshine@sunshineco.com>
Helped-By: René Scharfe <l.s.r@web.de>
Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Dongcan Jiang <dongcan.jiang@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark file-local symbols as "static", and drop functions that nobody
uses.
* jc/unused-symbols:
shallow.c: make check_shallow_file_for_update() static
remote.c: make clear_cas_option() static
urlmatch.c: make match_urls() static
revision.c: make save_parents() and free_saved_parents() static
line-log.c: make line_log_data_init() static
pack-bitmap.c: make pack_bitmap_filename() static
prompt.c: remove git_getpass() nobody uses
http.c: make finish_active_slot() and handle_curl_result() static
"git log --invert-grep --grep=WIP" will show only commits that do
not have the string "WIP" in their messages.
* cj/log-invert-grep:
log: teach --invert-grep option
"git log --grep=<string>" shows only commits with messages that
match the given string, but sometimes it is useful to be able to
show only commits that do *not* have certain messages (e.g. "show
me ones that are not FIXUP commits").
Originally, we had the invert-grep flag in grep_opt, but because
"git grep --invert-grep" does not make sense except in conjunction
with "--files-with-matches", which is already covered by
"--files-without-matches", it was moved it to revisions structure.
To have the flag there expresses the function to the feature better.
When the newly inserted two tests run, the history would have commits
with messages "initial", "second", "third", "fourth", "fifth", "sixth"
and "Second", committed in this order. The commits that does not match
either "th" or "Sec" is "second" and "initial". For the case insensitive
case only "initial" matches.
Signed-off-by: Christoph Junghans <ottxor@gentoo.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier we made "rev-list --object-edge" more aggressively list the
objects at the edge commits, in order to reduce number of objects
fetched into a shallow repository, but the change affected cases
other than "fetching into a shallow repository" and made it
unusably slow (e.g. fetching into a normal repository should not
have to suffer the overhead from extra processing). Limit it to a
more specific case by introducing --objects-edge-aggressive, a new
option to rev-list.
* bc/fetch-thin-less-aggressive-in-normal-repository:
pack-objects: use --objects-edge-aggressive for shallow repos
rev-list: add an option to mark fewer edges as uninteresting
Documentation: add missing article in rev-list-options.txt
The get_merge_bases*() API was easy to misuse by careless
copy&paste coders, leaving object flags tainted in the commits that
needed to be traversed.
* jc/merge-bases:
get_merge_bases(): always clean-up object flags
bisect: clean flags after checking merge bases
In commit fbd4a70 (list-objects: mark more commits as edges in
mark_edges_uninteresting - 2013-08-16), we marked an increasing number
of edges uninteresting. This change, and the subsequent change to make
this conditional on --objects-edge, are used by --thin to make much
smaller packs for shallow clones.
Unfortunately, they cause a significant performance regression when
pushing non-shallow clones with lots of refs (23.322 seconds vs.
4.785 seconds with 22400 refs). Add an option to git rev-list,
--objects-edge-aggressive, that preserves this more aggressive behavior,
while leaving --objects-edge to provide more performant behavior.
Preserve the current behavior for the moment by using the aggressive
option.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callers of get_merge_bases() can choose to leave object flags
used during the merge-base traversal by passing cleanup=0 as a
parameter, but in practice a very few callers can afford to do so
(namely, "git merge-base"), as they need to compute merge base in
preparation for other processing of their own and they need to see
the object without contaminate flags.
Change the function signature of get_merge_bases_many() and
get_merge_bases() to drop the cleanup parameter, so that the
majority of the callers do not have to say ", 1" at the end.
Give a new get_merge_bases_many_dirty() API to support only a few
callers that know they do not need to spend cycles cleaning up the
object flags.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is currently no easy way to ask the revision traversal
machinery to include objects reachable from the index (e.g.,
blobs and trees that have not yet been committed). This
patch adds an option to do so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call traverse_commit_list, we may have trees and
blobs in the pending array. As we process these, we pass the
"name" field from the pending entry as the path of the
object within the tree (which then becomes the root path if
we recurse into subtrees).
When we set up the traversal in prepare_revision_walk,
though, the "name" field of any pending trees and blobs is
likely to be the ref at which we found the object. We would
not want to make this part of the path (e.g., doing so would
make "git rev-list --objects v2.6.11-tree" in linux.git show
paths like "v2.6.11-tree/Makefile", which is nonsensical).
Therefore prepare_revision_walk sets the name field of each
pending tree and blobs to the empty string.
However, this leaves no room for a caller who does know the
correct path of a pending object to propagate that
information to the revision walker. We can fix this by
making two related changes:
1. Use the "path" field as the path instead of the "name"
field in traverse_commit_list. If the path is not set,
default to "" (which is what we always ended up with in
the current code, because of prepare_revision_walk).
2. In prepare_revision_walk, make a complete copy of the
entry. This makes the path field available to the
walker (if there is one), solving our problem.
Leaving the name field intact is now OK, as we do not
use it as a path due to point (1) above (and we can use
it to make more meaningful error messages if we want).
We also make the original "mode" field available to the
walker, though it does not actually use it.
Note that we still re-add the pending objects and free the
old ones (so we may strdup the path and name only to free
the old ones). This could be made more efficient by simply
copying the object_array entries that we are keeping.
However, that would require more restructuring of the code,
and is not done here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to add all reflog entries as tips for finding
reachable objects. The revision machinery can already do
this (to support "rev-list --reflog"); we can reuse that
code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we enter prepare_revision_walk, we have zero or more
entries in our "pending" array. We disconnect that array
from the rev_info, and then process each entry:
1. If the entry is a commit and the --source option is in
effect, we keep a pointer to the object name.
2. Otherwise, we re-add the item to the pending list with
a blank name.
We then throw away the old array by freeing the array
itself, but do not touch the "name" field of each entry. For
any items of type (2), we leak the memory associated with
the name. This commit fixes that by calling object_array_clear,
which handles the cleanup for us.
That breaks (1), though, because it depends on the memory
pointed to by the name to last forever. We can solve that by
making a copy of the name. This is slightly less efficient,
but it shouldn't matter in practice, as we do it only for
the tip commits of the traversal.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The API to allocate the structure to keep track of commit
decoration was cumbersome to use, inviting lazy code to
overallocate memory.
* jk/name-decoration-alloc:
log-tree: use FLEX_ARRAY in name_decoration
log-tree: make name_decoration hash static
log-tree: make add_name_decoration a public function
In the previous commit, we made add_name_decoration global
so that adders would not have to access the hash directly.
We now make the hash itself static so that callers _have_ to
add through our function, making sure that all additions go
through a single point. To do this, we have to add one more
accessor function: a way to lookup entries in the hash.
Since the only caller doesn't actually look at the returned
value, but rather only asks whether there is a decoration or
not, we could provide only a boolean "has_name_decoration".
That would allow us to make "struct name_decoration" local
to log-tree, as well.
However, it's unlikely to cause any maintainability harm
making the actual data public, and this interface is more
flexible if we need to look at decorations from other parts
of the code in the future.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once upon a time, we parsed pretty options by looking for
"--pretty" at the start of the string, and then feeding the
rest (including an "=") to get_commit_format. Later, commit
48ded91 (log --pretty: do not accept bogus "--prettyshort",
2008-05-25) split this into a separate check for "--pretty"
versus "--pretty=".
However, when parsing "--pretty", we still passed "arg+8" to
get_commit_format. This is useless, since it will always
point to the NUL terminator at the end of the string. We can
simply pass NULL instead; both parameters are treated the
same by get_commit_format.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of code paths had to read the commit object more than
once when showing header fields that are usually not parsed. The
internal data structure to keep track of the contents of the commit
object has been updated to reduce the need for this double-reading,
and to allow the caller find the length of the object.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths. Use this to optimize the code paths to
validate GPG signatures in commit objects.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
"git log --exclude=<glob> --all | git shortlog" worked as expected,
but "git shortlog --exclude=<glob> --all", which is supposed to be
identical to the above pipeline, was not accepted at the command
line argument parser level.
* jc/shortlog-ref-exclude:
shortlog: allow --exclude=<glob> to be passed
"git log -2master" is a common typo that shows two commits starting
from whichever random branch that is not 'master' that happens to
be checked out currently.
* jc/revision-dash-count-parsing:
revision: parse "git log -<count>" more carefully
"git log -2master" is a common typo that shows two commits starting
from whichever random branch that is not 'master' that happens to
be checked out currently.
* jc/revision-dash-count-parsing:
revision: parse "git log -<count>" more carefully
Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.
We return either:
1. The original commit->buffer
2. A newly allocated buffer from read_sha1_file
3. A reencoded buffer (based on either 1 or 2 above).
while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const. We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This mistyped command line simply ignores "master" and ends up
showing two commits from the current HEAD:
$ git log -2master
because we feed "2master" to atoi() without making sure that the
whole string is parsed as an integer.
Use the strtol_i() helper function instead.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"log --exclude=<glob> --all | shortlog" worked as expected, but
"shortlog --exclude=<glob> --all" was not accepted at the command
line argument parser level.
* jc/shortlog-ref-exclude:
shortlog: allow --exclude=<glob> to be passed
These two commands are supposed to be equivalent:
$ git log --exclude=refs/notes/\* --all --no-merges --since=2.days |
git shortlog
$ git shortlog --exclude=refs/notes/\* --all --no-merges --since=2.days
However, the latter does not understand the ref-exclusion command
line option, even though other options understood by "log", such as
"--all" and "--no-merges", are understood.
This was because e7b432c5 (revision: introduce --exclude=<glob> to
tame wildcards, 2013-08-30) did not wire the new option fully to the
machinery. A new option understood by handle_revision_pseudo_opt()
must be told to handle_revision_opt() as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pack-objects is computing the reachability bitmap to
serve a fetch request, it can erroneously die() if some of
the UNINTERESTING objects are not present. Upload-pack
throws away HAVE lines from the client for objects we do not
have, but we may have a tip object without all of its
ancestors (e.g., if the tip is no longer reachable and was
new enough to survive a `git prune`, but some of its
reachable objects did get pruned).
In the non-bitmap case, we do a revision walk with the HAVE
objects marked as UNINTERESTING. The revision walker
explicitly ignores errors in accessing UNINTERESTING commits
to handle this case (and we do not bother looking at
UNINTERESTING trees or blobs at all).
When we have bitmaps, however, the process is quite
different. The bitmap index for a pack-objects run is
calculated in two separate steps:
First, we perform an extensive walk from all the HAVEs to
find the full set of objects reachable from them. This walk
is usually optimized away because we are expected to hit an
object with a bitmap during the traversal, which allows us
to terminate early.
Secondly, we perform an extensive walk from all the WANTs,
which usually also terminates early because we hit a commit
with an existing bitmap.
Once we have the resulting bitmaps from the two walks, we
AND-NOT them together to obtain the resulting set of objects
we need to pack.
When we are walking the HAVE objects, the revision walker
does not know that we are walking it only to mark the
results as uninteresting. We strip out the UNINTERESTING flag,
because those objects _are_ interesting to us during the
first walk. We want to keep going to get a complete set of
reachable objects if we can.
We need some way to tell the revision walker that it's OK to
silently truncate the HAVE walk, just like it does for the
UNINTERESTING case. This patch introduces a new
`ignore_missing_links` flag to the `rev_info` struct, which
we set only for the HAVE walk.
It also adds tests to cover UNINTERESTING objects missing
from several positions: a missing blob, a missing tree, and
a missing parent commit. The missing blob already worked (as
we do not care about its contents at all), but the other two
cases caused us to die().
Note that there are a few cases we do not need to test:
1. We do not need to test a missing tree, with the blob
still present. Without the tree that refers to it, we
would not know that the blob is relevant to our walk.
2. We do not need to test a tip commit that is missing.
Upload-pack omits these for us (and in fact, we
complain even in the non-bitmap case if it fails to do
so).
Reported-by: Siddharth Agarwal <sid0@fb.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Attempts to show where a single-strand-of-pearls break in "git log"
output.
* nd/log-show-linear-break:
log: add --show-linear-break to help see non-linear history
object.h: centralize object flag allocation
Option explanation is in rev-list-options.txt. The interaction with -z
is left undecided.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/warn-on-object-refname-ambiguity:
rev-list: disable object/refname ambiguity check with --stdin
cat-file: restore warn_on_object_refname_ambiguity flag
cat-file: fix a minor memory leak in batch_objects
cat-file: refactor error handling of batch_objects
We started using wildmatch() in place of fnmatch(3); complete the
process and stop using fnmatch(3).
* nd/no-more-fnmatch:
actually remove compat fnmatch source code
stop using fnmatch (either native or compat)
Revert "test-wildmatch: add "perf" command to compare wildmatch and fnmatch"
use wildmatch() directly without fnmatch() wrapper
This is the "rev-list" analogue to 25fba78 (cat-file:
disable object/refname ambiguity check for batch mode,
2013-07-12). Like cat-file, "rev-list --stdin" may read a
large number of sha1 object names, and the warning check
introduces a significant slow-down.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
* ks/tree-diff-walk:
tree-walk: finally switch over tree descriptors to contain a pre-parsed entry
revision: convert to using diff_tree_sha1()
line-log: convert to using diff_tree_sha1()
tree-diff: convert diff_root_tree_sha1() to just call diff_tree_sha1 with old=NULL
tree-diff: allow diff_tree_sha1 to accept NULL sha1
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>