The origin, entry, and scoreboard structures are core to the blame
interface and need to be exposed for blame functionality.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create function that populates a blame_entry and prepends it to a list.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create function that completes setting up blame_scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create function that initializes blame_scoreboard to default values.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Either prepare_initial or prepare_final is used to determine which
commit is marked as 'final'. Call the underlying methods directly to
make this more clear.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new method's interface is marginally cleaner than blame_sort, and
will avoid the need to expose the compare_blame_final method.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the interface user to decide how to handle a progress update.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When iterating over references, reference priming is used to make sure
that loose references are read into the ref-cache before packed
references, to avoid races. It used to be that the prefix passed to
reference iterators almost always ended in `/`, for example
`refs/heads/`. In that case, the priming code would read all loose
references under `find_containing_dir("refs/heads/")`, which is
"refs/heads/". That's just what we want.
But now that `ref-filter` knows how to pass refname prefixes to
`for_each_fullref_in()`, the prefix might come from user input; for
example,
git for-each-ref refs/heads
Since the argument doesn't include a trailing slash, the reference
iteration code would prime all of the loose references under
`find_containing_dir("refs/heads")`, which is "refs/". Thus we would
unnecessarily read tags, remote-tracking references, etc., when the
user is only interested in branches.
It is a bit awkward to get around this problem. We can't just append a
slash to the argument, because we don't know ab initio whether an
argument like `refs/tags/release` corresponds to a single tag or to a
directory containing tags.
Moreover, until now a `prefix_ref_iterator` was used to make the final
decision about which references fall within the prefix (the
`cache_ref_iterator` only did a rough cut). This is also inefficient,
because the `prefix_ref_iterator` can't know, for example, that while
you are in a subdirectory that is completely within the prefix, you
don't have to do the prefix check.
So:
* Move the responsibility for doing the prefix check directly to
`cache_ref_iterator`. This means that `cache_ref_iterator_begin()`
never has to wrap its return value in a `prefix_ref_iterator`.
* Teach `cache_ref_iterator_begin()` (and `prime_ref_dir()`) to be
stricter about what they iterate over and what directories they
prime.
* Teach `cache_ref_iterator` to keep track of whether the current
`cache_ref_iterator_level` is fully within the prefix. If so, skip
the prefix checks entirely.
The main benefit of these optimizations is for loose references, since
packed references are always read all at once.
Note that after this change, `prefix_ref_iterator` is only ever used
for its trimming feature and not for its "prefix" feature. But I'm not
ripping out the latter yet, because it might be useful for another
patch series that I'm working on.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the interface user to decide how to handle a failed sanity check,
whether that be to output with the current state or to do nothing.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The no_whole_file_rename flag is used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The xdl_opts flags are used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The show_root flag is used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reverse flag is used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The argument from --contents is used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Copy and move score thresholds are used in parts of blame that are being
moved to libgit, and should be accessible via the scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Statistic counters are used in parts of blame that are being moved to
libgit, and should be accessible via the scoreboard structure.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions that will be publicly exposed should have names that better
reflect what they are a part of.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions that will be publicly exposed should have names that better
reflect what they are a part of.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions that will be publicly exposed should have names that better
reflect what they are a part of.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions related to blame_origin that will be publicly exposed should
have names that better reflect what they are a part of.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The scoreboard structure is core to the blame interface. Since
scoreboard will become more exposed, rename it to blame_scoreboard to
clarify what it is a part of.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The origin structure is core to the blame interface. Since origin will
become more exposed, rename it to blame_origin to clarify what it is a
part of.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
textconv_object is used in places other than blame.c and should be moved
to a more appropriate location. Other textconv related functions are
located in diff.c so that seems as good a place as any.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With commit 21666f1 ("convert object type handling from a string to a
number", 2007-02-26), there was no longer a need for blame.c to include
blob.h but it was not removed.
Signed-off-by: Jeff Smith <whydoubt@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we diff a blob against a working tree file like:
git diff HEAD:Makefile Makefile
we always use the working tree filename for both sides of
the diff. In most cases that's fine, as the two would be the
same anyway, as above. And until recently, we used the
"name" for the blob, not the path, which would have the
messy "HEAD:" on the beginning.
But when they don't match, like:
git diff HEAD:old_path new_path
it makes sense to show both names.
This patch uses the blob's path field if it's available, and
otherwise falls back to using the filename (in preference to
the blob's name, which is likely to be garbage like a raw
sha1).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's a subtle distinction between "name" and "path" for a
blob that we resolve: the name is what the user told us on
the command line, and the path is what we traversed when
finding the blob within a tree (if we did so).
When we diff blobs directly, we use "name", but "path" is
more likely to be useful to the user (it will find the
correct .gitattributes, and give them a saner diff header).
We still have to fall back to using the name for some cases
(i.e., any blob reference that isn't of the form tree:path).
That's the best we can do in such a case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The stuff_change() function makes diff_filespecs out of
blobs. The term we generally use for filespecs is "path",
not "name", so let's be consistent here. That will make
things less confusing when the next patch starts caring
about the path/name distinction inside the pending object
array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When diffing blobs directly, git-diff picks the blobs out of
the rev_info's pending array and copies the relevant bits to
a custom "struct blobinfo". But the pending array entry
already has all of this information (and more, which we'll
use in future patches). Let's just pass the original entry
instead.
In practice, these two blobs are probably adjacent in the
revs->pending array, and we could just pass the whole array.
But the current code is careful to pick each blob out
separately and put it into another array, so we'll continue
to do so and make our own array-of-pointers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the revision parser sees an argument like tree:path, we
parse it down to the correct blob (or tree), but throw away
the "path" portion. Let's ask get_sha1_with_context() to
record it, and pass it along in the pending array.
This will let programs like git-diff which rely on the
revision-parser show more accurate paths.
Note that the implementation is a little tricky; we have to
make sure we free oc.path in all code paths. For handle_dotdot(),
we can piggy-back on the existing cleanup-wrapper pattern.
The real work happens in handle_dotdot_1(), but the
handle_dotdot() wrapper makes sure that the path is freed no
matter how we exit the function (and for that reason we make
sure that the object_context struct is zero'd, so if we fail
to even get to the get_sha1_with_context() call, we just end
up calling free(NULL)).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "a..b" revision syntax was designed to handle commits,
so it doesn't bother to record any mode we find while
traversing a "tree:path" endpoint. These days "git diff" can
diff blobs using either "a:path..b:path" (with dots) or
"a:path b:path" (without), but the two behave
inconsistently, as the with-dots version fails to notice the
mode.
Let's teach the dot-dot range parser to record modes; it
doesn't cost us anything, and it makes this case work.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-diff command can directly compare two blobs (or a
blob and a file), but we don't test this at all. Let's add
some basic tests that reveal a few problems.
There are basically four interesting inputs:
1. sha1 against sha1 (where diff has no information beyond
the contents)
2. tree:path against tree:path (where it can get
information via get_sha1_with_context)
3. Same as (2), but using the ".." range syntax
4. tree:path against a filename
And beyond generating a sane diff, we care about a few
little bits: which paths they show in the diff header, and
whether they correctly pick up a mode change.
They should all be able to show a mode except for (1),
though note that case (3) is currently broken.
For the headers, we would ideally show the path within the
tree if we have it, making:
git diff a:path b:path
look the same as:
git diff a b -- path
We can't always do that (e.g., in the direct sha1/sha1 diff,
we have no path to show), in which case we should fall back
to the name that resolved to the blob (which is nonsense
from the repository's perspective, but is the best we can
do).
Aside from the fallback case in (1), none of the cases get
this right. Cases (2) and (3) always show the full
tree:path, even though we should be able to know just the
"path" portion.
Case (4) picks up the filename path, but assigns it to
_both_ sides of the diff. So this works for:
git diff tree:path path
but not for:
git diff tree:other_path path
The appropriate tests are marked to expect failure.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a sha1 lookup returns the tree path via "struct
object_context", it just copies it into a fixed-size buffer.
This means the result can be truncated, and it means our
"struct object_context" consumes a lot of stack space.
Instead, let's allocate a string on the heap. Because most
callers don't care about this information, we'll avoid doing
it by default (so they don't all have to start calling
free() on the result). There are basically two options for
the caller to signal to us that it's interested:
1. By setting a pointer to storage in the object_context.
2. By passing a flag in another parameter.
Doing (1) would match the way that sha1_object_info_extended()
works. But it would mean that every caller would have to
initialize the object_context, which they don't currently
have to do.
This patch does (2), and adds a new bit to the function's
flags field. All of the callers that look at the "path"
field are updated to pass the new flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_sha1_with_context() function zeroes out the
oc->symlink_path strbuf, but doesn't use strbuf_init() to
set up the usual invariants (like pointing to the slopbuf).
We don't actually write to the oc->symlink_path strbuf
unless we call get_tree_entry_follow_symlinks(), and that
function does initialize it. However, readers may still look
at the zero'd strbuf.
In practice this isn't a triggerable bug. The only caller
that looks at it only does so when the mode we found is 0.
This doesn't happen for non-tree-entries (where we return
S_IFINVALID). A broken tree entry could have a mode of 0,
but canon_mode() quietly rewrites that into S_IFGITLINK.
So the "0" mode should only come up when we did indeed find
a symlink.
This is mostly just an accident of how the code happens to
work, though. Let's future-proof ourselves to make sure the
strbuf is properly initialized for all calls (it's only a
few struct member assignments, not a heap allocation).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An early version of the patch to add object_context used the
name object_resolve_context. This was later shortened to
just object_context, but the "orc" variable name stuck in a
few places. Let's use "oc", which is used elsewhere in the
code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The handle_revision_arg function is rather long, and a big
chunk of it is handling the range operators. Let's pull that
out to a separate helper. While we're doing so, we can clean
up a few of the rough edges that made the flow hard to
follow:
- instead of manually restoring *dotdot (that we overwrote
with a NUL), do the real work in a sub-helper, which
makes it clear that the munge/restore lines are a
matched pair
- eliminate a goto which wasn't actually used for control
flow, but only to avoid duplicating a few lines
(instead, those lines are pushed into another helper
function)
- use early returns instead of deep nesting
- consistently name all variables for the left-hand side
of the range as "a" (rather than "this" or "from") and
the right-hand side as "b" (rather than "next", or using
the unadorned "sha1" or "flags" from the main function).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 003c84f6d (specifying ranges: we did not mean to make
".." an empty set, 2011-05-02), we treat the argument ".."
specially. We detect it by noticing that both sides of the
range are empty, and that this is a non-symmetric two-dot
range.
While correct, this makes the code overly complicated. We
can just detect ".." up front before we try to do further
parsing. This avoids having to de-munge the NUL from dotdot,
and lets us eliminate an extra const array (which we needed
only to do direct pointer comparisons).
It also removes the one code path from the range-parsing
conditional that requires us to return -1. That will make it
simpler to pull the dotdot parsing out into its own
function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The handle_revision_arg() function has a "dotdot" variable
that it uses to find a ".." or "..." in the argument. If we
don't find one, we look for other marks, like "^!". But we
just keep re-using the "dotdot" variable, which is
confusing.
Let's introduce a separate "mark" variable that can be used
for these other marks. They still reuse the same variable,
but at least the name is no longer actively misleading.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "dotdot" range parser avoids calling
lookup_commit_reference() if we are directly fed two
commits. But its casts are unnecessarily complex; that
function will just return a commit we pass into it.
Just calling the function all the time is much simpler, and
doesn't do any significant extra work (the object is already
parsed, and deref_tag() on a non-tag is a noop; we do incur
one extra lookup_object() call, but that's fairly trivial).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are parsing a range like "a..b", we write a
temporary NUL over the first ".", so that we can access the
names "a" and "b" as C strings. But our restoration of the
original "." is done at inconsistent times, which can lead
to confusing results.
For most calls, we restore the "." after we resolve the
names, but before we call verify_non_filename(). This means
that when we later call add_pending_object(), the name for
the left-hand "a" has been re-expanded to "a..b". You can
see this with:
git log --source a...b
where "b" will be correctly marked with "b", but "a" will be
marked with "a...b". Likewise with "a..b" (though you need
to use --boundary to even see "a" at all in that case).
To top off the confusion, when the REVARG_CANNOT_BE_FILENAME
flag is set, we skip the non-filename check, and leave the
NUL in place.
That means we do report the correct name for "a" in the
pending array. But some code paths try to show the whole
"a...b" name in error messages, and these erroneously show
only "a" instead of "a...b". E.g.:
$ git cherry-pick HEAD:foo...HEAD:foo
error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
error: object d95f3ad14dee633a758d2e331151e950dd13e4ed is a blob, not a commit
fatal: Invalid symmetric difference expression HEAD:foo
(That last message should be "HEAD:foo...HEAD:foo"; I used
cherry-pick because it passes the CANNOT_BE_FILENAME flag).
As an interesting side note, cherry-pick actually looks at
and re-resolves the arguments from the pending->name fields.
So it would have been visibly broken by the first bug, but
the effect was canceled out by the second one.
This patch makes the whole function consistent by re-writing
the NUL immediately after calling verify_non_filename(), and
then restoring the "." as appropriate in some error-printing
and early-return code paths.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is an implicit assumption that a directory containing only
untracked and ignored paths should itself be considered untracked. This
makes sense in use cases where we're asking if a directory should be
added to the git database, but not when we're asking if a directory can
be safely removed from the working tree; as a result, clean -d would
assume that an "untracked" directory containing ignored paths could be
deleted, even though doing so would also remove the ignored paths.
To get around this, we teach clean -d to collect ignored paths and skip
an untracked directory if it contained an ignored path, instead just
removing the untracked contents thereof. To achieve this, cmd_clean()
has to collect all untracked contents of untracked directories, in
addition to all ignored paths, to determine which untracked dirs must be
skipped (because they contain ignored paths) and which ones should *not*
be skipped.
For this purpose, correct_untracked_entries() is introduced to prune a
given dir_struct of untracked entries containing ignored paths and those
untracked entries encompassed by the untracked entries which are not
pruned away.
A memory leak is also fixed in cmd_clean().
This also fixes the known breakage in t7300, since clean -d now skips
untracked directories containing ignored paths.
Signed-off-by: Samuel Lijin <sxlijin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a duplicate mention of --contains in the SYNOPSIS to mention
--no-contains.
This fixes an error introduced in my commit ac3f5a3468 ("ref-filter:
add --no-contains option to tag/branch/for-each-ref", 2017-03-24).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Windows the environment variable PATH contains a semicolon-separated
list of directories to search for, in order, when looking for the
location of a binary to run. get_path_split() parses it and returns an
array of string copies, which is iterated by path_lookup(), which in
turn passes each entry to lookup_prog().
Change lookup_prog() to take the directory name as a length-limited
string instead of as a NUL-terminated one and parse PATH directly in
path_lookup(). This avoids memory allocations, simplifying the code.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Taking git-compat-util.h's cue (which uses an inline function to back
is_dir_sep()), let's use an inline function to back also the Windows
version of is_dir_sep(). This avoids problems when calling the function
with arguments that do more than just provide a single character, e.g.
incrementing a pointer. Example:
is_dir_sep(*p++)
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are matching refnames against a pattern, then we know that the
beginning of any refname that can match the pattern has to match the
part of the pattern up to the first glob character. For example, if
the pattern is `refs/heads/foo*bar`, then it can only match a
reference that has the prefix `refs/heads/foo`.
So pass that prefix to `for_each_fullref_in()`. This lets the ref code
avoid passing us the full set of refs, and in some cases avoid reading
them in the first place.
Note that this applies only when the `match_as_path` flag is set
(i.e., when `for-each-ref` is the caller), as the matching rules for
git-branch and git-tag are subtly different.
This could be generalized to the case of multiple patterns, but (a) it
probably doesn't come up that often, and (b) it is more awkward to
deal with multiple patterns (e.g., the patterns might not be
disjoint). So, since this is just an optimization, punt on the case of
multiple patterns.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only one caller was using it, so move the check to that caller.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of handling `GIT_REF_PARANOIA` in
`files_ref_iterator_begin()`, handle it in
`refs_ref_iterator_begin()`, where it will cover all reference stores.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old code ignored any errors encountered when trying to fopen the
"packed-refs" file, treating all such failures as if the file didn't
exist. But it could be that there is some other error opening the
file (e.g., permissions problems), and we don't want to silently
ignore such problems. So report any failures that are not due to
ENOENT.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach `read_packed_refs()` to also
* Allocate and initialize the new `packed_ref_cache`
* Open and close the `packed-refs` file
* Update the `validity` field of the new object
This decreases the coupling between `packed_refs_cache` and
`files_ref_store` by a little bit.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>