Commit Graph

16 Commits

Author SHA1 Message Date
Derrick Stolee
4925adb4da object-name: diagnose trees in index properly
When running 'git show :<path>' where '<path>' is a directory, then
there is a subtle difference between a full checkout and a sparse
checkout. The error message from diagnose_invalid_index_path() reports
whether the path is on disk or not. The full checkout will have the
directory on disk, but the path will not be in the index. The sparse
checkout could have the directory not exist, specifically when that
directory is outside of the sparse-checkout cone.

In the case of a sparse index, we have yet another state: the path can
be a sparse directory in the index. In this case, the error message from
diagnose_invalid_index_path() would erroneously say "path '<path>' is in
the index, but not at stage 0", which is false.

Add special casing around sparse directory entries so we get to the
correct error message. This requires two checks in order to get parity
with the normal sparse-checkout case.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 13:56:39 -07:00
Derrick Stolee
561287d342 object-name: reject trees found in the index
The get_oid_with_context_1() method is used when parsing revision
arguments. One particular case is to take a ":<path>" string and search
the index for the given path.

In the case of a sparse index, this might find a sparse directory entry,
in which case the contained object is a tree. In the case of a full
index, this search within the index would fail.

In order to maintain identical return state as in a full index, inspect
the discovered cache entry to see if it is a sparse directory and reject
it. This requires being careful around the only_to_die option to be sure
we die only at the correct time.

This changes the behavior of 'git show :<sparse-dir>', but does not
bring it entirely into alignment with a full index case. It specifically
hits the wrong error message within diagnose_invalid_index_path(). That
error message will be corrected in a future change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-26 13:56:38 -07:00
Junio C Hamano
0a01df08c0 Merge branch 'ab/date-mode-release'
Plug (some) memory leaks around parse_date_format().

* ab/date-mode-release:
  date API: add and use a date_mode_release()
  date API: add basic API docs
  date API: provide and use a DATE_MODE_INIT
  date API: create a date.h, split from cache.h
  cache.h: remove always unused show_date_human() declaration
2022-02-25 15:47:36 -08:00
Junio C Hamano
268e6b8d4d Merge branch 'ab/ambiguous-object-name'
Error output given in response to an ambiguous object name has been
improved.

* ab/ambiguous-object-name:
  object-name: re-use "struct strbuf" in show_ambiguous_object()
  object-name: iterate ambiguous objects before showing header
  object-name: show date for ambiguous tag objects
  object-name: make ambiguous object output translatable
  object-name: explicitly handle bad tags in show_ambiguous_object()
  object-name: explicitly handle OBJ_BAD in show_ambiguous_object()
  object-name tests: add tests for ambiguous object blind spots
2022-02-25 15:47:33 -08:00
Ævar Arnfjörð Bjarmason
88c7b4c3c8 date API: create a date.h, split from cache.h
Move the declaration of the date.c functions from cache.h, and adjust
the relevant users to include the new date.h header.

The show_ident_date() function belonged in pretty.h (it's defined in
pretty.c), its two users outside of pretty.c didn't strictly need to
include pretty.h, as they get it indirectly, but let's add it to them
anyway.

Similarly, the change to "builtin/{fast-import,show-branch,tag}.c"
isn't needed as far as the compiler is concerned, but since they all
use the "DATE_MODE()" macro we now define in date.h, let's have them
include it.

We could simply include this new header in "cache.h", but as this
change shows these functions weren't common enough to warrant
including in it in the first place. By moving them out of cache.h
changes to this API will no longer cause a (mostly) full re-build of
the project when "make" is run.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-16 09:40:00 -08:00
Ævar Arnfjörð Bjarmason
3a73c1dfaf object-name: re-use "struct strbuf" in show_ambiguous_object()
Reduce the allocations done by show_ambiguous_object() by moving the
"desc" strbuf into the "struct ambiguous_output" introduced in the
preceding commit.

This doesn't matter for optimization purposes, but since we're
accumulating a "struct strbuf advice" anyway let's follow that pattern
and add a "struct strbuf sb", we can then strbuf_reset() it rather
than calling strbuf_release() for each call to
show_ambiguous_object().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:09:34 -08:00
Ævar Arnfjörð Bjarmason
d2ef3cb7e2 object-name: iterate ambiguous objects before showing header
Change the "The candidates are" header that's shown for ambiguous
objects to be shown after we've iterated over all of the objects.

If we get any errors while doing so we don't want to split up the the
header and the list as a result. The two will now be printed together,
as shown in the updated testcase.

As we're accumulating the lines into as "struct strbuf" before
emitting them we need to add a trailing newline to the call in
show_ambiguous_object(). This and the change from "The candidates
are:" to "The candidates are:\n%s" helps to give translators more
context.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:09:34 -08:00
Ævar Arnfjörð Bjarmason
851b3d7671 object-name: show date for ambiguous tag objects
Make the ambiguous tag object output nicer in the case of tag objects
such as ebf3c04b26 (Git 2.32, 2021-06-06) by including the date in
the "tagger" header. I.e.:

    $ git rev-parse b7e68
    error: short object ID b7e68 is ambiguous
    hint: The candidates are:
    hint:   b7e68c41d92 tag 2021-06-06 - v2.32.0
    hint:   b7e68ae18e0 commit 2019-12-23 - bisect: use the standard 'if (!var)' way to check for 0
    hint:   b7e68f6b413 tree
    hint:   b7e68490b97 blob
    b7e68
    [...]

Before this we'd emit a "tag" line without a date, e.g.:

    hint:   b7e68c41d92 tag v2.32.0

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:09:33 -08:00
Ævar Arnfjörð Bjarmason
ba5e8a0eb8 object-name: make ambiguous object output translatable
Change the output of show_ambiguous_object() added in [1] and last
tweaked in [2] and the preceding commit to be more friendly to
translators.

By being able to customize the "<SP><SP>%s\n" format we're even ready
for RTL languages, who'd presumably like to change that to
"%s<SP><SP>\n".

In the case of the existing "tag [tag could not be parsed]" output
we'll now instead emit "[bad tag, could not parse it]". This is
consistent with the "[bad object]" output. Rephrasing the message like
this is possible because we're not unconditionally adding the
type_name() at the beginning.

1. 1ffa26c461 (get_short_sha1: list ambiguous objects on error,
   2016-09-26)
2. 5cc044e025 (get_short_oid: sort ambiguous objects by type,
   then SHA-1, 2018-05-10)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:09:33 -08:00
Ævar Arnfjörð Bjarmason
667a560be7 object-name: explicitly handle bad tags in show_ambiguous_object()
Follow-up the handling of OBJ_BAD in the preceding commit and
explicitly handle those cases where parse_tag() fails, or we don't end
up with a non-NULL pointer in in tag->tag.

If we run into such a tag we'd previously be silent about it. We
really should also be handling these batter in parse_tag_buffer() by
being more eager to emit an error(), instead of silently aborting with
"return -1;".

One example of such a tag is the one that's tested for in
"t3800-mktag.sh", where the code takes the "size <
the_hash_algo->hexsz + 24" branch.

But in lieu of earlier missing "error" output let's show the user
something to indicate why we're not showing a tag message in these
cases, now instead of showing:

    hint:   deadbeef tag

We'll instead display:

    hint:   deadbeef tag [tag could not be parsed]

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:09:33 -08:00
Ævar Arnfjörð Bjarmason
6780e68040 object-name: explicitly handle OBJ_BAD in show_ambiguous_object()
Amend the "unknown type" handling in the code that displays the
ambiguous object list to assert() that we're either going to get the
"real" object types we can pass to type_name(), or a -1 (OBJ_BAD)
return value from oid_object_info().

See [1] for the current output, and [1] for the commit that added the
"unknown type" handling.

We are never going to get an "unknown type" in the sense of custom
types crafted with "hash-object --literally", since we're not using
the OBJECT_INFO_ALLOW_UNKNOWN_TYPE flag.

If we manage to otherwise unpack such an object without errors we'll
die() in parse_loose_header_extended() called by sort_ambiguous()
before we get to show_ambiguous_object(), as is asserted by the test
added in the preceding commit.

So saying "unknown type" here was always misleading, we really meant
to say that we had a failure parsing the object at all, i.e. that we
had repository corruption. If the problem is only that it's type is
unknown we won't reach this code.

So let's emit a generic "[bad object]" instead. As our tests added in
the preceding commit show, we'll have emitted various "error" output
already in those cases.

We should do better in the truly "unknown type" cases, which we'd need
to handle if we were passing down the OBJECT_INFO_ALLOW_UNKNOWN_TYPE
flag. But let's leave that for some future improvement. In a
subsequent commit I'll improve the output we do show, and not having
to handle the "unknown type" (as in OBJECT_INFO_ALLOW_UNKNOWN_TYPE)
simplifies that change.

1. 5cc044e025 (get_short_oid: sort ambiguous objects by type,
   then SHA-1, 2018-05-10)
2. 1ffa26c461 (get_short_sha1: list ambiguous objects on error,
   2016-09-26)

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-27 12:09:33 -08:00
Ævar Arnfjörð Bjarmason
245b948815 cat-file: use GET_OID_ONLY_TO_DIE in --(textconv|filters)
Change the cat_one_file() logic that calls get_oid_with_context()
under --textconv and --filters to use the GET_OID_ONLY_TO_DIE flag,
thus improving the error messaging emitted when e.g. <path> is missing
but <rev> is not.

To service the "cat-file" use-case we need to introduce a new
"GET_OID_REQUIRE_PATH" flag, otherwise it would exit early as soon as
a valid "HEAD" was resolved, but in the "cat-file" case being changed
we always need a valid revision and path.

This arguably makes the "<bad rev>:<bad path>" and "<bad
rev>:<good (in HEAD) path>" use cases worse, as we won't quote the
<path> component at the user anymore, but let's just use the existing
logic "git log" et al use for now. We can improve the messaging for
those cases as a follow-up for all callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30 13:05:29 -08:00
Ævar Arnfjörð Bjarmason
9ce6000cb7 object-name.c: don't have GET_OID_ONLY_TO_DIE imply *_QUIETLY
Stop having GET_OID_ONLY_TO_DIE imply GET_OID_QUIETLY in
get_oid_with_context_1().

The *_DIE flag was added in 33bd598c39 (sha1_name.c: teach lookup
context to get_sha1_with_context(), 2012-07-02), and then later
tweaked in 7243ffdd78 (get_sha1: avoid repeating ourselves via
ONLY_TO_DIE, 2016-09-26).

Everything in that commit makes sense, but only for callers that
expect to fail in an initial call to get_oid_with_context_1(), e.g. as
"git show 0017" does via handle_revision_arg(), and then would like to
call get_oid_with_context_1() again via this
maybe_die_on_misspelt_object_name() function.

In the subsequent commit we'll add a new caller that expects to call
this only once, but who would still like to have all the error
messaging that GET_OID_ONLY_TO_DIE gives it, in addition to any
regular errors.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30 13:05:29 -08:00
Ben Boeckel
ed9bff0817 advice: remove read uses of most global advice_ variables
In c4a09cc9cc (Merge branch 'hw/advise-ng', 2020-03-25), a new API for
accessing advice variables was introduced and deprecated `advice_config`
in favor of a new array, `advice_setting`.

This patch ports all but two uses which read the status of the global
`advice_` variables over to the new `advice_enabled` API. We'll deal
with advice_add_embedded_repo and advice_graft_file_deprecated
separately.

Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 12:07:52 -07:00
Eric Wong
92d8ed8ac1 oidtree: a crit-bit tree for odb_loose_cache
This saves 8K per `struct object_directory', meaning it saves
around 800MB in my case involving 100K alternates (half or more
of those alternates are unlikely to hold loose objects).

This is implemented in two parts: a generic, allocation-free
`cbtree' and the `oidtree' wrapper on top of it.  The latter
provides allocation using alloc_state as a memory pool to
improve locality and reduce free(3) overhead.

Unlike oid-array, the crit-bit tree does not require sorting.
Performance is bound by the key length, for oidtree that is
fixed at sizeof(struct object_id).  There's no need to have
256 oidtrees to mitigate the O(n log n) overhead like we did
with oid-array.

Being a prefix trie, it is natively suited for expanding short
object IDs via prefix-limited iteration in
`find_short_object_filename'.

On my busy workstation, p4205 performance seems to be roughly
unchanged (+/-8%).  Startup with 100K total alternates with no
loose objects seems around 10-20% faster on a hot cache.
(800MB in memory savings means more memory for the kernel FS
cache).

The generic cbtree implementation does impose some extra
overhead for oidtree in that it uses memcmp(3) on
"struct object_id" so it wastes cycles comparing 12 extra bytes
on SHA-1 repositories.  I've not yet explored reducing this
overhead, but I expect there are many places in our code base
where we'd want to investigate this.

More information on crit-bit trees: https://cr.yp.to/critbit.html

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-07 21:28:04 -07:00
Martin Ågren
1e6771e504 object-name.c: rename from sha1-name.c
Generalize the last remnants of "sha" and "sha1" in this file and rename
it to reflect that we're not just able to handle SHA-1 these days.

We need to update one test to check for an updated error string.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-04 13:01:55 -08:00