Commit Graph

65119 Commits

Author SHA1 Message Date
Jeff King
5c5b29b459 cat-file: disable refs/replace with --batch-all-objects
When we're enumerating all objects in the object database, it doesn't
make sense to respect refs/replace. The point of this option is to
enumerate all of the objects in the database at a low level. By
definition we'd already show the replacement object's contents (under
its real oid), and showing those contents under another oid is almost
certainly working against what the user is trying to do.

Note that you could make the same argument for something like:

  git show-index <foo.idx |
  awk '{print $2}' |
  git cat-file --batch

but there we can't know in cat-file exactly what the user intended,
because we don't know the source of the input. They could be trying to
do low-level debugging, or they could be doing something more high-level
(e.g., imagine a porcelain built around cat-file for its object
accesses). So in those cases, we'll have to rely on the user specifying
"git --no-replace-objects" to tell us what to do.

One _could_ make an argument that "cat-file --batch" is sufficiently
low-level plumbing that it should not respect replace-objects at all
(and the caller should do any replacement if they want it).  But we have
been doing so for some time. The history is a little tangled:

  - looking back as far as v1.6.6, we would not respect replace refs for
    --batch-check, but would for --batch (because the former used
    sha1_object_info(), and the replace mechanism only affected actual
    object reads)

  - this discrepancy was made even weirder by 98e2092b50 (cat-file:
    teach --batch to stream blob objects, 2013-07-10), where we always
    output the header using the --batch-check code, and then printed the
    object separately. This could lead to "cat-file --batch" dying (when
    it notices the size or type changed for a non-blob object) or even
    producing bogus output (in streaming mode, we didn't notice that we
    wrote the wrong number of bytes).

  - that persisted until 1f7117ef7a (sha1_file: perform object
    replacement in sha1_object_info_extended(), 2013-12-11), which then
    respected replace refs for both forms.

So it has worked reliably this way for over 7 years, and we should make
sure it continues to do so. That could also be an argument that
--batch-all-objects should not change behavior (which this patch is
doing), but I really consider the current behavior to be an unintended
bug. It's a side effect of how the code is implemented (feeding the oids
back into oid_object_info() rather than looking at what we found while
reading the loose and packed object storage).

The implementation is straight-forward: we just disable the global
read_replace_refs flag when we're in --batch-all-objects mode. It would
perhaps be a little cleaner to change the flag we pass to
oid_object_info_extended(), but that's not enough. We also read objects
via read_object_file() and stream_blob_to_fd(). The former could switch
to its _extended() form, but the streaming code has no mechanism for
disabling replace refs. Setting the global flag works, and as a bonus,
it's impossible to have any "oops, we're sometimes replacing the object
and sometimes not" bugs in the output (like the ones caused by
98e2092b50 above).

The tests here cover the regular-input and --batch-all-objects cases,
for both --batch-check and --batch. There is a test in t6050 that covers
the regular-input case with --batch already, but this new one goes much
further in actually verifying the output (plus covering --batch-check
explicitly). This is perhaps a little overkill and the tests would be
simpler just covering --batch-check, but I wanted to make sure we're
checking that --batch output is consistent between the header and the
content. The global-flag technique used here makes that easy to get
right, but this is future-proofing us against regressions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:45:14 -07:00
Jeff King
c3660cfb03 cat-file: mention --unordered along with --batch-all-objects
The note on ordering for --batch-all-objects was written when that was
the only possible ordering. These days we have --unordered, too, so
let's point to it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:45:14 -07:00
Jeff King
e879295b20 t1006: clean up broken objects
A few of the tests create intentionally broken objects with broken
types. Let's clean them up after we're done with them, so that later
tests don't get confused (we hadn't noticed because this only affects
tests which use --batch-all-objects, but I'm about to add more).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:45:14 -07:00
Jonathan Tan
71ef66d740 submodule: trace adding submodule ODB as alternate
Submodule ODBs are never added as alternates during the execution of the
test suite, but there may be a rare interaction that the test suite does
not have coverage of. Add a trace message when this happens, so that
users who trace their commands can notice such occurrences.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:06 -07:00
Jonathan Tan
13a2f620b2 submodule: pass repo to check_has_commit()
Pass the repo explicitly when calling check_has_commit() to avoid
relying on add_submodule_odb(). With this commit and the parent commit,
the last remaining tests no longer rely on add_submodule_odb(), so mark
these tests accordingly.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:06 -07:00
Jonathan Tan
eef71904ff object-file: only register submodule ODB if needed
In a35e03dee0 ("submodule: lazily add submodule ODBs as alternates",
2021-09-08), Git was taught to add all known submodule ODBs as
alternates when attempting to read an object that doesn't exist, as a
fallback for when a submodule object is read as if it were in
the_repository. However, this behavior wasn't restricted to happen only
when reading from the_repository. Fix this.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:06 -07:00
Jonathan Tan
155b517d5c merge-{ort,recursive}: remove add_submodule_odb()
After the parent commit and some of its ancestors, the only place
commits are being accessed through alternates is in the user-facing
message formatting code. Fix those, and remove the add_submodule_odb()
calls.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:06 -07:00
Jonathan Tan
8788195c88 refs: peeling non-the_repository iterators is BUG
There is currently no support for peeling the current ref of an iterator
iterating over a non-the_repository ref store, and none is needed. Thus,
for now, BUG() if that happens.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:06 -07:00
Jonathan Tan
9bc45a2802 refs: teach arbitrary repo support to iterators
Note that should_pack_ref() is called when writing refs, which is only
supported for the_repository, hence the_repository is hardcoded there.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:05 -07:00
Jonathan Tan
34224e14d6 refs: plumb repo into ref stores
In preparation for the next 2 patches that adds (partial) support for
arbitrary repositories to ref iterators, plumb a repository into all ref
stores. There are no changes to program logic.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:05 -07:00
Hamza Mahfooz
6a5c337922 pretty: colorize pattern matches in commit messages
The "git log" command limits its output to the commits that contain strings
matched by a pattern when the "--grep=<pattern>" option is used, but unlike
output from "git grep -e <pattern>", the matches are not highlighted,
making them harder to spot.

Teach the pretty-printer code to highlight matches from the
"--grep=<pattern>", "--author=<pattern>" and "--committer=<pattern>"
options (to view the last one, you may have to ask for --pretty=fuller).

Also, it must be noted that we are effectively greping the content twice
(because it would be a hassle to rework the existing matching code to do
a /g match and then pass it all down to the coloring code), however it only
slows down "git log --author=^H" on this repository by around 1-2%
(compared to v2.33.0), so it should be a small enough slow down to justify
the addition of the feature.

Signed-off-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:19:14 -07:00
Ævar Arnfjörð Bjarmason
d342834529 parse-options: change OPT_{SHORT,UNSET} to an enum
Change the comparisons against OPT_SHORT and OPT_UNSET to an enum
which keeps track of how a given option got parsed. The case of "0"
was an implicit OPT_LONG, so let's add an explicit label for it.

Due to the xor in 0f1930c587 (parse-options: allow positivation of
options starting, with no-, 2012-02-25) the code already relied on
this being set back to 0. To avoid refactoring the logic involved in
that let's just start the enum at "0" instead of the usual "1<<0" (1),
but BUG() out if we don't have one of our expected flags.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
62f2ffc563 parse-options tests: test optname() output
There were no tests for checking the specific output that we'll
generate in optname(), let's add some. That output was added back in
4a59fd1312 (Add a simple option parser., 2007-10-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
28794ec72e parse-options.[ch]: make opt{bug,name}() "static"
Change these two functions to "static", the last user of "optname()"
outside of parse-options.c itself went away in the preceding commit,
for the reasons noted in 9440b831ad (parse-options: replace
opterror() with optname(), 2018-11-10) we shouldn't be adding any more
users of it.

The "optbug()" function was never used outside of parse-options.c, but
was made non-static in 1f275b7c4c (parse-options: export opterr,
optbug, 2011-08-11). I think the only external user of optname() was
the commit-graph.c caller added in 09e0327f57 (builtin/commit-graph.c:
introduce '--max-new-filters=<n>', 2020-09-18).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
13d9fcec29 commit-graph: stop using optname()
Stop using optname() in builtin/commit-graph.c to emit an error with
the --max-new-filters option. This changes code added in 809e0327f5
(builtin/commit-graph.c: introduce '--max-new-filters=<n>',
2020-09-18).

See 9440b831ad (parse-options: replace opterror() with optname(),
2018-11-10) for why using optname() like this is considered bad,
i.e. it's assembling human-readable output piecemeal, and the "option
`X'" at the start can't be translated.

It didn't matter in this case, but this code was also buggy in its use
of "opt->flags" to optname(), that function expects flags, but not
*those* flags.

Let's pass "max-new-filters" to the new error because the option name
isn't translatable, and because we can re-use a translation added in
f7e68a0878 (parse-options: check empty value in OPT_INTEGER and
OPT_ABBREV, 2019-05-29).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
3c2047a711 parse-options.c: move optname() earlier in the file
In preparation for making "optname" a static function move it above
its first user in parse-options.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
7bf7f0ba05 parse-options.h: make the "flags" in "struct option" an enum
Change the "flags" members of "struct option" to refer to their
corresponding "enum" defined earlier in the file.

The benefit of changing this to an enum isn't as great as with some
"enum parse_opt_type" as we'll always check this as a bitfield, so we
can't rely on the compiler checking "case" arms for us. But let's do
it for consistency with the rest of the file.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
1b887353d7 parse-options.c: use exhaustive "case" arms for "enum parse_opt_result"
Change the "default" case in parse_options() that handles the return
value of parse_options_step() to simply have a "case" arm for
PARSE_OPT_UNKNOWN, instead of leaving it to a comment. This means the
compiler can warn us about any missing case arms.

This adjusts code added in ff43ec3e2d (parse-opt: create
parse_options_step., 2008-06-23), given its age it may pre-date the
existence (or widespread use) of this coding style, which we've since
adopted more widely.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
352e761388 parse-options.[ch]: consistently use "enum parse_opt_result"
Use the "enum parse_opt_result" instead of an "int flags" as the
return value of the applicable functions in parse-options.c.

This will help catch future bugs, such as the missing "case" arms in
the two existing users of the API in "blame.c" and "shortlog.c". A
third caller in 309be813c9 (update-index: migrate to parse-options
API, 2010-12-01) was already checking for these.

As can be seen when trying to sort through the deluge of warnings
produced when compiling this with CC=g++ (mostly unrelated to this
change) we're not consistently using "enum parse_opt_result" even now,
i.e. we'll return error() and "return 0;". See f41179f16b
(parse-options: avoid magic return codes, 2019-01-27) for a commit
which started changing some of that.

I'm not doing any more of that exhaustive migration here, and it's
probably not worthwhile past the point of being able to check "enum
parse_opt_result" in switch().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Ævar Arnfjörð Bjarmason
3f9ab7ccde parse-options.[ch]: consistently use "enum parse_opt_flags"
Use the "enum parse_opt_flags" instead of an "int flags" as arguments
to the various functions in parse-options.c.

Even though this is an enum bitfield there's there's a benefit to
doing this when it comes to the wider C ecosystem. E.g. the GNU
debugger (gdb) will helpfully detect and print out meaningful enum
labels in this case. Here's the output before and after when breaking
in "parse_options()" after invoking "git stash show":

Before:

    (gdb) p flags
    $1 = 9

After:

    (gdb) p flags
    $1 = (PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_KEEP_UNKNOWN)

Of course as noted in[1] there's a limit to this smartness,
i.e. manually setting it with unrelated enum labels won't be
caught. There are some third-party extensions to do more exhaustive
checking[2], perhaps we'll be able to make use of them sooner than
later.

We've also got prior art using this pattern in the codebase. See
e.g. "enum bloom_filter_computed" added in 312cff5207 (bloom: split
'get_bloom_filter()' in two, 2020-09-16) and the "permitted" enum
added in ce910287e7 (add -p: fix checking of user input, 2020-08-17).

1. https://lore.kernel.org/git/87mtnvvj3c.fsf@evledraar.gmail.com/
2. https://github.com/sinelaw/elfs-clang-plugins/blob/master/enums_conversion/README.md

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:13:11 -07:00
Bagas Sanjaya
8c32856133 blame: document --color-* options
Commit cdc2d5f11f (builtin/blame: dim uninteresting metadata lines,
2018-04-23) and 25d5f52901 (builtin/blame: highlight recently changed
lines, 2018-04-23) introduce --color-lines and --color-by-age options to
git blame, respectively. While both options are mentioned in usage help,
they aren't documented in git-blame(1). Document them.

Co-authored-by: Dr. Matthias St. Pierre <m.st.pierre@ncp-e.com>
Signed-off-by: Dr. Matthias St. Pierre <m.st.pierre@ncp-e.com>
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 14:05:43 -07:00
Johannes Sixt
350b87cd65 userdiff-cpp: tighten word regex
Generally, word regex can be written such that they match tokens
liberally and need not model the actual syntax because it can be assumed
that the regex will only be applied to syntactically correct text.

The regex for cpp (C/C++) is too liberal, though. It regards these
sequences as single tokens:

   1+2
   1.5-e+2+f

and the following amalgams as one token:

   .l      as in str.length
   .f      as in str.find
   .e      as in str.erase

Tighten the regex in the following way:

- Accept + and - only in one position in the exponent. + and - are no
  longer regarded as the sign of a number and are treated by the
  catcher-all that is not visible in the driver's regex.

- Accept a leading decimal point only when it is followed by a digit.

For readability, factor hex- and binary numbers into an own term.

As a drive-by, this fixes that floating point numbers such as 12E5
(with upper-case E) were split into two tokens.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 13:04:07 -07:00
Johannes Sixt
3e063de46e t4034: add tests showing problematic cpp tokenizations
The word regex is too loose and matches long streaks of characters
that should actually be separate tokens.  Add these problematic test
cases. Separate the lines with text that will remain identical in the
pre- and post-image so that the diff algorithm will not lump removals
and additions of consecutive lines together. This makes the expected
output easier to read.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 13:04:07 -07:00
Johannes Sixt
1cf93847c1 t4034/cpp: actually test that operator tokens are not split
8d96e7288f (t4034: bulk verify builtin word regex sanity, 2010-12-18)
added many tests with the intent to verify that operators consisting of
more than one symbol are kept together. These are tested by probing a
transition from, e.g., a!=b to x!=y, which results in the word-diff

  [-a-]{+x+}!=[-b-]{+y+}

But that proves only that the letters and operators are separate tokens.
To prove that != is an unseparable token, we have to probe a transition
from, e.g., a=b to a!=b having a word-diff

  a[-=-]{+!=+}b

that proves that the ! is not separate from the =.

In the post-image, add to or remove from operators a character that
turns it into another valid operator.

Change the identifiers used around operators such that the diff
algorithm does not have an incentive to match, e.g., a<b in one spot
in the pre-image with a<b elsewhere in the post-image.

Adjust the expected output to match the new differences. Notice that
there are some undesirable tokenizations around e, ., and -.  This will
be addressed in a later change.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 13:04:07 -07:00
Han-Wen Nienhuys
d860c86ba5 Add "test-tool dump-reftable" command.
This command dumps individual tables or a stack of of tables.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
e793168364 reftable: add dump utility
provide a command-line utility for inspecting individual tables, and
inspecting a complete ref database

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
e48d427268 reftable: implement stack, a mutable database of reftable files.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
acb533440f reftable: implement refname validation
The packed/loose format has restrictions on refnames: a and a/b cannot
coexist. This limitation does not apply to reftable per se, but must be
maintained for interoperability. This code adds validation routines to
abort transactions that are trying to add invalid names.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
1ae2b8cda8 reftable: add merged table view
This adds an abstract, read-only interface to the ref database.

This primitive is used to construct the read view of the ref database
(the read view is constructed by merging several *.ref files). It also
provides the mechanism to provide a unified view of the refs in the main
repository and the per-worktree refs.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
3b34f636df reftable: add a heap-based priority queue for reftable records
This is needed to create a merged view multiple reftables

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
ffc97f1a9e reftable: reftable file level tests
With support for reading and writing files in place, we can construct files (in
memory) and attempt to read them back.

Because some sections of the format are optional (eg. indices, log entries), we
have to exercise this code using multiple sizes of input data

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
46bc0e731a reftable: read reftable files
This supports reading a single reftable file.

The commit introduces an abstract iterator type, which captures the usecases
both of reading individual refs, and iterating over a segment of the ref
namespace.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
17df8dbeba reftable: generic interface to tables
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
f14bd71934 reftable: write reftable files
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
35425d1034 reftable: a generic binary tree implementation
The reftable format includes support for an (OID => ref) map. This map can speed
up visibility and reachability checks. In particular, various operations along
the fetch/push path within Gerrit have ben sped up by using this structure.

The map is constructed with help of a binary tree. Object IDs are hashes, so
they are uniformly distributed. Hence, the tree does not attempt forced
rebalancing.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
e581fd7231 reftable: reading/writing blocks
The reftable format is structured as a sequence of block. Within a block,
records are prefix compressed, with an index of offsets for fully expand keys to
enable binary search within blocks.

This commit provides the logic to read and write these blocks.

Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
a322920d0b Provide zlib's uncompress2 from compat/zlib-compat.c
This will be needed for reading reflog blocks in reftable.

Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
e303bf22f9 reftable: (de)serialization for the polymorphic record type.
The reftable format is structured as a sequence of blocks, and each block
contains a sequence of prefix-compressed key-value records. There are 4 types of
records, and they have similarities in how they must be handled. This is
achieved by introducing a polymorphic 'record' type that encapsulates ref, log,
index and object records.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
1214aa841b reftable: add blocksource, an abstraction for random access reads
The reftable format is usually used with files for storage. However, we abstract
away this using the blocksource data structure. This has two advantages:

* log blocks are zlib compressed, and handling them is simplified if we can
  discard byte segments from within the block layer.

* for unittests, it is useful to read and write in-memory. The blocksource
  allows us to abstract the data away from on-disk files.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
ef8a6c6268 reftable: utility functions
This commit provides basic utility classes for the reftable library.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
890044708d reftable: add error related functionality
The reftable/ directory is structured as a library, so it cannot
crash on misuse. Instead, it returns an error code.

In addition to signaling errors, the error code can be used to signal
conditions from lower levels of the library to be handled by higher
levels of the library. For example, in a transaction we might
legitimately write an empty reftable file, but in that case, we want to
shortcut the transaction.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
Han-Wen Nienhuys
27f7ed2a07 reftable: add LICENSE
The objective of this code is to be usable as a C library, so it can be reused
in libgit2.

This is currently using a BSD license as it is the liberal license I could find,
but this could be changed to whatever fits the stated goal above.

This code is currently imported from github.com/hanwen/reftable. Once this code
lands in git.git, the C code will be removed from github.com/hanwen/reftable,
and the git.git code will be the source of truth.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:45:48 -07:00
René Scharfe
c90cfc225b test-mergesort: use repeatable random numbers
Use MINSTD to generate pseudo-random numbers consistently instead of
using rand(3), whose output can vary from system to system, and reset
its seed before filling in the test values.  This gives repeatable
results across versions and systems, which simplifies sharing and
comparing of results between developers.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 10:04:56 -07:00
René Scharfe
c8ad9d04c6 read-cache: let verify_path() reject trailing dir separators again
6e773527b6 (sparse-index: convert from full to sparse, 2021-03-30) made
verify_path() accept trailing directory separators for directories,
which is necessary for sparse directory entries.  This clemency causes
"git stash" to stumble over sub-repositories, though, and there may be
more unintended side-effects.

Avoid them by restoring the old verify_path() behavior and accepting
trailing directory separators only in places that are supposed to handle
sparse directory entries.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 17:52:26 -07:00
René Scharfe
2a1ae649a4 read-cache: add verify_path_internal()
Turn verify_path() into an internal function that distinguishes between
valid paths and those with trailing directory separators and rename it
to verify_path_internal().  Provide a wrapper with the old behavior
under the old name.  No functional change intended.  The new function
will be used in the next patch.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 17:49:39 -07:00
René Scharfe
fc5e90b848 t3905: show failure to ignore sub-repo
"git stash" used to ignore sub-repositories until 6e773527b6
(sparse-index: convert from full to sparse, 2021-03-30).  Add a test
that demonstrates this regression.

Reported-by: Robert Leftwich <robert@gitpod.io>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 17:47:39 -07:00
Ævar Arnfjörð Bjarmason
465028e0e2 merge: add missing strbuf_release()
We strbuf_reset() this "struct strbuf" in a loop earlier, but never
freed it. Plugs a memory leak that's been here ever since this code
got introduced in 1c7b76be7d (Build in merge, 2008-07-07).

This takes us from 68 failed tests in "t7600-merge.sh" to 59 under
SANITIZE=leak, and makes "t7604-merge-custom-message.sh" pass!

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 15:40:16 -07:00
Ævar Arnfjörð Bjarmason
272f0a574d ls-files: add missing string_list_clear()
Fix a memory leak that's been here ever since 72aeb18772 (clean.c,
ls-files.c: respect encapsulation of exclude_list_groups, 2013-01-16),
we dup'd the argument in option_parse_exclude(), but never freed the
string_list.

This makes almost all of t3001-ls-files-others-exclude.sh pass (it had
a lot of failures before). Let's mark it as passing with
TEST_PASSES_SANITIZE_LEAK=true, and then exclude the tests that still
failed with a !SANITIZE_LEAK prerequisite check until we fix those
leaks. We can still see the failed tests under
GIT_TEST_FAIL_PREREQS=true.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 15:40:15 -07:00
Ævar Arnfjörð Bjarmason
eab4ac6a23 ls-files: fix a trivial dir_clear() leak
Fix an edge case that was missed when the dir_clear() call was added
in eceba53214 (dir: fix problematic API to avoid memory leaks,
2020-08-18), we need to also clean up when we're about to exit with
non-zero.

That commit says, on the topic of the dir_clear() API and UNLEAK():

    [...]two of them clearly thought about leaks since they had an
    UNLEAK(dir) directive, which to me suggests that the method to
    free the data was too unclear.

I think that 0e5bba53af (add UNLEAK annotation for reducing leak
false positives, 2017-09-08) which added the UNLEAK() makes it clear
that that wasn't the case, rather it was the desire to avoid the
complexity of freeing the memory at the end of the program.

This does add a bit of complexity, but I think it's worth it to just
fix these leaks when it's easy in built-ins. It allows them to serve
as canaries for underlying APIs that shouldn't be leaking, it
encourages us to make those freeing APIs nicer for all their users,
and it prevents other leaking regressions by being able to mark the
entire test as TEST_PASSES_SANITIZE_LEAK=true.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 15:40:15 -07:00
Ævar Arnfjörð Bjarmason
6ad66ab45e tests: fix test-oid-array leak, test in SANITIZE=leak
Fix a trivial memory leak present ever since 38d905bf58 (sha1-array:
add test-sha1-array and basic tests, 2014-10-01), now that that's
fixed we can test this under GIT_TEST_PASSING_SANITIZE_LEAK=true.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-07 15:40:15 -07:00