Commit Graph

65089 Commits

Author SHA1 Message Date
Johannes Altmanninger
56c4d7f6a9 Documentation/git-status: document porcelain status T (typechange)
As reported in [1], T is missing from the description of porcelain
status letters in git-status(1) (whereas T *is* documented in
git-diff-files(1) and friends). Document T right after M (modified)
because the two are very similar.

[1] https://github.com/fish-shell/fish-shell/issues/8311

Signed-off-by: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-04 13:07:18 -07:00
Johannes Altmanninger
55e7f52b40 Documentation/diff-format: state in which cases porcelain status is T
Porcelain status letter T is documented as "type of the file", which
is technically correct but not enough information for users that are
not so familiar with this term from systems programming. In particular,
given that the only supported file types are "regular file", "symbolic
link" and "submodule", the term "file type" is surely opaque to the
many(?) users who are not aware that symbolic links can be tracked -
I thought that a "chmod +x" could cause the T status (wrong, it's M).

Explicitly document the three file types so users know if/how they
want to handle this.

Signed-off-by: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-04 13:07:18 -07:00
Johannes Altmanninger
1566cdd4ae Documentation/git-status: remove impossible porcelain status DR and DC
Commit 176ea74793 ("wt-status.c: handle worktree renames", 2017-12-27)
made a porcelain status like .R or .C possible. They occur only when
the source file is added to the index and the destination file is
added with --intent-to-add.

They also documented DR, but that status is impossible.  The index
change D means that the source file does not exist in the index.
The worktree change R/C states that the file has been renamed/copied
since the index, but that's impossible if it did not exist there.

Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Altmanninger <aclopte@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-04 13:07:18 -07:00
René Scharfe
100c2da2d3 p3400: stop using tac(1)
b3dfeebb92 (rebase: avoid computing unnecessary patch IDs, 2016-07-29)
added a perf test that calls tac(1) from GNU core utilities.  Support
systems without it by reversing the generated list using sort -nr
instead.  sort(1) with options -n and -r is already used in other tests.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-03 22:07:21 -07:00
Junio C Hamano
0785eb7698 The tenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-03 21:49:43 -07:00
Junio C Hamano
3a757d0369 Merge branch 'ah/connect-parse-feature-v0-fix'
Protocol v0 clients can get stuck parsing a malformed feature line.

* ah/connect-parse-feature-v0-fix:
  connect: also update offset for features without values
2021-10-03 21:49:21 -07:00
Junio C Hamano
93cccedb8f Merge branch 'ab/make-compdb-fix'
Build update.

* ab/make-compdb-fix:
  Makefile: pass -Wno-pendantic under GENERATE_COMPILATION_DATABASE=yes
2021-10-03 21:49:21 -07:00
Junio C Hamano
09bde81f29 Merge branch 'bs/difftool-msg-tweak'
Message tweak.

* bs/difftool-msg-tweak:
  difftool: fix word spacing in the usage strings
2021-10-03 21:49:21 -07:00
Junio C Hamano
068966d2e8 Merge branch 'rs/close-pack-leakfix'
Leakfix.

* rs/close-pack-leakfix:
  packfile: release bad_objects in close_pack()
2021-10-03 21:49:20 -07:00
Junio C Hamano
0a2b53c2f2 Merge branch 'ab/bundle-remove-verbose-option'
Doc update.

* ab/bundle-remove-verbose-option:
  bundle: remove ignored & undocumented "--verbose" flag
2021-10-03 21:49:20 -07:00
Junio C Hamano
65351024eb Merge branch 'ab/auto-depend-with-pedantic'
Improve build procedure for developers.

* ab/auto-depend-with-pedantic:
  Makefile: make COMPUTE_HEADER_DEPENDENCIES=auto work with DEVOPTS=pedantic
2021-10-03 21:49:20 -07:00
Junio C Hamano
2498121cd6 Merge branch 'ab/make-clean-depend-dirs'
"make clean" has been updated to remove leftover .depend/
directories, even when it is not told to use them to compute header
dependencies.

* ab/make-clean-depend-dirs:
  Makefile: clean .depend dirs under COMPUTE_HEADER_DEPENDENCIES != yes
2021-10-03 21:49:19 -07:00
Junio C Hamano
cbb1ae05d5 Merge branch 'ds/perf-test-built-path-fix'
Perf test fix.

* ds/perf-test-built-path-fix:
  t/perf/run: fix bin-wrappers computation
2021-10-03 21:49:19 -07:00
Junio C Hamano
58e2bc452b Merge branch 'jk/http-redact-fix'
Sensitive data in the HTTP trace were supposed to be redacted, but
we failed to do so in HTTP/2 requests.

* jk/http-redact-fix:
  http: match headers case-insensitively when redacting
2021-10-03 21:49:19 -07:00
Junio C Hamano
976d3f00d6 Merge branch 'bs/ls-files-opt-help-text-update'
Help text for "ls-files" options have been updated.

* bs/ls-files-opt-help-text-update:
  ls-files: use imperative mood for -X and -z option description
2021-10-03 21:49:19 -07:00
Junio C Hamano
6a4f5dadd3 Merge branch 'da/difftool-dir-diff-symlink-fix'
"git difftool --dir-diff" mishandled symbolic links.

* da/difftool-dir-diff-symlink-fix:
  difftool: fix symlink-file writing in dir-diff mode
2021-10-03 21:49:19 -07:00
Junio C Hamano
92382d14cd Merge branch 'hn/refs-errno-cleanup'
Futz with the way 'errno' is relied on in the refs API to carry the
failure modes up the call chain.

* hn/refs-errno-cleanup:
  refs: make errno output explicit for read_raw_ref_fn
  refs/files-backend: stop setting errno from lock_ref_oid_basic
  refs: remove EINVAL errno output from specification of read_raw_ref_fn
  refs file backend: move raceproof_create_file() here
2021-10-03 21:49:18 -07:00
Junio C Hamano
842d45d293 Merge branch 'ab/refs-files-cleanup'
Continued work on top of the hn/refs-errno-cleanup topic.

* ab/refs-files-cleanup:
  refs/files: remove unused "errno != ENOTDIR" condition
  refs/files: remove unused "errno == EISDIR" code
  refs/files: remove unused "oid" in lock_ref_oid_basic()
  refs API: remove OID argument to reflog_expire()
  reflog expire: don't lock reflogs using previously seen OID
  refs/files: add a comment about refs_reflog_exists() call
  refs: make repo_dwim_log() accept a NULL oid
  refs/debug: re-indent argument list for "prepare"
  refs/files: remove unused "skip" in lock_raw_ref() too
  refs/files: remove unused "extras/skip" in lock_ref_oid_basic()
  refs: drop unused "flags" parameter to lock_ref_oid_basic()
  refs/files: remove unused REF_DELETING in lock_ref_oid_basic()
  refs/packet: add missing BUG() invocations to reflog callbacks
2021-10-03 21:49:18 -07:00
Junio C Hamano
1030daecda Merge branch 'cb/cvsserver'
"git cvsserver" had a long-standing bug in its authentication code,
which has finally been corrected (it is unclear and is a separate
question if anybody is seriously using it, though).

* cb/cvsserver:
  Documentation: cleanup git-cvsserver
  git-cvsserver: protect against NULL in crypt(3)
  git-cvsserver: use crypt correctly to compare password hashes
2021-10-03 21:49:17 -07:00
Junio C Hamano
df9c83bdf7 Merge branch 'jx/ci-l10n'
CI help for l10n.

* jx/ci-l10n:
  ci: new github-action for git-l10n code review
2021-10-03 21:49:17 -07:00
Junio C Hamano
ac162a606b Merge branch 'jk/clone-unborn-head-in-bare'
"git clone" from a repository whose HEAD is unborn into a bare
repository didn't follow the branch name the other side used, which
is corrected.

* jk/clone-unborn-head-in-bare:
  clone: handle unborn branch in bare repos
2021-10-03 21:49:17 -07:00
Junio C Hamano
4a6fd7d3c7 Merge branch 'en/stash-df-fix'
"git stash", where the tentative change involves changing a
directory to a file (or vice versa), was confused, which has been
corrected.

* en/stash-df-fix:
  stash: restore untracked files AFTER restoring tracked files
  stash: avoid feeding directories to update-index
  t3903: document a pair of directory/file bugs
2021-10-03 21:49:16 -07:00
Taylor Blau
324efc90d1 builtin/repack.c: pass --refs-snapshot when writing bitmaps
To prevent the race described in an earlier patch, generate and pass a
reference snapshot to the multi-pack bitmap code, if we are writing one
from `git repack`.

This patch is mostly limited to creating a temporary file, and then
calling for_each_ref(). Except we try to minimize duplicates, since
doing so can drastically reduce the size in network-of-forks style
repositories. In the kernel's fork network (the repository containing
all objects from the kernel and all its forks), deduplicating the
references drops the snapshot size from 934 MB to just 12 MB.

But since we're handling duplicates in this way, we have to make sure
that we preferred references (those listed in pack.preferBitmapTips)
before non-preferred ones (to avoid recording an object which is pointed
at by a preferred tip as non-preferred).

We accomplish this by doing separate passes over the references: first
visiting each prefix in pack.preferBitmapTips, and then over the rest of
the references.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 16:40:09 -07:00
Bagas Sanjaya
273c9c5777 bisect--helper: add space between colon and following sentence
Add missing space between colon sentence (`bisect-run failed:`) and the
following sentence (`git bisect--helper --bisect-state`).

Fixes: d1bbbe45df (bisect--helper: reimplement `bisect_run` shell
function in C, 2021-09-13)
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:47:53 -07:00
Bagas Sanjaya
38c356aad6 blame: describe default output format
While there is descriptions for porcelain and incremental output
formats, the default format isn't described. Describe that format for
the sake of completeness.

Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:44:32 -07:00
Ævar Arnfjörð Bjarmason
96e41f58fe fsck: report invalid object type-path combinations
Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

We need to add the "object corrupt or missing" special-case to deal
with cases where read_loose_object() will return an error before
completing check_object_signature(), e.g. if we have an error in
unpack_loose_rest() because we find garbage after the valid gzip
content:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2
    $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ git fsck
    error: garbage at end of loose object 'e69d[...]'
    error: unable to unpack contents of ./objects/e6/9d[...]
    error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...]

There is currently some weird messaging in the edge case when the two
are combined, i.e. because we're not explicitly passing along an error
state about this specific scenario from check_stream_oid() via
read_loose_object() we'll end up printing the null OID if an object is
of an unknown type *and* it can't be unpacked by zlib, e.g.:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    $ /usr/bin/git fsck
    fatal: invalid object type
    $ ~/g/git/git fsck
    error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f'
    error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f
    [...]

I think it's OK to leave that for future improvements, which would
involve enum-ifying more error state as we've done with "enum
unpack_loose_header_result" in preceding commits. In these
increasingly more obscure cases the worst that can happen is that
we'll get slightly nonsensical or inapplicable error messages.

There's other such potential edge cases, all of which might produce
some confusing messaging, but still be handled correctly as far as
passing along errors goes. E.g. if check_object_signature() returns
and oideq(real_oid, null_oid()) is true, which could happen if it
returns -1 due to the read_istream() call having failed.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:01 -07:00
Ævar Arnfjörð Bjarmason
31deb28f5e fsck: don't hard die on invalid object types
Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it to accept a "struct object_info" (which contains the
OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See
f6371f9210 (sha1_file: add read_loose_object() function, 2017-01-13)
for the introduction of read_loose_object().

Since we'll need a "struct strbuf" to hold the "type_name" let's pass
it to the for_each_loose_file_in_objdir() callback to avoid allocating
a new one for each loose object in the iteration. It also makes the
memory management simpler than sticking it in fsck_loose() itself, as
we'll only need to strbuf_reset() it, with no need to do a
strbuf_release() before each "return".

Before this commit we'd never check the "type" if read_loose_object()
failed, but now we do. We therefore need to initialize it to OBJ_NONE
to be able to tell the difference between e.g. its
unpack_loose_header() having failed, and us getting past that and into
parse_loose_header().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:01 -07:00
Ævar Arnfjörð Bjarmason
dccb32bf01 object-file.c: stop dying in parse_loose_header()
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.

For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.

For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

Since 93cff9a978 (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".

The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4 (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.

We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f9210 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
5848fb11ac object-file.c: return ULHR_TOO_LONG on "header too long"
Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return ULHR_TOO_LONG saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
3b6a8db3b0 object-file.c: use "enum" return type for unpack_loose_header()
In a preceding commit we changed and documented unpack_loose_header()
from its previous behavior of returning any negative value or zero, to
only -1 or 0.

Let's add an "enum unpack_loose_header_result" type and use it for
these return values, and have the compiler assert that we're
exhaustively covering all of them.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
01cab97679 object-file.c: simplify unpack_loose_short_header()
Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483e (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
ddb3474b66 object-file.c: make parse_loose_header_extended() public
Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

It would be a better end-state to have already moved the declaration
of these functions to object-store.h to avoid the forward declaration
of "struct object_info" in cache.h, but let's leave that cleanup for
some other time.

1. https://lore.kernel.org/git/patch-v6-09.22-5b9278e7bb4-20210907T104559Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
bfff2c4833 object-file.c: return -1, not "status" from unpack_loose_header()
Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f842690 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f842690 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4 (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
74ad250a1c object-file.c: don't set "typep" when returning non-zero
When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483e (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
dd45a56246 cat-file tests: test for current --allow-unknown-type behavior
Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:06:00 -07:00
Ævar Arnfjörð Bjarmason
7e7d220d9d cat-file tests: add corrupt loose object test
Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason
59b8283d55 cat-file tests: test for missing/bogus object with -t, -s and -p
When we look up a missing object with cat_one_file() what error we
print out currently depends on whether we'll error out early in
get_oid_with_context(), or if we'll get an error later from
oid_object_info_extended().

The --allow-unknown-type flag then changes whether we pass the
"OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag to get_oid_with_context() or
not.

The "-p" flag is yet another special-case in printing the same output
on the deadbeef OID as we'd emit on the deadbeef_short OID for the
"-s" and "-t" options, it also doesn't support the
"--allow-unknown-type" flag at all.

Let's test the combination of the two sets of [-t, -s, -p] and
[--{no-}allow-unknown-type] (the --no-allow-unknown-type is implicit
in not supplying it), as well as a [missing,bogus] object pair.

This extends tests added in 3e370f9faf (t1006: add tests for git
cat-file --allow-unknown-type, 2015-05-03).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason
70e4a57762 cat-file tests: move bogus_* variable declarations earlier
Change the short/long bogus bogus object type variables into a form
where the two sets can be used concurrently. This'll be used by
subsequently added tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason
a5ed333121 fsck tests: test for garbage appended to a loose object
There wasn't any output tests for this scenario, let's ensure that we
don't regress on it in the changes that come after this.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason
42cd635b21 fsck tests: test current hash/type mismatch behavior
If fsck we move an object around between .git/objects/?? directories
to simulate a hash mismatch "git fsck" will currently hard die() in
object-file.c. This behavior will be fixed in subsequent commits, but
let's test for it as-is for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason
f7a0dba7a2 fsck tests: refactor one test to use a sub-repo
Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We can instead use the pattern of creating a named sub-repository,
then we don't have to worry about cleaning up after ourselves, nobody
will care what state the broken "hash-mismatch" repository is after
this test runs.

See [1] for related discussion on various "modern" test patterns that
can be used to avoid verbosity and increase reliability.

1. https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason
093fffdfbe fsck tests: add test for fsck-ing an unknown type
Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 15:05:59 -07:00
Ævar Arnfjörð Bjarmason
59580685be config.h: remove unused git_config_get_untracked_cache() declaration
This function was removed in ad0fb65999 (repo-settings: parse
core.untrackedCache, 2019-08-13), but not its corresponding *.h entry.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason
067e73c8ae log-tree.h: remove unused function declarations
The init_log_tree_opt() and log_tree_opt_parse() functions were
removed in cd2bdc5309 (Common option parsing for "git log --diff" and
friends, 2006-04-14), but not their corresponding *.h declaration.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason
1fd2aa543d grep.h: remove unused grep_threads_ok() declaration
This function was removed in 0579f91dd7 (grep: enable threading with
-p and -W using lazy attribute lookup, 2011-12-12), but not its
corresponding *.h declaration.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason
f787ebd51c builtin.h: remove cmd_tar_tree() declaration
The cmd_tar_tree() function itself was removed in
925ceccf05 (tar-tree: remove deprecated command, 2013-11-10).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 14:39:46 -07:00
Ævar Arnfjörð Bjarmason
0000e81811 builtin/remote.c: add and use SHOW_INFO_INIT
In the preceding commit we introduced REF_STATES_INIT, but did not
change the "struct show_info" to have a corresponding
initializer. Let's do that, and make it use "REF_STATES_INIT" and
"STRING_LIST_INIT_DUP", doing that requires changing "list" and
"states" away from being pointers.

The resulting end-state is simpler since we omit the local "info_list"
and "states" variables in show() as well as the memset().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 14:22:51 -07:00
Ævar Arnfjörð Bjarmason
0bc7787ca9 builtin/remote.c: add and use a REF_STATES_INIT
Use a new REF_STATES_INIT designated initializer instead of assigning
to the "strdup_strings" member of the previously memzero()'d version
of this struct.

The pattern of assigning to "strdup_strings" dates back to
211c89682e (Make git-remote a builtin, 2008-02-29) (when it was
"strdup_paths"), i.e. long before we used anything like our current
established *_INIT patterns consistently.

Then in e61e0cc6b7 (builtin-remote: teach show to display remote
HEAD, 2009-02-25) and e5dcbfd9ab (builtin-remote: new show output
style for push refspecs, 2009-02-25) we added some more of these.

As it turns out we only initialized this struct three times, all the
other uses were of pointers to those initialized structs. So let's
initialize it in those three places, skip the memset(), and pass those
structs down appropriately.

This would be a behavior change if we had codepaths that relied say on
implicitly having had "new_refs" initialized to STRING_LIST_INIT_NODUP
with the memset(), but only set the "strdup_strings" on some other
struct, but then called string_list_append() on "new_refs". There
isn't any such codepath, all of the late assignments to
"strdup_strings" assigned to those structs that we'd use for those
codepaths.

So just initializing them all up-front makes for easier to understand
code, i.e. in the pre-image it looked as though we had that tricky
edge case, but we didn't.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 14:22:51 -07:00
Ævar Arnfjörð Bjarmason
73ee449bbf urlmatch.[ch]: add and use URLMATCH_CONFIG_INIT
Change the initialization pattern of "struct urlmatch_config" to use
an *_INIT macro and designated initializers. Right now there's no
other "struct" member of "struct urlmatch_config" which would require
its own *_INIT, but it's good practice not to assume that. Let's also
change this to a designated initializer while we're at it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 14:22:51 -07:00
René Scharfe
afc72b5d3a mergesort: use ranks stack
The bottom-up mergesort implementation needs to skip through sublists a
lot.  A recursive version could avoid that, but would require log2(n)
stack frames.  Explicitly manage a stack of sorted sublists of various
lengths instead to avoid fast-forwarding while also keeping a lid on
memory usage.

While this patch was developed independently, a ranks stack is also used
in https://github.com/mono/mono/blob/master/mono/eglib/sort.frag.h by
the Mono project.

The idea is to keep slots for log2(n_max) sorted sublists, one for each
power of 2.  Such a construct can accommodate lists of any length up to
n_max.  Since there is a known maximum number of items (effectively
SIZE_MAX), we can preallocate the whole rank stack.

We add items one by one, which is akin to incrementing a binary number.
Make use of that by keeping track of the number of items and check bits
in it instead of checking for NULL in the rank stack when checking if a
sublist of a certain rank exists, in order to avoid memory accesses.

The first item can go into the empty first slot as a sublist of length
2^0.  The second one needs to be merged with the previous sublist and
the result goes into the empty second slot as a sublist of length 2^1.
The third one goes into vacated first slot and so on.  At the end we
merge all the sublists to get the result.

The new version still performs a stable sort by making sure to put items
seen earlier first when the compare function indicates equality.  That's
done by preferring items from sublists with a higher rank.

The new merge function also tries to minimize the number of operations.
Like blame.c::blame_merge(), the function doesn't set the next pointer
if it already points to the right item, and it exits when it reaches the
end of one of the two sublists that it's given.  The old code couldn't
do the latter because it kept all items in a single list.

The number of comparisons stays the same, though.  Here's example output
of "test-tool mergesort test" for the rand distributions with the most
number of comparisons with the ranks stack:

   $ t/helper/test-tool mergesort test | awk '
       NR > 1 && $1 != "rand" {next}
       $7 > max[$3] {max[$3] = $7; line[$3] = $0}
       END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
rand      copy                  100       32      669      420      569 OK
rand      dither               1023       64     9997     5396     8974 OK
rand      dither               1024      512    10007     6159     8983 OK
rand      dither               1025      256    10993     5988     9968 OK

Here are the differences to the results without this patch:

distribut mode                    n        m get_next set_next  compare
rand      copy                  100       32     -515     -280        0
rand      dither               1023       64    -6376    -4834        0
rand      dither               1024      512    -6377    -4081        0
rand      dither               1025      256    -7461    -5287        0

The numbers of get_next and set_next calls are reduced significantly.

NB: These winners are different than the ones shown in the patch that
introduced the unriffle mode because the addition of the unriffle_skewed
mode in between changed the consumption of rand() values.

Here are the distributions with the most comparisons overall with the
ranks stack:

   $ t/helper/test-tool mergesort test | awk '
       $7 > max[$3] {max[$3] = $7; line[$3] = $0}
       END {for (n in line) print line[n]}
   '

distribut mode                    n        m get_next set_next  compare verdict
sawtooth  unriffle_skewed       100      128      689      632      589 OK
sawtooth  unriffle_skewed      1023     1024    10230    10220     9207 OK
sawtooth  unriffle             1024     1024    10241    10240     9217 OK
sawtooth  unriffle_skewed      1025     2048    11266    10242    10241 OK

And here the differences to before:

distribut mode                    n        m get_next set_next  compare
sawtooth  unriffle_skewed       100      128     -495      -68        0
sawtooth  unriffle_skewed      1023     1024    -6143      -10        0
sawtooth  unriffle             1024     1024    -6143        0        0
sawtooth  unriffle_skewed      1025     2048    -7188    -1033        0

We get a similar reduction of get_next calls here, but only a slight
reduction of set_next calls, if at all.

And here are the results of p0071-sort.sh before:

0071.12: llist_mergesort() unsorted    0.36(0.33+0.01)
0071.14: llist_mergesort() sorted      0.15(0.13+0.01)
0071.16: llist_mergesort() reversed    0.16(0.14+0.01)

... and here the ones with this patch:

0071.12: llist_mergesort() unsorted    0.24(0.22+0.01)
0071.14: llist_mergesort() sorted      0.12(0.10+0.01)
0071.16: llist_mergesort() reversed    0.12(0.10+0.01)

NB: We can't use t/perf/run to compare revisions in one run because it
uses the test-tool from the worktree, not from the revisions being
tested.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 12:43:09 -07:00