Commit Graph

64823 Commits

Author SHA1 Message Date
Elijah Newren
0e29222e0c Documentation: call out commands that nuke untracked files/directories
Some commands have traditionally also removed untracked files (or
directories) that were in the way of a tracked file we needed.  Document
these cases.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
94b7f1563a Comment important codepaths regarding nuking untracked files/dirs
In the last few commits we focused on code in unpack-trees.c that
mistakenly removed untracked files or directories.  There may be more of
those, but in this commit we change our focus: callers of toplevel
commands that are expected to remove untracked files or directories.

As noted previously, we have toplevel commands that are expected to
delete untracked files such as 'read-tree --reset', 'reset --hard', and
'checkout --force'.  However, that does not mean that other highlevel
commands that happen to call these other commands thought about or
conveyed to users the possibility that untracked files could be removed.
Audit the code for such callsites, and add comments near existing
callsites to mention whether these are safe or not.

My auditing is somewhat incomplete, though; it skipped several cases:
  * git-rebase--preserve-merges.sh: is in the process of being
    deprecated/removed, so I won't leave a note that there are
    likely more bugs in that script.
  * contrib/git-new-workdir: why is the -f flag being used in a new
    empty directory??  It shouldn't hurt, but it seems useless.
  * git-p4.py: Don't see why -f is needed for a new dir (maybe it's
    not and is just superfluous), but I'm not at all familiar with
    the p4 stuff
  * git-archimport.perl: Don't care; arch is long since dead
  * git-cvs*.perl: Don't care; cvs is long since dead

Also, the reset --hard in builtin/worktree.c looks safe, due to only
running in an empty directory.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
56d06fe4aa unpack-trees: avoid nuking untracked dir in way of locally deleted file
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
1fdd51aa13 unpack-trees: avoid nuking untracked dir in way of unmerged file
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
480d3d6bf9 Change unpack_trees' 'reset' flag into an enum
Traditionally, unpack_trees_options->reset was used to signal that it
was okay to delete any untracked files in the way.  This was used by
`git read-tree --reset`, but then started appearing in other places as
well.  However, many of the other uses should not be deleting untracked
files in the way.  Change this value to an enum so that a value of 1
(i.e. "true") can be split into two:
   UNPACK_RESET_PROTECT_UNTRACKED,
   UNPACK_RESET_OVERWRITE_UNTRACKED
In order to catch accidental misuses (i.e. where folks call it the way
they traditionally used to), define the special enum value of
   UNPACK_RESET_INVALID = 1
which will trigger a BUG().

Modify existing callers so that
   read-tree --reset
   reset --hard
   checkout --force
continue using the UNPACK_RESET_OVERWRITE_UNTRACKED logic, while other
callers, including
   am
   checkout without --force
   stash  (though currently dead code; reset always had a value of 0)
   numerous callers from rebase/sequencer to reset_head()
will use the new UNPACK_RESET_PROTECT_UNTRACKED value.

Also, note that it has been reported that 'git checkout <treeish>
<pathspec>' currently also allows overwriting untracked files[1].  That
case should also be fixed, but it does not use unpack_trees() and thus
is outside the scope of the current changes.

[1] https://lore.kernel.org/git/15dad590-087e-5a48-9238-5d2826950506@gmail.com/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
1b5f37334a Remove ignored files by default when they are in the way
Change several commands to remove ignored files by default when they are
in the way.  Since some commands (checkout, merge) take a
--no-overwrite-ignore option to allow the user to configure this, and it
may make sense to add that option to more commands (and in the case of
merge, actually plumb that configuration option through to more of the
backends than just the fast-forwarding special case), add little
comments about where such flags would be used.

Incidentally, this fixes a test failure in t7112.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
c42e0b6409 unpack-trees: make dir an internal-only struct
Avoid accidental misuse or confusion over ownership by clearly making
unpack_trees_options.dir an internal-only variable.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
04988c8d18 unpack-trees: introduce preserve_ignored to unpack_trees_options
Currently, every caller of unpack_trees() that wants to ensure ignored
files are overwritten by default needs to:
   * allocate unpack_trees_options.dir
   * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
   * call setup_standard_excludes
AND then after the call to unpack_trees() needs to
   * call dir_clear()
   * deallocate unpack_trees_options.dir
That's a fair amount of boilerplate, and every caller uses identical
code.  Make this easier by instead introducing a new boolean value where
the default value (0) does what we want so that new callers of
unpack_trees() automatically get the appropriate behavior.  And move all
the handling of unpack_trees_options.dir into unpack_trees() itself.

While preserve_ignored = 0 is the behavior we feel is the appropriate
default, we defer fixing commands to use the appropriate default until a
later commit.  So, this commit introduces several locations where we
manually set preserve_ignored=1.  This makes it clear where code paths
were previously preserving ignored files when they should not have been;
a future commit will flip these to instead use a value of 0 to get the
behavior we want.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
491a7575f1 read-tree, merge-recursive: overwrite ignored files by default
This fixes a long-standing patchwork of ignored files handling in
read-tree and merge-recursive, called out and suggested by Junio long
ago.  Quoting from commit dcf0c16ef1 ("core.excludesfile clean-up"
2007-11-16):

    git-read-tree takes --exclude-per-directory=<gitignore>,
    not because the flexibility was needed.  Again, this was
    because the option predates the standardization of the ignore
    files.

    ...

    On the other hand, I think it makes perfect sense to fix
    git-read-tree, git-merge-recursive and git-clean to follow the
    same rule as other commands.  I do not think of a valid use case
    to give an exclude-per-directory that is nonstandard to
    read-tree command, outside a "negative" test in the t1004 test
    script.

    This patch is the first step to untangle this mess.

    The next step would be to teach read-tree, merge-recursive and
    clean (in C) to use setup_standard_excludes().

History shows each of these were partially or fully fixed:

  * clean was taught the new trick in 1617adc7a0 ("Teach git clean to
    use setup_standard_excludes()", 2007-11-14).

  * read-tree was primarily used by checkout & merge scripts.  checkout
    and merge later became builtins and were both fixed to use the new
    setup_standard_excludes() handling in fc001b526c ("checkout,merge:
    loosen overwriting untracked file check based on info/exclude",
    2011-11-27).  So the primary users were fixed, though read-tree
    itself was not.

  * merge-recursive has now been replaced as the default merge backend
    by merge-ort.  merge-ort fixed this by using
    setup_standard_excludes() starting early in its implementation; see
    commit 6681ce5cf6 ("merge-ort: add implementation of checkout()",
    2020-12-13), largely due to its design depending on checkout() and
    thus being influenced by the checkout code.  However,
    merge-recursive itself was not fixed here, in part because its
    design meant it had difficulty differentiating between untracked
    files, ignored files, leftover tracked files that haven't been
    removed yet due to order of processing files, and files written by
    itself due to collisions).

Make the conversion more complete by now handling read-tree and
handling at least the unpack_trees() portion of merge-recursive.  While
merge-recursive is on its way out, fixing the unpack_trees() portion is
easy and facilitates some of the later changes in this series.  Note
that fixing read-tree makes the --exclude-per-directory option to
read-tree useless, so we remove it from the documentation (though we
continue to accept it if passed).

The read-tree changes happen to fix a bug in t1013.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren
c512d27e78 checkout, read-tree: fix leak of unpack_trees_options.dir
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Carlo Marcelo Arenas Belón
2d84c4ed57 lazyload.h: use an even more generic function pointer than FARPROC
gcc will helpfully raise a -Wcast-function-type warning when casting
between functions that might have incompatible return types
(ex: GetUserNameExW returns bool which is only half the size of the
return type from FARPROC which is long long), so create a new type that
could be used as a completely generic function pointer and cast through
it instead.

Additionaly remove the -Wno-incompatible-pointer-types temporary
flag added in 27e0c3c (win32: allow building with pedantic mode
enabled, 2021-09-03), as it will be no longer needed.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:13:58 -07:00
Jeff King
67985e4e4a refs: drop "broken" flag from for_each_fullref_in()
No callers pass in anything but "0" here. Likewise to our sibling
functions. Note that some of them ferry along the flag, but none of
their callers pass anything but "0" either.

Nor is anybody likely to change that. Callers which really want to see
all of the raw refs use for_each_rawref(). And anybody interested in
iterating a subset of the refs will likely be happy to use the
now-default behavior of showing broken refs, but omitting dangling
symlinks.

So we can get rid of this whole feature.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
2d653c5036 ref-filter: drop broken-ref code entirely
Now that none of our callers passes the INCLUDE_BROKEN flag, we can drop
it entirely, along with the code to plumb it through to the
for_each_fullref_in() functions.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
1763334caf ref-filter: stop setting FILTER_REFS_INCLUDE_BROKEN
Of the ref-filter callers, for-each-ref and git-branch both set the
INCLUDE_BROKEN flag (but git-tag does not, which is a weird
inconsistency).  But now that GIT_REF_PARANOIA is on by default, that
produces almost the same outcome for all three.

The one exception is that GIT_REF_PARANOIA will omit dangling symrefs.
That's a better behavior for these tools, as they would never include
such a symref in the main output anyway (they can't, as it doesn't point
to an object). Instead they issue a warning to stderr. But that warning
is somewhat useless; a dangling symref is a perfectly reasonable thing
to have in your repository, and is not a sign of corruption. It's much
friendlier to just quietly ignore it.

And in terms of robustness, the warning gains us little. It does not
impact the exit code of either tool. So while the warning _might_ clue
in a user that they have an unexpected broken symref, it would not help
any kind of scripted use.

This patch converts for-each-ref and git-branch to stop using the
INCLUDE_BROKEN flag. That gives them more reasonable behavior, and
harmonizes them with git-tag.

We have to change one test to adapt to the situation. t1430 tries to
trigger all of the REF_ISBROKEN behaviors from the underlying ref code.
It uses for-each-ref to do so (because there isn't any other mechanism).
That will no longer issue a warning about the symref which points to an
invalid name, as it's considered dangling (and we can instead be sure
that it's _not_ mentioned on stderr). Note that we do still complain
about the illegally named "broken..symref"; its problem is not that it's
dangling, but the name of the symref itself is illegal.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
5d1f5b8cd4 repack, prune: drop GIT_REF_PARANOIA settings
Now that GIT_REF_PARANOIA is the default, we don't need to selectively
enable it for destructive operations. In fact, it's harmful to do so,
because it overrides any GIT_REF_PARANOIA=0 setting that the user may
have provided (because they're trying to work around some corruption).

With these uses gone, we can further clean up the ref_paranoia global,
and make it a static variable inside the refs code.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
968f12fdac refs: turn on GIT_REF_PARANOIA by default
The original point of the GIT_REF_PARANOIA flag was to include broken
refs in iterations, so that possibly-destructive operations would not
silently ignore them (and would generally instead try to operate on the
oids and fail when the objects could not be accessed).

We already turned this on by default for some dangerous operations, like
"repack -ad" (where missing a reachability tip would mean dropping the
associated history). But it was not on for general use, even though it
could easily result in the spreading of corruption (e.g., imagine
cloning a repository which simply omits some of its refs because
their objects are missing; the result quietly succeeds even though you
did not clone everything!).

This patch turns on GIT_REF_PARANOIA by default. So a clone as mentioned
above would actually fail (upload-pack tells us about the broken ref,
and when we ask for the objects, pack-objects fails to deliver them).
This may be inconvenient when working with a corrupted repository, but:

  - we are better off to err on the side of complaining about
    corruption, and then provide mechanisms for explicitly loosening
    safety.

  - this is only one type of corruption anyway. If we are missing any
    other objects in the history that _aren't_ ref tips, then we'd
    behave similarly (happily show the ref, but then barf when we
    started traversing).

We retain the GIT_REF_PARANOIA variable, but simply default it to "1"
instead of "0". That gives the user an escape hatch for loosening this
when working with a corrupt repository. It won't work across a remote
connection to upload-pack (because we can't necessarily set environment
variables on the remote), but there the client has other options (e.g.,
choosing which refs to fetch).

As a bonus, this also makes ref iteration faster in general (because we
don't have to call has_object_file() for each ref), though probably not
noticeably so in the general case. In a repo with a million refs, it
shaved a few hundred milliseconds off of upload-pack's advertisement;
that's noticeable, but most repos are not nearly that large.

The possible downside here is that any operation which iterates refs but
doesn't ever open their objects may now quietly claim to have X when the
object is corrupted (e.g., "git rev-list new-branch --not --all" will
treat a broken ref as uninteresting). But again, that's not really any
different than corruption below the ref level. We might have
refs/heads/old-branch as non-corrupt, but we are not actively checking
that we have the entire reachable history. Or the pointed-to object
could even be corrupted on-disk (but our "do we have it" check would
still succeed). In that sense, this is merely bringing ref-corruption in
line with general object corruption.

One alternative implementation would be to actually check for broken
refs, and then _immediately die_ if we see any. That would cause the
"rev-list --not --all" case above to abort immediately. But in many ways
that's the worst of all worlds:

  - it still spends time looking up the objects an extra time

  - it still doesn't catch corruption below the ref level

  - it's even more inconvenient; with the current implementation of
    GIT_REF_PARANOIA for something like upload-pack, we can make
    the advertisement and let the client choose a non-broken piece of
    history. If we bail as soon as we see a broken ref, they cannot even
    see the advertisement.

The test changes here show some of the fallout. A non-destructive "git
repack -adk" now fails by default (but we can override it). Deleting a
broken ref now actually tells the hooks the correct "before" state,
rather than a confusing null oid.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
6d751be4b6 refs: omit dangling symrefs when using GIT_REF_PARANOIA
Dangling symrefs aren't actually a corruption problem. It's perfectly
fine for refs/remotes/origin/HEAD to point to an unborn branch. And in
particular, if you are trying to establish reachability, a symref that
points nowhere doesn't matter either way. Any ref it could point to will
be examined during the rest of the traversal.

It's possible that a symref pointing nowhere _could_ be a sign that the
ref it was meant to point to was deleted accidentally (e.g., via
corruption). But there is no particular reason to think that is true for
any given case, and in the meantime, GIT_REF_PARANOIA kicking in
automatically for some operations means they'll fail unnecessarily.

So let's loosen it just a bit. The new test in t5312 shows off an
example that is safe, but currently fails (and no longer does after this
patch).

Note that we don't do anything if the caller explicitly asked for
DO_FOR_EACH_INCLUDE_BROKEN. In that case they may be looking for
dangling symrefs themselves, and setting GIT_REF_PARANOIA should not
_loosen_ things from what the caller asked for.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
8dccb2244c refs: add DO_FOR_EACH_OMIT_DANGLING_SYMREFS flag
When the DO_FOR_EACH_INCLUDE_BROKEN flag is used, we include both actual
corrupt refs (illegal names, missing objects), but also symrefs that
point to nothing. This latter is not really a corruption, but just
something that may happen normally. For example, the symref at
refs/remotes/origin/HEAD may point to a tracking branch which is later
deleted. (The local HEAD may also be unborn, of course, but we do not
access it through ref iteration).

Most callers of for_each_ref() etc, do not care. They don't pass
INCLUDE_BROKEN, so don't see it at all. But for those which do pass it,
this somewhat-normal state causes extra warnings (e.g., from
for-each-ref) or even aborts operations (destructive repacks with
GIT_REF_PARANOIA set).

This patch just introduces the flag and the mechanism; there are no
callers yet (and hence no tests). Two things to note on the
implementation:

  - we actually skip any symref that does not resolve to a ref. This
    includes ones which point to an invalidly-named ref. You could argue
    this is a more serious breakage than simple dangling. But the
    overall effect is the same (we could not follow the symref), as well
    as the impact on things like REF_PARANOIA (either way, a symref we
    can't follow won't impact reachability, because we'll see the ref
    itself during iteration). The underlying resolution function doesn't
    distinguish these two cases (they both get REF_ISBROKEN).

  - we change the iterator in refs/files-backend.c where we check
    INCLUDE_BROKEN. There's a matching spot in refs/packed-backend.c,
    but we don't know need to do anything there. The packed backend does
    not support symrefs at all.

The resulting set of flags might be a bit easier to follow if we broke
this down into "INCLUDE_CORRUPT_REFS" and "INCLUDE_DANGLING_SYMREFS".
But there are a few reasons not do so:

  - adding a new OMIT_DANGLING_SYMREFS flag lets us leave existing
    callers intact, without changing their behavior (and some of them
    really do want to see the dangling symrefs; e.g., t5505 has a test
    which expects us to report when a symref becomes dangling)

  - they're not actually independent. You cannot say "include dangling
    symrefs" without also including refs whose objects are not
    reachable, because dangling symrefs by definition do not have an
    object. We could tweak the implementation to distinguish this, but
    in practice nobody wants to ask for that. Adding the OMIT flag keeps
    the implementation simple and makes sure we don't regress the
    current behavior.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
9aab952e85 refs-internal.h: reorganize DO_FOR_EACH_* flag documentation
The documentation for the DO_FOR_EACH_* flags is sprinkled over the
refs-internal.h file. We define the two flags in one spot, and then
describe them in more detail far away from there, in the definitions of
refs_ref_iterator_begin() and ref_iterator_advance_fn().

Let's try to organize this a bit better:

  - convert the #defines to an enum. This makes it clear that they are
    related, and that the enum shows the complete set of flags.

  - combine all descriptions for each flag in a single spot, next to the
    flag's definition

  - use the enum rather than a bare int for functions which take the
    flags. This helps readers realize which flags can be used.

  - clarify the mention of flags for ref_iterator_advance_fn(). It does
    not take flags itself, but is meant to depend on ones set up
    earlier.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
bf708add2e refs-internal.h: move DO_FOR_EACH_* flags next to each other
There are currently two DO_FOR_EACH_* flags, which must not have their
bits overlap. Yet they're defined hundreds of lines apart. Let's move
them next to each other to make it clear that they are related and are a
complete set (which matters if you are adding a new flag and would like
to know what the next available bit is).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
5b062e1f79 t5312: be more assertive about command failure
When repacking or pruning in a corrupted repository, our tests in t5312
argue that it is OK to complete the operation or bail, as long as we
don't actually delete the objects pointed to by the corruption.

This isn't a wrong line of reasoning, but the tests are a bit permissive
by using test_might_fail. The fact is that we _do_ bail currently, and
if we ever stopped doing so, that would be worthy of a human
investigating. So let's switch these to test_must_fail.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
078eecbcbe t5312: test non-destructive repack
In t5312, we create a state with a broken ref, and then make sure that
destructive repacks don't silently ignore the breakage (where a
destructive repack is one that might drop objects). But we don't check
the behavior of non-destructive repacks at all (i.e., ones where we'd
keep unreachable objects).

So let's add a test to confirm the current behavior, which is that
they are allowed (i.e., ignoring the breakage and considering any
objects it points to as unreachable). This may change in the future, but
we'd like for the test suite to alert us to that fact.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:45 -07:00
Jeff King
f805844676 t5312: create bogus ref as necessary
Some tests in t5312 create an illegally-named ref, and then see how
various operations handle it. But between those operations, we also do
some more setup (e.g., repacking), and we are subtly depending on how
those setup steps react to the illegal ref.

To future-proof us against those behaviors changing, let's instead
create and clean up our bogus ref on demand in the tests that need it.

This has two small extra advantages:

 - the tests are more stand-alone; we do not need an extra test to clean
   up the ref before moving on to other parts of the script

 - the creation and cleanup is together in one helper function. Because
   these depend on touching the refs in the filesystem directly, they
   may need to be tweaked for a world with alternate backends (they have
   not been noticed so far in the reftable work because with a non-file
   backend the tests don't fail; they simply become uninteresting noops
   because the broken ref isn't read at all).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:44 -07:00
Jeff King
2ac0cbc9b0 t5312: drop "verbose" helper
t5312 has several uses of the "verbose" helper, as described in
8ad1652418 (t5304: use helper to report failure of "test foo = bar",
2014-10-10). Back then the "-x" trace option for tests was new, and was
not as pleasant to use (e.g., some tests failed under "-x", we did not
support BASH_XTRACEFD, etc).

These days it is clear that "-x" is the preferred way to get extra
output, and we don't need to mark up individual tests. Let's get rid of
the uses of "verbose" here, as one step toward eradicating it totally.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:44 -07:00
Jeff King
da5e0c6a00 t5600: provide detached HEAD for corruption failures
When checking how git-clone behaves when it fails, we stimulate some
failures by trying to do a clone from a local repository whose objects
have been removed. Because these clones use local optimizations, there's
a subtle dependency in how the corruption is handled on the sending
side.

If upload-pack does not show us the broken refs (which it does not
currently), then we see only HEAD (which is itself broken), and clone
that as a detached HEAD. When we try to write the ref, we notice that we
never got the object and bail.

But if upload-pack _does_ show us the broken refs (which it may in a
future patch), then we'll realize that HEAD is a symref and just write
that. You'd think we'd fail when writing out the refs themselves, but we
don't; we do a bulk write and skip the connectivity check because of our
--local optimizations. For the non-bare case, we do notice the problem
when we try to checkout. But for a bare repository, we unexpectedly
complete the clone successfully!

At first glance this may seem like a bug. But the whole point of those
local optimizations is to give up some safety for speed. If you want to
be careful, you should be using "--no-local", which would notice that
the pack did not transfer sufficient objects. We could do that in these
tests, but part of the point is for them to fail at specific moments
(and indeed, we have a later test that checks for transport failure).

However, we can make this less subtle and future-proof it against
changes on the upload-pack side by just having an explicit detached
HEAD in the corrupted repo. Now we'll fail as expected during the ref
write if any ref _or_ HEAD is corrupt, whether we're --bare or not.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:44 -07:00
Jeff King
e9de7a52a5 t5516: don't use HEAD ref for invalid ref-deletion tests
A few tests in t5516 want to assert that we can delete a corrupted ref
whose pointed-to object is missing. They do so by using the "main"
branch, which is also pointed to by HEAD.

This does work, but only because of a subtle assumption about the
implementation. We do not block the deletion because of the invalid ref,
but we _also_ do not notice that the deleted branch is pointed to by
HEAD. And so the safety rule of "do not allow HEAD to be deleted in a
non-bare repository" does not kick in, and the test passes.

Let's instead use a non-HEAD branch. That still tests what we care about
here (deleting a corrupt ref), but without implicitly depending on our
failure to notice that we're deleting HEAD. That will future proof the
test against that behavior changing.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:44 -07:00
Jeff King
b4724242fa t7900: clean up some more broken refs
The "incremental-repack task" test replaces the object directory with a
known state. As a result, some of our refs point to objects that are not
included in that state.

Commit 3cf5f221be (t7900: clean up some broken refs, 2021-01-19) cleaned
up some of those (that were causing warnings to stderr from the
maintenance process). But there are a few more that were missed. These
aren't hurting anything for now, but it's certainly an unexpected state
to leave the test repository in, and it will become a problem if repack
ever gets more picky about broken refs.

Let's clean up those additional refs (which are all in refs/remotes,
with nothing there that isn't broken), and add an extra "for-each-ref"
call to assert that we've got everything.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 12:36:44 -07:00
Ævar Arnfjörð Bjarmason
3e8084f188 http: check CURLE_SSL_PINNEDPUBKEYNOTMATCH when emitting errors
Change the error shown when a http.pinnedPubKey doesn't match to point
the http.pinnedPubKey variable added in aeff8a6121 (http: implement
public key pinning, 2016-02-15), e.g.:

    git -c http.pinnedPubKey=sha256/someNonMatchingKey ls-remote https://github.com/git/git.git
    fatal: unable to access 'https://github.com/git/git.git/' with http.pinnedPubkey configuration: SSL: public key does not match pinned public key!

Before this we'd emit the exact same thing without the " with
http.pinnedPubkey configuration". The advantage of doing this is that
we're going to get a translated message (everything after the ":" is
hardcoded in English in libcurl), and we've got a reference to the
git-specific configuration variable that's causing the error.

Unfortunately we can't test this easily, as there are no tests that
require https:// in the test suite, and t/lib-httpd.sh doesn't know
how to set up such tests. See [1] for the start of a discussion about
what it would take to have divergent "t/lib-httpd/apache.conf" test
setups. #leftoverbits

1. https://lore.kernel.org/git/YUonS1uoZlZEt+Yd@coredump.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 10:58:07 -07:00
Andrzej Hunt
44d2aec6e8 connect: also update offset for features without values
parse_feature_value() takes an offset, and uses it to seek past the
point in features_list that we've already seen. However if the feature
being searched for does not specify a value, the offset is not
updated. Therefore if we call parse_feature_value() in a loop on a
value-less feature, we'll keep on parsing the same feature over and over
again. This usually isn't an issue: there's no point in using
next_server_feature_value() to search for repeated instances of the same
capability unless that capability typically specifies a value - but a
broken server could send a response that omits the value for a feature
even when we are expecting a value.

Therefore we add an offset update calculation for the no-value case,
which helps ensure that loops using next_server_feature_value() will
always terminate.

next_server_feature_value(), and the offset calculation, were first
added in 2.28 in 2c6a403d96 (connect: add function to parse multiple
v1 capability values, 2020-05-25).

Thanks to Peff for authoring the test.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Andrzej Hunt <andrzej@ahunt.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 10:34:41 -07:00
Ævar Arnfjörð Bjarmason
cfe853e66b hook-list.h: add a generated list of hooks, like config-list.h
Make githooks(5) the source of truth for what hooks git supports, and
punt out early on hooks we don't know about in find_hook(). This
ensures that the documentation and the C code's idea about existing
hooks doesn't diverge.

We still have Perl and Python code running its own hooks, but that'll
be addressed by Emily Shaffer's upcoming "git hook run" command.

This resolves a long-standing TODO item in bugreport.c of there being
no centralized listing of hooks, and fixes a bug with the bugreport
listing only knowing about 1/4 of the p4 hooks. It didn't know about
the recent "reference-transaction" hook either.

We could make the find_hook() function die() or BUG() out if the new
known_hook() returned 0, but let's make it return NULL just as it does
when it can't find a hook of a known type. Making it die() is overly
anal, and unlikely to be what we need in catching stupid typos in the
name of some new hook hardcoded in git.git's sources. By making this
be tolerant of unknown hook names, changes in a later series to make
"git hook run" run arbitrary user-configured hook names will be easier
to implement.

I have not been able to directly test the CMake change being made
here. Since 4c2c38e800 (ci: modification of main.yml to use cmake for
vs-build job, 2020-06-26) some of the Windows CI has a hard dependency
on CMake, this change works there, and is to my eyes an obviously
correct use of a pattern established in previous CMake changes,
namely:

 - 061c2240b1 (Introduce CMake support for configuring Git,
    2020-06-12)
 - 709df95b78 (help: move list_config_help to builtin/help,
    2020-04-16)
 - 976aaedca0 (msvc: add a Makefile target to pre-generate the Visual
   Studio solution, 2019-07-29)

The LC_ALL=C is needed because at least in my locale the dash ("-") is
ignored for the purposes of sorting, which results in a different
order. I'm not aware of anything in git that has a hard dependency on
the order, but e.g. the bugreport output would end up using whatever
locale was in effect when git was compiled.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 09:44:54 -07:00
Ævar Arnfjörð Bjarmason
07a348e746 hook.c users: use "hook_exists()" instead of "find_hook()"
Use the new hook_exists() function instead of find_hook() where the
latter was called in boolean contexts. This make subsequent changes in
a series where we further refactor the hook API clearer, as we won't
conflate wanting to get the path of the hook with checking for its
existence.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 09:44:54 -07:00
Emily Shaffer
330155ed8a hook.c: add a hook_exists() wrapper and use it in bugreport.c
Add a boolean version of the find_hook() function for those callers
who are only interested in checking whether the hook exists, not what
the path to it is.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 09:44:54 -07:00
Ævar Arnfjörð Bjarmason
5e3aba33da hook.[ch]: move find_hook() from run-command.c to hook.c
Move the find_hook() function from run-command.c to a new hook.c
library. This change establishes a stub library that's pretty
pointless right now, but will see much wider use with Emily Shaffer's
upcoming "configuration-based hooks" series.

Eventually all the hook related code will live in hook.[ch]. Let's
start that process by moving the simple find_hook() function over
as-is.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 09:44:54 -07:00
Johannes Sixt
d2c470f9bc lazyload.h: fix warnings about mismatching function pointer types
Here, GCC warns about every use of the INIT_PROC_ADDR macro, for example:

In file included from compat/mingw.c:8:
compat/mingw.c: In function 'mingw_strftime':
compat/win32/lazyload.h:38:12: warning: assignment to
   'size_t (*)(char *, size_t,  const char *, const struct tm *)'
   {aka 'long long unsigned int (*)(char *, long long unsigned int,
      const char *, const struct tm *)'} from incompatible pointer type
   'FARPROC' {aka 'long long int (*)()'} [-Wincompatible-pointer-types]
   38 |  (function = get_proc_addr(&proc_addr_##function))
      |            ^
compat/mingw.c:1014:6: note: in expansion of macro 'INIT_PROC_ADDR'
 1014 |  if (INIT_PROC_ADDR(strftime))
      |      ^~~~~~~~~~~~~~

(message wrapped for convenience). Insert a cast to keep the compiler
happy. A cast is fine in these cases because they are generic function
pointer values that have been looked up in a DLL.

Helped-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 09:31:59 -07:00
Derrick Stolee
ca267aee15 t3705: test that 'sparse_entry' is unstaged
The tests in t3705-add-sparse-checkout.sh check to see how 'git add'
behaves with paths outside the sparse-checkout definition. These
currently check to see if a given warning is present but not that the
index is not updated with the sparse entries. Add a new
'test_sparse_entry_unstaged' helper to be sure 'git add' is behaving
correctly.

We need to modify setup_sparse_entry to actually commit the sparse_entry
file so it exists at HEAD and as an entry in the index, but its exact
contents are not staged in the index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 11:43:56 -07:00
Elijah Newren
446cc5544a t2500: add various tests for nuking untracked files
Noting that unpack_trees treats reset=1 & update=1 as license to nuke
untracked files, I looked for code paths that use this combination and
tried to generate testcases which demonstrated unintentional loss of
untracked files and directories.  I found several.

I also include testcases for `git reset --{hard,merge,keep}`.  A hard
reset is perhaps the most direct test of unpack_tree's reset=1 behavior,
but we cannot make `git reset --hard` preserve untracked files without
some migration work.

Also, the two commands `checkout --force` (because of the --force) and
`read-tree --reset` (because it's plumbing and we need to keep it
backward compatible) were left out as we expect those to continue
removing untracked files and directories.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 09:24:25 -07:00
René Scharfe
8c6b4332b4 packfile: release bad_objects in close_pack()
Unusable entries of a damaged pack file are recorded in the oidset
bad_objects.  Release it when we're done with the pack.

This doesn't affect intact packs because an empty oidset requires
no allocation.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 09:22:46 -07:00
Phillip Wood
2b88fe0603 rebase: fix todo-list rereading
54fd3243da ("rebase -i: reread the todo list if `exec` touched it",
2017-04-26) sought to reread the todo list after running an exec
command only if it had been changed. To accomplish this it checks the
stat data of the todo list after running an exec command to see if it
has changed. Unfortunately there are two problems, firstly the
implementation is buggy we actually reread the list after each exec
which is quadratic in the number of commit lookups and secondly the
design is predicated on using nanosecond time stamps which are not the
default.

The implementation bug stems from the fact that we write a new todo
list to disk before running each command but do not update the stat
data to reflect this[1].

The design problem is that it is possible for the user to edit the
todo list without changing its size or inode which means we have to
rely on the mtime to tell us if it has changed. Unfortunately unless
git is built with USE_NSEC it is possible for the original and edited
list to share the same mtime.

Ideally "git rebase --edit-todo" would set a flag that we would then
check in sequencer.c. Unfortunately this is approach will not work as
there are scripts in the wild that write to the todo list directly
without running "git rebase --edit-todo". Instead of relying on stat
data this patch simply reads the possibly edited todo list and
compares it to the original with memcmp(). This is much faster than
reparsing the todo list each time. This patch reduces the time to run

   git rebase -r -xtrue v2.32.0~100 v2.32.0

which runs 419 exec commands by 6.6%. For comparison fixing the
implementation bug in stat based approach reduces the time by a
further 1.4% and is indistinguishable from never rereading the todo
list.

[1] https://lore.kernel.org/git/20191125131833.GD23183@szeder.dev/

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 08:56:28 -07:00
Phillip Wood
dfa8bae5a2 sequencer.c: factor out a function
This code is heavily indented and obscures the high level logic within
the loop. Let's move it to its own function before modifying it in the
next commit. Note that there is a subtle change in behavior if the
todo list cannot be reread. Previously todo_list->current was
incremented before returning, now it returns immediately.

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 08:56:27 -07:00
Eric Wong
0d0d8d8a11 doc/technical: update note about core.multiPackIndex
MIDX files are used by default since commit 18e449f86b
(midx: enable core.multiPackIndex by default, 2020-09-25)

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-24 08:39:53 -07:00
Ævar Arnfjörð Bjarmason
f53df0bdf6 Makefile: remove an out-of-date comment
This comment added in dfea575017 (Makefile: lazily compute header
dependencies, 2010-01-26) has been out of date since
92b88eba9f (Makefile: use `git ls-files` to list header files, if
possible, 2019-03-04), when we did exactly what it tells us not to do
and added $(GENERATED_H) to $(OBJECTS) dependencies.

The rest of it was also somewhere between inaccurate and outdated,
since as of b8ba629264 (Makefile: fold MISC_H into LIB_H, 2012-06-20)
it's not followed by a list of header files, that got moved earlier in
the file into LIB_H in 60d24dd255 (Makefile: fold XDIFF_H and VCSSVN_H
into LIB_H, 2012-07-06).

Let's just remove it entirely, to the extent that we have anything
useful to say here the comment on the
"USE_COMPUTED_HEADER_DEPENDENCIES" variable a few lines above this
change does the job for us.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-23 15:06:47 -07:00
Ævar Arnfjörð Bjarmason
7c81295382 Makefile: don't perform "mv $@+ $@" dance for $(GENERATED_H)
Change the "cmd.sh > $@+ && mv $@+ $@" pattern used for generating the
config-list.h and command-list.h to just "cmd.sh >$@". This was needed
as a guard to ensure that we don't have an empty file if the script
failed, but since 7b76d6bf22 (Makefile: add and use the
".DELETE_ON_ERROR" flag, 2021-06-29) GNU make ensures that doesn't
happen.

There's still a lot of other places in the Makefile where we
needlessly use this pattern, but I'm just changing these because I'm
about to add a new $(GENERATED_H) target, let's have them all look and
act the same way.

Even with ".DELETE_ON_ERROR" there is still a point to using the "mv
$@+ $@" pattern in some cases, e.g. to ensure that you have a working
binary during recompilation (see [1] for the start of a long
discussion about that), but that doesn't apply here. Nothing external
uses $(GENERATED_H) directly, it's only ever used in the context of
the Makefile's own dependency (re-)generation.

1. https://lore.kernel.org/git/8735t93h0u.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-23 15:06:47 -07:00
Ævar Arnfjörð Bjarmason
7c3c0a99cc Makefile: stop hardcoding {command,config}-list.h
Change various places that hardcode the names of these two files to
refer to either $(GENERATED_H), or to a new generated-hdrs
target. That target is consistent with the *-objs targets I recently
added in 029bac01a8 (Makefile: add {program,xdiff,test,git,fuzz}-objs
& objects targets, 2021-02-23).

A subsequent commit will add a new generated hook-list.h. By doing
this refactoring we'll only need to add the new file to the
GENERATED_H variable, not EXCEPT_HDRS, the vcbuild/README etc.

Hardcoding command-list.h there seems to have been a case of
copy/paste programming in 976aaedca0 (msvc: add a Makefile target to
pre-generate the Visual Studio solution, 2019-07-29). The
config-list.h was added later in 709df95b78 (help: move
list_config_help to builtin/help, 2020-04-16).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-23 15:06:47 -07:00
Ævar Arnfjörð Bjarmason
ea47e59fe3 Makefile: mark "check" target as .PHONY
Fix a bug in 44c9e8594e (Fix up header file dependencies and add
sparse checking rules, 2005-07-03), we never marked the phony "check"
target as such.

Perhaps we should just remove it, since as of a combination of
912f9980d2 (Makefile: help people who run 'make check' by mistake,
2008-11-11) 0bcd9ae85d (sparse: Fix errors due to missing
target-specific variables, 2011-04-21) we've been suggesting the user
run "make sparse" directly.

But under that mode it still does something, as well as directing the
user to run "make test" under non-sparse. So let's punt that and
narrowly fix the PHONY bug.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-23 15:06:47 -07:00
Ævar Arnfjörð Bjarmason
f188160be9 bundle: remove ignored & undocumented "--verbose" flag
In 73c3253d75 (bundle: framework for options before bundle file,
2019-11-10) the "git bundle" command was refactored to use
parse_options(). In that refactoring it started understanding the
"--verbose" flag before the subcommand, e.g.:

    git bundle --verbose verify --quiet

However, nothing ever did anything with this "verbose" variable, and
the change wasn't documented. It appears to have been something that
escaped the lab, and wasn't flagged by reviewers at the time. Let's
just remove it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-23 15:03:48 -07:00
Junio C Hamano
ddb1055343 The eighth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-23 13:45:03 -07:00
Junio C Hamano
b1b065ee35 Merge branch 'rs/use-xopen-in-index-pack'
Code clean-up.

* rs/use-xopen-in-index-pack:
  index-pack: use xopen in init_thread
2021-09-23 13:44:50 -07:00
Junio C Hamano
d1e376d2f9 Merge branch 'kz/revindex-comment-fix'
Header comment fix.

* kz/revindex-comment-fix:
  pack-revindex.h: correct the time complexity descriptions
2021-09-23 13:44:49 -07:00
Junio C Hamano
50eb005eb3 Merge branch 'cb/plug-leaks-in-alloca-emu-users'
Leakfix.

* cb/plug-leaks-in-alloca-emu-users:
  t0000: avoid masking git exit value through pipes
  tree-diff: fix leak when not HAVE_ALLOCA_H
2021-09-23 13:44:49 -07:00
Junio C Hamano
f7511fdfbd Merge branch 'jt/submodule-name-to-gitdir'
Code refactoring.

* jt/submodule-name-to-gitdir:
  submodule: extract path to submodule gitdir func
2021-09-23 13:44:49 -07:00