Commit Graph

67677 Commits

Author SHA1 Message Date
Lessley Dennington
f2fc531585 osx-keychain: fix compiler warning
Update git-credential-osxkeychain.c to remove 'format string is not a string
literal (potentially insecure)' compiler warning by treating the string as
an argument.

Signed-off-by: Lessley Dennington <lessleydennington@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19 11:25:15 -07:00
Celeste Liu
cc391fc886 contrib/rerere-train: avoid useless gpg sign in training
Users may have configured "git merge" to always require GPG
signing the resulting commits. We are not running "git merge" to
re-create merge commits, but merely to replay merge conflicts,
and we will immediately discard the resulting commits; there
is no point in signing them.

Override such configuration that forces useless signing from the
command line with the "--no-gpg-sign" option.

Signed-off-by: Celeste Liu <coelacanthus@outlook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19 11:24:08 -07:00
Derrick Stolee
068fa54c00 midx: reduce memory pressure while writing bitmaps
We noticed that some 'git multi-pack-index write --bitmap' processes
were running with very high memory. It turns out that a lot of this
memory is required to store a list of every object in the written
multi-pack-index, with a second copy that has additional information
used for the bitmap writing logic.

Using 'valgrind --tool=massif' before this change, the following chart
shows how memory load increased and was maintained throughout the
process:

    GB
4.102^                                                             ::
     |              @  @::@@::@@::::::::@::::::@@:#:::::::::::::@@:: :
     |         :::::@@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |      :::: :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    :::: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |    : :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     |   :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @ :: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
     | @::: :: : :: @@:@: @ ::@ ::: ::::@: ::: @@:#:::::: :: : :@ :: :
   0 +--------------------------------------------------------------->

It turns out that the 'struct write_midx_context' data is persisting
through the life of the process, including the 'entries' array. This
array is used last inside find_commits_for_midx_bitmap() within
write_midx_bitmap(). If we free (and nullify) the array at that point,
we can free a decent chunk of memory before the bitmap logic adds more
to the memory footprint.

Here is the massif memory load chart after this change:

    GB
3.111^      #
     |      #                              :::::::::::@::::::::::::::@
     |      #        ::::::::::::::::::::::::: : :: : @:: ::::: :: ::@
     |     @#  :::::::::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |     @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :::@#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
     |  :: @#::: ::: :::::: :::: :: : :::::::: : :: : @:: ::::: :: ::@
   0 +--------------------------------------------------------------->

The previous change introduced a refactoring of write_midx_bitmap() to
make it more clear how much of the 'struct write_midx_context' instance
is needed at different parts of the process. In addition, the following
defensive programming measures were put in place:

 1. Using FREE_AND_NULL() we will at least get a segfault from reading a
    NULL pointer instead of a use-after-free.

 2. 'entries_nr' is also set to zero to make any loop that would iterate
    over the entries be trivial.

 3. Add significant comments in write_midx_internal() to add warnings
    for future authors who might accidentally add references to this
    cleared memory.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19 08:38:17 -07:00
Derrick Stolee
90b2bb710d midx: extract bitmap write setup
The write_midx_bitmap() method is a long method that does a lot of
steps. It requires the write_midx_context struct for use in
prepare_midx_packing_data() and find_commits_for_midx_bitmap(), but
after that only needs the pack_order array.

This is a messy, but completely non-functional refactoring. The code is
only being moved around to reduce visibility of the write_midx_context
during the longest part of computing reachability bitmaps.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19 08:38:17 -07:00
Derrick Stolee
5766524956 pack-bitmap-write: use const for hashes
The next change will use a const array when calling this method. There
is no need for the non-const version, so let's do this cleanup quickly.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-19 08:38:17 -07:00
Junio C Hamano
71a8fab31b The fourth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 13:31:58 -07:00
Junio C Hamano
afbe62d84c Merge branch 'sg/multi-pack-index-parse-options-fix'
The way "git multi-pack" uses parse-options API has been improved.

* sg/multi-pack-index-parse-options-fix:
  multi-pack-index: simplify handling of unknown --options
2022-07-18 13:31:58 -07:00
Junio C Hamano
4af2138417 Merge branch 'bc/nettle-sha256'
Support for libnettle as SHA256 implementation has been added.

* bc/nettle-sha256:
  sha256: add support for Nettle
2022-07-18 13:31:58 -07:00
Junio C Hamano
ba69ae876b Merge branch 'jd/gpg-interface-trust-level-string'
The code to convert between GPG trust level strings and internal
constants we use to represent them have been cleaned up.

* jd/gpg-interface-trust-level-string:
  gpg-interface: add function for converting trust level to string
2022-07-18 13:31:57 -07:00
Junio C Hamano
7f8d098b1b Merge branch 'ab/cocci-unused'
Add Coccinelle rules to detect the pattern of initializing and then
finalizing a structure without using it in between at all, which
happens after code restructuring and the compilers fail to
recognize as an unused variable.

* ab/cocci-unused:
  cocci: generalize "unused" rule to cover more than "strbuf"
  cocci: add and apply a rule to find "unused" strbufs
  cocci: have "coccicheck{,-pending}" depend on "coccicheck-test"
  cocci: add a "coccicheck-test" target and test *.cocci rules
  Makefile & .gitignore: ignore & clean "git.res", not "*.res"
  Makefile: remove mandatory "spatch" arguments from SPATCH_FLAGS
2022-07-18 13:31:57 -07:00
Junio C Hamano
6d003858e5 Merge branch 'gc/submodule-use-super-prefix'
Another step to rewrite more parts of "git submodule" in C.

* gc/submodule-use-super-prefix:
  submodule--helper: remove display path helper
  submodule--helper update: use --super-prefix
  submodule--helper: remove unused SUPPORT_SUPER_PREFIX flags
  submodule--helper: use correct display path helper
  submodule--helper: don't recreate recursive prefix
  submodule--helper update: use display path helper
  submodule--helper tests: add missing "display path" coverage
2022-07-18 13:31:56 -07:00
Junio C Hamano
e3349f2888 Merge branch 'en/merge-dual-dir-renames-fix'
Fixes a long-standing corner case bug around directory renames in
the merge-ort strategy.

* en/merge-dual-dir-renames-fix:
  merge-ort: fix issue with dual rename and add/add conflict
  merge-ort: shuffle the computation and cleanup of potential collisions
  merge-ort: make a separate function for freeing struct collisions
  merge-ort: small cleanups of check_for_directory_rename
  t6423: add tests of dual directory rename plus add/add conflict
2022-07-18 13:31:56 -07:00
Junio C Hamano
3d3874d537 Merge branch 'ab/test-without-templates'
Tweak tests so that they still work when the "git init" template
did not create .git/info directory.

* ab/test-without-templates:
  tests: don't assume a .git/info for .git/info/sparse-checkout
  tests: don't assume a .git/info for .git/info/exclude
  tests: don't assume a .git/info for .git/info/refs
  tests: don't assume a .git/info for .git/info/attributes
  tests: don't assume a .git/info for .git/info/grafts
  tests: don't depend on template-created .git/branches
  t0008: don't rely on default ".git/info/exclude"
2022-07-18 13:31:55 -07:00
Junio C Hamano
48e88a4862 Merge branch 'ab/build-gitweb'
Teach "make all" to build gitweb as well.

* ab/build-gitweb:
  gitweb/Makefile: add a "NO_GITWEB" parameter
  Makefile: build 'gitweb' in the default target
  gitweb/Makefile: include in top-level Makefile
  gitweb: remove "test" and "test-installed" targets
  gitweb/Makefile: prepare to merge into top-level Makefile
  gitweb/Makefile: clear up and de-duplicate the gitweb.{css,js} vars
  gitweb/Makefile: add a $(GITWEB_ALL) variable
  gitweb/Makefile: define all .PHONY prerequisites inline
2022-07-18 13:31:55 -07:00
Junio C Hamano
f63ac61fbf Merge branch 'ab/test-tool-leakfix'
Plug various memory leaks in test-tool commands.

* ab/test-tool-leakfix:
  test-tool delta: fix a memory leak
  test-tool ref-store: fix a memory leak
  test-tool bloom: fix memory leaks
  test-tool json-writer: fix memory leaks
  test-tool regex: call regfree(), fix memory leaks
  test-tool urlmatch-normalization: fix a memory leak
  test-tool {dump,scrap}-cache-tree: fix memory leaks
  test-tool path-utils: fix a memory leak
  test-tool test-hash: fix a memory leak
2022-07-18 13:31:54 -07:00
Junio C Hamano
44357f64f6 Merge branch 'ab/leakfix'
Plug various memory leaks.

* ab/leakfix:
  pull: fix a "struct oid_array" memory leak
  cat-file: fix a common "struct object_context" memory leak
  gc: fix a memory leak
  checkout: avoid "struct unpack_trees_options" leak
  merge-file: fix memory leaks on error path
  merge-file: refactor for subsequent memory leak fix
  cat-file: fix a memory leak in --batch-command mode
  revert: free "struct replay_opts" members
  submodule.c: free() memory from xgetcwd()
  clone: fix memory leak in wanted_peer_refs()
  check-ref-format: fix trivial memory leak
2022-07-18 13:31:54 -07:00
Junio C Hamano
f01315ef7d Merge branch 'jc/builtin-mv-move-array'
Apply Coccinelle rule to turn raw memmove() into MOVE_ARRAY() cpp
macro, which would improve maintainability and readability.

* jc/builtin-mv-move-array:
  builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove()
2022-07-18 13:31:53 -07:00
Junio C Hamano
2c1439231a Merge branch 'fr/vimdiff-layout-fix'
Recent update to vimdiff layout code has been made more robust
against different end-user vim settings.

* fr/vimdiff-layout-fix:
  vimdiff: make layout engine more robust against user vim settings
2022-07-18 13:31:53 -07:00
Siddharth Asthana
ec031da9f9 cat-file: add mailmap support
git-cat-file is used by tools like GitLab to get commit tag contents
that are then displayed to users. This content which has author,
committer or tagger information, could benefit from passing through the
mailmap mechanism before being sent or displayed.

This patch adds --[no-]use-mailmap command line option to the git
cat-file command. It also adds --[no-]mailmap option as an alias to
--[no-]use-mailmap.

This patch also introduces new test cases to test the mailmap mechanism in
git cat-file command.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 12:55:53 -07:00
Siddharth Asthana
66a8a95315 ident: rename commit_rewrite_person() to apply_mailmap_to_header()
commit_rewrite_person() takes a commit buffer and replaces the idents
in the header with their canonical versions using the mailmap mechanism.
The name "commit_rewrite_person()" is misleading as it doesn't convey
what kind of rewrite are we going to do to the buffer. It also doesn't
clearly mention that the function will limit itself to the header part
of the buffer. The new name, "apply_mailmap_to_header()", expresses the
functionality of the function pretty clearly.

We intend to use apply_mailmap_to_header() in git-cat-file to replace
idents in the headers of commit and tag object buffers. So, we will be
extending this function to take tag objects buffer as well and replace
idents on the tagger header using the mailmap mechanism.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 12:55:53 -07:00
Siddharth Asthana
dc88e349a2 ident: move commit_rewrite_person() to ident.c
commit_rewrite_person() and rewrite_ident_line() are static functions
defined in revision.c.

Their usages are as follows:
- commit_rewrite_person() takes a commit buffer and replaces the author
  and committer idents with their canonical versions using the mailmap
  mechanism
- rewrite_ident_line() takes author/committer header lines from the
  commit buffer and replaces the idents with their canonical versions
  using the mailmap mechanism.

This patch moves commit_rewrite_person() and rewrite_ident_line() to
ident.c which contains many other functions related to idents like
split_ident_line(). By moving commit_rewrite_person() to ident.c, we
also intend to use it in git-cat-file to replace committer and author
idents from the headers to their canonical versions using the mailmap
mechanism. The function is moved as is for now to make it clear that
there are no other changes, but it will be renamed in a following
commit.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 12:55:53 -07:00
Siddharth Asthana
e9c1b0e38c revision: improve commit_rewrite_person()
The function, commit_rewrite_person(), is designed to find and replace
an ident string in the header part, and the way it avoids a random
occurrence of "author A U Thor <author@example.com" in the text is by
insisting "author" to appear at the beginning of line by passing
"\nauthor " as "what".

The implementation also doesn't make any effort to limit itself to the
commit header by locating the blank line that appears after the header
part and stopping the search there. Also, the interface forces the
caller to make multiple calls if it wants to rewrite idents on multiple
headers. It shouldn't be the case.

To support the existing caller better, update commit_rewrite_person()
to:
- Make a single pass in the input buffer to locate headers named
  "author" and "committer" and replace idents on them.
- Stop at the end of the header, ensuring that nothing in the body of
  the commit object is modified.

The return type of the function commit_rewrite_person() has also been
changed from int to void. This has been done because the caller of the
function doesn't do anything with the return value of the function.

By simplifying the interface of the commit_rewrite_person(), we also
intend to expose it as a public function. We will also be renaming the
function in a future commit to a different name which clearly tells that
the function replaces idents in the header of the commit buffer.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: John Cai <johncai86@gmail.com>
Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 12:55:53 -07:00
Teng Long
5dcee7c705 pack-bitmap.c: continue looping when first MIDX bitmap is found
In "open_midx_bitmap()", we do a loop with the MIDX(es) in repo, when
the first one has been found, then will break out by a "return"
directly.

But actually, it's better to continue the loop until we have visited
both the MIDX in our repository, as well as any alternates (along with
_their_ alternates, recursively).

The reason for this is, there may exist more than one MIDX file in
a repo. The "multi_pack_index" struct is actually designed as a singly
linked list, and if a MIDX file has been already opened successfully,
then the other MIDX files will be skipped and left with a warning
"ignoring extra bitmap file." to the output.

The discussion link of community:

  https://public-inbox.org/git/YjzCTLLDCby+kJrZ@nand.local/

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:20:52 -07:00
Teng Long
9005eb021a pack-bitmap.c: using error() instead of silently returning -1
In "open_pack_bitmap_1()" and "open_midx_bitmap_1()", it's better to
return error() instead of "-1" when some unexpected error occurs like
"stat bitmap file failed", "bitmap header is invalid" or "checksum
mismatch", etc.

There are places where we do not replace, such as when the bitmap
does not exist (no bitmap in repository is allowed) or when another
bitmap has already been opened (in which case it should be a warning
rather than an error).

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:20:52 -07:00
Teng Long
6411cc08f3 pack-bitmap.c: do not ignore error when opening a bitmap file
Calls to git_open() to open the pack bitmap file and
multi-pack bitmap file do not report any error when they
fail.  These files are optional and it is not an error if
open failed due to ENOENT, but we shouldn't be ignoring
other kinds of errors.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:20:52 -07:00
Teng Long
349c26ff29 pack-bitmap.c: rename "idx_name" to "bitmap_name"
In "open_pack_bitmap_1()" and "open_midx_bitmap_1()" we use
a var named "idx_name" to represent the bitmap filename which
is computed by "midx_bitmap_filename()" or "pack_bitmap_filename()"
before we open it.

There may bring some confusion in this "idx_name" naming, which
might lead us to think of ".idx "or" multi-pack-index" files,
although bitmap is essentially can be understood as a kind of index,
let's define this name a little more accurate here.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:20:52 -07:00
Teng Long
9975975d7f pack-bitmap.c: mark more strings for translations
In pack-bitmap.c, some printed texts are translated, some are not.
Let's support the translations of the bitmap related output.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:20:52 -07:00
Teng Long
baf20c39a7 pack-bitmap.c: fix formatting of error messages
There are some text output issues in 'pack-bitmap.c', they exist in
die(), error() etc. This includes issues with capitalization the
first letter, newlines, error() instead of BUG(), and substitution
that don't have quotes around them.

Signed-off-by: Teng Long <dyroneteng@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:20:52 -07:00
Victoria Dye
72d3a5da32 scalar: convert README.md into a technical design doc
Adapt the content from 'contrib/scalar/README.md' into a design document in
'Documentation/technical/'. In addition to reformatting for asciidoc,
elaborate on the background, purpose, and design choices that went into
Scalar.

Most of this document will persist in the 'Documentation/technical/' after
Scalar has been moved out of 'contrib/' and into the root of Git. Until that
time, it will also contain a temporary "Roadmap" section detailing the
remaining series needed to finish the initial version of Scalar. The section
will be removed once Scalar is moved to the repo root, but in the meantime
serves as a guide for readers to keep up with progress on the feature.

Signed-off-by: Victoria Dye <vdye@github.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:03:56 -07:00
Victoria Dye
f22c95db53 scalar: reword command documentation to clarify purpose
Rephrase documentation to describe scalar as a "large repo management tool"
rather than an "opinionated management tool". The new description is
intended to more directly reflect the utility of scalar to better guide
users in preparation for scalar being built and installed as part of Git.

Signed-off-by: Victoria Dye <vdye@github.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:03:56 -07:00
Martin Ågren
a700395eaf t4200: drop irrelevant code
While setting up an unresolved merge for `git rerere`, we run `git
rev-parse` and `git fmt-merge-msg` to create a variable `$fifth` and a
commit-message file `msg`, which we then never actually use. This has
been like that since these tests were added in 672d1b789b ("rerere:
migrate to parse-options API", 2010-08-05). This does exercise `git
rev-parse` and `git fmt-merge-msg`, but doesn't contribute to testing
`git rerere`. Drop these lines.

Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 11:01:54 -07:00
Ævar Arnfjörð Bjarmason
3a251bac0d trace2: only include "fsync" events if we git_fsync()
Fix the overly verbose trace2 logging added in 9a4987677d (trace2:
add stats for fsync operations, 2022-03-30) (first released with
v2.36.0).

Since that change every single "git" command invocation has included
these "data" events, even though we'll only make use of these with
core.fsyncMethod=batch, and even then only have non-zero values if
we're writing object data to disk. See c0f4752ed2 (core.fsyncmethod:
batched disk flushes for loose-objects, 2022-04-04) for that feature.

As we're needing to indent the trace2_data_intmax() lines let's
introduce helper variables to ensure that our resulting lines (which
were already too) don't exceed the recommendations of the
CodingGuidelines. Doing that requires either wrapping them twice, or
introducing short throwaway variable names, let's do the latter.

The result was that e.g. "git version" would previously emit a total
of 6 trace2 events with the GIT_TRACE2_EVENT target (version, start,
cmd_ancestry, cmd_name, exit, atexit), but afterwards would emit
8. We'd emit 2 "data" events before the "exit" event.

The reason we didn't catch this was that the trace2 unit tests added
in a15860dca3 (trace2: t/helper/test-trace2, t0210.sh, t0211.sh,
t0212.sh, 2019-02-22) would omit any "data" events that weren't the
ones it cared about. Before this change to the C code 6/7 of our
"t/t0212-trace2-event.sh" tests would fail if this change was applied
to "t/t0212/parse_events.perl".

Let's make the trace2 testing more strict, and further append any new
events types we don't know about in "t/t0212/parse_events.perl". Since
we only invoke the "test-tool trace2" there's no guarantee that we'll
catch other overly verbose events in the future, but we'll at least
notice if we start emitting new events that are issues every time we
log anything with trace2's JSON target.

We exclude the "data_json" event type, we'd otherwise would fail on
both "win test" and "win+VS test" CI due to the logging added in
353d3d77f4 (trace2: collect Windows-specific process information,
2019-02-22). It looks like that logging should really be using
trace2_cmd_ancestry() instead, which was introduced later in
2f732bf15e (tr2: log parent process name, 2021-07-21), but let's
leave it for now.

The fix-up to aaf81223f4 (unpack-objects: use stream_loose_object()
to unpack large objects, 2022-06-11) is needed because we're changing
the behavior of these events as discussed above. Since we'd always
emit a "hardware-flush" event the test added in aaf81223f4 wasn't
testing anything except that this trace2 data was unconditionally
logged. Even if "core.fsyncMethod" wasn't set to "batch" we'd pass the
test.

Now we'll check the expected number of "writeout" v.s. "flush" calls
under "core.fsyncMethod=batch", but note that this doesn't actually
test if we carried out the sync using that method, on a platform where
we'd have to fall back to fsync() each of those "writeout" would
really be a "flush" (i.e. a full fsync()).

But in this case what we're testing is that the logic in
"unpack-objects" behaves as expected, not the OS-specific question of
whether we actually were able to use the "bulk" method.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 09:41:57 -07:00
Martin Ågren
ae436f283c config/core.txt: fix minor issues for core.sparseCheckoutCone
The sparse checkout feature can be used in "cone mode" or "non-cone
mode". In this one instance in the documentation, we refer to the latter
as "non cone mode" with whitespace rather than a hyphen. Align this with
the rest of our documentation.

A few words later in the same paragraph, there's mention of "a more
flexible patterns". Drop that leading "a" to fix the grammar.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 09:39:20 -07:00
SZEDER Gábor
a10f6e2bda index-format.txt: remove outdated list of supported extensions
The first section of 'Documentation/technical/index-format.txt'
mentions that "Git currently supports cache tree and resolve undo
extensions", but then goes on, and in the "Extensions" section
describes not only these two, but six other extensions [1].

Remove this sentence, as it's misleading about the status of all those
other extensions.

Alternatively we could keep that sentence and update the list of
extensions, but that might well lead to a recurring issue, because
apparently this list is never updated when a new index extension is
added.

[1] Split index, untracked cache, FS monitor cache, end of index
    entry, index entry offset table and sparse directory entries.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-18 09:24:43 -07:00
René Scharfe
0f1eb7d6e9 mergesort: remove llist_mergesort()
Now that all of its callers are gone, remove llist_mergesort().

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:39 -07:00
René Scharfe
9b9f5f6217 packfile: use DEFINE_LIST_SORT
Build a typed sort function for packed_git lists using DEFINE_LIST_SORT
instead of calling llist_mergesort().  This gets rid of the next pointer
accessor functions and their calling overhead at the cost of slightly
increased object text size.

Before:
__TEXT	__DATA	__OBJC	others	dec	hex
20218	320	0	110936	131474	20192	packfile.o

With this patch:
__TEXT	__DATA	__OBJC	others	dec	hex
20430	320	0	112619	133369	208f9	packfile.o

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:39 -07:00
René Scharfe
6fc9fec07b fetch-pack: use DEFINE_LIST_SORT
Build a static typed ref sorting function using DEFINE_LIST_SORT along
with a typed comparison function near its only two callers instead of
having an exported version that calls llist_mergesort().  This gets rid
of the next pointer accessor functions and their calling overhead at the
cost of a slightly increased object text size.

Before:
__TEXT	__DATA	__OBJC	others	dec	hex
23231	389	0	113689	137309	2185d	fetch-pack.o
29158	80	0	146864	176102	2afe6	remote.o

With this patch:
__TEXT	__DATA	__OBJC	others	dec	hex
23591	389	0	117759	141739	229ab	fetch-pack.o
29070	80	0	145718	174868	2ab14	remote.o

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:39 -07:00
René Scharfe
c0fb5774a6 commit: use DEFINE_LIST_SORT
Use DEFINE_LIST_SORT to build a typed sort function for commit_list
entries instead of calling llist_mergesort().  This gets rid of the next
pointer accessor functions and their calling overhead at the cost of a
slightly increased object text size.

Before:
__TEXT	__DATA	__OBJC	others	dec	hex
18795	92	0	104654	123541	1e295	commit.o

With this patch:
__TEXT	__DATA	__OBJC	others	dec	hex
18963	92	0	106094	125149	1e8dd	commit.o

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:39 -07:00
René Scharfe
47c30f7daa blame: use DEFINE_LIST_SORT
Build a typed sort function for blame entries using DEFINE_LIST_SORT
instead of calling llist_mergesort().  This gets rid of the next pointer
accessor functions and their calling overhead at the cost of a slightly
increased object text size.

Before:
__TEXT	__DATA	__OBJC	others	dec	hex
24621	56	0	147515	172192	2a0a0	blame.o

With this patch:
__TEXT	__DATA	__OBJC	others	dec	hex
25229	56	0	151702	176987	2b35b	blame.o

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:38 -07:00
René Scharfe
b378c2ff1e test-mergesort: use DEFINE_LIST_SORT
Build a typed sort function for the mergesort performance test tool
using DEFINE_LIST_SORT instead of calling llist_mergesort().  This gets
rid of the next pointer accessor functions and improves the performance
at the cost of a slightly higher object text size.

Before:
0071.12: llist_mergesort() unsorted    0.24(0.22+0.01)
0071.14: llist_mergesort() sorted      0.12(0.10+0.01)
0071.16: llist_mergesort() reversed    0.12(0.10+0.01)

__TEXT	__DATA	__OBJC	others	dec	hex
6407	276	0	24701	31384	7a98	t/helper/test-mergesort.o

With this patch:
0071.12: DEFINE_LIST_SORT unsorted     0.22(0.21+0.01)
0071.14: DEFINE_LIST_SORT sorted       0.11(0.10+0.01)
0071.16: DEFINE_LIST_SORT reversed     0.11(0.10+0.01)

__TEXT	__DATA	__OBJC	others	dec	hex
6615	276	0	25832	32723	7fd3	t/helper/test-mergesort.o

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:38 -07:00
René Scharfe
f00a039839 test-mergesort: use DEFINE_LIST_SORT_DEBUG
Define a typed sort function using DEFINE_LIST_SORT_DEBUG for the
mergesort sanity check instead of using llist_mergesort().  This gets
rid of the next pointer accessor functions and improves the performance
at the cost of slightly bigger object text.

Before:
Benchmark 1: t/helper/test-tool mergesort test
  Time (mean ± σ):     108.4 ms ±   0.2 ms    [User: 106.7 ms, System: 1.2 ms]
  Range (min … max):   108.0 ms … 108.8 ms    27 runs

__TEXT	__DATA	__OBJC	others	dec	hex
6251	276	0	23172	29699	7403	t/helper/test-mergesort.o

With this patch:
Benchmark 1: t/helper/test-tool mergesort test
  Time (mean ± σ):      94.0 ms ±   0.2 ms    [User: 92.4 ms, System: 1.1 ms]
  Range (min … max):    93.7 ms …  94.5 ms    31 runs

__TEXT	__DATA	__OBJC	others	dec	hex
6407	276	0	24701	31384	7a98	t/helper/test-mergesort.o

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:38 -07:00
René Scharfe
318051eaeb mergesort: add macros for typed sort of linked lists
Add the macros DECLARE_LIST_SORT and DEFINE_LIST_SORT for building
type-specific functions for sorting linked lists.  The generated
function expects a typed comparison function.

The programmer provides full type information (no void pointers).  This
allows the compiler to check whether the comparison function matches the
list type.  It can also inline the "next" pointer accessor functions and
even the comparison function to get rid of the calling overhead.

Also provide a DECLARE_LIST_SORT_DEBUG macro that allows executing
custom code whenever the accessor functions are used.  It's intended to
be used by test-mergesort, which counts these operations.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:38 -07:00
René Scharfe
848afebe56 mergesort: tighten merge loop
llist_merge() has special inner loops for taking elements from either of
the two lists to merge.  That helps consistently preferring one over the
other, for stability.  Merge the loops, swap the lists when the other
one has the next element for the result and keep track on which one to
prefer on equality.  This results in shorter code and object text:

Before:
__TEXT	__DATA	__OBJC	others	dec	hex
412	0	0	3441	3853	f0d	mergesort.o

With this patch:
__TEXT	__DATA	__OBJC	others	dec	hex
352	0	0	3516	3868	f1c	mergesort.o

Performance doesn't get worse:

Before:
0071.12: llist_mergesort() unsorted    0.24(0.22+0.01)
0071.14: llist_mergesort() sorted      0.12(0.10+0.01)
0071.16: llist_mergesort() reversed    0.12(0.10+0.01)

Benchmark 1: t/helper/test-tool mergesort test
  Time (mean ± σ):     109.2 ms ±   0.2 ms    [User: 107.5 ms, System: 1.1 ms]
  Range (min … max):   108.9 ms … 109.6 ms    27 runs

With this patch:
0071.12: llist_mergesort() unsorted    0.24(0.22+0.01)
0071.14: llist_mergesort() sorted      0.12(0.10+0.01)
0071.16: llist_mergesort() reversed    0.12(0.10+0.01)

Benchmark 1: t/helper/test-tool mergesort test
  Time (mean ± σ):     108.4 ms ±   0.2 ms    [User: 106.7 ms, System: 1.2 ms]
  Range (min … max):   108.0 ms … 108.8 ms    27 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:38 -07:00
René Scharfe
7a3775eeb4 mergesort: unify ranks loops
llist_mergesort() has a loop for adding a new element to the ranks array
and another one for rolling up said array into a single sorted list at
the end.  We can merge them, so that adding the last element rolls up
the whole array.  Handle the empty list before the main loop now because
list can't be NULL anymore inside the loop.

The result is shorter code and significantly less object text:

main:
__TEXT	__DATA	__OBJC	others	dec	hex
652	0	0	4651	5303	14b7	mergesort.o

With this patch:
__TEXT	__DATA	__OBJC	others	dec	hex
412	0	0	3441	3853	f0d	mergesort.o

Why is the change so big?  The reduction is amplified by llist_merge()
being inlined both before and after.

Performance stays basically the same:

main:
0071.12: llist_mergesort() unsorted    0.24(0.22+0.01)
0071.14: llist_mergesort() sorted      0.12(0.10+0.01)
0071.16: llist_mergesort() reversed    0.12(0.10+0.01)

Benchmark 1: t/helper/test-tool mergesort test
  Time (mean ± σ):     109.0 ms ±   0.3 ms    [User: 107.4 ms, System: 1.1 ms]
  Range (min … max):   108.7 ms … 109.6 ms    27 runs

With this patch:
0071.12: llist_mergesort() unsorted    0.24(0.22+0.01)
0071.14: llist_mergesort() sorted      0.12(0.10+0.01)
0071.16: llist_mergesort() reversed    0.12(0.10+0.01)

Benchmark 1: t/helper/test-tool mergesort test
  Time (mean ± σ):     109.2 ms ±   0.2 ms    [User: 107.5 ms, System: 1.1 ms]
  Range (min … max):   108.9 ms … 109.6 ms    27 runs

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 15:20:38 -07:00
Manuel Boni
07aed58017 config.txt: document include, includeIf
Git config's tab completion does not yet know about the "include"
and "includeIf" sections, nor the related "path" variable.

Add a description for these two sections in
'Documentation/config/includeif.txt', which points to git-config's
documentation, specifically the "Includes" and "Conditional Includes"
subsections.

As a side effect, tab completion can successfully complete the
'include', 'includeIf', and 'include.add' expressions.
This effect is tested by two new ad-hoc tests.
Variable completion only works for "include" for now.

Credit for the ideas behind this patch goes to
Ævar Arnfjörð Bjarmason.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Manuel Boni <ziosombrero@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-17 14:23:42 -07:00
Taylor Blau
9550f6c16a commit-graph: fix corrupt upgrade from generation v1 to v2
The previous commit demonstrates a bug where a commit-graph using
generation v2 could enter a state where one of the GDA2 values has its
most-significant bit set (indicating that its value should be read from
the extended offset table in the GDO2 chunk) without having a GDO2 chunk
to read from.

This results in the following error message being displayed to the
caller:

    fatal: commit-graph requires overflow generation data but has none

This bug arises in the following scenario:

  - We decide to write a commit-graph using generation number v2, and
    decide (correctly) that no GDO2 chunk is necessary (e.g., because
    all of the commiter date offsets are no larger than 2^31-1).

  - The v2 generation numbers are stored in the `->generation` member of
    the commit slab holding `struct commit_graph_data`'s.

  - Later on, `load_commit_graph_info()` is called, overwriting the
    v2 generation data in the aforementioned slab with any existing v1
    generation data.

Then, when the commit-graph code goes to write the GDA2 chunk via
`write_graph_chunk_generation_data()`, we use the overwritten generation
v1 data in a place where we expect to use a v2 generation number:

    offset = commit_graph_data_at(c)->generation - c->date;

...because `commit_graph_data_at(c)->generation` used to hold the v2
generation data, but it was overwritten to contain the v1 generation
number via `load_commit_graph_info()`.

If the `offset` computation above overflows the v2 generation number
max, then `write_graph_chunk_generation_data()` will update its count of
large offsets and write the marker accordingly:

    if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) {
        offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows;
        num_generation_data_overflows++;
    }

and reads will look for the GDO2 chunk containing the overflowing v2
generation number, *after* the commit-graph code decided that no such
chunk was necessary.

The main problem is that the slab containing `struct commit_graph_data`
has a dual purpose. It is used to hold data that we are about to write
to disk while generating a commit-graph, as well as hold data that was
read from an existing commit-graph.

When the two mix, namely when the result of reading the commit-graph has
a side-effect that mixes poorly with an in-progress commit-graph write,
we end up with corrupt data.

A complete fix might be to introduce a new slab that is used exclusively
for writing, and gate access between the two slabs based on context
provided by the caller (e.g., whether this computation is part of a
"read" or "write" operation).

But a more minimal fix addresses the only known path which overwrites
the slab data, which is `compute_bloom_filters()` ->
`get_or_compute_bloom_filter()` -> `load_commit_graph_info()` ->
`fill_commit_graph_info()` by avoiding the last call which clobbers the
data altogether.

This path only needs to learn the graph position of a given commit so
that it can be used in `load_bloom_filter_from_graph()`. By replacing
the last steps of the above with one that records the graph position
into a temporary variable which is then used to load the existing Bloom
data, we eliminate the clobbering, removing the corruption.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15 16:51:39 -07:00
Taylor Blau
7805360b7a commit-graph: introduce repo_find_commit_pos_in_graph()
Low-level callers in systems that are adjacent to the commit-graph (like
the changed-path Bloom filter code) could benefit from being able to
call a function like `parse_commit_in_graph()` without modifying the
corresponding commit slab data.

This is useful in contexts where that slab data is being used to prepare
for an upcoming commit-graph write, where Git must be careful to avoid
clobbering any of that data during a read operation.

Introduce a low-level variant of `parse_commit_in_graph()` which returns
the graph position of a given commit only, without modifying any of the
slab data.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15 16:51:39 -07:00
Taylor Blau
2dd804cd12 t5318: demonstrate commit-graph generation v2 corruption
When upgrading a commit-graph using generation v1 to one using
generation v2, it is possible to force Git into a corrupt state where it
(incorrectly) believes that a GDO2 chunk is necessary, *after* deciding
not to write one.

This makes subsequent reads using the commit-graph produce the following
error message:

    fatal: commit-graph requires overflow generation data but has none

Demonstrate this bug by increasing our test coverage to include a
minimal example of upgrading a commit-graph from generation v1 to v2.
The only notable components of this test are:

  - The committer date of the commit is chosen carefully so that the
    offset underflows when computed using a v1 generation number, but
    would not overflow when using v2 generation numbers.

  - The upgrade to generation number v2 must read in the v1 generation
    numbers, which we can do by passing `--changed-paths`, which will
    force the commit-graph internals to call `fill_commit_graph_info()`.

A future patch will squash this bug.

Reported-by: Jeff King <peff@peff.net>
Reproduced-by: Will Chandler <wfc@wfchandler.org>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-15 16:51:38 -07:00
René Scharfe
ae25974de3 mingw: avoid mktemp() in mkstemp() implementation
The implementation of mkstemp() for MinGW uses mktemp() and open()
without the flag O_EXCL, which is racy.  It's not a security problem
for now because all of its callers only create files within the
repository (incl. worktrees).  Replace it with a call to our more
secure internal function, git_mkstemp_mode(), to prevent possible
future issues.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14 22:45:05 -07:00
Taylor Blau
a92d8523ce commit-graph: pass repo_settings instead of repository
The parse_commit_graph() function takes a 'struct repository *' pointer,
but it only ever accesses config settings (either directly or through
the .settings field of the repo struct). Move all relevant config
settings into the repo_settings struct, and update parse_commit_graph()
and its existing callers so that it takes 'struct repo_settings *'
instead.

Callers of parse_commit_graph() will now need to call
prepare_repo_settings() themselves, or initialize a 'struct
repo_settings' directly.

Prior to ab14d0676c (commit-graph: pass a 'struct repository *' in more
places, 2020-09-09), parsing a commit-graph was a pure function
depending only on the contents of the commit-graph itself. Commit
ab14d0676c introduced a dependency on a `struct repository` pointer, and
later commits such as b66d84756f (commit-graph: respect
'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config
settings, which were accessed through the `settings` field of the
repository pointer. This field was initialized via a call to
`prepare_repo_settings()`.

Additionally, this fixes an issue in fuzz-commit-graph: In 44c7e62
(2021-12-06, repo-settings:prepare_repo_settings only in git repos),
prepare_repo_settings was changed to issue a BUG() if it is called by a
process whose CWD is not a Git repository.

The combination of commits mentioned above broke fuzz-commit-graph,
which attempts to parse arbitrary fuzzing-engine-provided bytes as a
commit graph file. Prior to this change, parse_commit_graph() called
prepare_repo_settings(), but since we run the fuzz tests without a valid
repository, we are hitting the BUG() from 44c7e62 for every test case.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14 15:42:17 -07:00