git-commit-vandalism

Author	SHA1	Message	Date
Siddharth Asthana	e9c1b0e38c	revision: improve commit_rewrite_person() The function, commit_rewrite_person(), is designed to find and replace an ident string in the header part, and the way it avoids a random occurrence of "author A U Thor <author@example.com" in the text is by insisting "author" to appear at the beginning of line by passing "\nauthor " as "what". The implementation also doesn't make any effort to limit itself to the commit header by locating the blank line that appears after the header part and stopping the search there. Also, the interface forces the caller to make multiple calls if it wants to rewrite idents on multiple headers. It shouldn't be the case. To support the existing caller better, update commit_rewrite_person() to: - Make a single pass in the input buffer to locate headers named "author" and "committer" and replace idents on them. - Stop at the end of the header, ensuring that nothing in the body of the commit object is modified. The return type of the function commit_rewrite_person() has also been changed from int to void. This has been done because the caller of the function doesn't do anything with the return value of the function. By simplifying the interface of the commit_rewrite_person(), we also intend to expose it as a public function. We will also be renaming the function in a future commit to a different name which clearly tells that the function replaces idents in the header of the commit buffer. Mentored-by: Christian Couder <christian.couder@gmail.com> Mentored-by: John Cai <johncai86@gmail.com> Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Siddharth Asthana <siddharthasthana31@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 12:55:53 -07:00
Teng Long	5dcee7c705	pack-bitmap.c: continue looping when first MIDX bitmap is found In "open_midx_bitmap()", we do a loop with the MIDX(es) in repo, when the first one has been found, then will break out by a "return" directly. But actually, it's better to continue the loop until we have visited both the MIDX in our repository, as well as any alternates (along with _their_ alternates, recursively). The reason for this is, there may exist more than one MIDX file in a repo. The "multi_pack_index" struct is actually designed as a singly linked list, and if a MIDX file has been already opened successfully, then the other MIDX files will be skipped and left with a warning "ignoring extra bitmap file." to the output. The discussion link of community: https://public-inbox.org/git/YjzCTLLDCby+kJrZ@nand.local/ Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:20:52 -07:00
Teng Long	9005eb021a	pack-bitmap.c: using error() instead of silently returning -1 In "open_pack_bitmap_1()" and "open_midx_bitmap_1()", it's better to return error() instead of "-1" when some unexpected error occurs like "stat bitmap file failed", "bitmap header is invalid" or "checksum mismatch", etc. There are places where we do not replace, such as when the bitmap does not exist (no bitmap in repository is allowed) or when another bitmap has already been opened (in which case it should be a warning rather than an error). Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:20:52 -07:00
Teng Long	6411cc08f3	pack-bitmap.c: do not ignore error when opening a bitmap file Calls to git_open() to open the pack bitmap file and multi-pack bitmap file do not report any error when they fail. These files are optional and it is not an error if open failed due to ENOENT, but we shouldn't be ignoring other kinds of errors. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:20:52 -07:00
Teng Long	349c26ff29	pack-bitmap.c: rename "idx_name" to "bitmap_name" In "open_pack_bitmap_1()" and "open_midx_bitmap_1()" we use a var named "idx_name" to represent the bitmap filename which is computed by "midx_bitmap_filename()" or "pack_bitmap_filename()" before we open it. There may bring some confusion in this "idx_name" naming, which might lead us to think of ".idx "or" multi-pack-index" files, although bitmap is essentially can be understood as a kind of index, let's define this name a little more accurate here. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:20:52 -07:00
Teng Long	9975975d7f	pack-bitmap.c: mark more strings for translations In pack-bitmap.c, some printed texts are translated, some are not. Let's support the translations of the bitmap related output. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:20:52 -07:00
Teng Long	baf20c39a7	pack-bitmap.c: fix formatting of error messages There are some text output issues in 'pack-bitmap.c', they exist in die(), error() etc. This includes issues with capitalization the first letter, newlines, error() instead of BUG(), and substitution that don't have quotes around them. Signed-off-by: Teng Long <dyroneteng@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:20:52 -07:00
Victoria Dye	72d3a5da32	scalar: convert README.md into a technical design doc Adapt the content from 'contrib/scalar/README.md' into a design document in 'Documentation/technical/'. In addition to reformatting for asciidoc, elaborate on the background, purpose, and design choices that went into Scalar. Most of this document will persist in the 'Documentation/technical/' after Scalar has been moved out of 'contrib/' and into the root of Git. Until that time, it will also contain a temporary "Roadmap" section detailing the remaining series needed to finish the initial version of Scalar. The section will be removed once Scalar is moved to the repo root, but in the meantime serves as a guide for readers to keep up with progress on the feature. Signed-off-by: Victoria Dye <vdye@github.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:03:56 -07:00
Victoria Dye	f22c95db53	scalar: reword command documentation to clarify purpose Rephrase documentation to describe scalar as a "large repo management tool" rather than an "opinionated management tool". The new description is intended to more directly reflect the utility of scalar to better guide users in preparation for scalar being built and installed as part of Git. Signed-off-by: Victoria Dye <vdye@github.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:03:56 -07:00
Martin Ågren	a700395eaf	t4200: drop irrelevant code While setting up an unresolved merge for `git rerere`, we run `git rev-parse` and `git fmt-merge-msg` to create a variable `$fifth` and a commit-message file `msg`, which we then never actually use. This has been like that since these tests were added in `672d1b789b` ("rerere: migrate to parse-options API", 2010-08-05). This does exercise `git rev-parse` and `git fmt-merge-msg`, but doesn't contribute to testing `git rerere`. Drop these lines. Reported-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 11:01:54 -07:00
Ævar Arnfjörð Bjarmason	3a251bac0d	trace2: only include "fsync" events if we git_fsync() Fix the overly verbose trace2 logging added in `9a4987677d` (trace2: add stats for fsync operations, 2022-03-30) (first released with v2.36.0). Since that change every single "git" command invocation has included these "data" events, even though we'll only make use of these with core.fsyncMethod=batch, and even then only have non-zero values if we're writing object data to disk. See `c0f4752ed2` (core.fsyncmethod: batched disk flushes for loose-objects, 2022-04-04) for that feature. As we're needing to indent the trace2_data_intmax() lines let's introduce helper variables to ensure that our resulting lines (which were already too) don't exceed the recommendations of the CodingGuidelines. Doing that requires either wrapping them twice, or introducing short throwaway variable names, let's do the latter. The result was that e.g. "git version" would previously emit a total of 6 trace2 events with the GIT_TRACE2_EVENT target (version, start, cmd_ancestry, cmd_name, exit, atexit), but afterwards would emit 8. We'd emit 2 "data" events before the "exit" event. The reason we didn't catch this was that the trace2 unit tests added in `a15860dca3` (trace2: t/helper/test-trace2, t0210.sh, t0211.sh, t0212.sh, 2019-02-22) would omit any "data" events that weren't the ones it cared about. Before this change to the C code 6/7 of our "t/t0212-trace2-event.sh" tests would fail if this change was applied to "t/t0212/parse_events.perl". Let's make the trace2 testing more strict, and further append any new events types we don't know about in "t/t0212/parse_events.perl". Since we only invoke the "test-tool trace2" there's no guarantee that we'll catch other overly verbose events in the future, but we'll at least notice if we start emitting new events that are issues every time we log anything with trace2's JSON target. We exclude the "data_json" event type, we'd otherwise would fail on both "win test" and "win+VS test" CI due to the logging added in `353d3d77f4` (trace2: collect Windows-specific process information, 2019-02-22). It looks like that logging should really be using trace2_cmd_ancestry() instead, which was introduced later in `2f732bf15e` (tr2: log parent process name, 2021-07-21), but let's leave it for now. The fix-up to `aaf81223f4` (unpack-objects: use stream_loose_object() to unpack large objects, 2022-06-11) is needed because we're changing the behavior of these events as discussed above. Since we'd always emit a "hardware-flush" event the test added in `aaf81223f4` wasn't testing anything except that this trace2 data was unconditionally logged. Even if "core.fsyncMethod" wasn't set to "batch" we'd pass the test. Now we'll check the expected number of "writeout" v.s. "flush" calls under "core.fsyncMethod=batch", but note that this doesn't actually test if we carried out the sync using that method, on a platform where we'd have to fall back to fsync() each of those "writeout" would really be a "flush" (i.e. a full fsync()). But in this case what we're testing is that the logic in "unpack-objects" behaves as expected, not the OS-specific question of whether we actually were able to use the "bulk" method. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 09:41:57 -07:00
Martin Ågren	ae436f283c	config/core.txt: fix minor issues for `core.sparseCheckoutCone` The sparse checkout feature can be used in "cone mode" or "non-cone mode". In this one instance in the documentation, we refer to the latter as "non cone mode" with whitespace rather than a hyphen. Align this with the rest of our documentation. A few words later in the same paragraph, there's mention of "a more flexible patterns". Drop that leading "a" to fix the grammar. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Acked-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 09:39:20 -07:00
SZEDER Gábor	a10f6e2bda	index-format.txt: remove outdated list of supported extensions The first section of 'Documentation/technical/index-format.txt' mentions that "Git currently supports cache tree and resolve undo extensions", but then goes on, and in the "Extensions" section describes not only these two, but six other extensions [1]. Remove this sentence, as it's misleading about the status of all those other extensions. Alternatively we could keep that sentence and update the list of extensions, but that might well lead to a recurring issue, because apparently this list is never updated when a new index extension is added. [1] Split index, untracked cache, FS monitor cache, end of index entry, index entry offset table and sparse directory entries. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-18 09:24:43 -07:00
René Scharfe	0f1eb7d6e9	mergesort: remove llist_mergesort() Now that all of its callers are gone, remove llist_mergesort(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:39 -07:00
René Scharfe	9b9f5f6217	packfile: use DEFINE_LIST_SORT Build a typed sort function for packed_git lists using DEFINE_LIST_SORT instead of calling llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 20218 320 0 110936 131474 20192 packfile.o With this patch: __TEXT __DATA __OBJC others dec hex 20430 320 0 112619 133369 208f9 packfile.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:39 -07:00
René Scharfe	6fc9fec07b	fetch-pack: use DEFINE_LIST_SORT Build a static typed ref sorting function using DEFINE_LIST_SORT along with a typed comparison function near its only two callers instead of having an exported version that calls llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of a slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 23231 389 0 113689 137309 2185d fetch-pack.o 29158 80 0 146864 176102 2afe6 remote.o With this patch: __TEXT __DATA __OBJC others dec hex 23591 389 0 117759 141739 229ab fetch-pack.o 29070 80 0 145718 174868 2ab14 remote.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:39 -07:00
René Scharfe	c0fb5774a6	commit: use DEFINE_LIST_SORT Use DEFINE_LIST_SORT to build a typed sort function for commit_list entries instead of calling llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of a slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 18795 92 0 104654 123541 1e295 commit.o With this patch: __TEXT __DATA __OBJC others dec hex 18963 92 0 106094 125149 1e8dd commit.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:39 -07:00
René Scharfe	47c30f7daa	blame: use DEFINE_LIST_SORT Build a typed sort function for blame entries using DEFINE_LIST_SORT instead of calling llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of a slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 24621 56 0 147515 172192 2a0a0 blame.o With this patch: __TEXT __DATA __OBJC others dec hex 25229 56 0 151702 176987 2b35b blame.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:38 -07:00
René Scharfe	b378c2ff1e	test-mergesort: use DEFINE_LIST_SORT Build a typed sort function for the mergesort performance test tool using DEFINE_LIST_SORT instead of calling llist_mergesort(). This gets rid of the next pointer accessor functions and improves the performance at the cost of a slightly higher object text size. Before: 0071.12: llist_mergesort() unsorted 0.24(0.22+0.01) 0071.14: llist_mergesort() sorted 0.12(0.10+0.01) 0071.16: llist_mergesort() reversed 0.12(0.10+0.01) __TEXT __DATA __OBJC others dec hex 6407 276 0 24701 31384 7a98 t/helper/test-mergesort.o With this patch: 0071.12: DEFINE_LIST_SORT unsorted 0.22(0.21+0.01) 0071.14: DEFINE_LIST_SORT sorted 0.11(0.10+0.01) 0071.16: DEFINE_LIST_SORT reversed 0.11(0.10+0.01) __TEXT __DATA __OBJC others dec hex 6615 276 0 25832 32723 7fd3 t/helper/test-mergesort.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:38 -07:00
René Scharfe	f00a039839	test-mergesort: use DEFINE_LIST_SORT_DEBUG Define a typed sort function using DEFINE_LIST_SORT_DEBUG for the mergesort sanity check instead of using llist_mergesort(). This gets rid of the next pointer accessor functions and improves the performance at the cost of slightly bigger object text. Before: Benchmark 1: t/helper/test-tool mergesort test Time (mean ± σ): 108.4 ms ± 0.2 ms [User: 106.7 ms, System: 1.2 ms] Range (min … max): 108.0 ms … 108.8 ms 27 runs __TEXT __DATA __OBJC others dec hex 6251 276 0 23172 29699 7403 t/helper/test-mergesort.o With this patch: Benchmark 1: t/helper/test-tool mergesort test Time (mean ± σ): 94.0 ms ± 0.2 ms [User: 92.4 ms, System: 1.1 ms] Range (min … max): 93.7 ms … 94.5 ms 31 runs __TEXT __DATA __OBJC others dec hex 6407 276 0 24701 31384 7a98 t/helper/test-mergesort.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:38 -07:00
René Scharfe	318051eaeb	mergesort: add macros for typed sort of linked lists Add the macros DECLARE_LIST_SORT and DEFINE_LIST_SORT for building type-specific functions for sorting linked lists. The generated function expects a typed comparison function. The programmer provides full type information (no void pointers). This allows the compiler to check whether the comparison function matches the list type. It can also inline the "next" pointer accessor functions and even the comparison function to get rid of the calling overhead. Also provide a DECLARE_LIST_SORT_DEBUG macro that allows executing custom code whenever the accessor functions are used. It's intended to be used by test-mergesort, which counts these operations. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:38 -07:00
René Scharfe	848afebe56	mergesort: tighten merge loop llist_merge() has special inner loops for taking elements from either of the two lists to merge. That helps consistently preferring one over the other, for stability. Merge the loops, swap the lists when the other one has the next element for the result and keep track on which one to prefer on equality. This results in shorter code and object text: Before: __TEXT __DATA __OBJC others dec hex 412 0 0 3441 3853 f0d mergesort.o With this patch: __TEXT __DATA __OBJC others dec hex 352 0 0 3516 3868 f1c mergesort.o Performance doesn't get worse: Before: 0071.12: llist_mergesort() unsorted 0.24(0.22+0.01) 0071.14: llist_mergesort() sorted 0.12(0.10+0.01) 0071.16: llist_mergesort() reversed 0.12(0.10+0.01) Benchmark 1: t/helper/test-tool mergesort test Time (mean ± σ): 109.2 ms ± 0.2 ms [User: 107.5 ms, System: 1.1 ms] Range (min … max): 108.9 ms … 109.6 ms 27 runs With this patch: 0071.12: llist_mergesort() unsorted 0.24(0.22+0.01) 0071.14: llist_mergesort() sorted 0.12(0.10+0.01) 0071.16: llist_mergesort() reversed 0.12(0.10+0.01) Benchmark 1: t/helper/test-tool mergesort test Time (mean ± σ): 108.4 ms ± 0.2 ms [User: 106.7 ms, System: 1.2 ms] Range (min … max): 108.0 ms … 108.8 ms 27 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:38 -07:00
René Scharfe	7a3775eeb4	mergesort: unify ranks loops llist_mergesort() has a loop for adding a new element to the ranks array and another one for rolling up said array into a single sorted list at the end. We can merge them, so that adding the last element rolls up the whole array. Handle the empty list before the main loop now because list can't be NULL anymore inside the loop. The result is shorter code and significantly less object text: main: __TEXT __DATA __OBJC others dec hex 652 0 0 4651 5303 14b7 mergesort.o With this patch: __TEXT __DATA __OBJC others dec hex 412 0 0 3441 3853 f0d mergesort.o Why is the change so big? The reduction is amplified by llist_merge() being inlined both before and after. Performance stays basically the same: main: 0071.12: llist_mergesort() unsorted 0.24(0.22+0.01) 0071.14: llist_mergesort() sorted 0.12(0.10+0.01) 0071.16: llist_mergesort() reversed 0.12(0.10+0.01) Benchmark 1: t/helper/test-tool mergesort test Time (mean ± σ): 109.0 ms ± 0.3 ms [User: 107.4 ms, System: 1.1 ms] Range (min … max): 108.7 ms … 109.6 ms 27 runs With this patch: 0071.12: llist_mergesort() unsorted 0.24(0.22+0.01) 0071.14: llist_mergesort() sorted 0.12(0.10+0.01) 0071.16: llist_mergesort() reversed 0.12(0.10+0.01) Benchmark 1: t/helper/test-tool mergesort test Time (mean ± σ): 109.2 ms ± 0.2 ms [User: 107.5 ms, System: 1.1 ms] Range (min … max): 108.9 ms … 109.6 ms 27 runs Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:38 -07:00
Manuel Boni	07aed58017	config.txt: document include, includeIf Git config's tab completion does not yet know about the "include" and "includeIf" sections, nor the related "path" variable. Add a description for these two sections in 'Documentation/config/includeif.txt', which points to git-config's documentation, specifically the "Includes" and "Conditional Includes" subsections. As a side effect, tab completion can successfully complete the 'include', 'includeIf', and 'include.add' expressions. This effect is tested by two new ad-hoc tests. Variable completion only works for "include" for now. Credit for the ideas behind this patch goes to Ævar Arnfjörð Bjarmason. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Manuel Boni <ziosombrero@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 14:23:42 -07:00
Taylor Blau	9550f6c16a	commit-graph: fix corrupt upgrade from generation v1 to v2 The previous commit demonstrates a bug where a commit-graph using generation v2 could enter a state where one of the GDA2 values has its most-significant bit set (indicating that its value should be read from the extended offset table in the GDO2 chunk) without having a GDO2 chunk to read from. This results in the following error message being displayed to the caller: fatal: commit-graph requires overflow generation data but has none This bug arises in the following scenario: - We decide to write a commit-graph using generation number v2, and decide (correctly) that no GDO2 chunk is necessary (e.g., because all of the commiter date offsets are no larger than 2^31-1). - The v2 generation numbers are stored in the `->generation` member of the commit slab holding `struct commit_graph_data`'s. - Later on, `load_commit_graph_info()` is called, overwriting the v2 generation data in the aforementioned slab with any existing v1 generation data. Then, when the commit-graph code goes to write the GDA2 chunk via `write_graph_chunk_generation_data()`, we use the overwritten generation v1 data in a place where we expect to use a v2 generation number: offset = commit_graph_data_at(c)->generation - c->date; ...because `commit_graph_data_at(c)->generation` used to hold the v2 generation data, but it was overwritten to contain the v1 generation number via `load_commit_graph_info()`. If the `offset` computation above overflows the v2 generation number max, then `write_graph_chunk_generation_data()` will update its count of large offsets and write the marker accordingly: if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW \| num_generation_data_overflows; num_generation_data_overflows++; } and reads will look for the GDO2 chunk containing the overflowing v2 generation number, after the commit-graph code decided that no such chunk was necessary. The main problem is that the slab containing `struct commit_graph_data` has a dual purpose. It is used to hold data that we are about to write to disk while generating a commit-graph, as well as hold data that was read from an existing commit-graph. When the two mix, namely when the result of reading the commit-graph has a side-effect that mixes poorly with an in-progress commit-graph write, we end up with corrupt data. A complete fix might be to introduce a new slab that is used exclusively for writing, and gate access between the two slabs based on context provided by the caller (e.g., whether this computation is part of a "read" or "write" operation). But a more minimal fix addresses the only known path which overwrites the slab data, which is `compute_bloom_filters()` -> `get_or_compute_bloom_filter()` -> `load_commit_graph_info()` -> `fill_commit_graph_info()` by avoiding the last call which clobbers the data altogether. This path only needs to learn the graph position of a given commit so that it can be used in `load_bloom_filter_from_graph()`. By replacing the last steps of the above with one that records the graph position into a temporary variable which is then used to load the existing Bloom data, we eliminate the clobbering, removing the corruption. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-15 16:51:39 -07:00
Taylor Blau	7805360b7a	commit-graph: introduce `repo_find_commit_pos_in_graph()` Low-level callers in systems that are adjacent to the commit-graph (like the changed-path Bloom filter code) could benefit from being able to call a function like `parse_commit_in_graph()` without modifying the corresponding commit slab data. This is useful in contexts where that slab data is being used to prepare for an upcoming commit-graph write, where Git must be careful to avoid clobbering any of that data during a read operation. Introduce a low-level variant of `parse_commit_in_graph()` which returns the graph position of a given commit only, without modifying any of the slab data. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-15 16:51:39 -07:00
Taylor Blau	2dd804cd12	t5318: demonstrate commit-graph generation v2 corruption When upgrading a commit-graph using generation v1 to one using generation v2, it is possible to force Git into a corrupt state where it (incorrectly) believes that a GDO2 chunk is necessary, after deciding not to write one. This makes subsequent reads using the commit-graph produce the following error message: fatal: commit-graph requires overflow generation data but has none Demonstrate this bug by increasing our test coverage to include a minimal example of upgrading a commit-graph from generation v1 to v2. The only notable components of this test are: - The committer date of the commit is chosen carefully so that the offset underflows when computed using a v1 generation number, but would not overflow when using v2 generation numbers. - The upgrade to generation number v2 must read in the v1 generation numbers, which we can do by passing `--changed-paths`, which will force the commit-graph internals to call `fill_commit_graph_info()`. A future patch will squash this bug. Reported-by: Jeff King <peff@peff.net> Reproduced-by: Will Chandler <wfc@wfchandler.org> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-15 16:51:38 -07:00
René Scharfe	ae25974de3	mingw: avoid mktemp() in mkstemp() implementation The implementation of mkstemp() for MinGW uses mktemp() and open() without the flag O_EXCL, which is racy. It's not a security problem for now because all of its callers only create files within the repository (incl. worktrees). Replace it with a call to our more secure internal function, git_mkstemp_mode(), to prevent possible future issues. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 22:45:05 -07:00
Taylor Blau	a92d8523ce	commit-graph: pass repo_settings instead of repository The parse_commit_graph() function takes a 'struct repository ' pointer, but it only ever accesses config settings (either directly or through the .settings field of the repo struct). Move all relevant config settings into the repo_settings struct, and update parse_commit_graph() and its existing callers so that it takes 'struct repo_settings ' instead. Callers of parse_commit_graph() will now need to call prepare_repo_settings() themselves, or initialize a 'struct repo_settings' directly. Prior to `ab14d0676c` (commit-graph: pass a 'struct repository *' in more places, 2020-09-09), parsing a commit-graph was a pure function depending only on the contents of the commit-graph itself. Commit `ab14d0676c` introduced a dependency on a `struct repository` pointer, and later commits such as `b66d84756f` (commit-graph: respect 'commitGraph.readChangedPaths', 2020-09-09) added dependencies on config settings, which were accessed through the `settings` field of the repository pointer. This field was initialized via a call to `prepare_repo_settings()`. Additionally, this fixes an issue in fuzz-commit-graph: In `44c7e62` (2021-12-06, repo-settings:prepare_repo_settings only in git repos), prepare_repo_settings was changed to issue a BUG() if it is called by a process whose CWD is not a Git repository. The combination of commits mentioned above broke fuzz-commit-graph, which attempts to parse arbitrary fuzzing-engine-provided bytes as a commit graph file. Prior to this change, parse_commit_graph() called prepare_repo_settings(), but since we run the fuzz tests without a valid repository, we are hitting the BUG() from `44c7e62` for every test case. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:42:17 -07:00
Glen Choo	8d1a744820	setup.c: create `safe.bareRepository` There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file. A user could run a Git command inside the bare repository thinking that the config file of the 'outer' repository would be used, but in reality, the bare repository's config file (which is attacker-controlled) is used, which may result in arbitrary code execution. See [1] for a fuller description and deeper discussion. A simple mitigation is to forbid bare repositories unless specified via `--git-dir` or `GIT_DIR`. In environments that don't use bare repositories, this would be minimally disruptive. Create a config variable, `safe.bareRepository`, that tells Git whether or not to die() when working with a bare repository. This config is an enum of: - "all": allow all bare repositories (this is the default) - "explicit": only allow bare repositories specified via --git-dir or GIT_DIR. If we want to protect users from such attacks by default, neither value will suffice - "all" provides no protection, but "explicit" is impractical for bare repository users. A more usable default would be to allow only non-embedded bare repositories ([2] contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series. [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com [2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:08:29 -07:00
Glen Choo	6061601d9f	safe.directory: use git_protected_config() Use git_protected_config() to read `safe.directory` instead of read_very_early_config(), making it 'protected configuration only'. As a result, `safe.directory` now respects "-c", so update the tests and docs accordingly. It used to ignore "-c" due to how it was implemented, not because of security or correctness concerns [1]. [1] https://lore.kernel.org/git/xmqqlevabcsu.fsf@gitster.g/ Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:08:29 -07:00
Glen Choo	5b3c650777	config: learn `git_protected_config()` `uploadpack.packObjectsHook` is the only 'protected configuration only' variable today, but we've noted that `safe.directory` and the upcoming `safe.bareRepository` should also be 'protected configuration only'. So, for consistency, we'd like to have a single implementation for protected configuration. The primary constraints are: 1. Reading from protected configuration should be fast. Nearly all "git" commands inside a bare repository will read both `safe.directory` and `safe.bareRepository`, so we cannot afford to be slow. 2. Protected configuration must be readable when the gitdir is not known. `safe.directory` and `safe.bareRepository` both affect repository discovery and the gitdir is not known at that point [1]. The chosen implementation in this commit is to read protected configuration and cache the values in a global configset. This is similar to the caching behavior we get with the_repository->config. Introduce git_protected_config(), which reads protected configuration and caches them in the global configset protected_config. Then, refactor `uploadpack.packObjectsHook` to use git_protected_config(). The protected configuration functions are named similarly to their non-protected counterparts, e.g. git_protected_config_check_init() vs git_config_check_init(). In light of constraint 1, this implementation can still be improved. git_protected_config() iterates through every variable in protected_config, which is wasteful, but it makes the conversion simple because it matches existing patterns. We will likely implement constant time lookup functions for protected configuration in a future series (such functions already exist for non-protected configuration, i.e. repo_config_get_*()). An alternative that avoids introducing another configset is to continue to read all config using git_config(), but only accept values that have the correct config scope [2]. This technically fulfills constraint 2, because git_config() simply ignores the local and worktree config when the gitdir is not known. However, this would read incomplete config into the_repository->config, which would need to be reset when the gitdir is known and git_config() needs to read the local and worktree config. Resetting the_repository->config might be reasonable while we only have these 'protected configuration only' variables, but it's not clear whether this extends well to future variables. [1] In this case, we do have a candidate gitdir though, so with a little refactoring, it might be possible to provide a gitdir. [2] This is how `uploadpack.packObjectsHook` was implemented prior to this commit. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:08:29 -07:00
Glen Choo	779ea9303a	Documentation: define protected configuration For security reasons, there are config variables that are only trusted when they are specified in certain configuration scopes, which are sometimes referred to on-list as 'protected configuration' [1]. A future commit will introduce another such variable, so let's define our terms so that we can have consistent documentation and implementation. In our documentation, define 'protected configuration' as the system, global and command config scopes. As a shorthand, I will refer to variables that are only respected in protected configuration as 'protected configuration only', but this term is not used in the documentation. This definition of protected configuration is based on whether or not Git can reasonably protect the user by ignoring the configuration scope: - System, global and command line config are considered protected because an attacker who has control over any of those can do plenty of harm without Git, so we gain very little by ignoring those scopes. - On the other hand, local (and similarly, worktree) config are not considered protected because it is relatively easy for an attacker to control local config, e.g.: - On some shared user environments, a non-admin attacker can create a repository high up the directory hierarchy (e.g. C:\.git on Windows), and a user may accidentally use it when their PS1 automatically invokes "git" commands. `safe.directory` prevents attacks of this form by making sure that the user intended to use the shared repository. It obviously shouldn't be read from the repository, because that would end up trusting the repository that Git was supposed to reject. - "git upload-pack" is expected to run in repositories that may not be controlled by the user. We cannot ignore all config in that repository (because "git upload-pack" would fail), but we can limit the risks by ignoring `uploadpack.packObjectsHook`. Only `uploadpack.packObjectsHook` is 'protected configuration only'. The following variables are intentionally excluded: - `safe.directory` should be 'protected configuration only', but it does not technically fit the definition because it is not respected in the "command" scope. A future commit will fix this. - `trace2.` happens to read the same scopes as `safe.directory` because they share an implementation. However, this is not for security reasons; it is because we want to start tracing so early that repository-level config and "-c" are not available [2]. This requirement is unique to `trace2.`, so it does not makes sense for protected configuration to be subject to the same constraints. [1] For example, https://lore.kernel.org/git/6af83767-576b-75c4-c778-0284344a8fe7@github.com/ [2] https://lore.kernel.org/git/a0c89d0d-669e-bf56-25d2-cbb09b012e70@jeffhostetler.com/ Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:08:29 -07:00
Glen Choo	5f5af3735d	Documentation/git-config.txt: add SCOPES section In a subsequent commit, we will introduce "protected configuration", which is easiest to describe in terms of configuration scopes (i.e. it's the union of the 'system', 'global', and 'command' scopes). This description is fine for ML discussions, but it's inadequate for end users because we don't provide a good description of "configuration scopes" in the public docs. `145d59f482` (config: add '--show-scope' to print the scope of a config value, 2020-02-10) introduced the word "scope" to our public docs, but that only enumerates the scopes and assumes the user can figure out what those values mean. Add a SCOPES section to Documentation/git-config.txt that describes the configuration scopes, their corresponding CLI options, and mentions that some configuration options are only respected in certain scopes. Then, use the word "scope" to simplify the FILES section and change some confusing wording. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:08:29 -07:00
Junio C Hamano	9dd64cb4d3	The third batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 15:04:00 -07:00
Junio C Hamano	361cbe6d6d	Merge branch 'ab/submodule-cleanup' Further preparation to turn git-submodule.sh into a builtin. * ab/submodule-cleanup: git-sh-setup.sh: remove "say" function, change last users git-submodule.sh: use "$quiet", not "$GIT_QUIET" submodule--helper: eliminate internal "--update" option submodule--helper: understand --checkout, --merge and --rebase synonyms submodule--helper: report "submodule" as our name in some "-h" output submodule--helper: rename "absorb-git-dirs" to "absorbgitdirs" submodule update: remove "-v" option submodule--helper: have --require-init imply --init git-submodule.sh: remove unused top-level "--branch" argument git-submodule.sh: make the "$cached" variable a boolean git-submodule.sh: remove unused $prefix variable git-submodule.sh: remove unused sanitize_submodule_env()	2022-07-14 15:04:00 -07:00
Junio C Hamano	0455aad1e3	Merge branch 'sy/mv-out-of-cone' "git mv A B" in a sparsely populated working tree can be asked to move a path between directories that are "in cone" (i.e. expected to be materialized in the working tree) and "out of cone" (i.e. expected to be hidden). The handling of such cases has been improved. * sy/mv-out-of-cone: mv: add check_dir_in_index() and solve general dir check issue mv: use flags mode for update_mode mv: check if <destination> exists in index to handle overwriting mv: check if out-of-cone file exists in index with SKIP_WORKTREE bit mv: decouple if/else-if checks using goto mv: update sparsity after moving from out-of-cone to in-cone t1092: mv directory from out-of-cone to in-cone t7002: add tests for moving out-of-cone file/directory	2022-07-14 15:04:00 -07:00
Junio C Hamano	73b9ef6ab1	Merge branch 'hx/unpack-streaming' Allow large objects read from a packstream to be streamed into a loose object file straight, without having to keep it in-core as a whole. * hx/unpack-streaming: unpack-objects: use stream_loose_object() to unpack large objects core doc: modernize core.bigFileThreshold documentation object-file.c: add "stream_loose_object()" to handle large object object-file.c: factor out deflate part of write_loose_object() object-file.c: refactor write_loose_object() to several steps unpack-objects: low memory footprint for get_data() in dry_run mode	2022-07-14 15:03:59 -07:00
Junio C Hamano	be733e1200	Merge branch 'en/merge-tree' "git merge-tree" learned a new mode where it takes two commits and computes a tree that would result in the merge commit, if the histories leading to these two commits were to be merged. * en/merge-tree: git-merge-tree.txt: add a section on potentional usage mistakes merge-tree: add a --allow-unrelated-histories flag merge-tree: allow `ls-files -u` style info to be NUL terminated merge-ort: optionally produce machine-readable output merge-ort: store more specific conflict information merge-ort: make `path_messages` a strmap to a string_list merge-ort: store messages in a list, not in a single strbuf merge-tree: provide easy access to `ls-files -u` style info merge-tree: provide a list of which files have conflicts merge-ort: remove command-line-centric submodule message from merge-ort merge-ort: provide a merge_get_conflicted_files() helper function merge-tree: support including merge messages in output merge-ort: split out a separate display_update_messages() function merge-tree: implement real merges merge-tree: add option parsing and initial shell for real merge function merge-tree: move logic for existing merge into new function merge-tree: rename merge_trees() to trivial_merge_trees()	2022-07-14 15:03:59 -07:00
Junio C Hamano	dc6315e1fc	Merge branch 'gg/worktree-from-the-above' In a non-bare repository, the behavior of Git when the core.worktree configuration variable points at a directory that has a repository as its subdirectory, regressed in Git 2.27 days. * gg/worktree-from-the-above: dir: minor refactoring / clean-up dir: traverse into repository	2022-07-14 15:03:58 -07:00
Johannes Schindelin	df534dcbaa	shortlog: use a stable sort When sorting the output of `git shortlog` by count, a list of authors in alphabetical order is then sorted by contribution count. Obviously, the idea is to maintain the alphabetical order for items with identical contribution count. At the moment, this job is performed by `qsort()`. As that function is not guaranteed to implement a stable sort algorithm, this can lead to inconsistent and/or surprising behavior: items with identical contribution count could lose their alphabetical sub-order. The `qsort()` in MS Visual C's runtime does _not_ implement a stable sort algorithm, and under certain circumstances this even causes a test failure in t4201.21 "shortlog can match multiple groups", where two authors both are listed with 2 contributions, and are listed in inverse alphabetical order. Let's instead use the stable sort provided by `git_stable_qsort()` to avoid this inconsistency. This is a companion to `2049b8dc65` (diffcore_rename(): use a stable sort, 2019-09-30). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 11:24:11 -07:00
Johannes Schindelin	ccc7b5148b	mergetool(vimdiff): allow paths to contain spaces again In `0041797449` (vimdiff: new implementation with layout support, 2022-03-30), we introduced a completely new implementation of the `vimdiff` backend for `git mergetool`. In this implementation, we no longer call `vim` directly but we accumulate in the variable `FINAL_CMD` an arbitrary number of commands for `vim` to execute, which necessitates the use of `eval` to split the commands properly into multiple command-line arguments. That same `eval` command also needs to pass the paths to `vim`, and while it looks as if they are quoted correctly, that quoting only reaches the `eval` instruction and is lost after that, therefore paths that contain whitespace characters (or other characters that are interpreted by the POSIX shell) are handled incorrectly. This is a simple reproducer: git init -b main bam-merge-fail cd bam-merge-fail echo a>"a file.txt" git add "a file.txt" git commit -m "added 'a file.txt'" echo b>"a file.txt" git add "a file.txt" git commit -m "diverged b 'a file.txt'" git checkout -b c HEAD~ echo c>"a file.txt" git add "a file.txt" git commit -m "diverged c 'a file.txt'" git checkout main git merge c git mergetool --tool=vimdiff With Git v2.37.0/v2.37.1, this will open 7 buffers, not four, and not display the correct contents at all. To fix this, let's not expand the variables containing the path parameters before passing them to the `eval` command, but let that command expand the variables instead. This fixes https://github.com/git-for-windows/git/issues/3945 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 10:37:44 -07:00
Matheus Tavares	611c7785e8	checkout: fix two bugs on the final count of updated entries At the end of `git checkout <pathspec>`, we get a message informing how many entries were updated in the working tree. However, this number can be inaccurate for two reasons: 1) Delayed entries currently get counted twice. 2) Failed entries are included in the count. The first problem happens because the counter is first incremented before inserting the entry in the delayed checkout queue, and once again when finish_delayed_checkout() calls checkout_entry(). And the second happens because the counter is incremented too early in checkout_entry(), before the entry was in fact checked out. Fix that by moving the count increment further down in the call stack and removing the duplicate increment on delayed entries. Note that we have to keep a per-entry reference for the counter (both on parallel checkout and delayed checkout) because not all entries are always accumulated at the same counter. See checkout_worktree(), at builtin/checkout.c for an example. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 10:19:28 -07:00
Matheus Tavares	11d14dee43	checkout: show bug about failed entries being included in final report After checkout, git usually reports how many entries were updated at that operation. However, because we count the entries too soon during the checkout process, we may actually include entries that do not get properly checked out in the end. This can lead to an inaccurate final report if the user expects it to show only the successful updates. This will be fixed in the next commit, but for now let's document it with a test that cover all checkout modes. Note that `test_checkout_workers` have to be slightly adjusted in order to use the construct `test_checkout_workers ... test_must_fail git checkout`. The function runs the command given to it with an assignment prefix to set the GIT_TRACE2 variable. However, this this assignment has an undefined behavior when the command is a shell function (like `test_must_fail`). As POSIX specifies: If the command name is a function that is not a standard utility implemented as a function, variable assignments shall affect the current execution environment during the execution of the function. It is unspecified: - Whether or not the variable assignments persist after the completion of the function - Whether or not the variables gain the export attribute during the execution of the function Thus, in order to make sure the GIT_TRACE2 value gets visible to the git command executed by `test_must_fail`, export the variable and run git in a subshell. [1]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html (Vol. 3: Shell and Utilities, Section 2.9.1: Simple Commands) Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 10:19:27 -07:00
Matheus Tavares	ed602c3f44	checkout: document bug where delayed checkout counts entries twice At the end of a `git checkout <pathspec>` operation, git reports how many paths were checked out with a message like "Updated N paths from the index". However, entries that end up on the delayed checkout queue (as requested by a long-running process filter) get counted twice, producing a wrong number in the final report. We will fix this bug in an upcoming commit. For now, only document/demonstrate it with a test_expect_failure. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 10:19:27 -07:00
Johannes Schindelin	7253f7ca9f	tests: fix incorrect --write-junit-xml code In `78d5e4cfb4` (tests: refactor --write-junit-xml code, 2022-05-21), this developer refactored the `--write-junit-xml` code a bit, including the part where the current test case's title was used in a `set` invocation, but failed to account for the fact that some test cases' titles start with a long option, which the `set` misinterprets as being intended for parsing. Let's fix this by using the `set -- <...>` form. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-14 10:02:06 -07:00
Junio C Hamano	4e2a4d1dd4	The second batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-13 14:54:56 -07:00
Junio C Hamano	fba8e7fa2d	Merge branch 'ds/git-rebase-doc-markup' References to commands-to-be-typed-literally in "git rebase" documentation mark-up have been corrected. * ds/git-rebase-doc-markup: git-rebase.txt: use back-ticks consistently	2022-07-13 14:54:56 -07:00
Junio C Hamano	9a13943ef4	Merge branch 'tk/rev-parse-doc-clarify-at-u' Doc update. * tk/rev-parse-doc-clarify-at-u: rev-parse: documentation adjustment - mention remote tracking with @{u}	2022-07-13 14:54:55 -07:00
Junio C Hamano	8c4f65e0bf	Merge branch 'cl/grep-max-count' "git grep -m<max-hits>" is a way to limit the hits shown per file. * cl/grep-max-count: grep: add --max-count command line option	2022-07-13 14:54:55 -07:00

... 8 9 10 11 12 ...

67856 Commits