git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	c72f2febae	Merge branch 'ab/coding-guidelines-c99' into maint-2.38 Update CodingGuidelines to clarify what features to use and avoid in C99. * ab/coding-guidelines-c99: CodingGuidelines: recommend against unportable C99 struct syntax CodingGuidelines: mention C99 features we can't use CodingGuidelines: allow declaring variables in for loops CodingGuidelines: mention dynamic C99 initializer elements CodingGuidelines: update for C99	2022-10-25 17:11:32 -07:00
Junio C Hamano	3882a0d3ad	Documentation: add lint-fsck-msgids During the initial development of the fsck-msgids.txt feature, it has become apparent that it is very much error prone to make sure the description in the documentation file are sorted and correctly match what is in the fsck.h header file. Add a quick-and-dirty Perl script and doc-lint target to sanity check that the fsck-msgids.txt is consistent with the error type list in the fsck.h header file. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-25 15:44:19 -07:00
John Cai	f6534dbda4	fsck: document msg-id The documentation lacks mention of specific <msg-id> that are supported. While git-help --config will display a list of these options, often developers' first instinct is to consult the git docs to find valid config values. Add a list of fsck error messages, and link to it from the git-fsck documentation. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-25 15:44:18 -07:00
Junio C Hamano	7edfb883ab	fsck: remove the unused MISSING_TREE_OBJECT This error type has never been used since it was introduced at `159e7b08` (fsck: detect gitmodules files, 2018-05-02). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-25 15:44:18 -07:00
John Cai	51691fed06	fsck: remove the unused BAD_TAG_OBJECT `2175a0c6` (fsck: stop checking tag->tagged, 2019-10-18) stopped checking the tagged object referred to by a tag object, which is what the error message BAD_TAG_OBJECT was for. Since then the BAD_TAG_OBJECT message is no longer used anywhere. Remove the BAD_TAG_OBJECT msg-id. Signed-off-by: John Cai <johncai86@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-25 15:44:18 -07:00
Taylor Blau	f1c0e3946e	apply: reject patches larger than ~1 GiB The apply code is not prepared to handle extremely large files. It uses "int" in some places, and "unsigned long" in others. This combination leads to unfortunate problems when switching between the two types. Using "int" prevents us from handling large files, since large offsets will wrap around and spill into small negative values, which can result in wrong behavior (like accessing the patch buffer with a negative offset). Converting from "unsigned long" to "int" also has truncation problems even on LLP64 platforms where "long" is the same size as "int", since the former is unsigned but the latter is not. To avoid potential overflow and truncation issues in `git apply`, apply similar treatment as in `dcd1742e56` (xdiff: reject files larger than ~1GB, 2015-09-24), where the xdiff code was taught to reject large files for similar reasons. The maximum size was chosen somewhat arbitrarily, but picking a value just shy of a gigabyte allows us to double it without overflowing 2^31-1 (after which point our value would wrap around to a negative number). To give ourselves a bit of extra margin, the maximum patch size is a MiB smaller than a full GiB, which gives us some slop in case we allocate "(records + 1) * sizeof(int)" or similar. Luckily, the security implications of these conversion issues are relatively uninteresting, because a victim needs to be convinced to apply a malicious patch. Reported-by: 정재우 <thebound7@gmail.com> Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-25 15:21:17 -07:00
Julia Ramer	a294443fa1	embargoed releases: also describe the git-security list and the process With the recent turnover on the git-security list, questions came up how things are usually run. Rather than answering questions individually, extend Git's existing documentation about security vulnerabilities to describe the git-security mailing list, how things are run on that list, and what to expect throughout the process from the time a security bug is reported all the way to the time when a fix is released. Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Julia Ramer <gitprplr@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 16:03:59 -07:00
Jerry Zhang	0d32ae8d7f	builtin: patch-id: remove unused diff-tree prefix The last git version that had "diff-tree" in the header text of "git diff-tree" output was v1.3.0 from 2006. The header text was changed from "diff-tree" to "commit" in `91539833` ("Log message printout cleanups"). Given how long ago this change was made, it is highly unlikely that anyone is still feeding in outputs from that git version. Remove the handling of the "diff-tree" prefix and document the source of the other prefixes so that the overall functionality is more clear. Signed-off-by: Jerry Zhang <Jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 15:44:20 -07:00
Jerry Zhang	2871f4d447	builtin: patch-id: add --verbatim as a command mode There are situations where the user might not want the default setting where patch-id strips all whitespace. They might be working in a language where white space is syntactically important, or they might have CI testing that enforces strict whitespace linting. In these cases, a whitespace change would result in the patch fundamentally changing, and thus deserving of a different id. Add a new mode that is exclusive of --stable and --unstable called --verbatim. It also corresponds to the config patchid.verbatim = true. In this mode, the stable algorithm is used and whitespace is not stripped from the patch text. Users of --unstable mainly care about compatibility with old git versions, which unstripping the whitespace would break. Thus there isn't a usecase for the combination of --verbatim and --unstable, and we don't expose this so as to not add maintainence burden. Signed-off-by: Jerry Zhang <jerry@skydio.com> fixes https://github.com/Skydio/revup/issues/2 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 15:44:20 -07:00
Jerry Zhang	93105aba6c	patch-id: fix patch-id for mode changes Currently patch-id as used in rebase and cherry-pick does not account for file modes if the file is modified. One consequence of this is that if you have a local patch that changes modes, but upstream has applied an outdated version of the patch that doesn't include that mode change, "git rebase" will drop your local version of the patch along with your mode changes. It also means that internal patch-id doesn't produce the same output as the builtin, which does account for mode changes due to them being part of diff output. Fix by adding mode to the patch-id if it has changed, in the same format that would be produced by diff, so that it is compatible with builtin patch-id. Signed-off-by: Jerry Zhang <Jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 15:44:20 -07:00
Jerry Zhang	0df19eb9d9	builtin: patch-id: fix patch-id with binary diffs "git patch-id" currently doesn't produce correct output if the incoming diff has any binary files. Add logic to get_one_patchid to handle the different possible styles of binary diff. This attempts to keep resulting patch-ids identical to what would be produced by the counterpart logic in diff.c, that is it produces the id by hashing the a and b oids in succession. In general we handle binary diffs by first caching the object ids from the "index" line and using those if we then find an indication that the diff is binary. The input could contain patches generated with "git diff --binary". This currently breaks the parse logic and results in multiple patch-ids output for a single commit. Here we have to skip the contents of the patch itself since those do not go into the patch id. --binary implies --full-index so the object ids are always available. When the diff is generated with --full-index there is no patch content to skip over. When a diff is generated without --full-index or --binary, it will contain abbreviated object ids. This will still result in a sufficiently unique patch-id when hashed, but does not match internal patch id output. We'll call this ok for now as we already need specialized arguments to diff in order to match internal patch id (namely -U3). Signed-off-by: Jerry Zhang <Jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 15:44:19 -07:00
Jerry Zhang	51276c1832	patch-id: use stable patch-id for rebases Git doesn't persist patch-ids during the rebase process, so there is no need to specifically invoke the unstable variant. Use the stable logic for all internal patch-id calculations to minimize the number of code paths and improve test coverage. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 15:44:19 -07:00
Jerry Zhang	0570be79ea	patch-id: fix stable patch id for binary / header-only Patch-ids for binary patches are found by hashing the object ids of the before and after objects in succession. However in the --stable case, there is a bug where hunks are not flushed for binary and header-only patch ids, which would always result in a patch-id of 0000. The --unstable case is currently correct. Reorder the logic to branch into 3 cases for populating the patch body: header-only which populates nothing, binary which populates the object ids, and normal which populates the text diff. All branches will end up flushing the hunk. Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond to those lines not being present in the "git diff" text output. This is necessary because we advertise that the patch-id calculated internally and used in format-patch is the same that what the builtin "git patch-id" would produce when piped from a diff. Update the test to run on both binary and normal files. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 15:44:19 -07:00
Taylor Blau	7b11234e3b	shortlog: implement `--group=committer` in terms of `--group=<format>` In the same spirit as the previous commit, reimplement `--group=committer` as a special case of `--group=<format>`, too. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 14:48:05 -07:00
Taylor Blau	9c10d4ff24	shortlog: implement `--group=author` in terms of `--group=<format>` Instead of handling SHORTLOG_GROUP_AUTHOR separately, reimplement it as a special case of the new `--group=<format>` mode, where the author mode is a shorthand for `--group='%aN <%aE>'. Note that we still need to keep the SHORTLOG_GROUP_AUTHOR enum since it has a different meaning in `read_from_stdin()`, where it is still used for a different purpose. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 14:48:05 -07:00
Taylor Blau	10538e2a62	shortlog: extract `shortlog_finish_setup()` Extract a function which finishes setting up the shortlog struct for use. The caller in `make_cover_letter()` does not care about trailer sorting, so it isn't strictly necessary to add a call there in this patch. But the next patch will add additional functionality to the new `shortlog_finish_setup()` function, which the caller in `make_cover_letter()` will care about. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 14:48:05 -07:00
Taylor Blau	3dc95e09e1	shortlog: support arbitrary commit format `--group`s In addition to generating a shortlog based on committer, author, or the identity in one or more specified trailers, it can be useful to generate a shortlog based on an arbitrary commit format. This can be used, for example, to generate a distribution of commit activity over time, like so: $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0.. 117 2022-06 274 2022-07 324 2022-08 263 2022-09 7 2022-10 Arbitrary commit formats can be used. In fact, `git shortlog`'s default behavior (to count by commit authors) can be emulated as follows: $ git shortlog --group='%aN <%aE>' ... and future patches will make the default behavior (as well as `--committer`, and `--group=trailer:<trailer>`) special cases of the more flexible `--group` option. Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to designate that `--group:<format>` is in use when in stdin mode to declare that the combination is invalid. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 14:48:05 -07:00
Taylor Blau	b017d3dae9	shortlog: extract `--group` fragment for translation The subsequent commit will add another unhandled case in `read_from_stdin()` which will want to use the same message as with `--group=trailer`. Extract the "--group=trailer" part from this message so the same translation key can be used for both cases. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 14:48:05 -07:00
Taylor Blau	0b293df964	shortlog: make trailer insertion a noop when appropriate When there are no trailers to insert, it is natural that insert_records_from_trailers() should return without having done any work. But instead we guard this call unnecessarily by first checking whether `log->groups` has the `SHORTLOG_GROUP_TRAILER` bit set. Prepare to match a similar pattern in the future where a function which inserts records of a certain type does no work when no specifiers matching that type are given. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 14:48:05 -07:00
Jeff King	251554c269	shortlog: accept `--date`-related options Prepare for a future patch which will introduce arbitrary pretty formats via the `--group` argument. To allow additional customizability (for example, to support something like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which groups commits by the datestring 'YYYY-mm' according to author date), we must store off the `--date` parsed from calling `parse_revision_opt()`. Note that this also affects custom output `--format` strings in `git shortlog`. Though this is a behavior change, this is arguably fixing a long-standing bug (ie., that `--format` strings are not affected by `--date` specifiers as they should be). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 14:48:05 -07:00
Jeff Hostetler	81071626ba	trace2: add global counter mechanism Add global counters mechanism to Trace2. The Trace2 counters mechanism adds the ability to create a set of global counter variables and an API to increment them efficiently. Counters can optionally report per-thread usage in addition to the sum across all threads. Counter events are emitted to the Trace2 logs when a thread exits and at process exit. Counters are an alternative to `data` and `data_json` events. Counters are useful when you want to measure something across the life of the process, when you don't want per-measurement events for performance reasons, when the data does not fit conveniently within a region, or when your control flow does not easily let you write the final total. For example, you might use this to report the number of calls to unzip() or the number of de-delta steps during a checkout. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:26 -07:00
Jeff Hostetler	8ad575646c	trace2: add stopwatch timers Add stopwatch timer mechanism to Trace2. Timers are an alternative to Trace2 Regions. Regions are useful for measuring the time spent in various computation phases, such as the time to read the index, time to scan for unstaged files, time to scan for untracked files, and etc. However, regions are not appropriate in all places. For example, during a checkout, it would be very inefficient to use regions to measure the total time spent inflating objects from the ODB from across the entire lifetime of the process; a per-unzip() region would flood the output and significantly slow the command; and some form of post-processing would be requried to compute the time spent in unzip(). Timers can be used to measure a series of timer intervals and emit a single summary event (at thread and/or process exit). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:26 -07:00
Jeff Hostetler	24a4c45da9	trace2: convert ctx.thread_name from strbuf to pointer Convert the `tr2tls_thread_ctx.thread_name` field from a `strbuf` to a "const char*" pointer. The `thread_name` field is a constant string that is constructed when the context is created. Using a (non-const) `strbuf` structure for it caused some confusion in the past because it implied that someone could rename a thread after it was created. That usage was not intended. Change it to a const pointer to make the intent more clear. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:26 -07:00
Jeff Hostetler	3124793604	trace2: improve thread-name documentation in the thread-context Improve the documentation of the tr2tls_thread_ctx.thread_name field and its relation to the tr2tls_thread_ctx.thread_id field. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:25 -07:00
Jeff Hostetler	a70839cf36	trace2: rename the thread_name argument to trace2_thread_start Rename the `thread_name` argument in `tr2tls_create_self()` and `trace2_thread_start()` to be `thread_base_name` to make it clearer that the passed argument is a component used in the construction of the actual `struct tr2tls_thread_ctx.thread_name` variable. The base name will be used along with the thread id to create a unique thread name. This commit does not change how the `thread_name` field is allocated or stored within the `tr2tls_thread_ctx` structure. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:25 -07:00
Jeff Hostetler	8e8c5ad27a	api-trace2.txt: elminate section describing the public trace2 API Eliminate the mostly obsolete `Public API` sub-section from the `Trace2 API` section in the documentation. Strengthen the referral to `trace2.h`. Most of the technical information in this sub-section was moved to `trace2.h` in `6c51cb525d` (trace2: move doc to trace2.h, 2019-11-17) to be adjacent to the function prototypes. The remaining text wasn't that useful by itself. Furthermore, the text would need a bit of overhaul to add routines that do not immediately generate a message, such as stopwatch timers. So it seemed simpler to just get rid of it. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:25 -07:00
Jeff Hostetler	5bbb925137	tr2tls: clarify TLS terminology Reduce or eliminate use of the term "TLS" in the Trace2 code. The term "TLS" has two popular meanings: "thread-local storage" and "transport layer security". In the Trace2 source, the term is associated with the former. There was concern on the mailing list about it refering to the latter. Update the source and documentation to eliminate the use of the "TLS" term or replace it with the phrase "thread-local storage" to reduce ambiguity. Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:25 -07:00
Jeff Hostetler	545ddca0c3	trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx Use "size_t" rather than "int" for the "alloc" and "nr_open_regions" fields in the "tr2tls_thread_ctx". These are used by ALLOC_GROW(). Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-24 12:45:25 -07:00
René Scharfe	cdc3db33ce	submodule: use strvec_pushf() for --super-prefix absorb_git_dir_into_superproject() uses a strbuf and strvec_pushl() to build and add the --super-prefix option and its argument. Use a single strvec_pushf() call to add the stuck form instead, which reduces the code size and avoids a strbuf allocation and release. The same is already done in submodule_reset_index() and submodule_move_head(). Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-23 14:07:32 -07:00
Jeff King	9b3fadfd06	t7700: annotate cruft-pack failure with ok=sigpipe One of our tests intentionally causes the cruft-pack generation phase of repack to fail, in order to stimulate an exit from repack at the desired moment. It does so by feeding a bogus option argument to pack-objects. This is a simple and reliable way to get pack-objects to fail, but it has one downside: pack-objects will die before reading its stdin, which means the caller repack may racily get SIGPIPE writing to it. For the purposes of this test, that's OK. We are checking whether repack cleans up already-created .tmp files, and it will do so whether it exits or dies by signal (because the tempfile API hooks both). But we have to tell test_must_fail that either outcome is OK, or it complains about the signal. Arguably this is a workaround (compared to fixing repack), as repack dying to SIGPIPE means that it loses the opportunity to give a more detailed message. But we don't actually write such a message anyway; we rely on pack-objects to have written something useful to stderr, and it does. In either case (signal or exit), that is the main thing the user will see. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-23 11:08:45 -07:00
Elijah Newren	ec1edbcb56	merge-tree: support multiple batched merges with --stdin Add an option, --stdin, to merge-tree which will accept lines of input with two branches to merge per line, and which will perform all the merges and give output for each in turn. This option implies -z, and modifies the output to also include a merge status since the exit code of the program can no longer convey that information now that multiple merges are involved. This could be useful, for example, by Git hosting providers. When one branch is updated, one may want to check whether all code reviews targetting that branch can still cleanly merge. Avoiding the overhead of starting up a separate process for each of those code reviews might provide significant savings in a repository with many code reviews. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-22 22:21:26 -07:00
Elijah Newren	a9f5bb83e0	merge-tree: update documentation for differences in -z output The Informational Messages was updated in `de90581141` ("merge-ort: optionally produce machine-readable output", 2022-06-18) to provide more detailed and machine parseable output when `-z` is passed, but the Documentation was not updated to reflect these changes. Update it now. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-22 22:21:24 -07:00
Jeff King	20da61f25f	Git.pm: trust rev-parse to find bare repositories When initializing a repository object, we run "git rev-parse --git-dir" to let the C version of Git find the correct directory. But curiously, if this fails we don't automatically say "not a git repository". Instead, we do our own pure-perl check to see if we're in a bare repository. This makes little sense, as rev-parse will report both bare and non-bare directories. This logic comes from `d5c7721d58` (Git.pm: Add support for subdirectories inside of working copies, 2006-06-24), but I don't see any reason given why we can't just rely on rev-parse. Worse, because we treat any non-error response from rev-parse as a non-bare repository, we'll erroneously set the object's WorkingCopy, even in a bare repository. But it gets worse. Since `8959555cee` (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), it's actively wrong (and dangerous). The perl code doesn't implement the same ownership checks. And worse, after "finding" the bare repository, it sets GIT_DIR in the environment, which tells any subsequent Git commands that we've confirmed the directory is OK, and to trust us. I.e., it re-opens the vulnerability plugged by `8959555cee` when using Git.pm's repository discovery code. We can fix this by just relying on rev-parse to tell us when we're not in a repository, which fixes the vulnerability. Furthermore, we'll ask its --is-bare-repository function to tell us if we're bare or not, and rely on that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-22 16:39:48 -07:00
Elijah Newren	2b86c10084	merge-ort: fix bug with dir rename vs change dir to symlink When changing a directory to a symlink on one side of history, and renaming the parent of that directory to a different directory name on the other side, e.g. with this kind of setup: Base commit: Has a file named dir/subdir/file Side1: Rename dir/ -> renamed-dir/ Side2: delete dir/subdir/file, add dir/subdir as symlink Then merge-ort was running into an assertion failure: git: merge-ort.c:2622: apply_directory_rename_modifications: Assertion `ci->dirmask == 0' failed merge-recursive did not have as obvious an issue handling this case, likely because we never fixed it to handle the case from commit `902c521a35` ("t6423: more involved directory rename test", 2020-10-15) where we need to be careful about nested renames when a directory rename occurs (dir/ -> renamed-dir/ implies dir/subdir/ -> renamed-dir/subdir/). However, merge-recursive does have multiple problems with this testcase: * Incorrect stages for the file: merge-recursive omits the stage in the index corresponding to the base stage, making `git status` report "added by us" for renamed-dir/subdir/file instead of the expected "deleted by them". * Poor directory/file conflict handling: For the renamed-dir/subdir symlink, instead of reporting a file/directory conflict as expected, it reports "Error: Refusing to lose untracked file at renamed-dir/subdir". This is a lie because there is no untracked file at that location. It then does the normal suboptimal merge-recursive thing of having the symlink be tracked in the index at a location where it can't be written due to D/F conflicts (namely, renamed-dir/subdir), but writes it to the working tree at a different location as a new untracked file (namely, renamed-dir/subdir~B^0) Technically, these problems don't prevent the user from resolving the merge if they can figure out to ignore the confusion, but because both pieces of output are quite confusing I don't want to modify the test to claim the recursive also passes it even if it doesn't have the bug that ort did. So, fix the bug in ort by splitting the conflict_info for "dir/subdir" into two, one for the directory part, one for the file (i.e. symlink) part, since the symlink is being renamed by directory rename detection. The directory part is needed for proper nesting, since there are still conflict_info fields for files underneath it (though those are marked as is_null, they are still present until the entries are processed, and the entry processing wants every non-toplevel entry to have a parent directory). Reported-by: Stefano Rivera <stefano@rivera.za.net> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-22 16:10:33 -07:00
Jeff King	193430717a	repack: drop remove_temporary_files() After we've successfully finished the repack, we call remove_temporary_files(), which looks for and removes any files matching ".tmp-$$-pack-", where $$ is the pid of the current process. But this is pointless. If we make it this far in the process, we've already renamed these tempfiles into place, and there is nothing left to delete. Nor is there a point in trying to call it to clean up when we _aren't_ successful. It's not safe for using in a signal handler, and the previous commit already handed that job over to the tempfile API. It might seem like it would be useful to clean up stray .tmp files left by other invocations of git-repack. But it won't clean those files; it only matches ones with its pid, and leaves the rest. Fortunately, those are cleaned up naturally by successive calls to git-repack; we'll consider .tmp-.pack the same as normal packfiles, so "repack -ad", etc, will roll up their contents and eventually delete them. The one case that could matter is if pack-objects generates an extension we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current code will quietly delete such a file, while after this patch we'd leave it in place. In practice this doesn't happen, and would be indicative of a bug. Leaving the file as cruft is arguably a better behavior, as it means somebody is more likely to eventually notice and fix the bug. If we really wanted to be paranoid, we could scan for and warn about such files, but that seems like overkill. There's nothing to test with regard to the removal of this function. It was doing nothing, so the behavior should be the same. However, we can verify (and protect) our assumption that "repack -ad" will eventually remove stray files by adding a test for that. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 18:03:52 -07:00
Jeff King	9cf10d8786	repack: use tempfiles for signal cleanup When git-repack exits due to a signal, it tries to clean up by calling its remove_temporary_files() function, which walks through the packs dir looking for ".tmp-$$-pack-" files to delete (where "$$" is the pid of the current process). The biggest problem here is that remove_temporary_files() is not safe to call in a signal handler. It uses opendir(), which isn't on the POSIX async-signal-safe list. The details will be platform-specific, but a likely issue is that it needs to allocate memory; if we receive a signal while inside malloc(), etc, we'll conflict on the allocator lock and deadlock with ourselves. We can fix this by just cleaning up the files directly, without walking the directory. We already know the complete list of .tmp- files that were generated, because we recorded them via populate_pack_exts(). When we find files there, we can use register_tempfile() to record the filenames. If we receive a signal, then the tempfile API will clean them up for us, and it's async-safe and pretty battle-tested. Note that this is slightly racier than the existing scheme. We don't record the filenames until pack-objects tells us the hash over stdout. So during the period between it generating the file and reporting the hash, we'd fail to clean up. However, that period is very small. During most of the pack generation process pack-objects is using its own internal tempfiles. It's only at the very end that it moves them into the names git-repack expects, and then it immediately reports the name to us. Given that cleanup like this is best effort (after all, we may get SIGKILL), this level of race is acceptable. When we register the tempfiles, we'll record them locally and use the result to call rename_tempfile(), rather than renaming by hand. This isn't strictly necessary, as once we've renamed the files they're gone, and the tempfile API's cleanup unlink() would simply become a pointless noop. But managing the lifetimes of the tempfile objects is the cleanest thing to do, and the tempfile pointers naturally fill the same role as the old booleans. This patch also fixes another small problem. We only hook signals, and don't set up an atexit handler. So if we see an error that causes us to die(), we'll leave the .tmp-* files in place. But since the tempfile API handles this for us, this is now fixed for free. The new test covers this by stimulating a failure of pack-objects when generating a cruft pack. Before this patch, the .tmp-* file for the main pack would have been left, but now we correctly clean it up. Two small subtleties on the implementation: - in the renaming loop, we can stop re-constructing fname_old; we only use it when we have a tempfile to rename, so we can just ask the tempfile for its path (which, barring bugs, should be identical) - when renaming fails, our error message mentions fname_old. But since a failed rename_tempfile() invalidates the tempfile struct, we'll lose access to that string. Instead, let's mention the destination filename, which is what most other callers do. Reported-by: Jan Pokorný <poki@fnusa.cz> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 18:03:52 -07:00
Jeff King	a4880b20cc	repack: expand error message for missing pack files If pack-objects tells us it generated pack $hash, we expect to find .tmp-$$-pack-$hash.pack, .idx, .rev, and so on. Some of these files are optional, but others are not. For the required ones, we'll bail with an error if any of them is missing. The error message is just "missing required file", which is a bit vague. We should be more clear that it is not the user's fault, but rather that the sub-pgoram we called is not operating as expected. In practice, nobody should ever see this message, as it would generally only be caused by a bug in Git. It probably doesn't make sense to convert this to a BUG(), though, as there are other (unlikely) possibilities, such as somebody else racily deleting the files, filesystem errors causing stat() to fail, and so on. A nice side effect here is that we stop relying on fname_old in this code path, which will let us deal with it only in the first part of the conditional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 18:03:52 -07:00
Jeff King	b639606fd0	repack: populate extension bits incrementally After generating the main pack and then any additional cruft packs, we iterate over the "names" list (which contains hashes of packs generated by pack-objects), and call populate_pack_exts() for each. There's one small problem with this. In repack_promisor_objects(), we may add entries to "names" and call populate_pack_exts() for them. Calling it again is mostly just wasteful, as we'll stat() the filename with each possible extension, get the same result, and just overwrite our bits. So we could drop the call there, and leave the final loop to populate all of the bits. But instead, this patch does the reverse: drops the final loop, and teaches the other two sites to populate the bits as they add entries. This makes the code easier to reason about, as you never have to worry about when the util field is valid; it is always valid for each entry. It also serves my ulterior purpose: recording the generated filenames as soon as possible will make it easier for a future patch to use them for cleaning up from a failed operation. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 18:03:52 -07:00
Jeff King	d3d9c51973	repack: convert "names" util bitfield to array We keep a string_list "names" containing the hashes of packs generated on our behalf by pack-objects. The util field of each item is treated as a bitfield that tells us which extensions (.pack, .idx, .rev, etc) are present for each name. Let's switch this to allocating a real array. That will give us room in a future patch to store more data than just a single bit per extension. And it makes the code a little easier to read, as we avoid casting back and forth between uintptr_t and a void pointer. Since the only thing we're storing is an array, we could just allocate it directly. But instead I've put it into a named struct here. That further increases readability around the casts, and in particular helps differentiate us from other string_lists in the same file which use their util field differently. E.g., the existing_*_packs lists still do bit-twiddling, but their bits have different meaning than the ones in "names". This makes it hard to grep around the code to see how the util fields are used; now you can look for "generated_pack_data". Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 18:03:52 -07:00
Junio C Hamano	ce8529b2bb	diff: leave NEEDWORK notes in show_stats() function The previous step made an attempt to correctly compute display columns allocated and padded different parts of diffstat output. There are at least two known codepaths in the function that still mixes up display widths and byte length that need to be fixed. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 15:02:31 -07:00
Philippe Blain	1762382ab1	subtree: fix split after annotated tag was squashed merged The previous commit fixed a failure in 'git subtree merge --squash' when the previous squash-merge merged an annotated tag of the subtree repository which is missing locally. The same failure happens in 'git subtree split', either directly or when called by 'git subtree push', under the same circumstances: 'cmd_split' invokes 'find_existing_splits', which loops through previous commits and invokes 'git rev-parse' (via 'process_subtree_split_trailer') on the value of any 'git subtree-split' trailer it finds. This fails if this value is the hash of an annotated tag which is missing locally. Add a new optional argument 'repository' to 'cmd_split' and 'find_existing_splits', and invoke 'cmd_split' with that argument from 'cmd_push'. This allows 'process_subtree_split_trailer' to try to fetch the missing tag from the 'repository' if it's not available locally, mirroring the new behaviour of 'git subtree pull' and 'git subtree merge'. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:06 -07:00
Philippe Blain	0d330673d4	subtree: fix squash merging after annotated tag was squashed merged When 'git subtree merge --squash $ref' is invoked, either directly or through 'git subtree pull --squash $repo $ref', the code looks for the latest squash merge of the subtree in order to create the new merge commit as a child of the previous squash merge. This search is done in function 'process_subtree_split_trailer', invoked by 'find_latest_squash', which looks for the most recent commit with a 'git-subtree-split' trailer; that trailer's value is the object name in the subtree repository of the ref that was last squash-merged. The function verifies that this object is present locally with 'git rev-parse', and aborts if it's not. The hash referenced by the 'git-subtree-split' trailer is guaranteed to correspond to a commit since it is the result of running 'git rev-parse -q --verify "$1^{commit}"' on the first argument of 'cmd_merge' (this corresponds to 'rev' in 'cmd_merge' which is passed through to 'new_squash_commit' and 'squash_msg'). But this is only the case since `e4f8baa88a` (subtree: parse revs in individual cmd_ functions, 2021-04-27), which went into Git 2.32. Before that commit, 'cmd_merge' verified the revision it was given using 'git rev-parse --revs-only "$@"'. Such an invocation, when fed the name of an annotated tag, would return the hash of the tag, not of the commit referenced by the tag. This leads to a failure in 'find_latest_squash' when squash-merging if the most recent squash-merge merged an annotated tag of the subtree repository, using a pre-2.32 version of 'git subtree', unless that previous annotated tag is present locally (which is not usually the case). We can fix this by fetching the object directly by its hash in 'process_subtree_split_trailer' when 'git rev-parse' fails, but in order to do so we need to know the name or URL of the subtree repository. This is not possible in general for 'git subtree merge', but is easy when it is invoked through 'git subtree pull' since in that case the subtree repository is passed by the user at the command line. Allow the 'git subtree pull' scenario to work out-of-the-box by adding an optional 'repository' argument to functions 'cmd_merge', 'find_latest_squash' and 'process_subtree_split_trailer', and invoke 'cmd_merge' with that 'repository' argument in 'cmd_pull'. If 'repository' is absent in 'process_subtree_split_trailer', instruct the user to try fetching the missing object directly. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:06 -07:00
Philippe Blain	f10d31cf2d	subtree: process 'git-subtree-split' trailer in separate function Both functions 'find_latest_squash' (called by 'git subtree merge --squash' and 'git subtree split --rejoin') and 'find_existing_splits' (called by git 'subtree split') loop through commits that have a 'git-subtree-dir' trailer, and then process the 'git-subtree-mainline' and 'git-subtree-split' trailers for those commits. The processing done for the 'git-subtree-split' trailer is simple: we check if the object exists with 'rev-parse' and set the variable 'sub' to the object name, or we die if the object does not exist. In a future commit we will add more steps to the processing of this trailer in order to make the code more robust. To reduce code duplication, move the processing of the 'git-subtree-split' trailer to a dedicated function, 'process_subtree_split_trailer'. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:06 -07:00
Philippe Blain	7990142eb1	subtree: use named variables instead of "$@" in cmd_pull 'cmd_pull' already checks that only two arguments are given, 'repository' and 'ref'. Define variables with these names instead of using the positional parameter $2 and "$@". This will allow a subsequent commit to pass 'repository' to 'cmd_merge'. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:06 -07:00
Philippe Blain	34ab458cb1	subtree: define a variable before its first use in 'find_latest_squash' The function 'find_latest_squash' takes a single argument, 'dir', but a debug statement uses this variable before it takes its value from $1. This statement thus gets the value of 'dir' from the calling function, which currently is the same as the 'dir' argument, so it works but it is confusing. Move the definition of 'dir' before its first use. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:05 -07:00
Philippe Blain	5626a9e2a9	subtree: prefix die messages with 'fatal' Just as was done in `0008d12284` (submodule: prefix die messages with 'fatal', 2021-07-10) for 'git-submodule.sh', make the 'die' messages outputed by 'git-subtree.sh' more in line with the rest of the code base by prefixing them with "fatal: ", and do not capitalize their first letter. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:05 -07:00
Philippe Blain	2e94339fdc	subtree: add 'die_incompatible_opt' function to reduce duplication `9a3e3ca2ba` (subtree: be stricter about validating flags, 2021-04-27) added validation code to check that options given to 'git subtree <cmd>' made sense with the command being used. Refactor these checks by adding a 'die_incompatible_opt' function to reduce code duplication. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:05 -07:00
Philippe Blain	a50fcc13dd	subtree: use 'git rev-parse --verify [--quiet]' for better error messages There are three occurences of 'git rev-parse <rev>' in 'git-subtree.sh' where the command expects a revision and the script dies or exits if the revision can't be found. In that case, the error message from 'git rev-parse' is: $ git rev-parse <bad rev> <bad rev> fatal: ambiguous argument '<bad rev>': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' This is a little confusing to the user, since this error message is outputed by 'git subtree'. At these points in the script, we know that we are looking for a single revision, so be explicit by using '--verify', resulting in a little better error message: $ git rev-parse --verify <bad rev> fatal: Needed a single revision In the two occurences where we 'die' if 'git rev-parse' fails, 'git subtree' outputs "could not rev-parse split hash $b from commit $sq", so we actually do not need the supplementary error message from 'git rev-parse'; add '--quiet' to silence it. In the third occurence, we 'exit', so keep the error message from 'git rev-parse'. Note that this messsage is still suboptimal since it can be understood to mean that 'git rev-parse' did not receive a single revision as argument, which is not the case here: the command did receive a single revision, but the revision is not resolvable to an available object. The alternative would be to use '--' after the revision, as suggested by the first error message, resulting in a clearer error message: $ git rev-parse <bad rev> -- fatal: bad revision '<bad rev>' Unfortunately we can't use that syntax because in the more common case of the revision resolving to a known object, the command outputs the object's hash, a newline, and the dashdash, which breaks the 'git subtree' script. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:05 -07:00
Philippe Blain	455f0adf57	test-lib-functions: mark 'test_commit' variables as 'local' Some variables in 'test_commit' have names that are common enough that it is very likely that test authors might use them in a test. If they do so and use 'test_commit' between setting such a variable and using it, the variable value from 'test_commit' will leak back into the test and most likely break it. Prevent that by marking all variables in 'test_commit' as 'local'. This allow a subsequent commit to use a 'tag' variable. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 13:51:05 -07:00
SZEDER Gábor	3dc6b4e027	Documentation/build-docdep.perl: generate sorted output To make sure that our manpages are rebuilt when any of the included source files change and only the affected manpages are rebuilt, 'build-docdep.perl' scans our documentation source files for include directives, and outputs 'make' dependencies to be included by 'Documentation/Makefile'. This script relies on Perl's hash data structures, and generates its output while iterating over them, and since hashes in Perl are very much unordered, the output varies greatly from run to run, both the order of targets and the order of dependencies of each target. This lack of ordering doesn't matter for 'make', because it cares neither about the order of targets in a Makefile nor about the order of a target's dependencies. However, it does matter to developers looking into build issues potentially involving these generated dependencies, as it's rather hard to tell whether there are any relevant (i.e. not order-only) changes among the dependencies compared to the previous run. So let's make 'build-docdep.perl's output stable and ordered by sorting the keys of the hashes before iterating over them. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-21 11:39:38 -07:00

... 2 3 4 5 6 ...

68633 Commits