git-commit-vandalism

Author	SHA1	Message	Date
Jonathan Tan	a7e67c11b8	clone: check connectivity even if clone is partial The commit that introduced the partial clone feature - `548719fbdc` ("clone: partial clone", 2017-12-08) - excluded connectivity checks for partial clones, but this also meant that it is possible for a clone to succeed, yet not have all objects either present or promised. Specifically, if cloning with --filter=blob:none from a repository that has a tag pointing to a blob, and the blob is not sent in the packfile, the clone will pass, even if the blob is not referenced by any tree in the packfile. Turn on connectivity checks for partial clone. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-09 12:37:38 -07:00
Jonathan Tan	a0c9016abd	upload-pack: send refs' objects despite "filter" A filter line in a request to upload-pack filters out objects regardless of whether they are directly referenced by a "want" line or not. This means that cloning with "--filter=blob:none" (or another filter that excludes blobs) from a repository with at least one ref pointing to a blob (for example, the Git repository itself) results in output like the following: error: missing object referenced by 'refs/tags/junio-gpg-pub' and if that particular blob is not referenced by a fetched tree, the resulting clone fails fsck because there is no object from the remote to vouch that the missing object is a promisor object. Update both the protocol and the upload-pack implementation to include all explicitly specified "want" objects in the packfile regardless of the filter specification. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-09 12:37:38 -07:00
brian m. carlson	fa29f36d99	docs: correct RFC specifying email line length The git send-email documentation specifies RFC 2821 (the SMTP RFC) as providing line length limits, but the specification that restricts line length to 998 octets is RFC 2822 (the email message format RFC). Since RFC 2822 has been obsoleted by RFC 5322, update the text to refer to RFC 5322 instead of RFC 2821. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-09 10:55:12 -07:00
brian m. carlson	e67a228cd8	send-email: automatically determine transfer-encoding git send-email, when invoked without a --transfer-encoding option, sends 8bit data without a MIME version or a transfer encoding. This has several downsides. First, unless the transfer encoding is specified, it defaults to 7bit, meaning that non-ASCII data isn't allowed. Second, if lines longer than 998 bytes are used, we will send an message that is invalid according to RFC 5322. The --validate option, which is the default, catches this issue, but it isn't clear to many people how to resolve this. To solve these issues, default the transfer encoding to "auto", so that we explicitly specify 8bit encoding when lines don't exceed 998 bytes and quoted-printable otherwise. This means that we now always emit Content-Transfer-Encoding and MIME-Version headers, so remove the conditionals from this portion of the code. It is unlikely that the unconditional inclusion of these two headers will affect the deliverability of messages in anything but a positive way, since MIME is already widespread and well understood by most email programs. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-09 10:55:12 -07:00
brian m. carlson	f2d06fb13f	send-email: accept long lines with suitable transfer encoding With --validate (which is the default), we warn about lines exceeding 998 characters due to the limits specified in RFC 5322. However, if we're using a suitable transfer encoding (quoted-printable or base64), we're guaranteed not to have lines exceeding 76 characters, so there's no need to fail in this case. The auto transfer encoding handles this specific case, so accept it as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-09 10:55:12 -07:00
brian m. carlson	7a36987fff	send-email: add an auto option for transfer encoding For most patches, using a transfer encoding of 8bit provides good compatibility with most servers and makes it as easy as possible to view patches. However, there are some patches for which 8bit is not a valid encoding: RFC 5322 specifies that a message must not have lines exceeding 998 octets. Add a transfer encoding value, auto, which indicates that a patch should use 8bit where allowed and quoted-printable otherwise. Choose quoted-printable instead of base64, since base64-encoded plain text is treated as suspicious by some spam filters. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-09 10:55:12 -07:00
Kana Natsuno	1ab631647e	userdiff: support new keywords in PHP hunk header Recent version of PHP supports interface, trait, abstract class and final class. This patch fixes the PHP hunk header regexp to support all of these keywords. Signed-off-by: Kana Natsuno <dev@whileimautomaton.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-06 14:59:28 -07:00
Kana Natsuno	9992fbd7a1	t4018: add missing test cases for PHP A later patch changes the built-in PHP pattern. These test cases demonstrate aspects of the pattern that we do not want to change. Signed-off-by: Kana Natsuno <dev@whileimautomaton.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-06 14:56:42 -07:00
Elijah Newren	327ac9cb9d	t6036: add lots of detail for directory/file conflicts in recursive case There was a discussion of problematic directory/file conflicts with virtual merge bases on the mailing list years ago at https://public-inbox.org/git/AANLkTimwUQafGDrjxWrfU9uY1uKoFLJhxYs=vssOPqdf@mail.gmail.com/ Part of these corresponding tests made it into this testsuite. However, the more problematic one didn't. And there are others that showcase the problems even more. Add a very lengthy explanation, some of it from that email, describing the tradeoffs in picking a recursive merge-base when you're dealing with an add/add directory/file conflict. The solution picked years ago is relatively good, but there is the potential to do even better, assuming we're willing to pay a certain performance cost. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-06 14:45:26 -07:00
Beat Bolli	6aaded5509	builtin/config: work around an unsized array forward declaration As reported here[0], Microsoft Visual Studio 2017.2 and "gcc -pedantic" don't understand the forward declaration of an unsized static array. They insist on an array size: d:\git\src\builtin\config.c(70,46): error C2133: 'builtin_config_options': unknown size The thread [1] explains that this is due to the single-pass nature of old compilers. To work around this error, introduce the forward-declared function usage_builtin_config() instead that uses the array builtin_config_options only after it has been defined. Also use this function in all other places where usage_with_options() is called with the same arguments. [0]: https://github.com/git-for-windows/git/issues/1735 [1]: https://groups.google.com/forum/#!topic/comp.lang.c.moderated/bmiF2xMz51U Fixes https://github.com/git-for-windows/git/issues/1735 Reported-By: Karen Huang (via GitHub) Signed-off-by: Beat Bolli <dev+git@drbeat.li> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-06 12:31:53 -07:00
Tobias Klauser	2e9957525e	git-rebase--preserve-merges: fix formatting of todo help message Part of the todo help message in git-rebase--preserve-merges.sh is unnecessarily indented, making the message look weird. Remove the extra lines and trailing indent. This was a minor regression introduced by `d48f97aa` ("rebase: reindent function git_rebase__interactive", 2018-03-23) in the 2.18 timeframe. The same issue exists in "rebase -i", but it is being addressed separately as part of the rewrite of the subcommand into C. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Reviewed-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-06 12:09:27 -07:00
Jeff King	5e834a4f39	t5500: prettify non-commit tag tests We don't need to use backslash continuation, as the "&&" already provides continuation (and happily soaks up empty lines between commands). We can also expand the multi-line printf into a here-document, which lets us use line breaks more naturally (and avoids another continuation that required us to break the natural indentation). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-06 10:52:02 -07:00
Mike Hommey	9d14ecf39d	fast-import: do not call diff_delta() with empty buffer We know diff_delta() returns NULL, saying "no good delta exists for it", when fed an empty data. Check the length of the data in the caller to avoid such a call. This incidentally reduces the number of attempted deltification we see in the final statistics. Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-06 09:46:12 -07:00
Taylor Blau	c707ded332	grep.c: extract show_line_header() The grep code invokes show_line() to display the contents of a matched or context line in its output. Part of this execution is to print a line header that includes information such as the kind, the line- and column-number and etc. of that match. To prepare for the addition of an option to print only the matching component(s) of a non-context line, we must prepare for the possibility that a single line may contain multiple matching parts, and thus will need multiple headers printed for a single line. Extracting show_line_header allows us to do just that. In the subsequent commit, it will be used within the colorization loop to print out only the matching parts of a line, optionally with LFs delimiting sub-matches. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 15:10:30 -07:00
Jonathan Tan	3390e42adb	fetch-pack: support negotiation tip whitelist During negotiation, fetch-pack eventually reports as "have" lines all commits reachable from all refs. Allow the user to restrict the commits sent in this way by providing a whitelist of tips; only the tips themselves and their ancestors will be sent. Both globs and single objects are supported. This feature is only supported for protocols that support connect or stateless-connect (such as HTTP with protocol v2). This will speed up negotiation when the repository has multiple relatively independent branches (for example, when a repository interacts with multiple repositories, such as with linux-next [1] and torvalds/linux [2]), and the user knows which local branch is likely to have commits in common with the upstream branch they are fetching. [1] https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next/ [2] https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 15:00:41 -07:00
Jonathan Tan	cf1e7c0770	fetch-pack: write shallow, then check connectivity When fetching, connectivity is checked after the shallow file is updated. There are 2 issues with this: (1) the connectivity check is only performed up to ancestors of existing refs (which is not thorough enough if we were deepening an existing ref in the first place), and (2) there is no rollback of the shallow file if the connectivity check fails. To solve (1), update the connectivity check to check the ancestry chain completely in the case of a deepening fetch by refraining from passing "--not --all" when invoking rev-list in connected.c. To solve (2), have fetch_pack() perform its own connectivity check before updating the shallow file. To support existing use cases in which "git fetch-pack" is used to download objects without much regard as to the connectivity of the resulting objects with respect to the existing repository, the connectivity check is only done if necessary (that is, the fetch is not a clone, and the fetch involves shallow/deepen functionality). "git fetch" still performs its own connectivity check, preserving correctness but sometimes performing redundant work. This redundancy is mitigated by the fact that fetch_pack() reports if it has performed a connectivity check itself, and if the transport supports connect or stateless-connect, it will bubble up that report so that "git fetch" knows not to perform the connectivity check in such a case. This was noticed when a user tried to deepen an existing repository by fetching with --no-shallow from a server that did not send all necessary objects - the connectivity check as run by "git fetch" succeeded, but a subsequent "git fsck" failed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:57:44 -07:00
Jeff King	e674eb2528	ref-filter: avoid backend filtering with --ignore-case When for-each-ref is used with --ignore-case, we expect match_name_as_path() to do a case-insensitive match. But there's an extra layer of filtering that happens before we even get there. Since commit `cfe004a5a9` (ref-filter: limit traversal to prefix, 2017-05-22), we feed the prefix to the ref backend so that it can optimize the ref iteration. There's no mechanism for us to tell the backend we're matching case-insensitively. Nor is there likely to be one anytime soon, since the packed backend relies on binary-searching the sorted list of refs. Let's just punt on this case. The extra filtering is an optimization that we simply can't do. We'll still give the correct answer via the filtering in match_name_as_path(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:49:37 -07:00
Aleksandr Makarov	639ab5efa1	for-each-ref: consistently pass WM_IGNORECASE flag The match_name_as_path() function learned to set WM_IGNORECASE in the "flags" field when the user passed --ignore-case. But it forgot to actually pass the flags to wildmatch()! As a result, the --ignore-case feature has been broken since it was added in `3bb16a8bf2` (tag, branch, for-each-ref: add --ignore-case for sorting and filtering, 2016-12-04). We didn't notice because we added tests only for git-branch and git-tag. Whereas git-for-each-ref has slightly different matching rules, and thus uses a different function (the related function match_pattern() does it correctly). Incidentally, this also caused clang's scan-build to complain about the code; the assignment to "flags" was dead code. Note that we can't flip the test in t6300 to expect_success yet. There's another bug, which will be dealt with in the next patch. Commit-message-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:49:15 -07:00
Jeff King	ee0f3e22c6	t6300: add a test for --ignore-case The --ignore-case option was added by `3bb16a8bf2` (tag, branch, for-each-ref: add --ignore-case for sorting and filtering, 2016-12-04), but it was never tested. And indeed, it does not work due to multiple bugs (which will be fixed in subsequent patches). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:49:13 -07:00
Elijah Newren	651f7f3a1b	t6042: add testcase covering long chains of rename conflicts Each rename is a lego: the source side could be connected to a delete or another rename, and the destination side could be connected to a rename or a conflicting add. Previous tests combined these to get e.g. rename/rename(1to2)/add/add, rename/rename(2to1)/delete/delete, and rename/add/delete. But we can also build bigger chains of conflicts. Add a testcase demonstrating this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:47:47 -07:00
Elijah Newren	eee73388f2	t6042: add testcase covering rename/rename(2to1)/delete/delete conflict If either side of a rename/rename(2to1) conflict is itself also involved in a rename/delete conflict, then the conflict is a little more complex; we can even have what I'd call a rename/rename(2to1)/delete/delete conflict. (In some ways, this is similar to a rename/rename(1to2)/add/add conflict, as added in commit `3672c97148` ("merge-recursive: Fix working copy handling for rename/rename/add/add", 2011-08-11)). Add a testcase for such a conflict. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:47:44 -07:00
Elijah Newren	11d9ade10e	t6042: add testcase covering rename/add/delete conflict type If a file is renamed on one side of history, and the other side of history both deletes the original file and adds a new unrelated file in the way of the rename, then we have what I call a rename/add/delete conflict. Add a testcase covering this scenario. Reported-by: Robert Dailey <rcdailey.lists@gmail.com> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:47:42 -07:00
Elijah Newren	451a3abc26	t6036: add a failed conflict detection case with conflicting types Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:43:43 -07:00
Elijah Newren	a79968bed1	t6036: add a failed conflict detection case with submodule add/add Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:43:43 -07:00
Elijah Newren	d4d1718080	t6036: add a failed conflict detection case with submodule modify/modify Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:43:42 -07:00
Elijah Newren	81f5a2ce7b	t6036: add a failed conflict detection case with symlink add/add Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:43:42 -07:00
Elijah Newren	c6d3dd5daf	t6036: add a failed conflict detection case with symlink modify/modify Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 14:43:42 -07:00
Elijah Newren	58f4d1b961	t6044: verify that merges expected to abort actually abort t6044 has lots of tests for verifying that merge will abort as expected when there are changes staged before the merge starts. However, it only checked for non-zero exit code, which could mean that the merge ran to completion with conflicts. Check that the merge was actually correctly aborted, i.e. that .git/MERGE_HEAD is not present. This changes one of the tests from expect_success to expect_failure. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 13:13:18 -07:00
Elijah Newren	1b9fbefbe0	index_has_changes(): avoid assuming operating on the_index Modify index_has_changes() to take a struct istate* instead of just operating on the_index. This is only a partial conversion, though, because we call do_diff_cache() which implicitly assumes work is to be done on the_index. Ongoing work is being done elsewhere to do the remainder of the conversion, and thus is not duplicated here. Instead, a simple check is put in place until that work is complete. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 13:13:18 -07:00
Elijah Newren	cffbfad50d	read-cache.c: move index_has_changes() from merge.c Since index_has_change() is an index-related function, move it to read-cache.c, only modifying it to avoid uses of the active_cache and active_nr macros. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 13:13:17 -07:00
Eric Sunshine	e7eb15faca	t7201: drop pointless "exit 0" at end of subshell This test employs a for-loop inside a subshell and correctly aborts the loop and fails the test overall (via "exit 1") if any iteration of the for-loop fails. Otherwise, it exits the subshell with an explicit but entirely unnecessary "exit 0", presumably to indicate that all iterations of the loop succeeded. The &&-chain is broken between the for-loop and the "exit 0". Rather than fixing the &&-chain, just drop the pointless "exit 0". Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:05 -07:00
Eric Sunshine	f1e1239811	t6036: fix broken "merge fails but has appropriate contents" tests These tests reference non-existent object "c" when they really mean to be referencing "C", however, these errors went unnoticed due to a broken &&-chain later in the tests. Fix these errors, as well as the broken &&-chains behind which they hid. Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:05 -07:00
Eric Sunshine	431f4a26b5	t5505: modernize and simplify hard-to-digest test This test uses a subshell within a subshell but is formatted in such a way as to suggests that the inner subshell is a sibling rather than a child, which makes it difficult to digest the test's structure and intent. Worse, the inner subshell performs cleanup of actions from earlier in the test, however, a failure between the initial actions and the cleanup will prevent the cleanup from taking place. Fix these problems by modernizing and simplifying the test and by using test_when_finished() for the cleanup action. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:05 -07:00
Eric Sunshine	fb23bd7af2	t5406: use write_script() instead of birthing shell script manually Take advantage of write_script() to abstract-away details of shell script creation, thus allowing the reader to focus on script content. Readability benefits, particularly in this case, since the script body was buried in a noisy one-liner subshell responsible for emitting boilerplate and body. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:04 -07:00
Eric Sunshine	fbd6ef273e	t5405: use test_must_fail() instead of checking exit code manually This test expects "git push" to fail, thus it manually inverts that local expected failure into a successful exit code for the test overall. In doing so, it intentionally breaks the &&-chain. Modernize by replacing manual exit code management with test_must_fail() and a normal &&-chain. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:04 -07:00
Eric Sunshine	e5d7e9f516	t/lib-submodule-update: fix "absorbing" test This test has been dysfunctional since it was added by `259f3ee296` (lib-submodule-update.sh: define tests for recursing into submodules, 2017-03-14), however, the problem went unnoticed due to a broken &&-chain. The test wants to verify that replacing a submodule containing a .git directory will absorb the .git directory into the .git/modules/ of the superproject, and then replace the working tree content appropriate to the superproject. It is, therefore, incorrect to check if the submodule content still exists since the submodule will have been replaced by the content of the superproject. Fix this by removing the submodule content check, which also happens to be the line that broke the &&-chain. While at it, fix broken &&-chains in a couple neighboring tests. Helped-by: Stefan Beller <sbeller@google.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:04 -07:00
Eric Sunshine	02779185d5	t: drop unnecessary terminating semicolon in subshell Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:04 -07:00
Eric Sunshine	ed6c994af4	t: use sane_unset() rather than 'unset' with broken &&-chain These tests intentionally break the &&-chain after using 'unset' since they don't know if 'unset' will succeed or fail and don't want a local 'unset' failure to fail the test overall. We can do better by using sane_unset(), which can be linked into the &&-chain as usual. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:04 -07:00
Eric Sunshine	0590ff26c4	t: use test_write_lines() instead of series of 'echo' commands These tests employ a noisy subshell (with missing &&-chain) to feed input into Git commands or files: (echo a; echo b; echo c) \| git some-command ... Simplify by taking advantage of test_write_lines(): test_write_lines a b c \| git some-command ... Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:04 -07:00
Eric Sunshine	8327974859	t: use test_might_fail() instead of manipulating exit code manually These tests manually coerce the exit code of invoked commands to "success" when they don't care if the command succeeds or fails since failure of those commands should not cause the test to fail overall. In doing so, they intentionally break the &&-chain. Modernize by replacing manual exit code management with test_might_fail() and a normal &&-chain. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 12:38:04 -07:00
Jameson Miller	8616a2d0cb	block alloc: add validations around cache_entry lifecyle Add an option (controlled by an environment variable) perform extra validations on mem_pool allocated cache entries. When set: 1) Invalidate cache_entry memory when discarding cache_entry. 2) When discarding index_state struct, verify that all cache_entries were allocated from expected mem_pool. 3) When discarding mem_pools, invalidate mem_pool memory. This should provide extra checks that mem_pools and their allocated cache_entries are being used as expected. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Jameson Miller	8e72d67529	block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Jameson Miller	0e58301d81	mem-pool: fill out functionality Add functions for: - combining two memory pools - determining if a memory address is within the range managed by a memory pool These functions will be used by future commits. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Jameson Miller	158dfeff3d	mem-pool: add life cycle management functions Add initialization and discard functions to mem_pool type. As the memory allocated by mem_pool can now be freed, we also track the large allocations. If the there are existing mp_blocks in the mem_poo's linked list of mp_blocksl, then the mp_block for a large allocation is inserted behind the head block. This is because only the head mp_block is considered when searching for availble space. This results in the following desirable properties: 1) The mp_block allocated for the large request will not be included not included in the search for available in future requests, the large mp_block is sized for the specific request and does not contain any spare space. 2) The head mp_block will not bumped from considation for future memory requests just because a request for a large chunk of memory came in. These changes are in preparation for a future commit that will utilize creating and discarding memory pool. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Jameson Miller	8fb8e3f636	mem-pool: only search head block for available space Instead of searching all memory blocks for available space to fulfill a memory request, only search the head block. If the head block does not have space, assume that previous block would most likely not be able to fulfill request either. This could potentially lead to more memory fragmentation, but also avoids searching memory blocks that probably will not be able to fulfill request. This pattern will benefit consumers that are able to generate a good estimate for how much memory will be needed, or if they are performing fixed sized allocations, so that once a block is exhausted it will never be able to fulfill a future request. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Jameson Miller	a849735bfb	block alloc: add lifecycle APIs for cache_entry structs It has been observed that the time spent loading an index with a large number of entries is partly dominated by malloc() calls. This change is in preparation for using memory pools to reduce the number of malloc() calls made to allocate cahce entries when loading an index. Add an API to allocate and discard cache entries, abstracting the details of managing the memory backing the cache entries. This commit does actually change how memory is managed - this will be done in a later commit in the series. This change makes the distinction between cache entries that are associated with an index and cache entries that are not associated with an index. A main use of cache entries is with an index, and we can optimize the memory management around this. We still have other cases where a cache entry is not persisted with an index, and so we need to handle the "transient" use case as well. To keep the congnitive overhead of managing the cache entries, there will only be a single discard function. This means there must be enough information kept with the cache entry so that we know how to discard them. A summary of the main functions in the API is: make_cache_entry: create cache entry for use in an index. Uses specified parameters to populate cache_entry fields. make_empty_cache_entry: Create an empty cache entry for use in an index. Returns cache entry with empty fields. make_transient_cache_entry: create cache entry that is not used in an index. Uses specified parameters to populate cache_entry fields. make_empty_transient_cache_entry: create cache entry that is not used in an index. Returns cache entry with empty fields. discard_cache_entry: A single function that knows how to discard a cache entry regardless of how it was allocated. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:27 -07:00
Jameson Miller	825ed4d9a0	read-cache: teach make_cache_entry to take object_id Teach make_cache_entry function to take object_id instead of a SHA-1. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:15 -07:00
Jameson Miller	768d796506	read-cache: teach refresh_cache_entry to take istate Refactor refresh_cache_entry() to work on a specific index, instead of implicitly using the_index. This is in preparation for making the make_cache_entry function apply to a specific index. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 10:58:15 -07:00
Ramsay Jones	fb16287719	fsck: check skiplist for object in fsck_blob() Since commit `ed8b10f631` ("fsck: check .gitmodules content", 2018-05-02), fsck will issue an error message for '.gitmodules' content that cannot be parsed correctly. This is the case, even when the corresponding blob object has been included on the skiplist. For example, using the cgit repository, we see the following: $ git fsck Checking object directories: 100% (256/256), done. error: bad config line 5 in blob .gitmodules error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob Checking objects: 100% (6626/6626), done. $ $ git config fsck.skiplist '.git/skip' $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip $ $ git fsck Checking object directories: 100% (256/256), done. error: bad config line 5 in blob .gitmodules Checking objects: 100% (6626/6626), done. $ Note that the error message issued by the config parser is still present, despite adding the object-id of the blob to the skiplist. One solution would be to provide a means of suppressing the messages issued by the config parser. However, given that (logically) we are asking fsck to ignore this object, a simpler approach is to just not call the config parser if the object is to be skipped. Add a check to the 'fsck_blob()' processing function, to determine if the object is on the skiplist and, if so, exit the function early. Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:49:44 -07:00
Jeff King	de6bd9e3ea	fsck: silence stderr when parsing .gitmodules If there's a parsing error we'll already report it via the usual fsck report() function (or not, if the user has asked to skip this object or warning type). The error message from the config parser just adds confusion. Let's suppress it. Note that we didn't test this case at all, so I've added coverage in t7415. We may end up toning down or removing this fsck check in the future. So take this test as checking what happens now with a focus on stderr, and not any ironclad guarantee that we must detect and report parse failures in the future. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-07-03 09:36:41 -07:00

... 11 12 13 14 15 ...

52760 Commits