git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	b69bed22c5	Merge branch 'jk/log-cherry-pick-duplicate-patches' When more than one commit with the same patch ID appears on one side, "git log --cherry-pick A...B" did not exclude them all when a commit with the same patch ID appears on the other side. Now it does. * jk/log-cherry-pick-duplicate-patches: patch-ids: handle duplicate hashmap entries	2021-01-25 14:19:19 -08:00
Jeff King	c9e3a4e76d	patch-ids: handle duplicate hashmap entries This fixes a bug introduced in `dfb7a1b4d0` (patch-ids: stop using a hand-rolled hashmap implementation, 2016-07-29) in which git rev-list --cherry-pick A...B will fail to suppress commits reachable from A even if a commit with matching patch-id appears in B. Around the time of that commit, the algorithm for "--cherry-pick" looked something like this: 0. Traverse all of the commits, marking them as being on the left or right side of the symmetric difference. 1. Iterate over the left-hand commits, inserting a patch-id struct for each into a hashmap, and pointing commit->util to the patch-id struct. 2. Iterate over the right-hand commits, checking which are present in the hashmap. If so, we exclude the commit from the output _and_ we mark the patch-id as "seen". 3. Iterate again over the left-hand commits, checking whether commit->util->seen is set; if so, exclude them from the output. At the end, we'll have eliminated commits from both sides that have a matching patch-id on the other side. But there's a subtle assumption here: for any given patch-id, we must have exactly one struct representing it. If two commits from A both have the same patch-id and we allow duplicates in the hashmap, then we run into a problem: a. In step 1, we insert two patch-id structs into the hashmap. b. In step 2, our lookups will find only one of these structs, so only one "seen" flag is marked. c. In step 3, one of the commits in A will have its commit->util->seen set, but the other will not. We'll erroneously output the latter. Prior to `dfb7a1b4d0`, our hashmap did not allow duplicates. Afterwards, it used hashmap_add(), which explicitly does allow duplicates. At that point, the solution would have been easy: when we are about to add a duplicate, skip doing so and return the existing entry which matches. But it gets more complicated. In `683f17ec44` (patch-ids: replace the seen indicator with a commit pointer, 2016-07-29), our step 3 goes away entirely. Instead, in step 2, when the right-hand side finds a matching patch_id from the left-hand side, we can directly mark the left-hand patch_id->commit to be omitted. Solving that would be easy, too; there's a one-to-many relationship of patch-ids to commits, so we just need to keep a list. But there's more. Commit `b3dfeebb92` (rebase: avoid computing unnecessary patch IDs, 2016-07-29) built on that by lazily computing the full patch-ids. So we don't even know when adding to the hashmap whether two commits truly have the same id. We'd have to tentatively assign them a list, and then possibly split them apart (possibly into N new structs) at the moment we compute the real patch-ids. This could work, but it's complicated and error-prone. Instead, let's accept that we may store duplicates, and teach the lookup side to be more clever. Rather than asking for a single matching patch-id, it will need to iterate over all matching patch-ids. This does mean examining every entry in a single hash bucket, but the worst-case for a hash lookup was already doing that. We'll keep the hashmap details out of the caller by providing a simple iteration interface. We can retain the simple has_commit_patch_id() interface for the other callers, but we'll simplify its return value into an integer, rather than returning the patch_id struct. That way they won't be tempted to look at the "commit" field of the return value without iterating. Reported-by: Arnaud Morin <arnaud.morin@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-12 11:13:32 -08:00
Johannes Schindelin	1550bb6ed0	t6[0-3]: adjust the references to the default branch name "main" Carefully excluding t6300, which sees independent development elsewhere at the time of writing, we use `main` as the default branch name in t6[0-3]. This trick was performed via $ (cd t && sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \ -e 's/Master/Main/g' -- t6[0-3].sh && git checkout HEAD -- t6300\) This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main` for those tests. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:18 -08:00
Johannes Schindelin	334afbc76f	tests: mark tests relying on the current default for `init.defaultBranch` In addition to the manual adjustment to let the `linux-gcc` CI job run the test suite with `master` and then with `main`, this patch makes sure that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts that currently rely on the initial branch name being `master by default. To determine which test scripts to mark up, the first step was to force-set the default branch name to `master` in - all test scripts that contain the keyword `master`, - t4211, which expects `t/t4211/history.export` with a hard-coded ref to initialize the default branch, - t5560 because it sources `t/t556x_common` which uses `master`, - t8002 and t8012 because both source `t/annotate-tests.sh` which also uses `master`) This trick was performed by this command: $ sed -i '/^ \. \.\/$test-lib\\|lib-\(bash\\|cvs\\|git-svn$\\|gitweb-lib\)\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' $(git grep -l master t/t[0-9].sh) \ t/t4211.sh t/t5560.sh t/t8002.sh t/t8012.sh After that, careful, manual inspection revealed that some of the test scripts containing the needle `master` do not actually rely on a specific default branch name: either they mention `master` only in a comment, or they initialize that branch specificially, or they do not actually refer to the current default branch. Therefore, the aforementioned modification was undone in those test scripts thusly: $ git checkout HEAD -- \ t/t0027-auto-crlf.sh t/t0060-path-utils.sh \ t/t1011-read-tree-sparse-checkout.sh \ t/t1305-config-include.sh t/t1309-early-config.sh \ t/t1402-check-ref-format.sh t/t1450-fsck.sh \ t/t2024-checkout-dwim.sh \ t/t2106-update-index-assume-unchanged.sh \ t/t3040-subprojects-basic.sh t/t3301-notes.sh \ t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \ t/t3436-rebase-more-options.sh \ t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \ t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \ t/t5511-refspec.sh t/t5526-fetch-submodules.sh \ t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \ t/t5548-push-porcelain.sh \ t/t5552-skipping-fetch-negotiator.sh \ t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \ t/t5614-clone-submodules-shallow.sh \ t/t7508-status.sh t/t7606-merge-custom.sh \ t/t9302-fast-import-unpack-limit.sh We excluded one set of test scripts in these commands, though: the range of `git p4` tests. The reason? `git p4` stores the (foreign) remote branch in the branch called `p4/master`, which is obviously not the default branch. Manual analysis revealed that only five of these tests actually require a specific default branch name to pass; They were modified thusly: $ sed -i '/^ \. \.\/lib-git-p4\.sh$/i\ GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\ ' t/t980[0167].sh t/t9811*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-19 15:44:17 -08:00
Stefan Beller	9c5b2fab30	tests: fix diff order arguments in test_cmp Fix the argument order for test_cmp. When given the expected result first the diff shows the actual output with '+' and the expectation with '-', which is the convention for our tests. Signed-off-by: Stefan Beller <sbeller@google.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-10-07 10:56:08 +09:00
Jacob Keller	96415b49dc	name-rev: add support to exclude refs by pattern match Extend git-name-rev to support excluding refs which match shell patterns using --exclude. These patterns can be used to limit the scope of refs by excluding any ref that matches one of the --exclude patterns. A ref will only be used for naming when it matches at least one --refs pattern but does not match any of the --exclude patterns. Thus, --exclude patterns are given precedence over --refs patterns. For example, suppose you wish to name a series of commits based on an official release tag of the form "v" but excluding any pre-release tags which match "rc". You can use the following to do so: git name-rev --refs="v" --exclude="rc" --all Add tests and update Documentation for this change. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-23 18:33:17 -08:00
Jacob Keller	290be6674a	name-rev: extend --refs to accept multiple patterns Teach git name-rev to take multiple --refs stored as a string list of patterns. The list of patterns will be matched inclusively, and each ref only needs to match one pattern to be included. A ref will only be excluded if it does not match any of the given patterns. Additionally, if any of the patterns would allow abbreviation, then we will abbreviate the ref, even if another pattern is more strict and would not have allowed abbreviation on its own. Add tests and documentation for this change. The tests expected output is dynamically generated. This is in order to avoid hard-coding a commit object name in the test results (as the expected output is to simply leave the commit object unnamed). Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2017-01-23 18:33:17 -08:00
Kevin Willford	b3dfeebb92	rebase: avoid computing unnecessary patch IDs The `rebase` family of Git commands avoid applying patches that were already integrated upstream. They do that by using the revision walking option that computes the patch IDs of the two sides of the rebase (local-only patches vs upstream-only ones) and skipping those local patches whose patch ID matches one of the upstream ones. In many cases, this causes unnecessary churn, as already the set of paths touched by a given commit would suffice to determine that an upstream patch has no local equivalent. This hurts performance in particular when there are a lot of upstream patches, and/or large ones. Therefore, let's introduce the concept of a "diff-header-only" patch ID, compare those first, and only evaluate the "full" patch ID lazily. Please note that in contrast to the "full" patch IDs, those "diff-header-only" patch IDs are prone to collide with one another, as adjacent commits frequently touch the very same files. Hence we now have to be careful to allow multiple hash entries with the same hash. We accomplish that by using the hashmap_add() function that does not even test for hash collisions. This also allows us to evaluate the full patch ID lazily, i.e. only when we found commits with matching diff-header-only patch IDs. We add a performance test that demonstrates ~1-6% improvement. In practice this will depend on various factors such as how many upstream changes and how big those changes are along with whether file system caches are cold or warm. As Git's test suite has no way of catching performance regressions, we also add a regression test that verifies that the full patch ID computation is skipped when the diff-header-only computation suffices. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-11 14:39:16 -07:00
Michael J Gruber	b388e14b89	rev-list --count: separate count for --cherry-mark When --count is used with --cherry-mark, omit the patch equivalent commits from the count for left and right commits and print the count of equivalent commits separately. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-04-26 13:13:20 -07:00
Michael J Gruber	fe3b59e595	t6007: test rev-list --cherry Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-03-09 13:50:55 -08:00
Michael J Gruber	cb56e3093a	rev-list: documentation and test for --cherry-mark Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-03-09 13:50:54 -08:00
Michael J Gruber	59c8afdf47	rev-list: documentation and test for --left/right-only Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-21 16:34:05 -08:00
Michael J Gruber	e0b9c34f7e	t6007: Make sure we test --cherry-pick Test 5 wants to test --cherry-pick but limits by pathspec in such a way that there are no commits on the left side of the range. Add a test without "--cherry-pick" which displays this, and add two more commits and another test which tests what we're after. This also shortens the last test. Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-02-21 16:33:08 -08:00
Thomas Rast	f69c501832	rev-list: introduce --count option Add a --count option that, instead of actually listing the commits, merely counts them. This is mostly geared towards script use, and to this end it acts specially when used with --left-right: it outputs the left and right counts separately. Previously, scripts would have to run a shell loop or small inline script over to achieve the same. (Without --left-right, a simple \|wc -l does the job.) Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-06-12 09:39:06 -07:00
Johannes Schindelin	023756f4eb	revision walker: --cherry-pick is a limited operation We used to rely on the fact that cherry-pick would trigger the code path to set limited = 1 in handle_commit(), when an uninteresting commit was encountered. However, when cherry picking between two independent branches, i.e. when there are no merge bases, and there is only linear development (which can happen when you cvsimport a fork of a project), no uninteresting commit will be encountered. So set limited = 1 when --cherry-pick was asked for. Noticed by Martin Bähr. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-09-15 16:34:11 -07:00
Johannes Schindelin	36d56de649	Fix --cherry-pick with given paths If you say --cherry-pick, you do not want to see patches which are in the upstream. If you specify paths with that, what you usually expect is that only those parts of the patches are looked at which actually touch the given paths. With this patch, that expectation is met. Noticed by Sam Vilain. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-07-11 14:59:31 -07:00

16 Commits