git-commit-vandalism

Author	SHA1	Message	Date
Derrick Stolee	561b583749	t6012: make rev-list tests more interesting As we are working to rewrite some of the revision-walk machinery, there could easily be some interesting interactions between the options that force topological constraints (--topo-order, --date-order, and --author-date-order) along with specifying a path. Add extra tests to t6012-rev-list-simplify.sh to add coverage of these interactions. To ensure interesting things occur, alter the repo data shape to have different orders depending on topo-, date-, or author-date-order. When testing using GIT_TEST_COMMIT_GRAPH, this assists in covering the new logic for topo-order walks using generation numbers. The extra tests can be added indepently. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	b45424181e	revision.c: generation-based topo-order algorithm The current --topo-order algorithm requires walking all reachable commits up front, topo-sorting them, all before outputting the first value. This patch introduces a new algorithm which uses stored generation numbers to incrementally walk in topo-order, outputting commits as we go. This can dramatically reduce the computation time to write a fixed number of commits, such as when limiting with "-n <N>" or filling the first page of a pager. When running a command like 'git rev-list --topo-order HEAD', Git performed the following steps: 1. Run limit_list(), which parses all reachable commits, adds them to a linked list, and distributes UNINTERESTING flags. If all unprocessed commits are UNINTERESTING, then it may terminate without walking all reachable commits. This does not occur if we do not specify UNINTERESTING commits. 2. Run sort_in_topological_order(), which is an implementation of Kahn's algorithm. It first iterates through the entire set of important commits and computes the in-degree of each (plus one, as we use 'zero' as a special value here). Then, we walk the commits in priority order, adding them to the priority queue if and only if their in-degree is one. As we remove commits from this priority queue, we decrement the in-degree of their parents. 3. While we are peeling commits for output, get_revision_1() uses pop_commit on the full list of commits computed by sort_in_topological_order(). In the new algorithm, these three steps correspond to three different commit walks. We run these walks simultaneously, and advance each only as far as necessary to satisfy the requirements of the 'higher order' walk. We know when we can pause each walk by using generation numbers from the commit- graph feature. Recall that the generation number of a commit satisfies: * If the commit has at least one parent, then the generation number is one more than the maximum generation number among its parents. * If the commit has no parent, then the generation number is one. There are two special generation numbers: * GENERATION_NUMBER_INFINITY: this value is 0xffffffff and indicates that the commit is not stored in the commit-graph and the generation number was not previously calculated. * GENERATION_NUMBER_ZERO: this value (0) is a special indicator to say that the commit-graph was generated by a version of Git that does not compute generation numbers (such as v2.18.0). Since we use generation_numbers_enabled() before using the new algorithm, we do not need to worry about GENERATION_NUMBER_ZERO. However, the existence of GENERATION_NUMBER_INFINITY implies the following weaker statement than the usual we expect from generation numbers: If A and B are commits with generation numbers gen(A) and gen(B) and gen(A) < gen(B), then A cannot reach B. Thus, we will walk in each of our stages until the "maximum unexpanded generation number" is strictly lower than the generation number of a commit we are about to use. The walks are as follows: 1. EXPLORE: using the explore_queue priority queue (ordered by maximizing the generation number), parse each reachable commit until all commits in the queue have generation number strictly lower than needed. During this walk, update the UNINTERESTING flags as necessary. 2. INDEGREE: using the indegree_queue priority queue (ordered by maximizing the generation number), add one to the in- degree of each parent for each commit that is walked. Since we walk in order of decreasing generation number, we know that discovering an in-degree value of 0 means the value for that commit was not initialized, so should be initialized to two. (Recall that in-degree value "1" is what we use to say a commit is ready for output.) As we iterate the parents of a commit during this walk, ensure the EXPLORE walk has walked beyond their generation numbers. 3. TOPO: using the topo_queue priority queue (ordered based on the sort_order given, which could be commit-date, author- date, or typical topo-order which treats the queue as a LIFO stack), remove a commit from the queue and decrement the in-degree of each parent. If a parent has an in-degree of one, then we add it to the topo_queue. Before we decrement the in-degree, however, ensure the INDEGREE walk has walked beyond that generation number. The implementations of these walks are in the following methods: * explore_walk_step and explore_to_depth * indegree_walk_step and compute_indegrees_to_depth * next_topo_commit and expand_topo_walk These methods have some patterns that may seem strange at first, but they are probably carry-overs from their equivalents in limit_list and sort_in_topological_order. One thing that is missing from this implementation is a proper way to stop walking when the entire queue is UNINTERESTING, so this implementation is not enabled by comparisions, such as in 'git rev-list --topo-order A..B'. This can be updated in the future. In my local testing, I used the following Git commands on the Linux repository in three modes: HEAD~1 with no commit-graph, HEAD~1 with a commit-graph, and HEAD with a commit-graph. This allows comparing the benefits we get from parsing commits from the commit-graph and then again the benefits we get by restricting the set of commits we walk. Test: git rev-list --topo-order -100 HEAD HEAD~1, no commit-graph: 6.80 s HEAD~1, w/ commit-graph: 0.77 s HEAD, w/ commit-graph: 0.02 s Test: git rev-list --topo-order -100 HEAD -- tools HEAD~1, no commit-graph: 9.63 s HEAD~1, w/ commit-graph: 6.06 s HEAD, w/ commit-graph: 0.06 s This speedup is due to a few things. First, the new generation- number-enabled algorithm walks commits on order of the number of results output (subject to some branching structure expectations). Since we limit to 100 results, we are running a query similar to filling a single page of results. Second, when specifying a path, we must parse the root tree object for each commit we walk. The previous benefits from the commit-graph are entirely from reading the commit-graph instead of parsing commits. Since we need to parse trees for the same number of commits as before, we slow down significantly from the non-path-based query. For the test above, I specifically selected a path that is changed frequently, including by merge commits. A less-frequently-changed path (such as 'README') has similar end-to-end time since we need to walk the same number of commits (before determining we do not have 100 hits). However, get the benefit that the output is presented to the user as it is discovered, much the same as a normal 'git log' command (no '--topo-order'). This is an improved user experience, even if the command has the same runtime. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	5284fc5cc9	commit/revisions: bookkeeping before refactoring There are a few things that need to move around a little before making a big refactoring in the topo-order logic: 1. We need access to record_author_date() and compare_commits_by_author_date() in revision.c. These are used currently by sort_in_topological_order() in commit.c. 2. Moving these methods to commit.h requires adding an author_date_slab declaration to commit.h. Consumers will need their own implementation. 3. The add_parents_to_list() method in revision.c performs logic around the UNINTERESTING flag and other special cases depending on the struct rev_info. Allow this method to ignore a NULL 'list' parameter, as we will not be populating the list for our walk. Also rename the method to the slightly more generic name process_parents() to make clear that this method does more than add to a list (and no list is required anymore). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	f0d9cc4196	revision.c: begin refactoring --topo-order logic When running 'git rev-list --topo-order' and its kin, the topo_order setting in struct rev_info implies the limited setting. This means that the following things happen during prepare_revision_walk(): * revs->limited implies we run limit_list() to walk the entire reachable set. There are some short-cuts here, such as if we perform a range query like 'git rev-list COMPARE..HEAD' and we can stop limit_list() when all queued commits are uninteresting. * revs->topo_order implies we run sort_in_topological_order(). See the implementation of that method in commit.c. It implies that the full set of commits to order is in the given commit_list. These two methods imply that a 'git rev-list --topo-order HEAD' command must walk the entire reachable set of commits _twice_ before returning a single result. If we have a commit-graph file with generation numbers computed, then there is a better way. This patch introduces some necessary logic redirection when we are in this situation. In v2.18.0, the commit-graph file contains zero-valued bytes in the positions where the generation number is stored in v2.19.0 and later. Thus, we use generation_numbers_enabled() to check if the commit-graph is available and has non-zero generation numbers. When setting revs->limited only because revs->topo_order is true, only do so if generation numbers are not available. There is no reason to use the new logic as it will behave similarly when all generation numbers are INFINITY or ZERO. In prepare_revision_walk(), if we have revs->topo_order but not revs->limited, then we trigger the new logic. It breaks the logic into three pieces, to fit with the existing framework: 1. init_topo_walk() fills a new struct topo_walk_info in the rev_info struct. We use the presence of this struct as a signal to use the new methods during our walk. In this patch, this method simply calls limit_list() and sort_in_topological_order(). In the future, this method will set up a new data structure to perform that logic in-line. 2. next_topo_commit() provides get_revision_1() with the next topo- ordered commit in the list. Currently, this simply pops the commit from revs->commits. 3. expand_topo_walk() provides get_revision_1() with a way to signal walking beyond the latest commit. Currently, this calls add_parents_to_list() exactly like the old logic. While this commit presents method redirection for performing the exact same logic as before, it allows the next commit to focus only on the new logic. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	d6b40712dc	test-reach: add rev-list tests The rev-list command is critical to Git's functionality. Ensure it works in the three commit-graph environments constructed in t6600-test-reach.sh. Here are a few important types of rev-list operations: * Basic: git rev-list --topo-order HEAD * Range: git rev-list --topo-order compare..HEAD * Ancestry: git rev-list --topo-order --ancestry-path compare..HEAD * Symmetric Difference: git rev-list --topo-order compare...HEAD Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	4b47a9a85e	test-reach: add run_three_modes method The 'test_three_modes' method assumes we are using the 'test-tool reach' command for our test. However, we may want to use the data shape of our commit graph and the three modes (no commit-graph, full commit-graph, partial commit-graph) for other git commands. Split test_three_modes to be a simple translation on a more general run_three_modes method that executes the given command and tests the actual output to the expected output. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:22 +09:00
Derrick Stolee	aca4240f6a	prio-queue: add 'peek' operation When consuming a priority queue, it can be convenient to inspect the next object that will be dequeued without actually dequeueing it. Our existing library did not have such a 'peek' operation, so add it as prio_queue_peek(). Add a reference-level comparison in t/helper/test-prio-queue.c so this method is exercised by t0009-prio-queue.sh. Further, add a test that checks the behavior when the compare function is NULL (i.e. the queue becomes a stack). Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 12:14:21 +09:00
SZEDER Gábor	0f0c51181d	travis-ci: install packages in 'ci/install-dependencies.sh' Ever since we started using Travis CI, we specified the list of packages to install in '.travis.yml' via the APT addon. While running our builds on Travis CI's container-based infrastructure we didn't have another choice, because that environment didn't support 'sudo', and thus we didn't have permission to install packages ourselves. With the switch to the VM-based infrastructure in the previous patch we do get a working 'sudo', so we can install packages by running 'sudo apt-get -y install ...' as well. Let's make use of this and install necessary packages in 'ci/install-dependencies.sh', so all the dependencies (i.e. both packages and "non-packages" (P4 and Git-LFS)) are handled in the same file. Install gcc-8 only in the 'linux-gcc' build job; so far it has been unnecessarily installed in the 'linux-clang' build job as well. Print the versions of P4 and Git-LFS conditionally, i.e. only when they have been installed; with this change even the static analysis and documentation build jobs start using 'ci/install-dependencies.sh' to install packages, and neither of these two build jobs depend on and thus install those. This change will presumably be beneficial for the upcoming Azure Pipelines integration [1]: preliminary versions of that patch series run a couple of 'apt-get' commands to install the necessary packages before running 'ci/install-dependencies.sh', but with this patch it will be sufficient to run only 'ci/install-dependencies.sh'. [1] https://public-inbox.org/git/1a22efe849d6da79f2c639c62a1483361a130238.1539598316.git.gitgitgadget@gmail.com/ Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 11:28:19 +09:00
Johannes Schindelin	11aad46432	tests: optionally skip `git rebase -p` tests The `--preserve-merges` mode of the `rebase` command is slated to be deprecated soon, as the more powerful `--rebase-merges` mode is available now, and the latter was designed with the express intent to address the shortcomings of `--preserve-merges`' design (e.g. the inability to reorder commits in an interactive rebase). As such, we will eventually even remove the `--preserve-merges` support, and along with it, its tests. In preparation for this, and also to allow the Windows phase of our automated tests to save some well-needed time when running the test suite, this commit introduces a new prerequisite REBASE_P, which can be forced to being unmet by setting the environment variable `GIT_TEST_SKIP_REBASE_P` to any non-empty string. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 11:27:30 +09:00
Johannes Schindelin	5ee4ef8bda	t3418: decouple test cases from a previous `rebase -p` test case It is in general a good idea for regression test cases to be as independent of each other as possible (with the one exception of an initial `setup` test case, which is only a test case in Git's test suite because it does not have a notion of a fixture or setup). This patch addresses one particular instance of this principle being violated: a few test cases in t3418-rebase-continue.sh depend on a side effect of a test case that verifies a specific `rebase -p` behavior. The later test cases should, however, still succeed even if the `rebase -p` test case is skipped. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 11:27:30 +09:00
Johannes Schindelin	6c8fbae619	t3404: decouple some test cases from outcomes of previous test cases Originally, the `--preserve-merges` option of the `git rebase` command piggy-backed on top of the `--interactive` feature. For that reason, the early test cases were added to the very same test script that contains the `git rebase -i` tests: `t3404-rebase-interactive.sh`. However, since `c42abfe785` (rebase: introduce a dedicated backend for --preserve-merges, 2018-05-28), the `--preserve-merges` feature got its own backend, in preparation for converting the rest of the `--interactive` code to built-in code, written in C rather than shell. The reason why the `--preserve-merges` feature was not converted at the same time is that we have something much better now: `--rebase-merges`. That option intends to supersede `--preserve-merges`, and we will probably deprecate the latter soon. Once `--preserve-merges` has been deprecated for a good amount of time, it will be time to remove it, and along with it, its tests. In preparation for that, let's make the rest of the test cases in `t3404-rebase-interactive.sh` independent of the test cases dedicated to `--preserve-merges`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 11:27:30 +09:00
Junio C Hamano	d582ea202b	Eighth batch for 2.20 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 11:04:59 +09:00
Junio C Hamano	b17ca8f94d	rebase: apply cocci patch Favor oideq() over !oidcmp() when checking for equality. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-02 11:04:59 +09:00
Junio C Hamano	85fcf1cbb6	Merge branch 'js/rebase-i-shortopt' "git rebase -i" learned to take 'b' as the short form of 'break' option in the todo list. * js/rebase-i-shortopt: rebase -i: recognize short commands without arguments	2018-11-02 11:04:59 +09:00
Junio C Hamano	789b1f7042	Merge branch 'js/rebase-i-break' "git rebase -i" learned a new insn, 'break', that the user can insert in the to-do list. Upon hitting it, the command returns control back to the user. * js/rebase-i-break: rebase -i: introduce the 'break' command rebase -i: clarify what happens on a failed `exec`	2018-11-02 11:04:58 +09:00
Junio C Hamano	b78c5fe96c	Merge branch 'js/rebase-autostash-fix' "git rebase" that has recently been rewritten in C had a few issues in its "--autstash" feature, which have been corrected. * js/rebase-autostash-fix: rebase --autostash: fix issue with dirty submodules rebase --autostash: demonstrate a problem with dirty submodules rebase (autostash): use an explicit OID to apply the stash rebase (autostash): store the full OID in <state-dir>/autostash rebase (autostash): avoid duplicate call to state_dir_path()	2018-11-02 11:04:58 +09:00
Junio C Hamano	9d322282e4	Merge branch 'cb/printf-empty-format' Build fix for a topic in flight. * cb/printf-empty-format: sequencer: cleanup for gcc warning in non developer mode	2018-11-02 11:04:57 +09:00
Junio C Hamano	7ce32f72e3	Merge branch 'jc/rebase-in-c-5-test-typofix' Typofix. * jc/rebase-in-c-5-test-typofix: rebase: fix typoes in error messages	2018-11-02 11:04:57 +09:00
Junio C Hamano	ee5e90434e	Merge branch 'pk/rebase-in-c-6-final' The final step of rewriting "rebase -i" in C. * pk/rebase-in-c-6-final: rebase: default to using the builtin rebase	2018-11-02 11:04:56 +09:00
Junio C Hamano	4b3517ee9d	Merge branch 'js/rebase-in-c-5.5-work-with-rebase-i-in-c' "rebase" that has been rewritten learns the new calling convention used by "rebase -i" that was rewritten in C, tying the loose end between two GSoC topics that stomped on each other's toes. * js/rebase-in-c-5.5-work-with-rebase-i-in-c: builtin rebase: prepare for builtin rebase -i	2018-11-02 11:04:56 +09:00
Junio C Hamano	fd1a9e903f	Merge branch 'pk/rebase-in-c-5-test' Rewrite "git rebase" in C. * pk/rebase-in-c-5-test: builtin rebase: error out on incompatible option/mode combinations builtin rebase: use no-op editor when interactive is "implied" builtin rebase: show progress when connected to a terminal builtin rebase: fast-forward to onto if it is a proper descendant builtin rebase: optionally pass custom reflogs to reset_head() builtin rebase: optionally auto-detect the upstream	2018-11-02 11:04:55 +09:00
Junio C Hamano	0293121717	Merge branch 'pk/rebase-in-c-4-opts' Rewrite "git rebase" in C. * pk/rebase-in-c-4-opts: builtin rebase: support --root builtin rebase: add support for custom merge strategies builtin rebase: support `fork-point` option merge-base --fork-point: extract libified function builtin rebase: support --rebase-merges[=[no-]rebase-cousins] builtin rebase: support `--allow-empty-message` option builtin rebase: support `--exec` builtin rebase: support `--autostash` option builtin rebase: support `-C` and `--whitespace=<type>` builtin rebase: support `--gpg-sign` option builtin rebase: support `--autosquash` builtin rebase: support `keep-empty` option builtin rebase: support `ignore-date` option builtin rebase: support `ignore-whitespace` option builtin rebase: support --committer-date-is-author-date builtin rebase: support --rerere-autoupdate builtin rebase: support --signoff builtin rebase: allow selecting the rebase "backend"	2018-11-02 11:04:55 +09:00
Junio C Hamano	39f7331545	Merge branch 'pk/rebase-in-c-3-acts' Rewrite "git rebase" in C. * pk/rebase-in-c-3-acts: builtin rebase: stop if `git am` is in progress builtin rebase: actions require a rebase in progress builtin rebase: support --edit-todo and --show-current-patch builtin rebase: support --quit builtin rebase: support --abort builtin rebase: support --skip builtin rebase: support --continue	2018-11-02 11:04:54 +09:00
Junio C Hamano	e0720a3867	Merge branch 'pk/rebase-in-c-2-basic' Rewrite "git rebase" in C. * pk/rebase-in-c-2-basic: builtin rebase: support `git rebase <upstream> <switch-to>` builtin rebase: only store fully-qualified refs in `options.head_name` builtin rebase: start a new rebase only if none is in progress builtin rebase: support --force-rebase builtin rebase: try to fast forward when possible builtin rebase: require a clean worktree builtin rebase: support the `verbose` and `diffstat` options builtin rebase: support --quiet builtin rebase: handle the pre-rebase hook and --no-verify builtin rebase: support `git rebase --onto A...B` builtin rebase: support --onto	2018-11-02 11:04:53 +09:00
Junio C Hamano	b49ef560ed	Merge branch 'ag/rebase-i-in-c' Rewrite of the remaining "rebase -i" machinery in C. * ag/rebase-i-in-c: rebase -i: move rebase--helper modes to rebase--interactive rebase -i: remove git-rebase--interactive.sh rebase--interactive2: rewrite the submodes of interactive rebase in C rebase -i: implement the main part of interactive rebase as a builtin rebase -i: rewrite init_basic_state() in C rebase -i: rewrite write_basic_state() in C rebase -i: rewrite the rest of init_revisions_and_shortrevisions() in C rebase -i: implement the logic to initialize $revisions in C rebase -i: remove unused modes and functions rebase -i: rewrite complete_action() in C t3404: todo list with commented-out commands only aborts sequencer: change the way skip_unnecessary_picks() returns its result sequencer: refactor append_todo_help() to write its message to a buffer rebase -i: rewrite checkout_onto() in C rebase -i: rewrite setup_reflog_action() in C sequencer: add a new function to silence a command, except if it fails rebase -i: rewrite the edit-todo functionality in C editor: add a function to launch the sequence editor rebase -i: rewrite append_todo_help() in C sequencer: make three functions and an enum from sequencer.c public	2018-11-02 11:04:53 +09:00
Junio C Hamano	5ae50845d8	Merge branch 'pk/rebase-in-c' Rewrite of the "rebase" machinery in C. * pk/rebase-in-c: builtin/rebase: support running "git rebase <upstream>" rebase: refactor common shell functions into their own file rebase: start implementing it as a builtin	2018-11-02 11:04:52 +09:00
Minh Nguyen	232b63c440	l10n: vi.po: fix typo in pack-objects Signed-off-by: Tran Ngoc Quan <vnwildman@gmail.com>	2018-11-01 13:26:08 +07:00
Junio C Hamano	e198b3a740	fetch: replace string-list used as a look-up table with a hashmap In find_non_local_tags() helper function (used to implement the "follow tags"), we use string_list_has_string() on two string lists as a way to see if a refname has already been processed, etc. All this code predates more modern in-core lookup API like hashmap; replace them with two hashmaps and one string list---the hashmaps are used for look-ups and the string list is to keep the order of items in the returned result stable (which is the only single thing hashmap does worse than lookups on string-list). Similarly, get_ref_map() uses another string-list as a look-up table for existing refs. Replace it with a hashmap. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 14:37:13 +09:00
Andreas Gruenbacher	5221048092	rev-parse: clear --exclude list after 'git rev-parse --all' Commit [1] added the --exclude option to revision.c. The --all, --branches, --tags, --remotes, and --glob options clear the exclude list. Shortly therafter, commit [2] added the same to 'git rev-parse', but without clearing the exclude list for the --all option. [1] `e7b432c52` ("revision: introduce --exclude=<glob> to tame wildcards", 2013-08-30) [2] `9dc01bf06` ("rev-parse: introduce --exclude=<glob> to tame wildcards", 2013-11-01) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 14:36:36 +09:00
Jonathan Tan	5400b2a2d9	fetch-pack: be more precise in parsing v2 response Each section in a protocol v2 response is followed by either a DELIM packet (indicating more sections to follow) or a FLUSH packet (indicating none to follow). But when parsing the "acknowledgments" section, do_fetch_pack_v2() is liberal in accepting both, but determines whether to continue reading or not based solely on the contents of the "acknowledgments" section, not on whether DELIM or FLUSH was read. There is no issue with a protocol-compliant server, but can result in confusing error messages when communicating with a server that serves unexpected additional sections. Consider a server that sends "new-section" after "acknowledgments": - client writes request - client reads the "acknowledgments" section which contains no "ready", then DELIM - since there was no "ready", client needs to continue negotiation, and writes request - client reads "new-section", and reports to the end user "expected 'acknowledgments', received 'new-section'" For the person debugging the involved Git implementation(s), the error message is confusing in that "new-section" was not received in response to the latest request, but to the first one. One solution is to always continue reading after DELIM, but in this case, we can do better. We know from the protocol that "ready" means at least the packfile section is coming (hence, DELIM) and that no "ready" means that no sections are to follow (hence, FLUSH). So teach process_acks() to enforce this. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 14:30:22 +09:00
Phillip Wood	4d010a757c	sequencer: use read_author_script() Use the new function added in the last commit to read the author script, updating read_env_script() and read_author_ident(). We now have a single code path that reads the author script for am and all flavors of rebase. This changes the behavior of read_env_script() as previously it would set any environment variables that were in the author-script file. Now it is an error if the file contains other variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and GIT_AUTHOR_DATE are missing. This is what am and the non interactive version of rebase have been doing for several years so hopefully it will not cause a problem for interactive rebase users. The advantage is that we are reusing existing code from am which uses sq_dequote() to properly dequote variables. This fixes potential problems with user edited scripts as read_env_script() which did not track quotes properly. This commit also removes the fallback code for checking for a broken author script after git is upgraded when a rebase is stopped. Now that the parsing uses sq_dequote() it will reliably return an error if the quoting is broken and the user will have to abort the rebase and restart. This isn't ideal but it's a corner case and the detection of the broken quoting could be confused by user edited author scripts. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 12:08:06 +09:00
Phillip Wood	bcd33ec25f	add read_author_script() to libgit Add read_author_script() to sequencer.c based on the implementation in builtin/am.c and update read_am_author_script() to use read_author_script(). The sequencer code that reads the author script will be updated in the next commit. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 12:08:06 +09:00
Phillip Wood	a75d351388	am: rename read_author_script() Rename read_author_script() in preparation for adding a shared read_author_script() function to libgit. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 12:08:06 +09:00
Phillip Wood	442c36bd08	am: improve author-script error reporting If there are errors in a user edited author-script there was no indication of what was wrong. This commit adds some specific error messages depending on the problem. It also relaxes the requirement that the variables appear in a specific order in the file to match the behavior of 'rebase --interactive'. Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 12:08:06 +09:00
Phillip Wood	28224c2359	am: don't die in read_author_script() The caller is already prepared to handle errors returned from this function so there is no need for it to die if it cannot read the file. Suggested-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-01 12:08:06 +09:00
Antonio Ospite	2b1257e463	t/helper: add test-submodule-nested-repo-config Add a test tool to exercise config_from_gitmodules(), in particular for the case of nested submodules. Add also a test to document that reading the submoudles config of nested submodules does not work yet when the .gitmodules file is not in the working tree but it still in the index. This is because the git API does not always make it possible access the object store of an arbitrary repository (see get_oid() usage in config_from_gitmodules()). When this git limitation gets fixed the aforementioned use case will be supported too. Signed-off-by: Antonio Ospite <ao2@ao2.it> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 15:01:30 +09:00
Antonio Ospite	76e9bdc437	submodule: support reading .gitmodules when it's not in the working tree When the .gitmodules file is not available in the working tree, try using the content from the index and from the current branch. This covers the case when the file is part of the repository but for some reason it is not checked out, for example because of a sparse checkout. This makes it possible to use at least the 'git submodule' commands which read the gitmodules configuration file without fully populating the working tree. Writing to .gitmodules will still require that the file is checked out, so check for that before calling config_set_in_gitmodules_file_gently. Add a similar check also in git-submodule.sh::cmd_add() to anticipate the eventual failure of the "git submodule add" command when .gitmodules is not safely writeable; this prevents the command from leaving the repository in a spurious state (e.g. the submodule repository was cloned but .gitmodules was not updated because config_set_in_gitmodules_file_gently failed). Moreover, since config_from_gitmodules() now accesses the global object store, it is necessary to protect all code paths which call the function against concurrent access to the global object store. Currently this only happens in builtin/grep.c::grep_submodules(), so call grep_read_lock() before invoking code involving config_from_gitmodules(). Finally, add t7418-submodule-sparse-gitmodules.sh to verify that reading from .gitmodules succeeds and that writing to it fails when the file is not checked out. NOTE: there is one rare case where this new feature does not work properly yet: nested submodules without .gitmodules in their working tree. This has been documented with a warning and a test_expect_failure item in t7814, and in this case the current behavior is not altered: no config is read. Signed-off-by: Antonio Ospite <ao2@ao2.it> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 15:01:30 +09:00
Jeff King	0afbe3e806	read_istream_pack_non_delta(): document input handling Twice now we have scratched our heads about why the loose streaming code needs the protection added by `692f0bc7ae` (avoid infinite loop in read_istream_loose, 2013-03-25), but the similar code in its pack counterpart does not. The short answer is that use_pack() will die before it lets us run out of bytes. Note that this could mean reading garbage (including the trailing hash) from the packfile in some cases of corruption, but that's OK. zlib will notice and complain (and if not, certainly the end result will not match the object hash we expect). Let's leave a comment this time to document our findings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 14:32:30 +09:00
Jeff King	6a139cdd74	ls-remote: pass heads/tags prefixes to transport Unlike its arbitrary text patterns, the --heads and --tags options to ls-remote are true prefixes. We can pass this information to the transport code. If the v2 protocol is in use, that will reduce the size of the ref advertisement. Note that the test added here succeeds both before and after the patch. This is an optimization, not a bug-fix; it's just making sure we didn't break anything. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 13:40:11 +09:00
Jeff King	631f0f8c4b	ls-remote: do not send ref prefixes for patterns Since `b4be74105f` (ls-remote: pass ref prefixes when requesting a remote's refs, 2018-03-15), "ls-remote foo" will pass "refs/heads/foo", "refs/tags/foo", etc to the transport code in an attempt to let the other side reduce the size of its advertisement. Unfortunately this is not correct, as ls-remote patterns do not follow the usual ref lookup rules, and are in fact tail-matched. So we could find "refs/heads/foo" or "refs/heads/a/much/deeper/foo" or even "refs/another/hierarchy/foo". Since we can't pass a prefix and there's not yet a v2 extension for matching wildcards, we must disable this feature to keep the same behavior as v1. Reported-by: Jon Simons <jon@jonsimons.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 13:40:09 +09:00
Junio C Hamano	18ad13e5b2	Adjust for 2.19.x series * jk/detect-truncated-zlib-input cat-file: handle streaming failures consistently check_stream_sha1(): handle input underflow t1450: check large blob in trailing-garbage test	2018-10-31 13:12:12 +09:00
Jeff King	98f425b453	cat-file: handle streaming failures consistently There are three ways to convince cat-file to stream a blob: - cat-file -p $blob - cat-file blob $blob - echo $batch \| cat-file --batch In the first two, we simply exit with the error code of streaw_blob_to_fd(). That means that an error will cause us to exit with "-1" (which we try to avoid) without printing any kind of error message (which is confusing to the user). Instead, let's match the third case, which calls die() on an error. Unfortunately we cannot be more specific, as stream_blob_to_fd() does not tell us whether the problem was on reading (e.g., a corrupt object) or on writing (e.g., ENOSPC). That might be an opportunity for future work, but for now we will at least exit with a sane message and exit code. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 13:05:26 +09:00
Jeff King	ccdc4819d5	check_stream_sha1(): handle input underflow This commit fixes an infinite loop when fscking large truncated loose objects. The check_stream_sha1() function takes an mmap'd loose object buffer and streams 4k of output at a time, checking its sha1. The loop quits when we've output enough bytes (we know the size from the object header), or when zlib tells us anything except Z_OK or Z_BUF_ERROR. The latter is expected because zlib may run out of room in our 4k buffer, and that is how it tells us to process the output and loop again. But Z_BUF_ERROR also covers another case: one in which zlib cannot make forward progress because it needs more _input_. This should never happen in this loop, because though we're streaming the output, we have the entire deflated input available in the mmap'd buffer. But since we don't check this case, we'll just loop infinitely if we do see a truncated object, thinking that zlib is asking for more output space. It's tempting to fix this by checking stream->avail_in as part of the loop condition (and quitting if all of our bytes have been consumed). But that assumes that once zlib has consumed the input, there is nothing left to do. That's not necessarily the case: it may have read our input into its internal state, but still have bytes to output. Instead, let's continue on Z_BUF_ERROR only when we see the case we're expecting: the previous round filled our output buffer completely. If it didn't (and we still saw Z_BUF_ERROR), we know something is wrong and should break out of the loop. The bug comes from commit `f6371f9210` (sha1_file: add read_loose_object() function, 2017-01-13), which reimplemented some of the existing loose object functions. So it's worth checking if this bug was inherited from any of those. The answers seems to be no. The two obvious candidates are both OK: 1. unpack_sha1_rest(); this doesn't need to loop on Z_BUF_ERROR at all, since it allocates the expected output buffer in advance (which we can't do since we're explicitly streaming here) 2. check_object_signature(); the streaming path relies on the istream interface, which uses read_istream_loose() for this case. That function uses a similar "is our output buffer full" check with Z_BUF_ERROR (which is where I stole it from for this patch!) Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 13:05:26 +09:00
Jeff King	5632baf238	t1450: check large blob in trailing-garbage test Commit `cce044df7f` (fsck: detect trailing garbage in all object types, 2017-01-13) added two tests of trailing garbage in a loose object file: one with a commit and one with a blob. The point of having two is that blobs would follow a different code path that streamed the contents, instead of loading it into a buffer as usual. At the time, merely being a blob was enough to trigger the streaming code path. But since `7ac4f3a007` (fsck: actually fsck blob data, 2018-05-02), we now only stream blobs that are actually large. So since then, the streaming code path is not tested at all for this case. We can restore the original intent of the test by tweaking core.bigFileThreshold to make our small blob seem large. There's no easy way to externally verify that we followed the streaming code path, but I did check before/after using a temporary debug statement. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 12:53:44 +09:00
Johannes Schindelin	ff8978d533	mingw: fix isatty() after dup2() Since `a9b8a09c3c` (mingw: replace isatty() hack, 2016-12-22), we handle isatty() by special-casing the stdin/stdout/stderr file descriptors, caching the return value. However, we missed the case where dup2() overrides the respective file descriptor. That poses a problem e.g. where the `show` builtin asks for a pager very early, the `setup_pager()` function sets the pager depending on the return value of `isatty()` and then redirects stdout. Subsequently, `cmd_log_init_finish()` calls `setup_pager()` again. What should happen now is that `isatty()` reports that stdout is not a TTY and consequently stdout should be left alone. Let's override dup2() to handle this appropriately. This fixes https://github.com/git-for-windows/git/issues/1077 Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 12:48:59 +09:00
Johannes Schindelin	0e218f91c2	mingw: unset PERL5LIB by default Git for Windows ships with its own Perl interpreter, and insists on using it, so it will most likely wreak havoc if PERL5LIB is set before launching Git. Let's just unset that environment variables when spawning processes. To make this feature extensible (and overrideable), there is a new config setting `core.unsetenvvars` that allows specifying a comma-separated list of names to unset before spawning processes. Reported by Gabriel Fuhrmann. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 12:46:32 +09:00
Johannes Schindelin	bdfbb0ea93	config: move Windows-specific config settings into compat/mingw.c Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 12:46:27 +09:00
Johannes Schindelin	70fc5793df	config: allow for platform-specific core.* config settings In the Git for Windows project, we have ample precendent for config settings that apply to Windows, and to Windows only. Let's formalize this concept by introducing a platform_core_config() function that can be #define'd in a platform-specific manner. This will allow us to contain platform-specific code better, as the corresponding variables no longer need to be exported so that they can be defined in environment.c and be set in config.c Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 12:46:21 +09:00
Johannes Schindelin	409670f34e	config: rename `dummy` parameter to `cb` in git_default_config() This is the convention elsewhere (and prepares for the case where we may need to pass callback data). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 12:45:30 +09:00
Johannes Schindelin	fe21c6b285	mingw: reencode environment variables on the fly (UTF-16 <-> UTF-8) On Windows, the authoritative environment is encoded in UTF-16. In Git for Windows, we convert that to UTF-8 (because UTF-16 is such a foreign idea to Git that its source code is unprepared for it). Previously, out of performance concerns, we converted the entire environment to UTF-8 in one fell swoop at the beginning, and upon putenv() and run_command() converted it back. Having a private copy of the environment comes with its own perils: when a library used by Git's source code tries to modify the environment, it does not really work (in Git for Windows' case, libcurl, see https://github.com/git-for-windows/git/compare/bcad1e6d58^...bcad1e6d58^2 for a glimpse of the issues). Hence, it makes our environment handling substantially more robust if we switch to on-the-fly-conversion in `getenv()`/`putenv()` calls. Based on an initial version in the MSVC context by Jeff Hostetler, this patch makes it so. Surprisingly, this has a positive effect on speed: at the time when the current code was written, we tested the performance, and there were so many `getenv()` calls that it seemed better to convert everything in one go. In the meantime, though, Git has obviously been cleaned up a bit with regards to `getenv()` calls so that the Git processes spawned by the test suite use an average of only 40 `getenv()`/`putenv()` calls over the process lifetime. Speaking of the entire test suite: the total time spent in the re-encoding in the current code takes about 32.4 seconds (out of 113 minutes runtime), whereas the code introduced in this patch takes only about 8.2 seconds in total. Not much, but it proves that we need not be concerned about the performance impact introduced by this patch. Helped-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-31 11:43:14 +09:00

... 7 8 9 10 11 ...

54177 Commits