git-commit-vandalism

Author	SHA1	Message	Date
Jeff King	56d5dde752	shortlog: parse trailer idents Trailers don't necessarily contain name/email identity values, so shortlog has so far treated them as opaque strings. However, since many trailers do contain identities, it's useful to treat them as such when they can be parsed. That lets "-e" work as usual, as well as mailmap. When they can't be parsed, we'll continue with the old behavior of treating them as a single string (there's no new test for that here, since the existing tests cover a trailer like this). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	87abb96222	shortlog: rename parse_stdin_ident() This function is actually useful for parsing any identity, whether from stdin or not. We'll need it for handling trailers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	f17b0b99bf	shortlog: de-duplicate trailer values The current documentation is vague about what happens with --group=trailer:signed-off-by when we see a commit with: Signed-off-by: One Signed-off-by: Two Signed-off-by: One We clearly should credit both "One" and "Two", but should "One" get credited twice? The current code does so, but mostly because that was the easiest thing to do. It's probably more useful to count each commit at most once. This will become especially important when we allow values from multiple sources in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	47beb37bc6	shortlog: match commit trailers with --group If a project uses commit trailers, this patch lets you use shortlog to see who is performing each action. For example, running: git shortlog -ns --group=trailer:reviewed-by in git.git shows who has reviewed. You can even use a custom format to see things like who has helped whom: git shortlog --format="...helped %an (%ad)" \ --group=trailer:helped-by Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	f0939a0eb1	trailer: add interface for iterating over commit trailers The trailer code knows how to parse out the trailers and re-format them, but there's no easy way to iterate over the trailers (you can use trailer_info, but you have to then do a bunch of extra parsing). Let's add an iteration interface that makes this easy to do. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	92338c450b	shortlog: add grouping option In preparation for adding more grouping types, let's refactor the committer/author grouping code and add a user-facing option that binds them together. In particular: - the main option is now "--group", to make it clear that the various group types are mutually exclusive. The "--committer" option is an alias for "--group=committer". - we keep an enum rather than a binary flag, to prepare for more values - we prefer switch statements to ternary assignment, since other group types will need more custom code Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Johannes Schindelin	f33f2d3d54	t9902: avoid using the branch name `master` The completion tests used that name unnecessarily, and it is a non-inclusive term, so let's avoid using it here. Since three of the touched test cases make use of the fact that two of the branch names (`master` and `maint`) start with the same letter (or even with the same two letters), we choose to replace the use of `master` by a name that also has that property: `main`. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-26 17:03:29 -07:00
Johannes Schindelin	b6211b89eb	tests: avoid variations of the `master` branch name The term `master` has a loaded history that serves as a constant reminder of racial injustice. The Git project has no desire to perpetuate this and already started avoiding it. The test suite uses variations of this name for branches other than the default one. Apart from t3200, where we just addressed this in the previous commit, those instances can be renamed in an automated manner because they do not require any changes outside of the test script, so let's do that. Seeing as the touched branches have very little (if anything) to do with the default branch, we choose to use a completely separate naming scheme: `topic_<number>` (it cannot be `topic-<number>` because t5515 uses the `test_oid` machinery with the term, and that machinery uses shell variables internally, whose names cannot contain dashes). This trick was performed by this (GNU) sed invocation: $ sed -i 's/master$[a-z0-9]$/topic_\1/g' t/t*.sh Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-26 17:03:29 -07:00
René Scharfe	5336d50696	ref-filter: plug memory leak in reach_filter() `21bf933928` (ref-filter: allow merged and no-merged filters, 2020-09-15) added an early return to reach_filter(). Avoid leaking the memory of a then unused array by postponing its allocation until we know we need it. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-26 15:39:49 -07:00
Ákos Uzonyi	0bc18daa2f	completion: complete refs after 'git restore -s' Currently only the long version (--source=) supports completion. Add completion support to the short (-s) option too. Signed-off-by: Ákos Uzonyi <uzonyi.akos@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-26 15:30:05 -07:00
Ákos Uzonyi	c09d1280f7	completion: use "prev" variable instead of introducing "prevword" In both _git_checkout and _git_switch a new "prevword" variable were introduced, however the "prev" variable already contains the last word. The "prevword" variable is replaced with "prev", and the case is moved to the beginning of the function, like it's done in many other places (e.g. _git_commit). Also the indentaion of the case is fixed. Signed-off-by: Ákos Uzonyi <uzonyi.akos@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-26 15:30:03 -07:00
Junio C Hamano	9bc233ae1c	Seventeenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 15:25:42 -07:00
Junio C Hamano	0335915690	Merge branch 'jk/diff-highlight-blank-match-fix' "diff-highlight" (in contrib/) had a logic to flush its output upon seeing a blank line but the way it detected a blank line was broken. * jk/diff-highlight-blank-match-fix: diff-highlight: correctly match blank lines for flush	2020-09-25 15:25:42 -07:00
Junio C Hamano	b5847b9fab	Merge branch 'hx/push-atomic-with-cert' "git push" that wants to be atomic and wants to send push certificate learned not to prepare and sign the push certificate when it fails the local check (hence due to atomicity it is known that no certificate is needed). * hx/push-atomic-with-cert: send-pack: run GPG after atomic push checking	2020-09-25 15:25:41 -07:00
Junio C Hamano	407d914521	Merge branch 'rs/misc-cleanups' Code cleanup. * rs/misc-cleanups: pack-write: use hashwrite_be32() in write_idx_file()	2020-09-25 15:25:41 -07:00
Junio C Hamano	9f4588d72b	Merge branch 'ld/p4-unshelve-fix' The "unshelve" subcommand of "git p4" used incorrectly used commit^N where it meant to say commit~N to name the Nth generation ancestor, which has been corrected. * ld/p4-unshelve-fix: git-p4: use HEAD~$n to find parent commit for unshelve git-p4 unshelve: adding a commit breaks git-p4 unshelve	2020-09-25 15:25:40 -07:00
Junio C Hamano	6c430a647c	Merge branch 'jx/proc-receive-hook' "git receive-pack" that accepts requests by "git push" learned to outsource most of the ref updates to the new "proc-receive" hook. * jx/proc-receive-hook: doc: add documentation for the proc-receive hook transport: parse report options for tracking refs t5411: test updates of remote-tracking branches receive-pack: new config receive.procReceiveRefs doc: add document for capability report-status-v2 New capability "report-status-v2" for git-push receive-pack: feed report options to post-receive receive-pack: add new proc-receive hook t5411: add basic test cases for proc-receive hook transport: not report a non-head push as a branch	2020-09-25 15:25:39 -07:00
Junio C Hamano	48794acc50	Merge branch 'ds/maintenance-part-1' A "git gc"'s big brother has been introduced to take care of more repository maintenance tasks, not limited to the object database cleaning. * ds/maintenance-part-1: maintenance: add trace2 regions for task execution maintenance: add auto condition for commit-graph task maintenance: use pointers to check --auto maintenance: create maintenance.<task>.enabled config maintenance: take a lock on the objects directory maintenance: add --task option maintenance: add commit-graph task maintenance: initialize task array maintenance: replace run_auto_gc() maintenance: add --quiet option maintenance: create basic maintenance runner	2020-09-25 15:25:38 -07:00
Junio C Hamano	0512eabd91	sequencer: stop abbreviating stopped-sha file The object name written to this file is not exposed to end-users and the only reader of this file immediately expands it back to a full object name. Stop abbreviating while writing, and expect a full object name while reading, which simplifies the code a bit. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 14:11:12 -07:00
Junio C Hamano	9f0be82123	t1506: rev-parse A..B and A...B Because these constructs can be used to parse user input to be passed to rev-list --objects, e.g. range=$(git rev-parse v1.0..v2.0) && git rev-list --objects $range \| git pack-objects --stdin the endpoints (v1.0 and v2.0 in the example) are shown without peeling them to underlying commits, even when they are annotated tags. Make sure it stays that way. While at it, ensure "rev-parse A...B" also keeps the endpoints A and B unpeeled, even though the negative side (i.e. the merge-base between A and B) has to become a commit. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 14:09:17 -07:00
Jeff King	eb049759fb	protocol: re-enable v2 protocol by default Protocol v2 became the default in v2.26.0 via `684ceae32d` (fetch: default to protocol version 2, 2019-12-23). More widespread use turned up a regression in negotiation. That was fixed in v2.27.0 via `4fa3f00abb` (fetch-pack: in protocol v2, in_vain only after ACK, 2020-04-27), but we also reverted the default to v0 as a precuation in `11c7f2a30b` (Revert "fetch: default to protocol version 2", 2020-04-22). In v2.28.0, we re-enabled it for experimental users with `3697caf4b9` (config: let feature.experimental imply protocol.version=2, 2020-05-20) and haven't heard any complaints. v2.28 has only been out for 2 months, but I'd generally expect people turning on feature.experimental to also stay pretty up-to-date. So we're not likely to collect much more data by waiting. In addition, we have no further reports from people running v2.26.0, and of course some people have been setting protocol.version manually for ages. Let's move forward with v2 as the default again. It's possible there are still lurking bugs, but we won't know until it gets more widespread use. And we can find and squash them just like any other bug at this point. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 11:40:42 -07:00
Derrick Stolee	2fec604f8d	maintenance: add start/stop subcommands Add new subcommands to 'git maintenance' that start or stop background maintenance using 'cron', when available. This integration is as simple as I could make it, barring some implementation complications. The schedule is laid out as follows: 0 1-23 * * * $cmd maintenance run --schedule=hourly 0 0 * * 1-6 $cmd maintenance run --schedule=daily 0 0 * * 0 $cmd maintenance run --schedule=weekly where $cmd is a properly-qualified 'git for-each-repo' execution: $cmd=$path/git --exec-path=$path for-each-repo --config=maintenance.repo where $path points to the location of the Git executable running 'git maintenance start'. This is critical for systems with multiple versions of Git. Specifically, macOS has a system version at '/usr/bin/git' while the version that users can install resides at '/usr/local/bin/git' (symlinked to '/usr/local/libexec/git-core/git'). This will also use your locally-built version if you build and run this in your development environment without installing first. This conditional schedule avoids having cron launch multiple 'git for-each-repo' commands in parallel. Such parallel commands would likely lead to the 'hourly' and 'daily' tasks competing over the object database lock. This could lead to to some tasks never being run! Since the --schedule=<frequency> argument will run all tasks with _at least_ the given frequency, the daily runs will also run the hourly tasks. Similarly, the weekly runs will also run the daily and hourly tasks. The GIT_TEST_CRONTAB environment variable is not intended for users to edit, but instead as a way to mock the 'crontab [-l]' command. This variable is set in test-lib.sh to avoid a future test from accidentally running anything with the cron integration from modifying the user's schedule. We use GIT_TEST_CRONTAB='test-tool crontab <file>' in our tests to check how the schedule is modified in 'git maintenance (start\|stop)' commands. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:59:44 -07:00
Derrick Stolee	0c18b70081	maintenance: add [un]register subcommands In preparation for launching background maintenance from the 'git maintenance' builtin, create register/unregister subcommands. These commands update the new 'maintenance.repos' config option in the global config so the background maintenance job knows which repositories to maintain. These commands allow users to add a repository to the background maintenance list without disrupting the actual maintenance mechanism. For example, a user can run 'git maintenance register' when no background maintenance is running and it will not start the background maintenance. A later update to start running background maintenance will then pick up this repository automatically. The opposite example is that a user can run 'git maintenance unregister' to remove the current repository from background maintenance without halting maintenance for other repositories. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:59:44 -07:00
Derrick Stolee	4950b2a2b5	for-each-repo: run subcommands on configured repos It can be helpful to store a list of repositories in global or system config and then iterate Git commands on that list. Create a new builtin that makes this process simple for experts. We will use this builtin to run scheduled maintenance on all configured repositories in a future change. The test is very simple, but does highlight that the "--" argument is optional. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:59:44 -07:00
Derrick Stolee	b08ff1fee0	maintenance: add --schedule option and config Maintenance currently triggers when certain data-size thresholds are met, such as number of pack-files or loose objects. Users may want to run certain maintenance tasks based on frequency instead. For example, a user may want to perform a 'prefetch' task every hour, or 'gc' task every day. To help these users, update the 'git maintenance run' command to include a '--schedule=<frequency>' option. The allowed frequencies are 'hourly', 'daily', and 'weekly'. These values are also allowed in a new config value 'maintenance.<task>.schedule'. The 'git maintenance run --schedule=<frequency>' checks the '.schedule' config value for each enabled task to see if the configured frequency is at least as frequent as the frequency from the '--schedule' argument. We use the following order, for full clarity: 'hourly' > 'daily' > 'weekly' Use new 'enum schedule_priority' to track these values numerically. The following cron table would run the scheduled tasks with the correct frequencies: 0 1-23 * * git -C <repo> maintenance run --schedule=hourly 0 0 * * 1-6 git -C <repo> maintenance run --schedule=daily 0 0 * * 0 git -C <repo> maintenance run --schedule=weekly This cron schedule will run --schedule=hourly every hour except at midnight. This avoids a concurrent run with the --schedule=daily that runs at midnight every day except the first day of the week. This avoids a concurrent run with the --schedule=weekly that runs at midnight on the first day of the week. Since --schedule=daily also runs the 'hourly' tasks and --schedule=weekly runs the 'hourly' and 'daily' tasks, we will still see all tasks run with the proper frequencies. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:59:44 -07:00
Derrick Stolee	1942d48380	maintenance: optionally skip --auto process Some commands run 'git maintenance run --auto --[no-]quiet' after doing their normal work, as a way to keep repositories clean as they are used. Currently, users who do not want this maintenance to occur would set the 'gc.auto' config option to 0 to avoid the 'gc' task from running. However, this does not stop the extra process invocation. On Windows, this extra process invocation can be more expensive than necessary. Allow users to drop this extra process by setting 'maintenance.auto' to 'false'. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:59:44 -07:00
Derrick Stolee	e841a79a13	maintenance: add incremental-repack auto condition The incremental-repack task updates the multi-pack-index by deleting pack- files that have been replaced with new packs, then repacking a batch of small pack-files into a larger pack-file. This incremental repack is faster than rewriting all object data, but is slower than some other maintenance activities. The 'maintenance.incremental-repack.auto' config option specifies how many pack-files should exist outside of the multi-pack-index before running the step. These pack-files could be created by 'git fetch' commands or by the loose-objects task. The default value is 10. Setting the option to zero disables the task with the '--auto' option, and a negative value makes the task run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:05 -07:00
Derrick Stolee	a13e3d0ec8	maintenance: auto-size incremental-repack batch When repacking during the 'incremental-repack' task, we use the --batch-size option in 'git multi-pack-index repack'. The initial setting used --batch-size=0 to repack everything into a single pack-file. This is not sustainable for a large repository. The amount of work required is also likely to use too many system resources for a background job. Update the 'incremental-repack' task by dynamically computing a --batch-size option based on the current pack-file structure. The dynamic default size is computed with this idea in mind for a client repository that was cloned from a very large remote: there is likely one "big" pack-file that was created at clone time. Thus, do not try repacking it as it is likely packed efficiently by the server. Instead, we select the second-largest pack-file, and create a batch size that is one larger than that pack-file. If there are three or more pack-files, then this guarantees that at least two will be combined into a new pack-file. Of course, this means that the second-largest pack-file size is likely to grow over time and may eventually surpass the initially-cloned pack-file. Recall that the pack-file batch is selected in a greedy manner: the packs are considered from oldest to newest and are selected if they have size smaller than the batch size until the total selected size is larger than the batch size. Thus, that oldest "clone" pack will be first to repack after the new data creates a pack larger than that. We also want to place some limits on how large these pack-files become, in order to bound the amount of time spent repacking. A maximum batch-size of two gigabytes means that large repositories will never be packed into a single pack-file using this job, but also that repack is rather expensive. This is a trade-off that is valuable to have if the maintenance is being run automatically or in the background. Users who truly want to optimize for space and performance (and are willing to pay the upfront cost of a full repack) can use the 'gc' task to do so. Create a test for this two gigabyte limit by creating an EXPENSIVE test that generates two pack-files of roughly 2.5 gigabytes in size, then performs an incremental repack. Check that the --batch-size argument in the subcommand uses the hard-coded maximum. Helped-by: Chris Torek <chris.torek@gmail.com> Reported-by: Son Luong Ngoc <sluongng@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:05 -07:00
Derrick Stolee	52fe41ff1c	maintenance: add incremental-repack task The previous change cleaned up loose objects using the 'loose-objects' that can be run safely in the background. Add a similar job that performs similar cleanups for pack-files. One issue with running 'git repack' is that it is designed to repack all pack-files into a single pack-file. While this is the most space-efficient way to store object data, it is not time or memory efficient. This becomes extremely important if the repo is so large that a user struggles to store two copies of the pack on their disk. Instead, perform an "incremental" repack by collecting a few small pack-files into a new pack-file. The multi-pack-index facilitates this process ever since 'git multi-pack-index expire' was added in `19575c7` (multi-pack-index: implement 'expire' subcommand, 2019-06-10) and 'git multi-pack-index repack' was added in `ce1e4a1` (midx: implement midx_repack(), 2019-06-10). The 'incremental-repack' task runs the following steps: 1. 'git multi-pack-index write' creates a multi-pack-index file if one did not exist, and otherwise will update the multi-pack-index with any new pack-files that appeared since the last write. This is particularly relevant with the background fetch job. When the multi-pack-index sees two copies of the same object, it stores the offset data into the newer pack-file. This means that some old pack-files could become "unreferenced" which I will use to mean "a pack-file that is in the pack-file list of the multi-pack-index but none of the objects in the multi-pack-index reference a location inside that pack-file." 2. 'git multi-pack-index expire' deletes any unreferenced pack-files and updaes the multi-pack-index to drop those pack-files from the list. This is safe to do as concurrent Git processes will see the multi-pack-index and not open those packs when looking for object contents. (Similar to the 'loose-objects' job, there are some Git commands that open pack-files regardless of the multi-pack-index, but they are rarely used. Further, a user that self-selects to use background operations would likely refrain from using those commands.) 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set of pack-files that are listed in the multi-pack-index and creates a new pack-file containing the objects whose offsets are listed by the multi-pack-index to be in those objects. The set of pack- files is selected greedily by sorting the pack-files by modified time and adding a pack-file to the set if its "expected size" is smaller than the batch size until the total expected size of the selected pack-files is at least the batch size. The "expected size" is calculated by taking the size of the pack-file divided by the number of objects in the pack-file and multiplied by the number of objects from the multi-pack-index with offset in that pack-file. The expected size approximates how much data from that pack-file will contribute to the resulting pack-file size. The intention is that the resulting pack-file will be close in size to the provided batch size. The next run of the incremental-repack task will delete these repacked pack-files during the 'expire' step. In this version, the batch size is set to "0" which ignores the size restrictions when selecting the pack-files. It instead selects all pack-files and repacks all packed objects into a single pack-file. This will be updated in the next change, but it requires doing some calculations that are better isolated to a separate change. These steps are based on a similar background maintenance step in Scalar (and VFS for Git) [1]. This was incredibly effective for users of the Windows OS repository. After using the same VFS for Git repository for over a year, some users had _thousands_ of pack-files that combined to up to 250 GB of data. We noticed a few users were running into the open file descriptor limits (due in part to a bug in the multi-pack-index fixed by `af96fe3` (midx: add packs to packed_git linked list, 2019-04-29). These pack-files were mostly small since they contained the commits and trees that were pushed to the origin in a given hour. The GVFS protocol includes a "prefetch" step that asks for pre-computed pack- files containing commits and trees by timestamp. These pack-files were grouped into "daily" pack-files once a day for up to 30 days. If a user did not request prefetch packs for over 30 days, then they would get the entire history of commits and trees in a new, large pack-file. This led to a large number of pack-files that had poor delta compression. By running this pack-file maintenance step once per day, these repos with thousands of packs spanning 200+ GB dropped to dozens of pack- files spanning 30-50 GB. This was done all without removing objects from the system and using a constant batch size of two gigabytes. Once the work was done to reduce the pack-files to small sizes, the batch size of two gigabytes means that not every run triggers a repack operation, so the following run will not expire a pack-file. This has kept these repos in a "clean" state. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	efdd2f0d4c	midx: use start_delayed_progress() Now that the multi-pack-index may be written as part of auto maintenance at the end of a command, reduce the progress output when the operations are quick. Use start_delayed_progress() instead of start_progress(). Update t5319-multi-pack-index.sh to use GIT_PROGRESS_DELAY=0 now that the progress indicators are conditional. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	18e449f86b	midx: enable core.multiPackIndex by default The core.multiPackIndex setting has been around since `c4d25228eb` (config: create core.multiPackIndex setting, 2018-07-12), but has been disabled by default. If a user wishes to use the multi-pack-index feature, then they must enable this config and run 'git multi-pack-index write'. The multi-pack-index feature is relatively stable now, so make the config option true by default. For users that do not use a multi-pack-index, the only extra cost will be a file lookup to see if a multi-pack-index file exists (once per process, per object directory). Also, this config option will be referenced by an upcoming "incremental-repack" task in the maintenance builtin, so move the config option into the repository settings struct. Note that if GIT_TEST_MULTI_PACK_INDEX=1, then we want to ignore the config option and treat core.multiPackIndex as enabled. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	3e220e6069	maintenance: create auto condition for loose-objects The loose-objects task deletes loose objects that already exist in a pack-file, then place the remaining loose objects into a new pack-file. If this step runs all the time, then we risk creating pack-files with very few objects with every 'git commit' process. To prevent overwhelming the packs directory with small pack-files, place a minimum number of objects to justify the task. The 'maintenance.loose-objects.auto' config option specifies a minimum number of loose objects to justify the task to run under the '--auto' option. This defaults to 100 loose objects. Setting the value to zero will prevent the step from running under '--auto' while a negative value will force it to run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	252cfb7cb8	maintenance: add loose-objects task One goal of background maintenance jobs is to allow a user to disable auto-gc (gc.auto=0) but keep their repository in a clean state. Without any cleanup, loose objects will clutter the object database and slow operations. In addition, the loose objects will take up extra space because they are not stored with deltas against similar objects. Create a 'loose-objects' task for the 'git maintenance run' command. This helps clean up loose objects without disrupting concurrent Git commands using the following sequence of events: 1. Run 'git prune-packed' to delete any loose objects that exist in a pack-file. Concurrent commands will prefer the packed version of the object to the loose version. (Of course, there are exceptions for commands that specifically care about the location of an object. These are rare for a user to run on purpose, and we hope a user that has selected background maintenance will not be trying to do foreground maintenance.) 2. Run 'git pack-objects' on a batch of loose objects. These objects are grouped by scanning the loose object directories in lexicographic order until listing all loose objects -or- reaching 50,000 objects. This is more than enough if the loose objects are created only by a user doing normal development. We noticed users with _millions_ of loose objects because VFS for Git downloads blobs on-demand when a file read operation requires populating a virtual file. This step is based on a similar step in Scalar [1] and VFS for Git. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	28cb5e66dd	maintenance: add prefetch task When working with very large repositories, an incremental 'git fetch' command can download a large amount of data. If there are many other users pushing to a common repo, then this data can rival the initial pack-file size of a 'git clone' of a medium-size repo. Users may want to keep the data on their local repos as close as possible to the data on the remote repos by fetching periodically in the background. This can break up a large daily fetch into several smaller hourly fetches. The task is called "prefetch" because it is work done in advance of a foreground fetch to make that 'git fetch' command much faster. However, if we simply ran 'git fetch <remote>' in the background, then the user running a foreground 'git fetch <remote>' would lose some important feedback when a new branch appears or an existing branch updates. This is especially true if a remote branch is force-updated and this isn't noticed by the user because it occurred in the background. Further, the functionality of 'git push --force-with-lease' becomes suspect. When running 'git fetch <remote> <options>' in the background, use the following options for careful updating: 1. --no-tags prevents getting a new tag when a user wants to see the new tags appear in their foreground fetches. 2. --refmap= removes the configured refspec which usually updates refs/remotes/<remote>/* with the refs advertised by the remote. While this looks confusing, this was documented and tested by `b40a50264a` (fetch: document and test --refmap="", 2020-01-21), including this sentence in the documentation: Providing an empty `<refspec>` to the `--refmap` option causes Git to ignore the configured refspecs and rely entirely on the refspecs supplied as command-line arguments. 3. By adding a new refspec "+refs/heads/:refs/prefetch/<remote>/" we can ensure that we actually load the new values somewhere in our refspace while not updating refs/heads or refs/remotes. By storing these refs here, the commit-graph job will update the commit-graph with the commits from these hidden refs. 4. --prune will delete the refs/prefetch/<remote> refs that no longer appear on the remote. 5. --no-write-fetch-head prevents updating FETCH_HEAD. We've been using this step as a critical background job in Scalar [1] (and VFS for Git). This solved a pain point that was showing up in user reports: fetching was a pain! Users do not like waiting to download the data that was created while they were away from their machines. After implementing background fetch, the foreground fetch commands sped up significantly because they mostly just update refs and download a small amount of new data. The effect is especially dramatic when paried with --no-show-forced-udpates (through fetch.showForcedUpdates=false). [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Johannes Schindelin	3eccc7b99d	cmake: ignore files generated by CMake as run in Visual Studio As of recent Visual Studio versions, CMake support is built-in: https://docs.microsoft.com/en-us/cpp/build/cmake-projects-in-visual-studio?view=vs-2019 All that needs to be done is to open the worktree as a folder, and Visual Studio will find the `CMakeLists.txt` file and automatically generate the project files. Let's ignore the entirety of those generated files. Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:50:44 -07:00
Jeff King	45d93eb824	shortlog: change "author" variables to "ident" We already match "committer", and we're about to start matching more things. Let's use a more neutral variable to avoid confusion. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:47:50 -07:00
Christian Couder	73c6de06af	bisect: don't use invalid oid as rev when starting In `06f5608c14` (bisect--helper: `bisect_start` shell function partially in C, 2019-01-02), we changed the following shell code: - rev=$(git rev-parse -q --verify "$arg^{commit}") \|\| { - test $has_double_dash -eq 1 && - die "$(eval_gettext "'\$arg' does not appear to be a valid revision")" - break - } - revs="$revs $rev" into: + char *commit_id = xstrfmt("%s^{commit}", arg); + if (get_oid(commit_id, &oid) && has_double_dash) + die(_("'%s' does not appear to be a valid " + "revision"), arg); + + string_list_append(&revs, oid_to_hex(&oid)); + free(commit_id); In case of an invalid "arg" when "has_double_dash" is false, the old code would "break" out of the argument loop. In the new C code though, `oid_to_hex(&oid)` is unconditonally appended to "revs". This is wrong first because "oid" is junk as `get_oid(commit_id, &oid)` failed and second because it doesn't break out of the argument loop. Not breaking out of the argument loop means that "arg" is then not treated as a path restriction (which is wrong). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 09:57:48 -07:00
Alex Henrie	54200cef86	pull: don't warn if pull.ff has been set A user who understands enough to set pull.ff does not need additional instructions. Signed-off-by: Alex Henrie <alexhenrie24@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 23:04:27 -07:00
Junio C Hamano	610e2b9240	blame: validate and peel the object names on the ignore list The command reads list of object names to place on the ignore list either from the command line or from a file, but they are not checked with their object type (those read from the file are not even checked for object existence). Extend the oidset_parse_file() API and allow it to take a callback that can be used to die (e.g. when an inappropriate input is read) or modify the object name read (e.g. when a tag pointing at a commit is read, and the caller wants a commit object name), and use it in the code that handles ignore list. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 22:20:58 -07:00
Junio C Hamano	f58931c8d6	t8013: minimum preparatory clean-up The closing sq for each test piece should be placed at the beginning of line. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 22:20:57 -07:00
Thomas Guyot-Sionnest	ff0c7fa8cb	diff: fix modified lines stats with --stat and --numstat Only skip diffstats when both oids are valid and identical. This check was causing both false-positives (files included in diffstats with no actual changes (0 lines modified) and false-negatives (showing 0 lines modified in stats when files had actually changed). Also replaced same_contents with may_differ to avoid confusion. Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:31:45 -07:00
Jeff King	176380fd11	Revert "fast-export: use local array to store anonymized oid" This reverts commit `f39ad38410`. That commit was trying to silence a type-punning warning on older versions of gcc. However, its analysis was all wrong. I didn't notice that we _were_ in fact type-punning because there are two versions of put_be32(): one that uses casts and unaligned loads, and another that uses bitshifts. I looked at the latter, but on my platform we were defaulting to the former. However, as of the previous commit, we'll always use the bitshift version. So we can drop this hackery to avoid the warning, making the code slightly cleaner. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:30:11 -07:00
Jeff King	c578e29ba0	bswap.h: drop unaligned loads Our put_be32() routine and its variants (get_be32(), put_be64(), etc) has two implementations: on some platforms we cast memory in place and use nothl()/htonl(), which can cause unaligned memory access. And on others, we pick out the individual bytes using bitshifts. This introduces extra complexity, and sometimes causes compilers to generate warnings about type-punning. And it's not clear there's any performance advantage. This split goes back to `660231aa97` (block-sha1: support for architectures with memory alignment restrictions, 2009-08-12). The unaligned versions were part of the original block-sha1 code in `d7c208a92e` (Add new optimized C 'block-sha1' routines, 2009-08-05), which says it is: Based on the mozilla SHA1 routine, but doing the input data accesses a word at a time and with 'htonl()' instead of loading bytes and shifting. Back then, Linus provided timings versus the mozilla code which showed a 27% improvement: https://lore.kernel.org/git/alpine.LFD.2.01.0908051545000.3390@localhost.localdomain/ However, the unaligned loads were either not the useful part of that speedup, or perhaps compilers and processors have changed since then. Here are times for computing the sha1 of 4GB of random data, with and without -DNO_UNALIGNED_LOADS (and BLK_SHA1=1, of course). This is with gcc 10, -O2, and the processor is a Core i9-9880H. [stock] Benchmark #1: t/helper/test-tool sha1 <foo.rand Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s] Range (min … max): 6.550 s … 6.841 s 10 runs [-DNO_UNALIGNED_LOADS] Benchmark #1: t/helper/test-tool sha1 <foo.rand Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s] Range (min … max): 6.394 s … 6.447 s 10 runs And here's the same test run on an AMD A8-7600, using gcc 8. [stock] Benchmark #1: t/helper/test-tool sha1 <foo.rand Time (mean ± σ): 11.721 s ± 0.113 s [User: 10.761 s, System: 0.951 s] Range (min … max): 11.509 s … 11.861 s 10 runs [-DNO_UNALIGNED_LOADS] Benchmark #1: t/helper/test-tool sha1 <foo.rand Time (mean ± σ): 11.744 s ± 0.066 s [User: 10.807 s, System: 0.928 s] Range (min … max): 11.637 s … 11.863 s 10 runs So the unaligned loads don't seem to help much, and actually make things worse. It's possible there are platforms where they provide more benefit, but: - the non-x86 platforms for which we use this code are old and obscure (powerpc and s390). - the main caller that cares about performance is block-sha1. But these days it is rarely used anyway, in favor of sha1dc (which is already much slower, and nobody seems to have cared that much). Let's just drop unaligned versions entirely in the name of simplicity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:30:09 -07:00
Pranit Bauva	517ecb3161	bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions in C and add the subcommands to `git bisect--helper` to call them from git-bisect.sh . bisect_auto_next() function returns an enum bisect_error type as whole `git bisect` can exit with an error code when bisect_next() does. Return an error when `bisect_next()` fails, that fix a bug on shell script version. Using `--bisect-next` and `--bisect-auto-next` subcommands is a temporary measure to port shell function to C so as to use the existing test suite. As more functions are ported, `--bisect-auto-next` subcommand will be retired and will be called by some other methods. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:06:30 -07:00
Miriam Rubio	c7a7f48f4f	bisect: call 'clear_commit_marks_all()' in 'bisect_next_all()' As there can be other revision walks after bisect_next_all(), let's add a call to a function to clear all the marks at the end of bisect_next_all(). Mentored-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:06:30 -07:00
Pranit Bauva	09535f056b	bisect--helper: reimplement `bisect_autostart` shell function in C Reimplement the `bisect_autostart()` shell function in C and add the C implementation from `bisect_next()` which was previously left uncovered. Add `--bisect-autostart` subcommand to be called from git-bisect.sh. Using `--bisect-autostart` subcommand is a temporary measure to port the shell function to C so as to use the existing test suite. As more functions are ported, this subcommand will be retired and bisect_autostart() will be called directly by `bisect_state()`. Change behavior of shell script that returned success when user aborted the bisection. Mentored-by: Lars Schneider <larsxschneider@gmail.com> Mentored-by: Christian Couder <chriscool@tuxfamily.org> Mentored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> Signed-off-by: Miriam Rubio <mirucam@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:06:30 -07:00
Denton Liu	d8d3d632f4	hooks--update.sample: use hash-agnostic zero OID The update sample hook has the zero OID hardcoded as 40 zeros. However, with the introduction of SHA-256 support, this assumption no longer holds true. Replace the hardcoded $z40 with a call to git hash-object --stdin </dev/null \| tr '[0-9a-f]' '0' so the sample hook becomes hash-agnostic. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-23 09:31:45 -07:00
Denton Liu	8c7e505950	hooks--pre-push.sample: use hash-agnostic zero OID The pre-push sample hook has the zero OID hardcoded as 40 zeros. However, with the introduction of SHA-256 support, this assumption no longer holds true. Replace the hardcoded $z40 with a call to git hash-object --stdin </dev/null \| tr '[0-9a-f]' '0' so the sample hook becomes hash-agnostic. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-23 09:31:45 -07:00
Denton Liu	6a117da6e5	hooks--pre-push.sample: modernize script The preferred form for a command substitution is $() over ``. Use this form for the command substitution in the sample hook. The preferred form for conditional tests is to use `test` over []. Replace [] with `test`. Finally, replace all instances of "sha" with "oid". Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-23 09:31:45 -07:00
Junio C Hamano	e1cfff6765	Sixteenth batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-22 12:36:34 -07:00

... 9 10 11 12 13 ...

61161 Commits