git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	52b8c8c716	Merge branch 'ds/maintenance-part-2' "git maintenance", an extended big brother of "git gc", continues to evolve. * ds/maintenance-part-2: maintenance: add incremental-repack auto condition maintenance: auto-size incremental-repack batch maintenance: add incremental-repack task midx: use start_delayed_progress() midx: enable core.multiPackIndex by default maintenance: create auto condition for loose-objects maintenance: add loose-objects task maintenance: add prefetch task	2020-10-27 15:09:47 -07:00
Junio C Hamano	26bb5437f6	Merge branch 'rs/worktree-list-show-locked' "git worktree list" now shows if each worktree is locked. This possibly may open us to show other kinds of states in the future. * rs/worktree-list-show-locked: worktree: teach `list` to annotate locked worktree	2020-10-27 15:09:47 -07:00
Junio C Hamano	ae84e924da	Merge branch 'rs/tighten-callers-of-deref-tag' Code clean-up. * rs/tighten-callers-of-deref-tag: line-log: handle deref_tag() returning NULL blame: handle deref_tag() returning NULL grep: handle deref_tag() returning NULL	2020-10-27 15:09:46 -07:00
Junio C Hamano	f34687dc81	Merge branch 'jk/committer-date-is-author-date-fix' In 2.29, "--committer-date-is-author-date" option of "rebase" and "am" subcommands lost the e-mail address by mistake, which has been corrected. * jk/committer-date-is-author-date-fix: rebase: fix broken email with --committer-date-is-author-date am: fix broken email with --committer-date-is-author-date t3436: check --committer-date-is-author-date result more carefully	2020-10-26 14:59:58 -07:00
Jeff King	16b0bb99ea	am: fix broken email with --committer-date-is-author-date Commit `e8cbe2118a` (am: stop exporting GIT_COMMITTER_DATE, 2020-08-17) rewrote the code for setting the committer date to use fmt_ident(), rather than setting an environment variable and letting commit_tree() handle it. But it introduced two bugs: - we use the author email string instead of the committer email - when parsing the committer ident, we used the wrong variable to compute the length of the email, resulting in it always being a zero-length string This commit fixes both, which causes our test of this option via the rebase "apply" backend to now succeed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-23 08:25:19 -07:00
René Scharfe	db7d07f610	blame: handle deref_tag() returning NULL Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-12 12:25:14 -07:00
René Scharfe	e30b1525fb	grep: handle deref_tag() returning NULL deref_tag() can return NULL. Exit gracefully in that case instead of blindly dereferencing the return value. .name shouldn't ever be NULL, but grep_object() handles that case explicitly, so let's be defensive here as well and show the broken object's ID if it happens to lack a name after all. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-12 12:25:14 -07:00
Rafael Silva	c57b3367be	worktree: teach `list` to annotate locked worktree The "git worktree list" shows the absolute path to the working tree, the commit that is checked out and the name of the branch. It is not immediately obvious which of the worktrees, if any, are locked. "git worktree remove" refuses to remove a locked worktree with an error message. If "git worktree list" told which worktrees are locked in its output, the user would not even attempt to remove such a worktree, or would realize that "git worktree remove -f -f <path>" is required. Teach "git worktree list" to append "locked" to its output. The output from the command becomes like so: $ git worktree list /path/to/main abc123 [master] /path/to/worktree 456def (detached HEAD) /path/to/locked-worktree 123abc (detached HEAD) locked Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Rafael Silva <rafaeloliveira.cs@gmail.com> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-12 12:24:29 -07:00
Junio C Hamano	d620daaa34	Merge branch 'ja/misc-doc-fixes' Doc fixes. * ja/misc-doc-fixes: doc: fix the bnf like style of some commands doc: git-remote fix ups doc: use linkgit macro where needed. git-bisect-lk2009: make continuation of list indented	2020-10-08 21:53:26 -07:00
Junio C Hamano	c7ac8c0a7c	Merge branch 'jk/index-pack-hotfixes' Hotfix and clean-up for the jt/threaded-index-pack topic that has graduated to v2.29-rc0. * jk/index-pack-hotfixes: index-pack: make get_base_data() comment clearer index-pack: drop type_cas mutex index-pack: restore "resolving deltas" progress meter	2020-10-08 21:53:26 -07:00
Jean-Noël Avila	9f443f5531	doc: fix the bnf like style of some commands In command line options, variables are entered between < and > Signed-off-by: Jean-Noël Avila <jn.avila@free.fr> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-08 14:01:19 -07:00
Jonathan Tan	ec6a8f9705	index-pack: make get_base_data() comment clearer A comment mentions that we may free cached delta bases via find_unresolved_deltas(), but that function went away in `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08). Since we need to rewrite that comment anyway, make the entire comment clearer. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-07 13:32:27 -07:00
Jeff King	bebe171947	index-pack: drop type_cas mutex The type_cas lock lost all of its callers in `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08), so we can safely delete it. The compiler didn't alert us that the variable became unused, because we still call pthread_mutex_init() and pthread_mutex_destroy() on it. It's worth considering also whether that commit was in error to remove the use of the lock. Why don't we need it now, if we did before, as described in `ab791dd138` (index-pack: fix race condition with duplicate bases, 2014-08-29)? I think the answer is that we now look at and assign the child_obj->real_type field in the main thread while holding the work_lock(). So we don't have to worry about racing with the worker threads. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-07 11:51:26 -07:00
Jeff King	cea69151a4	index-pack: restore "resolving deltas" progress meter Commit `f08cbf60fe` (index-pack: make quantum of work smaller, 2020-09-08) refactored the main loop in threaded_second_pass(), but also deleted the call to display_progress() at the top of the loop. This means that users typically see no progress at all during the delta resolution phase (and for large repositories, Git appears to hang). This looks like an accident that was unrelated to the intended change of that commit, since we continue to update nr_resolved_deltas in resolve_delta(). Let's restore the call to get that progress back. We'll also add a test that confirms we generate the expected progress. This isn't perfect, as it wouldn't catch a bug where progress was delayed to the end. That was probably possible to trigger when receiving a thin pack, because we'd eventually call display_progress() from fix_unresolved_deltas(), but only once after doing all the work. However, since our test case generates a complete pack, it reliably demonstrates this particular bug and its fix. And we can't do better without making the test racy. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-07 11:50:09 -07:00
Junio C Hamano	5f8c70a148	Merge branch 'jk/format-auto-base-when-able' "git format-patch" learns to take "whenAble" as a possible value for the format.useAutoBase configuration variable to become no-op when the automatically computed base does not make sense. * jk/format-auto-base-when-able: format-patch: teach format.useAutoBase "whenAble" option	2020-10-05 14:01:55 -07:00
Junio C Hamano	8e3ec76a20	Merge branch 'jk/refspecs-negative' "git fetch" and "git push" support negative refspecs. * jk/refspecs-negative: refspec: add support for negative refspecs	2020-10-05 14:01:54 -07:00
Junio C Hamano	e68f0a4e57	Merge branch 'jt/keep-partial-clone-filter-upon-lazy-fetch' The lazy fetching done internally to make missing objects available in a partial clone incorrectly made permanent damage to the partial clone filter in the repository, which has been corrected. * jt/keep-partial-clone-filter-upon-lazy-fetch: fetch: do not override partial clone filter promisor-remote: remove unused variable	2020-10-05 14:01:53 -07:00
Junio C Hamano	19dd352d03	Merge branch 'jk/unused' Code cleanup. * jk/unused: dir.c: drop unused "untracked" from treat_path_fast() sequencer: handle ignore_footer when parsing trailers test-advise: check argument count with argc instead of argv sparse-checkout: fill in some options boilerplate sequencer: drop repository argument from run_git_commit() push: drop unused repo argument to do_push() assert PARSE_OPT_NONEG in parse-options callbacks env--helper: write to opt->value in parseopt helper drop unused argc parameters convert: drop unused crlf_action from check_global_conv_flags_eol()	2020-10-05 14:01:52 -07:00
Junio C Hamano	34415c76c8	Merge branch 'so/combine-diff-simplify' Code simplification. * so/combine-diff-simplify: diff: get rid of redundant 'dense' argument	2020-10-05 14:01:51 -07:00
Junio C Hamano	58138d3f26	Merge branch 'js/default-branch-name-part-2' Update the tests to drop word 'master' from them. * js/default-branch-name-part-2: t9902: avoid using the branch name `master` tests: avoid variations of the `master` branch name t3200: avoid variations of the `master` branch name fast-export: avoid using unnecessary language in a code comment t/test-terminal: avoid non-inclusive language	2020-10-05 14:01:50 -07:00
Junio C Hamano	2fa8aacc72	Merge branch 'jk/shortlog-group-by-trailer' "git shortlog" has been taught to group commits by the contents of the trailer lines, like "Reviewed-by:", "Coauthored-by:", etc. * jk/shortlog-group-by-trailer: shortlog: allow multiple groups to be specified shortlog: parse trailer idents shortlog: rename parse_stdin_ident() shortlog: de-duplicate trailer values shortlog: match commit trailers with --group trailer: add interface for iterating over commit trailers shortlog: add grouping option shortlog: change "author" variables to "ident"	2020-10-04 12:49:14 -07:00
Junio C Hamano	f4cc68cbd0	Merge branch 'mr/bisect-in-c-2' Rewrite of the "git bisect" script in C continues. * mr/bisect-in-c-2: bisect--helper: reimplement `bisect_next` and `bisect_auto_next` shell functions in C bisect: call 'clear_commit_marks_all()' in 'bisect_next_all()' bisect--helper: reimplement `bisect_autostart` shell function in C bisect--helper: introduce new `write_in_file()` function bisect--helper: use '-res' in 'cmd_bisect__helper' return bisect--helper: BUG() in cmd_*() on invalid subcommand	2020-10-04 12:49:08 -07:00
Junio C Hamano	03a01824a4	Merge branch 'cc/bisect-start-fix' "git bisect start X Y", when X and Y are not valid committish object names, should take X and Y as pathspec, but didn't. * cc/bisect-start-fix: bisect: don't use invalid oid as rev when starting	2020-10-04 12:49:08 -07:00
Junio C Hamano	230ff3e997	Merge branch 'jc/blame-ignore-fix' "git blame --ignore-rev/--ignore-revs-file" failed to validate their input are valid revision, and failed to take into account that the user may want to give an annotated tag instead of a commit, which has been corrected. * jc/blame-ignore-fix: blame: validate and peel the object names on the ignore list t8013: minimum preparatory clean-up	2020-10-04 12:49:07 -07:00
Junio C Hamano	86cca370e1	Merge branch 'jk/drop-unaligned-loads' Compilation fix around type punning. * jk/drop-unaligned-loads: Revert "fast-export: use local array to store anonymized oid" bswap.h: drop unaligned loads	2020-10-04 12:49:06 -07:00
Jacob Keller	7efba5fa39	format-patch: teach format.useAutoBase "whenAble" option The format.useAutoBase configuration option exists to allow users to enable '--base=auto' for format-patch by default. This can sometimes lead to poor workflow, due to unexpected failures when attempting to format an ancient patch: $ git format-patch -1 <an old commit> fatal: base commit shouldn't be in revision list This can be very confusing, as it is not necessarily immediately obvious that the user requested a --base (since this was in the configuration, not on the command line). We do want --base=auto to fail when it cannot provide a suitable base, as it would be equally confusing if a formatted patch did not include the base information when it was requested. Teach format.useAutoBase a new mode, "whenAble". This mode will cause format-patch to attempt to include a base commit when it can. However, if no valid base commit can be found, then format-patch will continue formatting the patch without a base commit. In order to avoid making yet another branch name unusable with --base, do not teach --base=whenAble or --base=whenable. Instead, refactor the base_commit option to use a callback, and rely on the global configuration variable auto_base. This does mean that a user cannot request this optional base commit generation from the command line. However, this is likely not too valuable. If the user requests base information manually, they will be immediately informed of the failure to acquire a suitable base commit. This allows the user to make an informed choice about whether to continue the format. Add tests to cover the new mode of operation for --base. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-01 15:22:10 -07:00
Jacob Keller	c0192df630	refspec: add support for negative refspecs Both fetch and push support pattern refspecs which allow fetching or pushing references that match a specific pattern. Because these patterns are globs, they have somewhat limited ability to express more complex situations. For example, suppose you wish to fetch all branches from a remote except for a specific one. To allow this, you must setup a set of refspecs which match only the branches you want. Because refspecs are either explicit name matches, or simple globs, many patterns cannot be expressed. Add support for a new type of refspec, referred to as "negative" refspecs. These are prefixed with a '^' and mean "exclude any ref matching this refspec". They can only have one "side" which always refers to the source. During a fetch, this refers to the name of the ref on the remote. During a push, this refers to the name of the ref on the local side. With negative refspecs, users can express more complex patterns. For example: git fetch origin refs/heads/:refs/remotes/origin/ ^refs/heads/dontwant will fetch all branches on origin into remotes/origin, but will exclude fetching the branch named dontwant. Refspecs today are commutative, meaning that order doesn't expressly matter. Rather than forcing an implied order, negative refspecs will always be applied last. That is, in order to match, a ref must match at least one positive refspec, and match none of the negative refspecs. This is similar to how negative pathspecs work. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-30 14:52:00 -07:00
Jeff King	75d3bee157	sparse-checkout: fill in some options boilerplate The sparse-checkout passes along argv and argc to its sub-command helper functions. Many of these sub-commands do not yet take any command-line options, and ignore those parameters. Let's instead add empty option lists and make sure we call parse_options(). That will give a useful error message for something like: git sparse-checkout list --nonsense which currently just silently ignores the unknown option. As a bonus, it also silences some -Wunused-parameter warnings. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-30 12:53:48 -07:00
Jeff King	5b9427e0ac	push: drop unused repo argument to do_push() We stopped using the "repo" argument in `8e4c8af058` (push: disallow --all and refspecs when remote.<name>.mirror is set, 2019-09-02), which moved the pushremote handling to its caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-30 12:53:47 -07:00
Jeff King	8d2aa8dfac	assert PARSE_OPT_NONEG in parse-options callbacks In the spirit of `517fe807d6` (assert NOARG/NONEG behavior of parse-options callbacks, 2018-11-05), let's cover some parse-options callbacks which expect to be used with PARSE_OPT_NONEG but don't explicitly assert that this is the case. These callbacks are all used correctly in the current code, but this will help document their expectations and future-proof the code. As a bonus, it also silences -Wunused-parameters (these were added since the initial sweep of `517fe807d6`, and we can't yet turn on -Wunused-parameters to remind people because it has too many existing false positives). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-30 12:53:47 -07:00
Jeff King	424e28fcad	env--helper: write to opt->value in parseopt helper We use OPT_CALLBACK_F() to call the option_parse_type() callback, passing it the address of "cmdmode" as the value to write to. But the callback doesn't look at opt->value at all, and instead writes to a global variable. This works out because that's the same global variable we happen to pass in, but it's rather confusing. Let's use the passed-in value instead. We'll also make "cmdmode" a local variable of the main function, ensuring we can't make the same mistake again. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-30 12:53:47 -07:00
Jeff King	e885a84f1b	drop unused argc parameters Many functions take an argv/argc pair, but never actually look at argc. This makes it useless at best (we use the NULL sentinel in argv to find the end of the array), and misleading at worst (what happens if the argc count does not match the argv NULL?). In each of these instances, the argv NULL does match the argc count, so there are no bugs here. But let's tighten the interfaces to make it harder to get wrong (and to reduce some -Wunused-parameter complaints). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-30 12:53:47 -07:00
Junio C Hamano	299deeac8a	Merge branch 'ah/pull' Earlier we taught "git pull" to warn when the user does not say the histories need to be merged, rebased or accepts only fast- forwarding, but the warning triggered for those who have set the pull.ff configuration variable. * ah/pull: pull: don't warn if pull.ff has been set	2020-09-29 14:01:22 -07:00
Junio C Hamano	b28919c7bc	Merge branch 'bc/clone-with-git-default-hash-fix' "git clone" that clones from SHA-1 repository, while GIT_DEFAULT_HASH set to use SHA-256 already, resulted in an unusable repository that half-claims to be SHA-256 repository with SHA-1 objects and refs. This has been corrected. * bc/clone-with-git-default-hash-fix: builtin/clone: avoid failure with GIT_DEFAULT_HASH	2020-09-29 14:01:20 -07:00
Junio C Hamano	288ed98bf7	Merge branch 'tb/bloom-improvements' "git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'	2020-09-29 14:01:20 -07:00
Sergey Organov	d01141de5a	diff: get rid of redundant 'dense' argument Get rid of 'dense' argument that is redundant for every function that has 'struct rev_info *rev' argument as well, as the value of 'dense' passed is always taken from 'rev->dense_combined_merges' field. The only place where this was not the case is in 'submodule.c' where 'diff_tree_combined_merge()' was called with '1' for 'dense' argument. However, at that call the 'revs' instance used is local to the function, and we now just set 'revs->dense_combined_merges' to 1 in this local instance. Signed-off-by: Sergey Organov <sorganov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-29 11:54:53 -07:00
Jonathan Tan	23547c4051	fetch: do not override partial clone filter When a fetch with the --filter argument is made, the configured default filter is set even if one already exists. This change was made in `5e46139376` ("builtin/fetch: remove unique promisor remote limitation", 2019-06-25) - in particular, changing from: * If this is the FIRST partial-fetch request, we enable partial * on this repo and remember the given filter-spec as the default * for subsequent fetches to this remote. to: * If this is a partial-fetch request, we enable partial on * this repo if not already enabled and remember the given * filter-spec as the default for subsequent fetches to this * remote. (The given filter-spec is "remembered" even if there is already an existing one.) This is problematic whenever a lazy fetch is made, because lazy fetches are made using "git fetch --filter=blob:none", but this will also happen if the user invokes "git fetch --filter=<filter>" manually. Therefore, restore the behavior prior to `5e46139376`, which writes a filter-spec only if the current fetch request is the first partial-fetch one (for that remote). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-28 16:11:59 -07:00
Jeff King	63d24fa0b0	shortlog: allow multiple groups to be specified Now that shortlog supports reading from trailers, it can be useful to combine counts from multiple trailers, or between trailers and authors. This can be done manually by post-processing the output from multiple runs, but it's non-trivial to make sure that each name/commit pair is counted only once. This patch teaches shortlog to accept multiple --group options on the command line, and pull data from all of them. That makes it possible to run: git shortlog -ns --group=author --group=trailer:co-authored-by to get a shortlog that counts authors and co-authors equally. The implementation is mostly straightforward. The "group" enum becomes a bitfield, and the trailer key becomes a list. I didn't bother implementing the multi-group semantics for reading from stdin. It would be possible to do, but the existing matching code makes it awkward, and I doubt anybody cares. The duplicate suppression we used for trailers now covers authors and committers as well (though in non-trailer single-group mode we can skip the hash insertion and lookup, since we only see one value per commit). There is one subtlety: we now care about the case when no group bit is set (in which case we default to showing the author). The caller in builtin/log.c needs to be adapted to ask explicitly for authors, rather than relying on shortlog_init(). It would be possible with some gymnastics to make this keep working as-is, but it's not worth it for a single caller. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	56d5dde752	shortlog: parse trailer idents Trailers don't necessarily contain name/email identity values, so shortlog has so far treated them as opaque strings. However, since many trailers do contain identities, it's useful to treat them as such when they can be parsed. That lets "-e" work as usual, as well as mailmap. When they can't be parsed, we'll continue with the old behavior of treating them as a single string (there's no new test for that here, since the existing tests cover a trailer like this). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	87abb96222	shortlog: rename parse_stdin_ident() This function is actually useful for parsing any identity, whether from stdin or not. We'll need it for handling trailers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	f17b0b99bf	shortlog: de-duplicate trailer values The current documentation is vague about what happens with --group=trailer:signed-off-by when we see a commit with: Signed-off-by: One Signed-off-by: Two Signed-off-by: One We clearly should credit both "One" and "Two", but should "One" get credited twice? The current code does so, but mostly because that was the easiest thing to do. It's probably more useful to count each commit at most once. This will become especially important when we allow values from multiple sources in a future patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	47beb37bc6	shortlog: match commit trailers with --group If a project uses commit trailers, this patch lets you use shortlog to see who is performing each action. For example, running: git shortlog -ns --group=trailer:reviewed-by in git.git shows who has reviewed. You can even use a custom format to see things like who has helped whom: git shortlog --format="...helped %an (%ad)" \ --group=trailer:helped-by Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Jeff King	92338c450b	shortlog: add grouping option In preparation for adding more grouping types, let's refactor the committer/author grouping code and add a user-facing option that binds them together. In particular: - the main option is now "--group", to make it clear that the various group types are mutually exclusive. The "--committer" option is an alias for "--group=committer". - we keep an enum rather than a binary flag, to prepare for more values - we prefer switch statements to ternary assignment, since other group types will need more custom code Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-27 12:21:05 -07:00
Junio C Hamano	6c430a647c	Merge branch 'jx/proc-receive-hook' "git receive-pack" that accepts requests by "git push" learned to outsource most of the ref updates to the new "proc-receive" hook. * jx/proc-receive-hook: doc: add documentation for the proc-receive hook transport: parse report options for tracking refs t5411: test updates of remote-tracking branches receive-pack: new config receive.procReceiveRefs doc: add document for capability report-status-v2 New capability "report-status-v2" for git-push receive-pack: feed report options to post-receive receive-pack: add new proc-receive hook t5411: add basic test cases for proc-receive hook transport: not report a non-head push as a branch	2020-09-25 15:25:39 -07:00
Junio C Hamano	48794acc50	Merge branch 'ds/maintenance-part-1' A "git gc"'s big brother has been introduced to take care of more repository maintenance tasks, not limited to the object database cleaning. * ds/maintenance-part-1: maintenance: add trace2 regions for task execution maintenance: add auto condition for commit-graph task maintenance: use pointers to check --auto maintenance: create maintenance.<task>.enabled config maintenance: take a lock on the objects directory maintenance: add --task option maintenance: add commit-graph task maintenance: initialize task array maintenance: replace run_auto_gc() maintenance: add --quiet option maintenance: create basic maintenance runner	2020-09-25 15:25:38 -07:00
Derrick Stolee	e841a79a13	maintenance: add incremental-repack auto condition The incremental-repack task updates the multi-pack-index by deleting pack- files that have been replaced with new packs, then repacking a batch of small pack-files into a larger pack-file. This incremental repack is faster than rewriting all object data, but is slower than some other maintenance activities. The 'maintenance.incremental-repack.auto' config option specifies how many pack-files should exist outside of the multi-pack-index before running the step. These pack-files could be created by 'git fetch' commands or by the loose-objects task. The default value is 10. Setting the option to zero disables the task with the '--auto' option, and a negative value makes the task run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:05 -07:00
Derrick Stolee	a13e3d0ec8	maintenance: auto-size incremental-repack batch When repacking during the 'incremental-repack' task, we use the --batch-size option in 'git multi-pack-index repack'. The initial setting used --batch-size=0 to repack everything into a single pack-file. This is not sustainable for a large repository. The amount of work required is also likely to use too many system resources for a background job. Update the 'incremental-repack' task by dynamically computing a --batch-size option based on the current pack-file structure. The dynamic default size is computed with this idea in mind for a client repository that was cloned from a very large remote: there is likely one "big" pack-file that was created at clone time. Thus, do not try repacking it as it is likely packed efficiently by the server. Instead, we select the second-largest pack-file, and create a batch size that is one larger than that pack-file. If there are three or more pack-files, then this guarantees that at least two will be combined into a new pack-file. Of course, this means that the second-largest pack-file size is likely to grow over time and may eventually surpass the initially-cloned pack-file. Recall that the pack-file batch is selected in a greedy manner: the packs are considered from oldest to newest and are selected if they have size smaller than the batch size until the total selected size is larger than the batch size. Thus, that oldest "clone" pack will be first to repack after the new data creates a pack larger than that. We also want to place some limits on how large these pack-files become, in order to bound the amount of time spent repacking. A maximum batch-size of two gigabytes means that large repositories will never be packed into a single pack-file using this job, but also that repack is rather expensive. This is a trade-off that is valuable to have if the maintenance is being run automatically or in the background. Users who truly want to optimize for space and performance (and are willing to pay the upfront cost of a full repack) can use the 'gc' task to do so. Create a test for this two gigabyte limit by creating an EXPENSIVE test that generates two pack-files of roughly 2.5 gigabytes in size, then performs an incremental repack. Check that the --batch-size argument in the subcommand uses the hard-coded maximum. Helped-by: Chris Torek <chris.torek@gmail.com> Reported-by: Son Luong Ngoc <sluongng@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:05 -07:00
Derrick Stolee	52fe41ff1c	maintenance: add incremental-repack task The previous change cleaned up loose objects using the 'loose-objects' that can be run safely in the background. Add a similar job that performs similar cleanups for pack-files. One issue with running 'git repack' is that it is designed to repack all pack-files into a single pack-file. While this is the most space-efficient way to store object data, it is not time or memory efficient. This becomes extremely important if the repo is so large that a user struggles to store two copies of the pack on their disk. Instead, perform an "incremental" repack by collecting a few small pack-files into a new pack-file. The multi-pack-index facilitates this process ever since 'git multi-pack-index expire' was added in `19575c7` (multi-pack-index: implement 'expire' subcommand, 2019-06-10) and 'git multi-pack-index repack' was added in `ce1e4a1` (midx: implement midx_repack(), 2019-06-10). The 'incremental-repack' task runs the following steps: 1. 'git multi-pack-index write' creates a multi-pack-index file if one did not exist, and otherwise will update the multi-pack-index with any new pack-files that appeared since the last write. This is particularly relevant with the background fetch job. When the multi-pack-index sees two copies of the same object, it stores the offset data into the newer pack-file. This means that some old pack-files could become "unreferenced" which I will use to mean "a pack-file that is in the pack-file list of the multi-pack-index but none of the objects in the multi-pack-index reference a location inside that pack-file." 2. 'git multi-pack-index expire' deletes any unreferenced pack-files and updaes the multi-pack-index to drop those pack-files from the list. This is safe to do as concurrent Git processes will see the multi-pack-index and not open those packs when looking for object contents. (Similar to the 'loose-objects' job, there are some Git commands that open pack-files regardless of the multi-pack-index, but they are rarely used. Further, a user that self-selects to use background operations would likely refrain from using those commands.) 3. 'git multi-pack-index repack --bacth-size=<size>' collects a set of pack-files that are listed in the multi-pack-index and creates a new pack-file containing the objects whose offsets are listed by the multi-pack-index to be in those objects. The set of pack- files is selected greedily by sorting the pack-files by modified time and adding a pack-file to the set if its "expected size" is smaller than the batch size until the total expected size of the selected pack-files is at least the batch size. The "expected size" is calculated by taking the size of the pack-file divided by the number of objects in the pack-file and multiplied by the number of objects from the multi-pack-index with offset in that pack-file. The expected size approximates how much data from that pack-file will contribute to the resulting pack-file size. The intention is that the resulting pack-file will be close in size to the provided batch size. The next run of the incremental-repack task will delete these repacked pack-files during the 'expire' step. In this version, the batch size is set to "0" which ignores the size restrictions when selecting the pack-files. It instead selects all pack-files and repacks all packed objects into a single pack-file. This will be updated in the next change, but it requires doing some calculations that are better isolated to a separate change. These steps are based on a similar background maintenance step in Scalar (and VFS for Git) [1]. This was incredibly effective for users of the Windows OS repository. After using the same VFS for Git repository for over a year, some users had _thousands_ of pack-files that combined to up to 250 GB of data. We noticed a few users were running into the open file descriptor limits (due in part to a bug in the multi-pack-index fixed by `af96fe3` (midx: add packs to packed_git linked list, 2019-04-29). These pack-files were mostly small since they contained the commits and trees that were pushed to the origin in a given hour. The GVFS protocol includes a "prefetch" step that asks for pre-computed pack- files containing commits and trees by timestamp. These pack-files were grouped into "daily" pack-files once a day for up to 30 days. If a user did not request prefetch packs for over 30 days, then they would get the entire history of commits and trees in a new, large pack-file. This led to a large number of pack-files that had poor delta compression. By running this pack-file maintenance step once per day, these repos with thousands of packs spanning 200+ GB dropped to dozens of pack- files spanning 30-50 GB. This was done all without removing objects from the system and using a constant batch size of two gigabytes. Once the work was done to reduce the pack-files to small sizes, the batch size of two gigabytes means that not every run triggers a repack operation, so the following run will not expire a pack-file. This has kept these repos in a "clean" state. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/PackfileMaintenanceStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	3e220e6069	maintenance: create auto condition for loose-objects The loose-objects task deletes loose objects that already exist in a pack-file, then place the remaining loose objects into a new pack-file. If this step runs all the time, then we risk creating pack-files with very few objects with every 'git commit' process. To prevent overwhelming the packs directory with small pack-files, place a minimum number of objects to justify the task. The 'maintenance.loose-objects.auto' config option specifies a minimum number of loose objects to justify the task to run under the '--auto' option. This defaults to 100 loose objects. Setting the value to zero will prevent the step from running under '--auto' while a negative value will force it to run every time. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00
Derrick Stolee	252cfb7cb8	maintenance: add loose-objects task One goal of background maintenance jobs is to allow a user to disable auto-gc (gc.auto=0) but keep their repository in a clean state. Without any cleanup, loose objects will clutter the object database and slow operations. In addition, the loose objects will take up extra space because they are not stored with deltas against similar objects. Create a 'loose-objects' task for the 'git maintenance run' command. This helps clean up loose objects without disrupting concurrent Git commands using the following sequence of events: 1. Run 'git prune-packed' to delete any loose objects that exist in a pack-file. Concurrent commands will prefer the packed version of the object to the loose version. (Of course, there are exceptions for commands that specifically care about the location of an object. These are rare for a user to run on purpose, and we hope a user that has selected background maintenance will not be trying to do foreground maintenance.) 2. Run 'git pack-objects' on a batch of loose objects. These objects are grouped by scanning the loose object directories in lexicographic order until listing all loose objects -or- reaching 50,000 objects. This is more than enough if the loose objects are created only by a user doing normal development. We noticed users with _millions_ of loose objects because VFS for Git downloads blobs on-demand when a file read operation requires populating a virtual file. This step is based on a similar step in Scalar [1] and VFS for Git. [1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-25 10:53:04 -07:00

1 2 3 4 5 ...

8798 Commits