git-commit-vandalism

Author	SHA1	Message	Date
René Scharfe	ddd0bfac7c	receive-pack: use FLEX_ALLOC_MEM in queue_command() Use the macro FLEX_ALLOC_MEM instead of open-coding it. This shortens and simplifies the code a bit. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-13 19:49:30 -07:00
Junio C Hamano	dd610aeda6	Merge branch 'kw/patch-ids-optim' When "git rebase" tries to compare set of changes on the updated upstream and our own branch, it computes patch-id for all of these changes and attempts to find matches. This has been optimized by lazily computing the full patch-id (which is expensive) to be compared only for changes that touch the same set of paths. * kw/patch-ids-optim: rebase: avoid computing unnecessary patch IDs patch-ids: add flag to create the diff patch id using header only data patch-ids: replace the seen indicator with a commit pointer patch-ids: stop using a hand-rolled hashmap implementation	2016-08-12 09:47:39 -07:00
Junio C Hamano	2c44b7a53b	Merge branch 'js/mv-dir-to-new-directory' "git mv dir non-existing-dir/" did not work in some environments the same way as existing mainstream platforms. The code now moves "dir" to "non-existing-dir", without relying on rename("A", "B/") that strips the trailing slash of '/'. * js/mv-dir-to-new-directory: git mv: do not keep slash in `git mv dir non-existing-dir/`	2016-08-12 09:47:37 -07:00
Junio C Hamano	0a315befa7	Merge branch 'rs/use-strbuf-add-unique-abbrev' A small code clean-up. * rs/use-strbuf-add-unique-abbrev: use strbuf_add_unique_abbrev() for adding short hashes	2016-08-12 09:47:37 -07:00
Junio C Hamano	b32d7c524b	Merge branch 'rs/merge-add-strategies-simplification' A small code clean-up. * rs/merge-add-strategies-simplification: merge: use string_list_split() in add_strategies()	2016-08-12 09:47:36 -07:00
Junio C Hamano	18f3ce8841	Merge branch 'rs/child-process-init' A small code clean-up. * rs/child-process-init: use CHILD_PROCESS_INIT to initialize automatic variables	2016-08-12 09:47:36 -07:00
Junio C Hamano	2f9c615efb	Merge branch 'sb/submodule-clone-retry' Fix-up to an error codepath in a topic already in 'master'. * sb/submodule-clone-retry: submodule--helper: use parallel processor correctly	2016-08-12 09:47:34 -07:00
Junio C Hamano	f4fd627661	Merge branch 'jk/reset-ident-time-per-commit' into maint Not-so-recent rewrite of "git am" that started making internal calls into the commit machinery had an unintended regression, in that no matter how many seconds it took to apply many patches, the resulting committer timestamp for the resulting commits were all the same. * jk/reset-ident-time-per-commit: am: reset cached ident date for each patch	2016-08-12 09:16:56 -07:00
Kevin Willford	b3dfeebb92	rebase: avoid computing unnecessary patch IDs The `rebase` family of Git commands avoid applying patches that were already integrated upstream. They do that by using the revision walking option that computes the patch IDs of the two sides of the rebase (local-only patches vs upstream-only ones) and skipping those local patches whose patch ID matches one of the upstream ones. In many cases, this causes unnecessary churn, as already the set of paths touched by a given commit would suffice to determine that an upstream patch has no local equivalent. This hurts performance in particular when there are a lot of upstream patches, and/or large ones. Therefore, let's introduce the concept of a "diff-header-only" patch ID, compare those first, and only evaluate the "full" patch ID lazily. Please note that in contrast to the "full" patch IDs, those "diff-header-only" patch IDs are prone to collide with one another, as adjacent commits frequently touch the very same files. Hence we now have to be careful to allow multiple hash entries with the same hash. We accomplish that by using the hashmap_add() function that does not even test for hash collisions. This also allows us to evaluate the full patch ID lazily, i.e. only when we found commits with matching diff-header-only patch IDs. We add a performance test that demonstrates ~1-6% improvement. In practice this will depend on various factors such as how many upstream changes and how big those changes are along with whether file system caches are cold or warm. As Git's test suite has no way of catching performance regressions, we also add a regression test that verifies that the full patch ID computation is skipped when the diff-header-only computation suffices. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-11 14:39:16 -07:00
Jeff King	07e7dbf0db	gc: default aggressive depth to 50 This commit message is long and has lots of background and numbers. The summary is: the current default of 250 doesn't save much space, and costs CPU. It's not a good tradeoff. Read on for details. The "--aggressive" flag to git-gc does three things: 1. use "-f" to throw out existing deltas and recompute from scratch 2. use "--window=250" to look harder for deltas 3. use "--depth=250" to make longer delta chains Items (1) and (2) are good matches for an "aggressive" repack. They ask the repack to do more computation work in the hopes of getting a better pack. You pay the costs during the repack, and other operations see only the benefit. Item (3) is not so clear. Allowing longer chains means fewer restrictions on the deltas, which means potentially finding better ones and saving some space. But it also means that operations which access the deltas have to follow longer chains, which affects their performance. So it's a tradeoff, and it's not clear that the tradeoff is even a good one. The existing "250" numbers for "--aggressive" come originally from this thread: http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/ where Linus says: So when I said "--depth=250 --window=250", I chose those numbers more as an example of extremely aggressive packing, and I'm not at all sure that the end result is necessarily wonderfully usable. It's going to save disk space (and network bandwidth - the delta's will be re-used for the network protocol too!), but there are definitely downsides too, and using long delta chains may simply not be worth it in practice. There are some numbers in that thread, but they're mostly focused on the improved window size, and measure the improvement from --depth=250 and --window=250 together. E.g.: http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/ talks about the improved run-time of "git-blame", which comes from the reduced pack size. But most of that reduction is coming from --window=250, whereas most of the extra costs come from --depth=250. There's a link in that thread showing that increasing the depth beyond 50 doesn't seem to help much with the size: https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html but again, no discussion of the timing impact. In an earlier thread from Ted Ts'o which discussed setting the non-aggressive default (from 10 to 50): http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/ we have more numbers, with the conclusion that going past 50 does not help size much, and hurts the speed of normal operations. So from that, we might guess that 50 is actually a sweet spot, even for aggressive, if we interpret aggressive to "spend time now to make a better pack". It is not clear that "--depth=250" is actually a better pack. It may be slightly _smaller_, but it carries a run-time penalty. Here are some more recent timings I did to verify that. They show three things: - the size of the resulting pack (so disk saved to store, bandwidth saved on clones/fetches) - the cost of "rev-list --objects --all", which shows the effect of the delta chains on trees (commits typically don't delta, and the command doesn't touch the blobs at all) - the cost of "log -Sfoo", which will additionally access each blob All cases were repacked with "git repack -adf --depth=$d --window=250" (so basically, what would happen if we tweaked the "gc --aggressive" default depth). The timings are all wall-clock best-of-3. The machine itself has plenty of RAM compared to the repositories (which is probably typical of most workstations these days), so we're really measuring CPU usage, as the whole thing will be in disk cache after the first run. The core.deltaBaseCacheLimit is at its default of 96MiB. It's possible that tweaking it would have some impact on the tests, as some of them (especially "log -S" on a large repo) are likely to overflow that. But bumping that carries a run-time memory cost, so for these tests, I focused on what we could do just with the on-disk pack tradeoffs. Each test is done for four depths: 250 (the current value), 50 (the current default that tested well previously), 100 (to show something on the larger side, which previous tests showed was not a good tradeoff), and 10 (the very old default, which previous tests showed was worse than 50). Here are the numbers for linux.git: depth \| size \| % \| rev-list \| % \| log -Sfoo \| % -------+-------+-------+----------+--------+-----------+------- 250 \| 967MB \| n/a \| 48.159s \| n/a \| 378.088 \| n/a 100 \| 971MB \| +0.4% \| 41.471s \| -13.9% \| 342.060 \| -9.5% 50 \| 979MB \| +1.2% \| 37.778s \| -21.6% \| 311.040s \| -17.7% 10 \| 1.1GB \| +6.6% \| 32.518s \| -32.5% \| 279.890s \| -25.9% and for git.git: depth \| size \| % \| rev-list \| % \| log -Sfoo \| % -------+-------+-------+----------+--------+-----------+------- 250 \| 48MB \| n/a \| 2.215s \| n/a \| 20.922s \| n/a 100 \| 49MB \| +0.5% \| 2.140s \| -3.4% \| 17.736s \| -15.2% 50 \| 49MB \| +1.7% \| 2.099s \| -5.2% \| 15.418s \| -26.3% 10 \| 53MB \| +9.3% \| 2.001s \| -9.7% \| 12.677s \| -39.4% You can see that that the CPU savings for regular operations improves as we decrease the depth. The savings are less for "rev-list" on a smaller repository than they are for blob-accessing operations, or even rev-list on a larger repository. This may mean that a larger delta cache would help (though setting core.deltaBaseCacheLimit by itself doesn't). But we can also see that the space savings are not that great as the depth goes higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff. Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is probably not. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-11 11:53:19 -07:00
Junio C Hamano	11b53957ac	Merge branch 'sb/submodule-update-dot-branch' A few updates to "git submodule update". Use of "\| wc -l" break with BSD variant of 'wc'. * sb/submodule-update-dot-branch: t7406: fix breakage on OSX submodule update: allow '.' for branch value submodule--helper: add remote-branch helper submodule-config: keep configured branch around submodule--helper: fix usage string for relative-path submodule update: narrow scope of local variable submodule update: respect depth in subsequent fetches t7406: future proof tests with hard coded depth	2016-08-10 12:33:20 -07:00
Junio C Hamano	1a5f1a3f25	Merge branch 'js/am-3-merge-recursive-direct' "git am -3" calls "git merge-recursive" when it needs to fall back to a three-way merge; this call has been turned into an internal subroutine call instead of spawning a separate subprocess. * js/am-3-merge-recursive-direct: merge-recursive: flush output buffer even when erroring out merge_trees(): ensure that the callers release output buffer merge-recursive: offer an option to retain the output in 'obuf' merge-recursive: write the commit title in one go merge-recursive: flush output buffer before printing error messages am -3: use merge_recursive() directly again merge-recursive: switch to returning errors instead of dying merge-recursive: handle return values indicating errors merge-recursive: allow write_tree_from_memory() to error out merge-recursive: avoid returning a wholesale struct merge_recursive: abort properly upon errors prepare the builtins for a libified merge_recursive() merge-recursive: clarify code in was_tracked() die(_("BUG")): avoid translating bug messages die("bug"): report bugs consistently t5520: verify that `pull --rebase` shows the helpful advice when failing	2016-08-10 12:33:20 -07:00
Junio C Hamano	7a3ea66633	Merge branch 'js/commit-slab-decl-fix' * js/commit-slab-decl-fix: commit-slab.h: avoid duplicated global static variables config.c: avoid duplicated global static variables	2016-08-10 12:33:20 -07:00
Junio C Hamano	db40a62239	Merge branch 'jt/format-patch-from-config' "git format-patch" learned format.from configuration variable to specify the default settings for its "--from" option. * jt/format-patch-from-config: format-patch: format.from gives the default for --from	2016-08-10 12:33:18 -07:00
Junio C Hamano	24fbe00490	Merge branch 'jk/reset-ident-time-per-commit' Not-so-recent rewrite of "git am" that started making internal calls into the commit machinery had an unintended regression, in that no matter how many seconds it took to apply many patches, the resulting committer timestamp for the resulting commits were all the same. * jk/reset-ident-time-per-commit: am: reset cached ident date for each patch	2016-08-10 12:33:17 -07:00
Junio C Hamano	574a31b5b7	Merge branch 'rs/use-strbuf-addstr' into maint * rs/use-strbuf-addstr: use strbuf_addstr() instead of strbuf_addf() with "%s" use strbuf_addstr() for adding constant strings to a strbuf	2016-08-10 11:55:34 -07:00
Junio C Hamano	d9d7ab3b1d	Merge branch 'os/no-verify-skips-commit-msg-too' into maint "git commit --help" said "--no-verify" is only about skipping the pre-commit hook, and failed to say that it also skipped the commit-msg hook. * os/no-verify-skips-commit-msg-too: commit: describe that --no-verify skips the commit-msg hook in the help text	2016-08-10 11:55:25 -07:00
Junio C Hamano	b7fb136bf6	Merge branch 'rs/rm-strbuf-optim' into maint The use of strbuf in "git rm" to build filename to remove was a bit suboptimal, which has been fixed. * rs/rm-strbuf-optim: rm: reuse strbuf for all remove_dir_recursively() calls	2016-08-10 11:55:24 -07:00
Junio C Hamano	60b84ba26c	Merge branch 'jk/parse-options-concat' into maint Users of the parse_options_concat() API function need to allocate extra slots in advance and fill them with OPT_END() when they want to decide the set of supported options dynamically, which makes the code error-prone and hard to read. This has been corrected by tweaking the API to allocate and return a new copy of "struct option" array. * jk/parse-options-concat: parse_options: allocate a new array when concatenating	2016-08-10 11:55:24 -07:00
Stefan Beller	2201ee09b5	submodule--helper: use parallel processor correctly When developing another patch series I had a temporary state in which git-clone would segfault, when the call was prepared in prepare_to_clone_next_submodule. This lead to the call failing, i.e. in `update_clone_task_finished` the task was scheduled to be tried again. The second call to prepare_to_clone_next_submodule would return 0, as the segfaulted clone did create the .git file already, such that was not considered to need to be cloned again. I was seeing the "BUG: ce was a submodule before?\n" message, which was the correct behavior at the time as my local code was buggy. When trying to debug this failure, I tried to use printing messages into the strbuf that is passed around, but these messages were never printed as the die(..) doesn't flush the `err` strbuf. When implementing the die() in `665b35ecc` (2016-06-09, "submodule--helper: initial clone learns retry logic"), I considered this condition to be a severe condition, which should lead to an immediate abort as we do not trust ourselves any more. However the queued messages in `err` are valuable so let's not toss them out by immediately dying, but a graceful return. Another thing to note: The error message itself was misleading. A return value of 0 doesn't indicate the passed in `ce` is not a submodule any more, but just that we do not consider cloning it any more. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-09 14:54:16 -07:00
Johannes Sixt	dc29ddebb9	config.c: avoid duplicated global static variables Repeating the definition of a static variable seems to be valid in C. Nevertheless, it is bad style because it can cause confusion, definitely when it becomes necessary to change the type. `d64ec16` (git config: reorganize to use parseopt, 2009-02-21) added two static variables near the top of the file config.c without removing the definitions of the two variables that occurs later in the file. The two variables were needed earlier in the file in the newly introduced parseopt structure. These references were removed later in `d0e08d6` (config: fix parsing of "git config --get-color some.key -1", 2014-11-20). Remove the redundant, younger, definitions near the top of the file and keep the original definitions that occur later. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-09 10:19:24 -07:00
Junio C Hamano	19492555ca	Merge branch 'jk/parseopt-string-list' A small memory leak in the command line parsing of "git blame" has been plugged. * jk/parseopt-string-list: blame: drop strdup of string literal	2016-08-08 14:48:44 -07:00
Junio C Hamano	940622bc8b	Merge branch 'rs/use-strbuf-addstr' * rs/use-strbuf-addstr: use strbuf_addstr() instead of strbuf_addf() with "%s" use strbuf_addstr() for adding constant strings to a strbuf	2016-08-08 14:48:41 -07:00
Junio C Hamano	78849622ec	Merge branch 'jk/pack-objects-optim' "git pack-objects" has a few options that tell it not to pack objects found in certain packfiles, which require it to scan .idx files of all available packs. The codepaths involved in these operations have been optimized for a common case of not having any non-local pack and/or any .kept pack. * jk/pack-objects-optim: pack-objects: compute local/ignore_pack_keep early pack-objects: break out of want_object loop early find_pack_entry: replace last_found_pack with MRU cache add generic most-recently-used list sha1_file: drop free_pack_by_name t/perf: add tests for many-pack scenarios	2016-08-08 14:48:39 -07:00
Junio C Hamano	768ededa9c	Merge branch 'va/i18n' More i18n marking. * va/i18n: i18n: config: unfold error messages marked for translation i18n: notes: mark comment for translation	2016-08-08 14:48:38 -07:00
Junio C Hamano	0d3279962a	Merge branch 'jk/reflog-date' The reflog output format is documented better, and a new format --date=unix to report the seconds-since-epoch (without timezone) has been added. * jk/reflog-date: date: clarify --date=raw description date: add "unix" format date: document and test "raw-local" mode doc/pretty-formats: explain shortening of %gd doc/pretty-formats: describe index/time formats for %gd doc/rev-list-options: explain "-g" output formats doc/rev-list-options: clarify "commit@{Nth}" for "-g" option	2016-08-08 14:48:37 -07:00
Junio C Hamano	a220e2bbbf	Merge branch 'pb/commit-editmsg-path' into maint Code clean-up. * pb/commit-editmsg-path: builtin/commit.c: memoize git-path for COMMIT_EDITMSG	2016-08-08 14:21:38 -07:00
Junio C Hamano	aa9136a87e	Merge branch 'nd/pack-ofs-4gb-limit' into maint "git pack-objects" and "git index-pack" mostly operate with off_t when talking about the offset of objects in a packfile, but there were a handful of places that used "unsigned long" to hold that value, leading to an unintended truncation. * nd/pack-ofs-4gb-limit: fsck: use streaming interface for large blobs in pack pack-objects: do not truncate result in-pack object size on 32-bit systems index-pack: correct "offset" type in unpack_entry_data() index-pack: report correct bad object offsets even if they are large index-pack: correct "len" type in unpack_data() sha1_file.c: use type off_t* for object_info->disk_sizep pack-objects: pass length to check_pack_crc() without truncation	2016-08-08 14:21:36 -07:00
Junio C Hamano	327b3f8459	Merge branch 'mh/blame-worktree' into maint "git blame file" allowed the lineage of lines in the uncommitted, unadded contents of "file" to be inspected, but it refused when "file" did not appear in the current commit. When "file" was created by renaming an existing file (but the change has not been committed), this restriction was unnecessarily tight. * mh/blame-worktree: t/t8003-blame-corner-cases.sh: Use here documents blame: allow to blame paths freshly added to the index	2016-08-08 14:21:32 -07:00
Johannes Schindelin	189d035e67	git mv: do not keep slash in `git mv dir non-existing-dir/` When calling `rename("dir", "non-existing-dir/")` on Linux, it silently succeeds, stripping the trailing slash of the second argument. This is all good and dandy but this behavior disagrees with the specs at http://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html that state clearly regarding the 2nd parameter (called `new`): If the `new` argument does not resolve to an existing directory entry for a file of type directory and the `new` argument contains at least one non- <slash> character and ends with one or more trailing <slash> characters after all symbolic links have been processed, `rename()` shall fail. Of course, we would like `git mv dir non-existing-dir/` to succeed (and rename the directory "dir" to "non-existing-dir"). Let's be extra careful to remove the trailing slash in that case. This lets t7001-mv.sh pass in Bash on Windows. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-08 10:43:20 -07:00
René Scharfe	1eb47f167d	use strbuf_add_unique_abbrev() for adding short hashes Call strbuf_add_unique_abbrev() to add abbreviated hashes to strbufs instead of taking detours through find_unique_abbrev() and its static buffer. This is shorter and a bit more efficient. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-06 10:33:57 -07:00
René Scharfe	02a8cfa478	merge: use string_list_split() in add_strategies() Call string_list_split() for cutting a space separated list into pieces instead of reimplementing it based on struct strategy. The attr member of struct strategy was not used split_merge_strategies(); it was a pure string operation. Also be nice and clean up once we're done splitting; the old code didn't bother freeing any of the allocated memory. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-05 15:11:06 -07:00
René Scharfe	542aa25d97	use CHILD_PROCESS_INIT to initialize automatic variables Initialize struct child_process variables already when they're defined. That's shorter and saves a function call. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-05 15:10:05 -07:00
René Scharfe	bc57b9c0cc	use strbuf_addstr() instead of strbuf_addf() with "%s" Call strbuf_addstr() for adding a simple string to a strbuf instead of using the heavier strbuf_addf(). This is shorter and documents the intent more clearly. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-05 15:09:25 -07:00
Junio C Hamano	1e9a4856fb	Merge branch 'sb/submodule-clone-retry' An earlier tweak to make "submodule update" retry a failing clone of submodules was buggy and caused segfault, which has been fixed. * sb/submodule-clone-retry: submodule-helper: fix indexing in clone retry error reporting path git-submodule: forward exit code of git-submodule--helper more faithfully	2016-08-04 14:39:17 -07:00
Stefan Beller	4d7bc52b17	submodule update: allow '.' for branch value Gerrit has a "superproject subscription" feature[1], that triggers a commit in a superproject that is subscribed to its submodules. Conceptually this Gerrit feature can be done on the client side with Git via (except for raciness, error handling etc): while [ true ]; do git -C <superproject> submodule update --remote --force git -C <superproject> commit -a -m "Update submodules" git -C <superproject> push done for each branch in the superproject. To ease the configuration in Gerrit a special value of "." has been introduced for the submodule.<name>.branch to mean the same branch as the superproject[2], such that you can create a new branch on both superproject and the submodule and this feature continues to work on that new branch. Now we find projects in the wild with such a .gitmodules file. The .gitmodules used in these Gerrit projects do not conform to Gits understanding of how .gitmodules should look like. This teaches Git to deal gracefully with this syntax as well. The redefinition of "." does no harm to existing projects unaware of this change, as "." is an invalid branch name in Git, so we do not expect such projects to exist. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 16:13:22 -07:00
Stefan Beller	92bbe7ccf1	submodule--helper: add remote-branch helper In a later patch we want to enhance the logic for the branch selection. Rewrite the current logic to be in C, so we can directly use C when we enhance the logic. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 16:11:35 -07:00
Junio C Hamano	a58a8e3f71	Merge branch 'jk/push-progress' "git push" and "git clone" learned to give better progress meters to the end user who is waiting on the terminal. * jk/push-progress: receive-pack: send keepalives during quiet periods receive-pack: turn on connectivity progress receive-pack: relay connectivity errors to sideband receive-pack: turn on index-pack resolving progress index-pack: add flag for showing delta-resolution progress clone: use a real progress meter for connectivity check check_connected: add progress flag check_connected: relay errors to alternate descriptor check_everything_connected: use a struct with named options check_everything_connected: convert to argv_array rev-list: add optional progress reporting check_everything_connected: always pass --quiet to rev-list	2016-08-03 15:10:28 -07:00
Junio C Hamano	d083d420b7	Merge branch 'jk/parse-options-concat' Users of the parse_options_concat() API function need to allocate extra slots in advance and fill them with OPT_END() when they want to decide the set of supported options dynamically, which makes the code error-prone and hard to read. This has been corrected by tweaking the API to allocate and return a new copy of "struct option" array. * jk/parse-options-concat: parse_options: allocate a new array when concatenating	2016-08-03 15:10:25 -07:00
Junio C Hamano	cf27c7996e	Merge branch 'sb/push-options' "git push" learned to accept and pass extra options to the receiving end so that hooks can read and react to them. * sb/push-options: add a test for push options push: accept push options receive-pack: implement advertising and receiving push options push options: {pre,post}-receive hook learns about push options	2016-08-03 15:10:24 -07:00
Junio C Hamano	a35031240b	Merge branch 'os/no-verify-skips-commit-msg-too' "git commit --help" said "--no-verify" is only about skipping the pre-commit hook, and failed to say that it also skipped the commit-msg hook. * os/no-verify-skips-commit-msg-too: commit: describe that --no-verify skips the commit-msg hook in the help text	2016-08-03 15:10:22 -07:00
Eric Sunshine	aa59e14b23	blame: drop strdup of string literal This strdup was added as part of `58dbfa2` (blame: accept multiple -L ranges, 2013-08-06) to be consistent with parse_opt_string_list(), which appends to the same list. But as of `7a7a517` (parse_opt_string_list: stop allocating new strings, 2016-06-13), we should stop using strdup (to match parse_opt_string_list, and for all the reasons described in that commit; namely that it does nothing useful and causes us to leak the memory). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 08:52:46 -07:00
Jeff King	4d9c7e6f45	am: reset cached ident date for each patch When we compute the date to go in author/committer lines of commits, or tagger lines of tags, we get the current date once and then cache it for the rest of the program. This is a good thing in some cases, like "git commit", because it means we do not racily assign different times to the author/committer fields of a single commit object. But as more programs start to make many commits in a single process (e.g., the recently builtin "git am"), it means that you'll get long strings of commits with identical committer timestamps (whereas before, we invoked "git commit" many times and got true timestamps). This patch addresses it by letting callers reset the cached time, which means they'll get a fresh time on their next call to git_committer_info() or git_author_info(). The first caller to do so is "git am", which resets the time for each patch it applies. It would be nice if we could just do this automatically before filling in the ident fields of commit and tag objects. Unfortunately, it's hard to know where a particular logical operation begins and ends. For instance, if commit_tree_extended() were to call reset_ident_date() before getting the committer/author ident, that doesn't quite work; sometimes the author info is passed in to us as a parameter, and it may or may not have come from a previous call to ident_default_date(). So in those cases, we lose the property that the committer and the author timestamp always match. You could similarly put a date-reset at the end of commit_tree_extended(). That actually works in the current code base, but it's fragile. It makes the assumption that after commit_tree_extended() finishes, the caller has no other operations that would logically want to fall into the same timestamp. So instead we provide the tool to easily do the reset, and let the high-level callers use it to annotate their own logical operations. There's no automated test, because it would be inherently racy (it depends on whether the program takes multiple seconds to run). But you can see the effect with something like: # make a fake 100-patch series top=$(git rev-parse HEAD) bottom=$(git rev-list --first-parent -100 HEAD \| tail -n 1) git log --format=email --reverse --first-parent \ --binary -m -p $bottom..$top >patch # now apply it; this presumably takes multiple seconds git checkout --detach $bottom git am <patch # now count the number of distinct committer times; # prior to this patch, there would only be one, but # now we'd typically see several. git log --format=%ct $bottom.. \| sort -u Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Helped-by: Paul Tan <pyokagan@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:49:41 -07:00
Stefan Beller	2de26ae1dc	submodule--helper: fix usage string for relative-path Internally we call the underscore version of relative_path, but externally we present an API with no underscores. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:41:53 -07:00
René Scharfe	02962d3684	use strbuf_addstr() for adding constant strings to a strbuf Replace uses of strbuf_addf() for adding strings with more lightweight strbuf_addstr() calls. In http-push.c it becomes easier to see what's going on without having to verfiy that the definition of PROPFIND_ALL_REQUEST doesn't contain any format specifiers. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 13:42:10 -07:00
Josh Triplett	6bc6b6c0dc	format-patch: format.from gives the default for --from This helps users who would prefer format-patch to default to --from, and makes it easier to change the default in the future. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 13:13:02 -07:00
Johannes Schindelin	548009c0d5	merge_trees(): ensure that the callers release output buffer The recursive merge machinery accumulates its output in an output buffer, to be flushed at the end of merge_recursive(). At this point, we forgot to release the output buffer. When calling merge_trees() (i.e. the non-recursive part of the recursive merge) directly, the output buffer is never flushed because the caller may be merge_recursive() which wants to flush the output itself. For the same reason, merge_trees() cannot release the output buffer: it may still be needed. Forgetting to release the output buffer did not matter much when running git-checkout, or git-merge-recursive, because we exited after the operation anyway. Ever since cherry-pick learned to pick a commit range, however, this memory leak had the potential of becoming a problem. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 11:45:30 -07:00
Jeff King	56dfeb6263	pack-objects: compute local/ignore_pack_keep early In want_object_in_pack(), we can exit early from our loop if neither "local" nor "ignore_pack_keep" are set. If they are, however, we must examine each pack to see if it has the object and is non-local or has a ".keep". It's quite common for there to be no non-local or .keep packs at all, in which case we know ahead of time that looking further will be pointless. We can pre-compute this by simply iterating over the list of packs ahead of time, and dropping the flags if there are no packs that could match. Another similar strategy would be to modify the loop in want_object_in_pack() to notice that we have already found the object once, and that we are looping only to check for "local" and "keep" attributes. If a pack has neither of those, we can skip the call to find_pack_entry_one(), which is the expensive part of the loop. This has two advantages: - it isn't all-or-nothing; we still get some improvement when there's a small number of kept or non-local packs, and a large number of non-kept local packs - it eliminates any possible race where we add new non-local or kept packs after our initial scan. In practice, I don't think this race matters; we already cache the packed_git information, so somebody who adds a new pack or .keep file after we've started will not be noticed at all, unless we happen to need to call reprepare_packed_git() because a lookup fails. In other words, we're already racy, and the race is not a big deal (losing the race means we might include an object in the pack that would not otherwise be, which is an acceptable outcome). However, it also has a disadvantage: we still loop over the rest of the packs for each object to check their flags. This is much less expensive than doing the object lookup, but still not free. So if we wanted to implement that strategy to cover the non-all-or-nothing cases, we could do so in addition to this one (so you get the most speedup in the all-or-nothing case, and the best we can do in the other cases). But given that the all-or-nothing case is likely the most common, it is probably not worth the trouble, and we can revisit this later if evidence points otherwise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:08 -07:00
Jeff King	cd37996795	pack-objects: break out of want_object loop early When pack-objects collects the list of objects to pack (either from stdin, or via its internal rev-list), it filters each one through want_object_in_pack(). This function loops through each existing packfile, looking for the object. When we find it, we mark the pack/offset combo for later use. However, we can't just return "yes, we want it" at that point. If --honor-pack-keep is in effect, we must keep looking to find it in _all_ packs, to make sure none of them has a .keep. Likewise, if --local is in effect, we must make sure it is not present in any non-local pack. As a result, the sum effort of these calls is effectively O(nr_objects * nr_packs). In an ordinary repository, we have only a handful of packs, and this doesn't make a big difference. But in pathological cases, it can slow the counting phase to a crawl. This patch notices the case that we have neither "--local" nor "--honor-pack-keep" in effect and breaks out of the loop early, after finding the first instance. Note that our worst case is still "objects * packs" (i.e., we might find each object in the last pack we look in), but in practice we will often break out early. On an "average" repo, my git.git with 8 packs, this shows a modest 2% (a few dozen milliseconds) improvement in the counting-objects phase of "git pack-objects --all <foo" (hackily instrumented by sticking exit(0) right after list_objects). But in a much more pathological case, it makes a bigger difference. I ran the same command on a real-world example with ~9 million objects across 1300 packs. The counting time dropped from 413s to 45s, an improvement of about 89%. Note that this patch won't do anything by itself for a normal "git gc", as it uses both --honor-pack-keep and --local. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:07 -07:00
Junio C Hamano	c81d283675	Merge branch 'dk/blame-move-no-reason-for-1-line-context' into maint "git blame -M" missed a single line that was moved within the file. * dk/blame-move-no-reason-for-1-line-context: blame: require 0 context lines while finding moved lines with -M	2016-07-28 11:26:01 -07:00
Junio C Hamano	174f9e622f	Merge branch 'js/am-call-theirs-theirs-in-fallback-3way' into maint One part of "git am" had an oddball helper function that called stuff from outside "his" as opposed to calling what we have "ours", which was not gender-neutral and also inconsistent with the rest of the system where outside stuff is usuall called "theirs" in contrast to "ours". * js/am-call-theirs-theirs-in-fallback-3way: am: counteract gender bias	2016-07-28 11:25:59 -07:00
Junio C Hamano	87be95b6f9	Merge branch 'ew/gc-auto-pack-limit-fix' into maint "gc.autoPackLimit" when set to 1 should not trigger a repacking when there is only one pack, but the code counted poorly and did so. * ew/gc-auto-pack-limit-fix: gc: fix off-by-one error with gc.autoPackLimit	2016-07-28 11:25:56 -07:00
Junio C Hamano	c12c71fabb	Merge branch 'nd/ita-cleanup' into maint Git does not know what the contents in the index should be for a path added with "git add -N" yet, so "git grep --cached" should not show hits (or show lack of hits, with -L) in such a path, but that logic does not apply to "git grep", i.e. searching in the working tree files. But we did so by mistake, which has been corrected. * nd/ita-cleanup: grep: fix grepping for "intent to add" files t7810-grep.sh: fix a whitespace inconsistency t7810-grep.sh: fix duplicated test name	2016-07-28 11:25:51 -07:00
Junio C Hamano	4966b58f3e	Merge branch 'js/find-commit-subject-ignore-leading-blanks' into maint A helper function that takes the contents of a commit object and finds its subject line did not ignore leading blank lines, as is commonly done by other codepaths. Make it ignore leading blank lines to match. * js/find-commit-subject-ignore-leading-blanks: reset --hard: skip blank lines when reporting the commit subject sequencer: use skip_blank_lines() to find the commit subject commit -C: skip blank lines at the beginning of the message commit.c: make find_commit_subject() more robust pretty: make the skip_blank_lines() function public	2016-07-28 11:25:50 -07:00
Junio C Hamano	ad2d777604	Merge branch 'nd/pack-ofs-4gb-limit' "git pack-objects" and "git index-pack" mostly operate with off_t when talking about the offset of objects in a packfile, but there were a handful of places that used "unsigned long" to hold that value, leading to an unintended truncation. * nd/pack-ofs-4gb-limit: fsck: use streaming interface for large blobs in pack pack-objects: do not truncate result in-pack object size on 32-bit systems index-pack: correct "offset" type in unpack_entry_data() index-pack: report correct bad object offsets even if they are large index-pack: correct "len" type in unpack_data() sha1_file.c: use type off_t* for object_info->disk_sizep pack-objects: pass length to check_pack_crc() without truncation	2016-07-28 10:34:42 -07:00
Junio C Hamano	2c608e0f7c	Merge branch 'nd/worktree-lock' "git worktree prune" protected worktrees that are marked as "locked" by creating a file in a known location. "git worktree" command learned a dedicated command pair to create and remove such a file, so that the users do not have to do this with editor. * nd/worktree-lock: worktree.c: find_worktree() search by path suffix worktree: add "unlock" command worktree: add "lock" command worktree.c: add is_worktree_locked() worktree.c: add is_main_worktree() worktree.c: add find_worktree()	2016-07-28 10:34:42 -07:00
Vasco Almeida	996ee6d27a	i18n: notes: mark comment for translation Mark comment displayed when editing a note for translation. Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-28 09:09:18 -07:00
Jeff King	642833db78	date: add "unix" format We already have "--date=raw", which is a Unix epoch timestamp plus a contextual timezone (either the author's or the local). But one may not care about the timezone and just want the epoch timestamp by itself. It's not hard to parse the two apart, but if you are using a pretty-print format, you may want git to show the "finished" form that the user will see. We can accomodate this by adding a new date format, "unix", which is basically "raw" without the timezone. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-27 14:15:51 -07:00
Orgad Shaneh	def480fe99	commit: describe that --no-verify skips the commit-msg hook in the help text This brings the short help in line with the documentation. Signed-off-by: Orgad Shaneh <orgads@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-26 13:44:55 -07:00
Johannes Schindelin	3f338f43b0	am -3: use merge_recursive() directly again Last October, we had to change this code to run `git merge-recursive` in a child process: git-am wants to print some helpful advice when the merge failed, but the code in question was not prepared to return, it die()d instead. We are finally at a point when the code is prepared to return errors, and can avoid the child process again. This reverts commit `c63d4b2` (am -3: do not let failed merge from completing the error codepath, 2015-10-09), with the necessary changes to adjust for the fact that Git's source code changed in the meantime (such as: using OIDs instead of hashes in the recursive merge, and a removed gender bias). Note: the code now calls merge_recursive_generic() again. Unlike merge_trees() and merge_recursive(), this function returns 0 upon success, as most of Git's functions. Therefore, the error value -1 naturally is handled correctly, and we do not have to take care of it specifically. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-26 11:13:44 -07:00
Johannes Schindelin	f241ff0d0a	prepare the builtins for a libified merge_recursive() Previously, callers of merge_trees() or merge_recursive() expected that code to die() with an error message. This used to be okay because we called those commands from scripts, and had a chance to print out a message in case the command failed fatally (read: with exit code 128). As scripting incurs its own set of problems (portability, speed, idiosyncrasies of different shells, limited data structures leading to inefficient code), we are converting more and more of these scripts into builtins, using library functions directly. We already tried to use merge_recursive() directly in the builtin git-am, for example. Unfortunately, we had to roll it back temporarily because some of the code in merge-recursive.c still deemed it okay to call die(), when the builtin am code really wanted to print out a useful advice after the merge failed fatally. In the next commits, we want to fix that. The code touched by this commit expected merge_trees() to die() with some useful message when there is an error condition, but merge_trees() is going to be improved by converting all die() calls to return error() instead (i.e. return value -1 after printing out the message as before), so that the caller can react more flexibly. This is a step to prepare for the version of merge_trees() that no longer dies, even if we just imitate the previous behavior by calling exit(128): this is what callers of e.g. `git merge` have come to expect. Note that the callers of the sequencer (revert and cherry-pick) already fail fast even for the return value -1; The only difference is that they now get a chance to say "<command> failed". A caller of merge_trees() might want handle error messages themselves (or even suppress them). As this patch is already complex enough, we leave that change for a later patch. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-26 11:13:44 -07:00
Johannes Schindelin	ef1177d18e	die("bug"): report bugs consistently The vast majority of error messages in Git's source code which report a bug use the convention to prefix the message with "BUG:". As part of cleaning up merge-recursive to stop die()ing except in case of detected bugs, let's just make the remainder of the bug reports consistent with the de facto rule. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-26 11:13:44 -07:00
Junio C Hamano	37e9c7f5e1	Merge branch 'mh/blame-worktree' "git blame file" allowed the lineage of lines in the uncommitted, unadded contents of "file" to be inspected, but it refused when "file" did not appear in the current commit. When "file" was created by renaming an existing file (but the change has not been committed), this restriction was unnecessarily tight. * mh/blame-worktree: t/t8003-blame-corner-cases.sh: Use here documents blame: allow to blame paths freshly added to the index	2016-07-25 14:13:45 -07:00
Junio C Hamano	9db3979784	Merge branch 'js/fsck-name-object' When "git fsck" reports a broken link (e.g. a tree object contains a blob that does not exist), both containing object and the object that is referred to were reported with their 40-hex object names. The command learned the "--name-objects" option to show the path to the containing object from existing refs (e.g. "HEAD~24^2:file.txt"). * js/fsck-name-object: fsck: optionally show more helpful info for broken links fsck: give the error function a chance to see the fsck_options fsck_walk(): optionally name objects on the go fsck: refactor how to describe objects	2016-07-25 14:13:44 -07:00
Junio C Hamano	03f25e85d9	Merge branch 'rs/rm-strbuf-optim' The use of strbuf in "git rm" to build filename to remove was a bit suboptimal, which has been fixed. * rs/rm-strbuf-optim: rm: reuse strbuf for all remove_dir_recursively() calls	2016-07-25 14:13:36 -07:00
Junio C Hamano	87492cb24d	Merge branch 'mh/ref-iterators' The API to iterate over all the refs (i.e. for_each_ref(), etc.) has been revamped. * mh/ref-iterators: for_each_reflog(): reimplement using iterators dir_iterator: new API for iterating over a directory tree for_each_reflog(): don't abort for bad references do_for_each_ref(): reimplement using reference iteration refs: introduce an iterator interface ref_resolves_to_object(): new function entry_resolves_to_object(): rename function from ref_resolves_to_object() get_ref_cache(): only create an instance if there is a submodule remote rm: handle symbolic refs correctly delete_refs(): add a flags argument refs: use name "prefix" consistently do_for_each_ref(): move docstring to the header file refs: remove unnecessary "extern" keywords	2016-07-25 14:13:33 -07:00
Junio C Hamano	6b34ce90a7	Merge branch 'mh/split-under-lock' Further preparatory work on the refs API before the pluggable backend series can land. * mh/split-under-lock: (33 commits) lock_ref_sha1_basic(): only handle REF_NODEREF mode commit_ref_update(): remove the flags parameter lock_ref_for_update(): don't resolve symrefs lock_ref_for_update(): don't re-read non-symbolic references refs: resolve symbolic refs first ref_transaction_update(): check refname_is_safe() at a minimum unlock_ref(): move definition higher in the file lock_ref_for_update(): new function add_update(): initialize the whole ref_update verify_refname_available(): adjust constness in declaration refs: don't dereference on rename refs: allow log-only updates delete_branches(): use resolve_refdup() ref_transaction_commit(): correctly report close_ref() failure ref_transaction_create(): disallow recursive pruning refs: make error messages more consistent lock_ref_sha1_basic(): remove unneeded local variable read_raw_ref(): move docstring to header file read_raw_ref(): improve docstring read_raw_ref(): rename symref argument to referent ...	2016-07-25 14:13:32 -07:00
Johannes Sixt	eb09121b74	submodule-helper: fix indexing in clone retry error reporting path 'git submodule--helper update-clone' has logic to retry failed clones a second time. For this purpose, there is a list of submodules to clone, and a second list that is filled with the submodules to retry. Within these lists, the submodules are identified by an index as if both lists were just appended. This works nicely except when the second clone attempt fails as well. To report an error, the identifying index must be adjusted by an offset so that it can be used as an index into the second list. However, the calculation uses the logical total length of the lists so that the result always points one past the end of the second list. Pick the correct index. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Acked-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-22 13:43:53 -07:00
Jeff King	83558686ce	receive-pack: send keepalives during quiet periods After a client has sent us the complete pack, we may spend some time processing the data and running hooks. If the client asked us to be quiet, receive-pack won't send any progress data during the index-pack or connectivity-check steps. And hooks may or may not produce their own progress output. In these cases, the network connection is totally silent from both ends. Git itself doesn't care about this (it will wait forever), but other parts of the system (e.g., firewalls, load-balancers, etc) might hang up the connection. So we'd like to send some sort of keepalive to let the network and the client side know that we're still alive and processing. We can use the same trick we did in `05e9515` (upload-pack: send keepalive packets during pack computation, 2013-09-08). Namely, we will send an empty sideband data packet every `N` seconds that we do not relay any stderr data over the sideband channel. As with `05e9515`, this means that we won't bother sending keepalives when there's actual progress data, but will kick in when it has been disabled (or if there is a lull in the progress data). The concept is simple, but the details are subtle enough that they need discussing here. Before the client sends us the pack, we don't want to do any keepalives. We'll have sent our ref advertisement, and we're waiting for them to send us the pack (and tell us that they support sidebands at all). While we're receiving the pack from the client (or waiting for it to start), there's no need for keepalives; it's up to them to keep the connection active by sending data. Moreover, it would be wrong for us to do so. When we are the server in the smart-http protocol, we must treat our connection as half-duplex. So any keepalives we send while receiving the pack would potentially be buffered by the webserver. Not only does this make them useless (since they would not be delivered in a timely manner), but it could actually cause a deadlock if we fill up the buffer with keepalives. (It wouldn't be wrong to send keepalives in this phase for a full-duplex connection like ssh; it's simply pointless, as it is the client's responsibility to speak). As soon as we've gotten all of the pack data, then the client is waiting for us to speak, and we should start keepalives immediately. From here until the end of the connection, we send one any time we are not otherwise sending data. But there's a catch. Receive-pack doesn't know the moment we've gotten all the data. It passes the descriptor to index-pack, who reads all of the data, and then starts resolving the deltas. We have to communicate that back. To make this work, we instruct the sideband muxer to enable keepalives in three phases: 1. In the beginning, not at all. 2. While reading from index-pack, wait for a signal indicating end-of-input, and then start them. 3. Afterwards, always. The signal from index-pack in phase 2 has to come over the stderr channel which the muxer is reading. We can't use an extra pipe because the portable run-command interface only gives us stderr and stdout. Stdout is already used to pass the .keep filename back to receive-pack. We could also send a signal there, but then we would find out about it in the main thread. And the keepalive needs to be done by the async muxer thread (since it's the one writing sideband data back to the client). And we can't reliably signal the async thread from the main thread, because the async code sometimes uses threads and sometimes uses forked processes. Therefore the signal must come over the stderr channel, where it may be interspersed with other random human-readable messages from index-pack. This patch makes the signal a single NUL byte. This is easy to parse, should not appear in any normal stderr output, and we don't have to worry about any timing issues (like seeing half the signal bytes in one read(), and half in a subsequent one). This is a bit ugly, but it's simple to code and should work reliably. Another option would be to stop using an async thread for muxing entirely, and just poll() both stderr and stdout of index-pack from the main thread. This would work for index-pack (because we aren't doing anything useful in the main thread while it runs anyway). But it would make the connectivity check and the hook muxers much more complicated, as they need to simultaneously feed the sub-programs while reading their stderr. The index-pack phase is the only one that needs this signaling, so it could simply behave differently than the other two. That would mean having two separate implementations of copy_to_sideband (and the keepalive code), though. And it still doesn't get rid of the signaling; it just means we can write a nicer message like "END_OF_INPUT" or something on stdout, since we don't have to worry about separating it from the stderr cruft. One final note: this signaling trick is only done with index-pack, not with unpack-objects. There's no point in doing it for the latter, because by definition it only kicks in for a small number of objects, where keepalives are not as useful (and this conveniently lets us avoid duplicating the implementation). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:11:11 -07:00
Jeff King	6b4cd2f827	receive-pack: turn on connectivity progress When we receive a large push, the server side of the connection may spend a lot of time (30s or more for a full push of linux.git) walking the object graph without producing any output. Let's give the user some indication that we're actually working. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:11:11 -07:00
Jeff King	d415092ac4	receive-pack: relay connectivity errors to sideband If the connectivity check encounters a problem when receiving a push, the error output goes to receive-pack's stderr, whose destination depends on the protocol used (ssh tends to send it to the user, though without a "remote" prefix; http will generally eat it in the server's error log). The information should consistently go back to the user, as there is a reasonable chance their client is buggy and generating a bad pack. We can do so by muxing it over the sideband as we do with other sub-process stderr. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:11:10 -07:00
Jeff King	d06303bb9a	receive-pack: turn on index-pack resolving progress When we receive a large push, the server side may have to spend a lot of CPU processing the incoming packfile. During the "receiving" phase, we are typically network bound, and the client is writing its own progress to the user. But during the delta resolution phase, we may spend minutes (e.g., for a full push of linux.git) without making any indication to the user that the connection has not hung. Let's ask index-pack to produce progress output for this phase (unless the client asked us to be quiet, of course). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:11:10 -07:00
Jeff King	e376f17fd1	index-pack: add flag for showing delta-resolution progress The index-pack command has two progress meters: one for "receiving objects", and one for "resolving deltas". You get neither by default, or both with "-v". But for a push through receive-pack, we would want only the "resolving deltas" phase, _not_ the "receiving objects" progress. There are two reasons for this. One is simply that existing clients are already printing "writing objects" progress at the same time. Arguably "receiving" from the far end is more useful, because it tells you what has actually gotten there, as opposed to what might be stuck in a buffer somewhere between the client and server. But that would require a protocol extension to tell clients not to print their progress. Possible, but complexity for little gain. The second reason is much more important. In a full-duplex connection like git-over-ssh, we can print progress while the pack is incoming, and it will immediately get to the client. But for a half-duplex connection like git-over-http, we should not say anything until we have received the full request. Anything we write is subject to being stuck in a buffer by the webserver. Worse, we can end up in a deadlock if that buffer fills up. So our best bet is to avoid writing anything that isn't a small fixed size until we've received the full pack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:11:10 -07:00
Jeff King	38e590ea12	clone: use a real progress meter for connectivity check Because the initial connectivity check for a cloned repository can be slow, `0781aa4` (clone: let the user know when check_everything_connected is run, 2013-05-03) added a "fake" progress meter; we simply say "Checking connectivity" when it starts, and "done" at the end, with nothing between. Since check_connected() now knows how to do a real progress meter, we can drop our fake one and use that one instead. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:11:09 -07:00
Jeff King	7043c7071c	check_everything_connected: use a struct with named options The number of variants of check_everything_connected has grown over the years, so that the "real" function takes several possibly-zero, possibly-NULL arguments. We hid the complexity behind some wrapper functions, but this doesn't scale well when we want to add new options. If we add more wrapper variants to handle the new options, then we can get a combinatorial explosion when those options might be used together (right now nobody wants to use both "shallow" and "transport" together, so we get by with just a few wrappers). If instead we add new parameters to each function, each of which can have a default value, then callers who want the defaults end up with confusing invocations like: check_everything_connected(fn, 0, data, -1, 0, NULL); where it is unclear which parameter is which (and every caller needs updated when we add new options). Instead, let's add a struct to hold all of the optional parameters. This is a little more verbose for the callers (who have to declare the struct and fill it in), but it makes their code much easier to follow, because every option is named as it is set (and unused options do not have to be mentioned at all). Note that we could also stick the iteration function and its callback data into the option struct, too. But since those are required for each call, by avoiding doing so, we can let very simple callers just pass "NULL" for the options and not worry about the struct at all. While we're touching each site, let's also rename the function to check_connected(). The existing name was quite long, and not all of the wrappers even used the full name. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:10:53 -07:00
Jeff King	434ea3cdad	rev-list: add optional progress reporting It's easy to ask rev-list to do a traversal that may takes many seconds (e.g., by calling "--objects --all"). In theory you can monitor its progress by the output you get to stdout, but this isn't always easy. Some operations, like "--count", don't make any output until the end. And some callers, like check_everything_connected(), are using it just for the error-checking of the traversal, and throw away stdout entirely. This patch adds a "--progress" option which can be used to give some eye-candy for a user waiting for a long traversal. This is just a rev-list option and not a regular traversal option, because it needs cooperation from the callbacks in builtin/rev-list.c to do the actual count. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-20 12:10:44 -07:00
Junio C Hamano	3d55eea805	Merge branch 'js/am-call-theirs-theirs-in-fallback-3way' One part of "git am" had an oddball helper function that called stuff from outside "his" as opposed to calling what we have "ours", which was not gender-neutral and also inconsistent with the rest of the system where outside stuff is usuall called "theirs" in contrast to "ours". * js/am-call-theirs-theirs-in-fallback-3way: am: counteract gender bias	2016-07-19 13:22:23 -07:00
Junio C Hamano	2b6456b808	Merge branch 'jk/write-file' General code clean-up around a helper function to write a single-liner to a file. * jk/write-file: branch: use write_file_buf instead of write_file use write_file_buf where applicable write_file: add format attribute write_file: add pointer+len variant write_file: use xopen write_file: drop "gently" form branch: use non-gentle write_file for branch description am: ignore return value of write_file() config: fix bogus fd check when setting up default config	2016-07-19 13:22:23 -07:00
Junio C Hamano	96e08010ee	Merge branch 'jk/printf-format' Code clean-up to avoid using a variable string that compilers may feel untrustable as printf-style format given to write_file() helper function. * jk/printf-format: commit.c: remove print_commit_list() avoid using sha1_to_hex output as printf format walker: let walker_say take arbitrary formats	2016-07-19 13:22:22 -07:00
Junio C Hamano	566fdaf611	Merge branch 'nd/fetch-ref-summary' Improve the look of the way "git fetch" reports what happened to each ref that was fetched. * nd/fetch-ref-summary: fetch: reduce duplicate in ref update status lines with placeholder fetch: align all "remote -> local" output fetch: change flag code for displaying tag update and deleted ref fetch: refactor ref update status formatting code git-fetch.txt: document fetch output	2016-07-19 13:22:21 -07:00
Junio C Hamano	a63d31b4d3	Merge branch 'bc/cocci' Conversion from unsigned char sha1[20] to struct object_id continues. * bc/cocci: diff: convert prep_temp_blob() to struct object_id merge-recursive: convert merge_recursive_generic() to object_id merge-recursive: convert leaf functions to use struct object_id merge-recursive: convert struct merge_file_info to object_id merge-recursive: convert struct stage_data to use object_id diff: rename struct diff_filespec's sha1_valid member diff: convert struct diff_filespec to struct object_id coccinelle: apply object_id Coccinelle transformations coccinelle: convert hashcpy() with null_sha1 to hashclr() contrib/coccinelle: add basic Coccinelle transforms hex: add oid_to_hex_r()	2016-07-19 13:22:16 -07:00
Junio C Hamano	63641fb071	Merge branch 'js/log-to-diffopt-file' The commands in the "log/diff" family have had an FILE* pointer in the data structure they pass around for a long time, but some codepaths used to always write to the standard output. As a preparatory step to make "git format-patch" available to the internal callers, these codepaths have been updated to consistently write into that FILE* instead. * js/log-to-diffopt-file: mingw: fix the shortlog --output=<file> test diff: do not color output when --color=auto and --output=<file> is given t4211: ensure that log respects --output=<file> shortlog: respect the --output=<file> setting format-patch: use stdout directly format-patch: avoid freopen() format-patch: explicitly switch off color when writing to files shortlog: support outputting to streams other than stdout graph: respect the diffopt.file setting line-log: respect diffopt's configured output file stream log-tree: respect diffopt's configured output file stream log: prepare log/log-tree to reuse the diffopt.close_file attribute	2016-07-19 13:22:15 -07:00
Junio C Hamano	7418a6b1a0	Merge branch 'dk/blame-move-no-reason-for-1-line-context' "git blame -M" missed a single line that was moved within the file. * dk/blame-move-no-reason-for-1-line-context: blame: require 0 context lines while finding moved lines with -M	2016-07-19 13:22:13 -07:00
Johannes Schindelin	90cf590f53	fsck: optionally show more helpful info for broken links When reporting broken links between commits/trees/blobs, it would be quite helpful at times if the user would be told how the object is supposed to be reachable. With the new --name-objects option, git-fsck will try to do exactly that: name the objects in a way that shows how they are reachable. For example, when some reflog got corrupted and a blob is missing that should not be, the user might want to remove the corresponding reflog entry. This option helps them find that entry: `git fsck` will now report something like this: broken link from tree b5eb6ff... (refs/stash@{<date>}~37:) to blob ec5cf80... Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 15:15:59 -07:00
Mike Hommey	3b75ee9327	blame: allow to blame paths freshly added to the index When blaming files, changes in the work tree are taken into account and displayed as being "Not Committed Yet". However, when blaming a file that is not known to the current HEAD, git blame fails with `no such path 'foo' in HEAD`, even when the file was git add'ed. Allowing such a blame is useful when the new file added to the index (not yet committed) was created by renaming an existing file. It also is useful when the new file was created from pieces already in HEAD, moved or copied from other files and blaming with copy detection (i.e. "-C"). Signed-off-by: Mike Hommey <mh@glandium.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 14:33:38 -07:00
Johannes Schindelin	1cd772cc41	fsck: give the error function a chance to see the fsck_options We will need this in the next commit, where fsck will be taught to optionally name the objects when reporting issues about them. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Johannes Schindelin	993a21b0a0	fsck: refactor how to describe objects In many places, we refer to objects via their SHA-1s. Let's abstract that into a function. For the moment, it does nothing else than what we did previously: print out the 40-digit hex string. But that will change over the course of the next patches. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-18 11:35:00 -07:00
Stefan Beller	f6a4e61fbb	push: accept push options This implements everything that is required on the client side to make use of push options from the porcelain push command. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-14 15:50:41 -07:00
Stefan Beller	c714e45f87	receive-pack: implement advertising and receiving push options The pre/post receive hook may be interested in more information from the user. This information can be transmitted when both client and server support the "push-options" capability, which when used is a phase directly after update commands ended by a flush pkt. Similar to the atomic option, the server capability can be disabled via the `receive.advertisePushOptions` config variable. While documenting this, fix a nit in the `receive.advertiseAtomic` wording. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-14 15:50:40 -07:00
Stefan Beller	77a9745d19	push options: {pre,post}-receive hook learns about push options The environment variable GIT_PUSH_OPTION_COUNT is set to the number of push options sent, and GIT_PUSH_OPTION_{0,1,..} is set to the transmitted option. The code is not executed as the push options are set to NULL, nor is the new capability advertised. There was some discussion back and forth how to present these push options to the user as there are some ways to do it: Keep all options in one environment variable ============================================ + easiest way to implement in Git - This would make things hard to parse correctly in the hook. Put the options in files instead, filenames are in GIT_PUSH_OPTION_FILES ====================================== + After a discussion about environment variables and shells, we may not want to put user data into an environment variable (see [1] for example). + We could transmit binaries, i.e. we're not bound to C strings as we are when using environment variables to the user. + Maybe easier to parse than constructing environment variable names GIT_PUSH_OPTION_{0,1,..} yourself - cleanup of the temporary files is hard to do reliably - we have race conditions with multiple clients pushing, hence we'd need to use mkstemp. That's not too bad, but still. Use environment variables, but restrict to key/value pairs ========================================================== (When the user pushes a push option `foo=bar`, we'd GIT_PUSH_OPTION_foo=bar) + very easy to parse for a simple model of push options - it's not sufficient for more elaborate models, e.g. it doesn't allow doubles (e.g. cc=reviewer@email) Present the options in different environment variables ====================================================== (This is implemented) * harder to parse as a user, but we have a sample hook for that. - doesn't allow binary files + allows the same option twice, i.e. is not restrictive about options, except for binary files. + doesn't clutter a remote directory with (possibly stale) temporary files As we first want to focus on getting simple strings to work reliably, we go with the last option for now. If we want to do transmission of binaries later, we can just attach a 'side-channel', e.g. "any push option that contains a '\0' is put into a file instead of the environment variable and we'd have new GIT_PUSH_OPTION_FILES, GIT_PUSH_OPTION_FILENAME_{0,1,..} environment variables". [1] 'Shellshock' https://lwn.net/Articles/614218/ Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-14 15:50:17 -07:00
Junio C Hamano	42bd66816b	Merge branch 'nd/ita-cleanup' Git does not know what the contents in the index should be for a path added with "git add -N" yet, so "git grep --cached" should not show hits (or show lack of hits, with -L) in such a path, but that logic does not apply to "git grep", i.e. searching in the working tree files. But we did so by mistake, which has been corrected. * nd/ita-cleanup: grep: fix grepping for "intent to add" files t7810-grep.sh: fix a whitespace inconsistency t7810-grep.sh: fix duplicated test name	2016-07-13 11:24:18 -07:00
Junio C Hamano	97865e83c7	Merge branch 'ew/gc-auto-pack-limit-fix' "gc.autoPackLimit" when set to 1 should not trigger a repacking when there is only one pack, but the code counted poorly and did so. * ew/gc-auto-pack-limit-fix: gc: fix off-by-one error with gc.autoPackLimit	2016-07-13 11:24:12 -07:00
Junio C Hamano	2703572b3a	Merge branch 'va/i18n-even-more' More markings of messages for i18n, with updates to various tests to pass GETTEXT_POISON tests. One patch from the original submission dropped due to conflicts with jk/upload-pack-hook, which is still in flux. * va/i18n-even-more: (38 commits) t5541: become resilient to GETTEXT_POISON i18n: branch: mark comment when editing branch description for translation i18n: unmark die messages for translation i18n: submodule: escape shell variables inside eval_gettext i18n: submodule: join strings marked for translation i18n: init-db: join message pieces i18n: remote: allow translations to reorder message i18n: remote: mark URL fallback text for translation i18n: standardise messages i18n: sequencer: add period to error message i18n: merge: change command option help to lowercase i18n: merge: mark messages for translation i18n: notes: mark options for translation i18n: notes: mark strings for translation i18n: transport-helper.c: change N_() call to _() i18n: bisect: mark strings for translation t5523: use test_i18ngrep for negation t4153: fix negated test_i18ngrep call t9003: become resilient to GETTEXT_POISON tests: unpack-trees: update to use test_i18n* functions ...	2016-07-13 11:24:10 -07:00
Nguyễn Thái Ngọc Duy	ec9d224903	fsck: use streaming interface for large blobs in pack For blobs, we want to make sure the on-disk data is not corrupted (i.e. can be inflated and produce the expected SHA-1). Blob content is opaque, there's nothing else inside to check for. For really large blobs, we may want to avoid unpacking the entire blob in memory, just to check whether it produces the same SHA-1. On 32-bit systems, we may not have enough virtual address space for such memory allocation. And even on 64-bit where it's not a problem, allocating a lot more memory could result in kicking other parts of systems to swap file, generating lots of I/O and slowing everything down. For this particular operation, not unpacking the blob and letting check_sha1_signature, which supports streaming interface, do the job is sufficient. check_sha1_signature() is not shown in the diff, unfortunately. But if will be called when "data_valid && !data" is false. We will call the callback function "fn" with NULL as "data". The only callback of this function is fsck_obj_buffer(), which does not touch "data" at all if it's a blob. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:15:29 -07:00
Nguyễn Thái Ngọc Duy	af92a645d3	pack-objects: do not truncate result in-pack object size on 32-bit systems A typical diff will not show what's going on and you need to see full functions. The core code is like this, at the end of of write_one() e->idx.offset = offset; size = write_object(f, e, offset); if (!size) { e->idx.offset = recursing; return WRITE_ONE_BREAK; } written_list[nr_written++] = &e->idx; /* make sure off_t is sufficiently large not to wrap / if (signed_add_overflows(offset, size)) die("pack too large for current definition of off_t"); *offset += size; Here we can see that the in-pack object size is returned by write_object (or indirectly by write_reuse_object). And it's used to calculate object offsets, which end up in the pack index file, generated at the end. If "size" overflows (on 32-bit sytems, unsigned long is 32-bit while off_t can be 64-bit), we got wrong offsets and produce incorrect .idx file, which may make it look like the .pack file is corrupted. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:15:17 -07:00
Nguyễn Thái Ngọc Duy	da49a7da3a	index-pack: correct "offset" type in unpack_entry_data() unpack_entry_data() receives an off_t value from unpack_raw_entry(), which could be larger than unsigned long on 32-bit systems with large file support. Correct the type so truncation does not happen. This only affects bad object reporting though. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:15:08 -07:00
Nguyễn Thái Ngọc Duy	fd3e67474c	index-pack: report correct bad object offsets even if they are large Use the right type for offsets in this case, off_t, which makes a difference on 32-bit systems with large file support, and change formatting code accordingly. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:14:47 -07:00
Nguyễn Thái Ngọc Duy	7171a0b0cf	index-pack: correct "len" type in unpack_data() On 32-bit systems with large file support, one entry could be larger than 4GB and overflow "len". Correct it so we can unpack a full entry. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:14:38 -07:00
Nguyễn Thái Ngọc Duy	166df26f28	sha1_file.c: use type off_t* for object_info->disk_sizep This field, filled by sha1_object_info() contains the on-disk size of an object, which could go over 4GB limit of unsigned long on 32-bit systems. Use off_t for it instead and update all callers. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-13 09:14:20 -07:00
René Scharfe	deb8e15a19	rm: reuse strbuf for all remove_dir_recursively() calls Don't throw the memory allocated for remove_dir_recursively() away after a single call, use it for the other entries as well instead. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-12 15:09:21 -07:00

1 2 3 4 5 ...

4985 Commits