git-commit-vandalism

Author	SHA1	Message	Date
Stefan Beller	92bbe7ccf1	submodule--helper: add remote-branch helper In a later patch we want to enhance the logic for the branch selection. Rewrite the current logic to be in C, so we can directly use C when we enhance the logic. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 16:11:35 -07:00
Junio C Hamano	80460f513e	Ninth batch of topics for 2.10 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 15:13:16 -07:00
Junio C Hamano	767da54bf8	Merge branch 'jk/diff-do-not-reuse-wtf-needs-cleaning' There is an optimization used in "git diff $treeA $treeB" to borrow an already checked-out copy in the working tree when it is known to be the same as the blob being compared, expecting that open/mmap of such a file is faster than reading it from the object store, which involves inflating and applying delta. This however kicked in even when the checked-out copy needs to go through the convert-to-git conversion (including the clean filter), which defeats the whole point of the optimization. The optimization has been disabled when the conversion is necessary. * jk/diff-do-not-reuse-wtf-needs-cleaning: diff: do not reuse worktree files that need "clean" conversion	2016-08-03 15:10:29 -07:00
Junio C Hamano	f4fa8a9b18	Merge branch 'rs/submodule-config-code-cleanup' Code cleanup. * rs/submodule-config-code-cleanup: submodule-config: fix test binary crashing when no arguments given submodule-config: combine early return code into one goto submodule-config: passing name reference for .gitmodule blobs submodule-config: use explicit empty string instead of strbuf in config_from()	2016-08-03 15:10:28 -07:00
Junio C Hamano	a58a8e3f71	Merge branch 'jk/push-progress' "git push" and "git clone" learned to give better progress meters to the end user who is waiting on the terminal. * jk/push-progress: receive-pack: send keepalives during quiet periods receive-pack: turn on connectivity progress receive-pack: relay connectivity errors to sideband receive-pack: turn on index-pack resolving progress index-pack: add flag for showing delta-resolution progress clone: use a real progress meter for connectivity check check_connected: add progress flag check_connected: relay errors to alternate descriptor check_everything_connected: use a struct with named options check_everything_connected: convert to argv_array rev-list: add optional progress reporting check_everything_connected: always pass --quiet to rev-list	2016-08-03 15:10:28 -07:00
Junio C Hamano	67b3a5d4c0	Merge branch 'jt/fetch-large-handshake-window-on-http' "git fetch" exchanges batched have/ack messages between the sender and the receiver, initially doubling every time and then falling back to enlarge the window size linearly. The "smart http" transport, being an half-duplex protocol, outgrows the preset limit too quickly and becomes inefficient when interacting with a large repository. The internal mechanism learned to grow the window size more aggressively when working with the "smart http" transport. * jt/fetch-large-handshake-window-on-http: fetch-pack: grow stateless RPC windows exponentially	2016-08-03 15:10:27 -07:00
Junio C Hamano	5569c01be8	Merge branch 'jk/git-jump' "git jump" script (in contrib/) has been updated a bit. * jk/git-jump: contrib/git-jump: fix typo in README contrib/git-jump: add whitespace-checking mode contrib/git-jump: fix greedy regex when matching hunks	2016-08-03 15:10:27 -07:00
Junio C Hamano	5a2f4d3eef	Merge branch 'mm/status-suggest-merge-abort' "git status" learned to suggest "merge --abort" during a conflicted merge, just like it already suggests "rebase --abort" during a conflicted rebase. * mm/status-suggest-merge-abort: status: suggest 'git merge --abort' when appropriate	2016-08-03 15:10:26 -07:00
Junio C Hamano	d083d420b7	Merge branch 'jk/parse-options-concat' Users of the parse_options_concat() API function need to allocate extra slots in advance and fill them with OPT_END() when they want to decide the set of supported options dynamically, which makes the code error-prone and hard to read. This has been corrected by tweaking the API to allocate and return a new copy of "struct option" array. * jk/parse-options-concat: parse_options: allocate a new array when concatenating	2016-08-03 15:10:25 -07:00
Junio C Hamano	cf27c7996e	Merge branch 'sb/push-options' "git push" learned to accept and pass extra options to the receiving end so that hooks can read and react to them. * sb/push-options: add a test for push options push: accept push options receive-pack: implement advertising and receiving push options push options: {pre,post}-receive hook learns about push options	2016-08-03 15:10:24 -07:00
Junio C Hamano	4067a45438	Merge branch 'ew/http-walker' Dumb http transport on the client side has been optimized. * ew/http-walker: list: avoid incompatibility with *BSD sys/queue.h http-walker: reduce O(n) ops with doubly-linked list http: avoid disconnecting on 404s for loose objects http-walker: remove unused parameter from fetch_object	2016-08-03 15:10:24 -07:00
Junio C Hamano	6395499a68	Merge branch 'pm/build-persistent-https-with-recent-go' The build procedure for "git persistent-https" helper (in contrib/) has been updated so that it can be built with more recent versions of Go. * pm/build-persistent-https-with-recent-go: contrib/persistent-https: use Git version for build label contrib/persistent-https: update ldflags syntax for Go 1.7+	2016-08-03 15:10:23 -07:00
Junio C Hamano	c0728edfb6	Merge branch 'da/subtree-2.9-regression' "git merge" in Git v2.9 was taught to forbid merging an unrelated lines of history by default, but that is exactly the kind of thing the "--rejoin" mode of "git subtree" (in contrib/) wants to do. "git subtree" has been taught to use the "--allow-unrelated-histories" option to override the default. * da/subtree-2.9-regression: subtree: fix "git subtree split --rejoin" t7900-subtree.sh: fix quoting and broken && chains	2016-08-03 15:10:22 -07:00
Junio C Hamano	a35031240b	Merge branch 'os/no-verify-skips-commit-msg-too' "git commit --help" said "--no-verify" is only about skipping the pre-commit hook, and failed to say that it also skipped the commit-msg hook. * os/no-verify-skips-commit-msg-too: commit: describe that --no-verify skips the commit-msg hook in the help text	2016-08-03 15:10:22 -07:00
Joey Hess	52db4b0467	clarify %f documentation It's natural to expect %f to be an actual file on disk; help avoid that mistake. Signed-off-by: Joey Hess <joeyh@joeyh.name> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 10:10:35 -07:00
Johannes Schindelin	04e0869876	import-tars: support hard links Previously, we simply treated hard links as if they were plain files with size 0, ignoring the link type "1" and hence the link target. What we should do instead, of course, is to use the link target to get at the import mark for the contents, even if we cannot recreate the hard link per se, as Git has no concept of hard links. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 09:46:11 -07:00
Stefan Beller	f6fb30a01d	gitmodules: document shallow recommendation Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 08:53:52 -07:00
Eric Sunshine	aa59e14b23	blame: drop strdup of string literal This strdup was added as part of `58dbfa2` (blame: accept multiple -L ranges, 2013-08-06) to be consistent with parse_opt_string_list(), which appends to the same list. But as of `7a7a517` (parse_opt_string_list: stop allocating new strings, 2016-06-13), we should stop using strdup (to match parse_opt_string_list, and for all the reasons described in that commit; namely that it does nothing useful and causes us to leak the memory). Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 08:52:46 -07:00
Johannes Sixt	54956df9bc	t4130: work around Windows limitation On Windows, it is already pretty expensive to try to recreate the stat() data that Git assumes is cheap to obtain. To make things halfway decent in performance, we even have to skip emulating the inode and to determine the number of hard links. This is not a huge problem, usually, as either the size or the mtime or the ctime are tell-tale enough to say when a file has changed, and even if not, those changes are typically made after the index file was written, triggering a rehashing of the files' contents. The t4130-apply-criss-cross-rename test case, however, requires the inode to determine that files of equal size were swapped, as renaming files does not update their mtime. Every once in a while, t4130 fails on Windows because of this missing piece. Equal file sizes are not crucial for the test cases, however. Hence, generate files with different sizes so that there is some property that the swapped files can be discovered reliably even on Windows. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-03 08:47:38 -07:00
Junio C Hamano	54ba5a1a16	hashmap: clarify that hashmap_entry can safely be discarded The API documentation said that the hashmap_entry structure to be embedded in the caller's structure is to be treated as opaque, which left the reader wondering if it can safely be discarded when it no longer is necessary. If the hashmap_entry structure had references to external resources such as allocated memory or an open file descriptor, merely free(3)ing the containing structure (when the caller's structure is on the heap) or letting it go out of scope (when it is on the stack) would end up leaking the external resource. Document that there is no need for hashmap_entry_clear() that corresponds to hashmap_entry_init() to give the API users a little bit of peace of mind. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-02 14:34:17 -07:00
Jeff King	4d9c7e6f45	am: reset cached ident date for each patch When we compute the date to go in author/committer lines of commits, or tagger lines of tags, we get the current date once and then cache it for the rest of the program. This is a good thing in some cases, like "git commit", because it means we do not racily assign different times to the author/committer fields of a single commit object. But as more programs start to make many commits in a single process (e.g., the recently builtin "git am"), it means that you'll get long strings of commits with identical committer timestamps (whereas before, we invoked "git commit" many times and got true timestamps). This patch addresses it by letting callers reset the cached time, which means they'll get a fresh time on their next call to git_committer_info() or git_author_info(). The first caller to do so is "git am", which resets the time for each patch it applies. It would be nice if we could just do this automatically before filling in the ident fields of commit and tag objects. Unfortunately, it's hard to know where a particular logical operation begins and ends. For instance, if commit_tree_extended() were to call reset_ident_date() before getting the committer/author ident, that doesn't quite work; sometimes the author info is passed in to us as a parameter, and it may or may not have come from a previous call to ident_default_date(). So in those cases, we lose the property that the committer and the author timestamp always match. You could similarly put a date-reset at the end of commit_tree_extended(). That actually works in the current code base, but it's fragile. It makes the assumption that after commit_tree_extended() finishes, the caller has no other operations that would logically want to fall into the same timestamp. So instead we provide the tool to easily do the reset, and let the high-level callers use it to annotate their own logical operations. There's no automated test, because it would be inherently racy (it depends on whether the program takes multiple seconds to run). But you can see the effect with something like: # make a fake 100-patch series top=$(git rev-parse HEAD) bottom=$(git rev-list --first-parent -100 HEAD \| tail -n 1) git log --format=email --reverse --first-parent \ --binary -m -p $bottom..$top >patch # now apply it; this presumably takes multiple seconds git checkout --detach $bottom git am <patch # now count the number of distinct committer times; # prior to this patch, there would only be one, but # now we'd typically see several. git log --format=%ct $bottom.. \| sort -u Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Helped-by: Paul Tan <pyokagan@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:49:41 -07:00
Stefan Beller	b5944f3476	submodule-config: keep configured branch around The branch field will be used in a later patch by `submodule update`. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:42:07 -07:00
Stefan Beller	2de26ae1dc	submodule--helper: fix usage string for relative-path Internally we call the underscore version of relative_path, but externally we present an API with no underscores. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:41:53 -07:00
Stefan Beller	341238ebc4	submodule update: narrow scope of local variable Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:41:51 -07:00
Stefan Beller	6cbf454a2e	submodule update: respect depth in subsequent fetches When depth is given the user may have a reasonable expectation that any remote operation is using the given depth. Add a test to demonstrate we still get the desired sha1 even if the depth is too short to include the actual commit. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:41:02 -07:00
Stefan Beller	d4470c5a46	t7406: future proof tests with hard coded depth The prior hard coded depth was chosen to be exactly the length from the recorded gitlink to the tip of the remote, so if you add more commits to the remote before, this test will not test its intention any more. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:40:56 -07:00
Ingo Brückl	766cdc4147	t3700: add a test_mode_in_index helper function The case statement to check the file mode of a staged file appears a number of times. Simplify the test by utilizing a test_mode_in_index helper function. Signed-off-by: Ingo BrÃ¼ckl <ib@wupperonline.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:25:30 -07:00
Ingo Brückl	b38ab197c2	t3700: merge two tests into one Depending on the underlying platform a chmod may be a noop. Although it wouldn't harm the result of the '--chmod=-x' test, there is a more robust way to make sure the --chmod option works both ways. Merge the two separate tests for the --chmod option into one, checking both permissions on the same file. Signed-off-by: Ingo BrÃ¼ckl <ib@wupperonline.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:20:53 -07:00
Ingo Brückl	c0fa44d8f1	t3700: remove unwanted leftover files before running new tests When an earlier test that has prerequisite is skipped, files used by later tests may be left in the working tree in an unexpected state. For example, a test runs this sequence: echo foo >xfoo1 && chmod 755 xfoo1 to create an executable file xfoo1, expecting that xfoo1 does not exist before it runs in the test sequence. However, the absence of this file depends on "git reset --hard" done in an earlier test, that is skipped when SANITY prerequisite is not met, and worse yet, xfoo1 originally is created as a symbolic link, which means the chmod does not affect the modes of xfoo1 as this test expects. Fix this by starting the test with "rm -f xfoo1" to make sure the file is created from scratch, and do the same to other similar tests. Signed-off-by: Ingo BrÃ¼ckl <ib@wupperonline.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:20:51 -07:00
René Scharfe	50492f7b38	pass constants as first argument to st_mult() The result of st_mult() is the same no matter the order of its arguments. It invokes the macro unsigned_mult_overflows(), which divides the second parameter by the first one. Pass constants first to allow that division to be done already at compile time. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 14:01:03 -07:00
René Scharfe	02962d3684	use strbuf_addstr() for adding constant strings to a strbuf Replace uses of strbuf_addf() for adding strings with more lightweight strbuf_addstr() calls. In http-push.c it becomes easier to see what's going on without having to verfiy that the definition of PROPFIND_ALL_REQUEST doesn't contain any format specifiers. Signed-off-by: Rene Scharfe <l.s.r@web.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 13:42:10 -07:00
Josh Triplett	6bc6b6c0dc	format-patch: format.from gives the default for --from This helps users who would prefer format-patch to default to --from, and makes it easier to change the default in the future. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 13:13:02 -07:00
Andreas Brauchli	77947bbe24	gitweb: escape link body in format_ref_marker Fix a case where an html link can be generated from unescaped input resulting in invalid strict xhtml or potentially injected code. An overview of a repo with a tag "1.0.0&0.0.1" would previously result in an unescaped ampersand in the link body. Signed-off-by: Andreas Brauchli <a.brauchli@elementarea.net> Acked-by: Jakub Narębski <jnareb@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 12:55:40 -07:00
Johannes Schindelin	6999bc7074	merge-recursive: flush output buffer even when erroring out Ever since `66a155b` (Enable output buffering in merge-recursive., 2007-01-14), we had a problem: When the merge failed in a fatal way, all regular output was swallowed because we called die() and did not get a chance to drain the output buffers. To fix this, several modifications were necessary: - we needed to stop die()ing, to give callers a chance to do something when an error occurred (in this case, flush the output buffers), - we needed to delay printing the error message so that the caller can print the buffered output before that, and - we needed to make sure that the output buffers are flushed even when the return value indicates an error. The first two changes were introduced through earlier commits in this patch series, and this commit addresses the third one. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 11:45:30 -07:00
Johannes Schindelin	548009c0d5	merge_trees(): ensure that the callers release output buffer The recursive merge machinery accumulates its output in an output buffer, to be flushed at the end of merge_recursive(). At this point, we forgot to release the output buffer. When calling merge_trees() (i.e. the non-recursive part of the recursive merge) directly, the output buffer is never flushed because the caller may be merge_recursive() which wants to flush the output itself. For the same reason, merge_trees() cannot release the output buffer: it may still be needed. Forgetting to release the output buffer did not matter much when running git-checkout, or git-merge-recursive, because we exited after the operation anyway. Ever since cherry-pick learned to pick a commit range, however, this memory leak had the potential of becoming a problem. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 11:45:30 -07:00
Johannes Schindelin	f1e2426b28	merge-recursive: offer an option to retain the output in 'obuf' Since `66a155b` (Enable output buffering in merge-recursive., 2007-01-14), we already accumulate the output in a buffer. The idea was to avoid interfering with the progress output that goes to stderr, which is unbuffered, when we write to stdout, which is buffered. We extend that buffering to allow the caller to handle the output (possibly suppressing it). This will help us when extending the sequencer to do rebase -i's brunt work: it does not want the picks to print anything by default but instead determine itself whether to print the output or not. Note that we also redirect the error messages into the output buffer when the caller asked not to flush the output buffer, for two reasons: 1) to retain the correct output order, and 2) to allow the caller to suppress all output. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 11:45:30 -07:00
Johannes Schindelin	dde75cb056	merge-recursive: write the commit title in one go In `66a155b` (Enable output buffering in merge-recursive., 2007-01-14), we changed the code such that it prints the output in one go, to avoid interfering with the progress output. Let's make sure that the same holds true when outputting the commit title: previously, we used several printf() statements to stdout and assumed that stdout's buffer is large enough to hold the entire commit title. Apart from making that speculation unnecessary, we change the code to add the message to the output buffer before flushing for another reason: the next commit will introduce a new level of output buffering, where the caller can request the output not to be flushed, but to be retained for further processing. This latter feature will be needed when teaching the sequencer to do rebase -i's brunt work: it wants to control the output of the cherry-picks (i.e. recursive merges). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 11:45:28 -07:00
Johannes Schindelin	bc9204d4ef	merge-recursive: flush output buffer before printing error messages The data structure passed to the recursive merge machinery has a feature where the caller can ask for the output to be buffered into a strbuf, by setting the field 'buffer_output'. Previously, we died without flushing, losing accumulated output. With this patch, we show the output first, and only then print the error message. Currently, the only user of that buffering is merge_recursive() itself, to avoid the progress output to interfere. In the next patches, we will introduce a new buffer_output mode that forces merge_recursive() to retain the output buffer for further processing by the caller. If the caller asked for that, we will then also write the error messages into the output buffer. This is necessary to give the caller more control not only how to react in case of errors but also control how/if to display the error messages. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-08-01 11:45:27 -07:00
Kevin Willford	3e8e32c32e	patch-ids: add flag to create the diff patch id using header only data This will allow a diff patch id to be created using only the header data so that the contents of the file will not have to be loaded. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 14:10:01 -07:00
Kevin Willford	683f17ec44	patch-ids: replace the seen indicator with a commit pointer The cherry_pick_list was looping through the original side checking the seen indicator and setting the cherry_flag on the commit. If we save off the commit in the patch_id we can set the cherry_flag on the correct commit when running through the other side when a patch_id match is found. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 13:23:03 -07:00
Kevin Willford	dfb7a1b4d0	patch-ids: stop using a hand-rolled hashmap implementation This change will use the hashmap from the hashmap.h to keep track of the patch_ids that have been encountered instead of using an internal implementation. This simplifies the implementation of the patch ids. Signed-off-by: Kevin Willford <kcwillford@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 13:23:03 -07:00
Jeff King	56dfeb6263	pack-objects: compute local/ignore_pack_keep early In want_object_in_pack(), we can exit early from our loop if neither "local" nor "ignore_pack_keep" are set. If they are, however, we must examine each pack to see if it has the object and is non-local or has a ".keep". It's quite common for there to be no non-local or .keep packs at all, in which case we know ahead of time that looking further will be pointless. We can pre-compute this by simply iterating over the list of packs ahead of time, and dropping the flags if there are no packs that could match. Another similar strategy would be to modify the loop in want_object_in_pack() to notice that we have already found the object once, and that we are looping only to check for "local" and "keep" attributes. If a pack has neither of those, we can skip the call to find_pack_entry_one(), which is the expensive part of the loop. This has two advantages: - it isn't all-or-nothing; we still get some improvement when there's a small number of kept or non-local packs, and a large number of non-kept local packs - it eliminates any possible race where we add new non-local or kept packs after our initial scan. In practice, I don't think this race matters; we already cache the packed_git information, so somebody who adds a new pack or .keep file after we've started will not be noticed at all, unless we happen to need to call reprepare_packed_git() because a lookup fails. In other words, we're already racy, and the race is not a big deal (losing the race means we might include an object in the pack that would not otherwise be, which is an acceptable outcome). However, it also has a disadvantage: we still loop over the rest of the packs for each object to check their flags. This is much less expensive than doing the object lookup, but still not free. So if we wanted to implement that strategy to cover the non-all-or-nothing cases, we could do so in addition to this one (so you get the most speedup in the all-or-nothing case, and the best we can do in the other cases). But given that the all-or-nothing case is likely the most common, it is probably not worth the trouble, and we can revisit this later if evidence points otherwise. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:08 -07:00
Jeff King	cd37996795	pack-objects: break out of want_object loop early When pack-objects collects the list of objects to pack (either from stdin, or via its internal rev-list), it filters each one through want_object_in_pack(). This function loops through each existing packfile, looking for the object. When we find it, we mark the pack/offset combo for later use. However, we can't just return "yes, we want it" at that point. If --honor-pack-keep is in effect, we must keep looking to find it in _all_ packs, to make sure none of them has a .keep. Likewise, if --local is in effect, we must make sure it is not present in any non-local pack. As a result, the sum effort of these calls is effectively O(nr_objects * nr_packs). In an ordinary repository, we have only a handful of packs, and this doesn't make a big difference. But in pathological cases, it can slow the counting phase to a crawl. This patch notices the case that we have neither "--local" nor "--honor-pack-keep" in effect and breaks out of the loop early, after finding the first instance. Note that our worst case is still "objects * packs" (i.e., we might find each object in the last pack we look in), but in practice we will often break out early. On an "average" repo, my git.git with 8 packs, this shows a modest 2% (a few dozen milliseconds) improvement in the counting-objects phase of "git pack-objects --all <foo" (hackily instrumented by sticking exit(0) right after list_objects). But in a much more pathological case, it makes a bigger difference. I ran the same command on a real-world example with ~9 million objects across 1300 packs. The counting time dropped from 413s to 45s, an improvement of about 89%. Note that this patch won't do anything by itself for a normal "git gc", as it uses both --honor-pack-keep and --local. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:07 -07:00
Jeff King	a73cdd21c4	find_pack_entry: replace last_found_pack with MRU cache Each pack has an index for looking up entries in O(log n) time, but if we have multiple packs, we have to scan through them linearly. This can produce a measurable overhead for some operations. We dealt with this long ago in `f7c22cc` (always start looking up objects in the last used pack first, 2007-05-30), which keeps what is essentially a 1-element most-recently-used cache. In theory, we should be able to do better by keeping a similar but longer cache, that is the same length as the pack-list itself. Since we now have a convenient generic MRU structure, we can plug it in and measure. Here are the numbers for running p5303 against linux.git: Test HEAD^ HEAD ------------------------------------------------------------------------ 5303.3: rev-list (1) 31.56(31.28+0.27) 31.30(31.08+0.20) -0.8% 5303.4: repack (1) 40.62(39.35+2.36) 40.60(39.27+2.44) -0.0% 5303.6: rev-list (50) 31.31(31.06+0.23) 31.23(31.00+0.22) -0.3% 5303.7: repack (50) 58.65(69.12+1.94) 58.27(68.64+2.05) -0.6% 5303.9: rev-list (1000) 38.74(38.40+0.33) 31.87(31.62+0.24) -17.7% 5303.10: repack (1000) 367.20(441.80+4.62) 342.00(414.04+3.72) -6.9% The main numbers of interest here are the rev-list ones (since that is exercising the normal object lookup code path). The single-pack case shouldn't improve at all; the 260ms speedup there is just part of the run-to-run noise (but it's important to note that we didn't make anything worse with the overhead of maintaining our cache). In the 50-pack case, we see similar results. There may be a slight improvement, but it's mostly within the noise. The 1000-pack case does show a big improvement, though. That carries over to the repack case, as well. Even though we haven't touched its pack-search loop yet, it does still do a lot of normal object lookups (e.g., for the internal revision walk), and so improves. As a point of reference, I also ran the 1000-pack test against a version of HEAD^ with the last_found_pack optimization disabled. It takes ~60s, so that gives an indication of how much even the single-element cache is helping. For comparison, here's a smaller repository, git.git: Test HEAD^ HEAD --------------------------------------------------------------------- 5303.3: rev-list (1) 1.56(1.54+0.01) 1.54(1.51+0.02) -1.3% 5303.4: repack (1) 1.84(1.80+0.10) 1.82(1.80+0.09) -1.1% 5303.6: rev-list (50) 1.58(1.55+0.02) 1.59(1.57+0.01) +0.6% 5303.7: repack (50) 2.50(3.18+0.04) 2.50(3.14+0.04) +0.0% 5303.9: rev-list (1000) 2.76(2.71+0.04) 2.24(2.21+0.02) -18.8% 5303.10: repack (1000) 13.21(19.56+0.25) 11.66(18.01+0.21) -11.7% You can see that the percentage improvement is similar. That's because the lookup we are optimizing is roughly O(nr_objects * nr_packs). Since the number of packs is constant in both tests, we'd expect the improvement to be linear in the number of objects. But the whole process is also linear in the number of objects, so the improvement is a constant factor. The exact improvement does also depend on the contents of the packs. In p5303, the extra packs all have 5 first-parent commits in them, which is a reasonable simulation of a pushed-to repository. But it also means that only 250 first-parent commits are in those packs (compared to almost 50,000 total in linux.git), and the rest are in the huge "base" pack. So once we start looking at history in taht big pack, that's where we'll find most everything, and even the 1-element cache gets close to 100% cache hits. You could almost certainly show better numbers with a more pathological case (e.g., distributing the objects more evenly across the packs). But that's simply not that realistic a scenario, so it makes more sense to focus on these numbers. The implementation itself is a straightforward application of the MRU code. We provide an MRU-ordered list of packs that shadows the packed_git list. This is easy to do because we only create and revise the pack list in one place. The "reprepare" code path actually drops the whole MRU and replaces it for simplicity. It would be more efficient to just add new entries, but there's not much point in optimizing here; repreparing happens rarely, and only after doing a lot of other expensive work. The key things to keep optimized are traversal (which is just a normal linked list, albeit with one extra level of indirection over the regular packed_git list), and marking (which is a constant number of pointer assignments, though slightly more than the old last_found_pack was; it doesn't seem to create a measurable slowdown, though). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:07 -07:00
Jeff King	002f206faf	add generic most-recently-used list There are a few places in Git that would benefit from a fast most-recently-used cache (e.g., the list of packs, which we search linearly but would like to order based on locality). This patch introduces a generic list that can be used to store arbitrary pointers in most-recently-used order. The implementation is just a doubly-linked list, where "marking" an item as used moves it to the front of the list. Insertion and marking are O(1), and iteration is O(n). There's no lookup support provided; if you need fast lookups, you are better off with a different data structure in the first place. There is also no deletion support. This would not be hard to do, but it's not necessary for handling pack structs, which are created and never removed. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:07 -07:00
Jeff King	3157c880f6	sha1_file: drop free_pack_by_name The point of this function is to drop an entry from the "packed_git" cache that points to a file we might be overwriting, because our contents may not be the same (and hence the only caller was pack-objects as it moved a temporary packfile into place). In older versions of git, this could happen because the names of packfiles were derived from the set of objects they contained, not the actual bits on disk. But since `1190a1a` (pack-objects: name pack files after trailer hash, 2013-12-05), the name reflects the actual bits on disk, and any two packfiles with the same name can be used interchangeably. Dropping this function not only saves a few lines of code, it makes the lifetime of "struct packed_git" much easier to reason about: namely, we now do not ever free these structs. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:06 -07:00
Jeff King	77023ea3c3	t/perf: add tests for many-pack scenarios Git's pack storage does efficient (log n) lookups in a single packfile's index, but if we have multiple packfiles, we have to linearly search each for a given object. This patch introduces some timing tests for cases where we have a large number of packs, so that we can measure any improvements we make in the following patches. The main thing we want to time is object lookup. To do this, we measure "git rev-list --objects --all", which does a fairly large number of object lookups (essentially one per object in the repository). However, we also measure the time to do a full repack, which is interesting for two reasons. One is that in addition to the usual pack lookup, it has its own linear iteration over the list of packs. And two is that because it it is the tool one uses to go from an inefficient many-pack situation back to a single pack, we care about its performance not only at marginal numbers of packs, but at the extreme cases (e.g., if you somehow end up with 5,000 packs, it is the only way to get back to 1 pack, so we need to make sure it performs well). We measure the performance of each command in three scenarios: 1 pack, 50 packs, and 1,000 packs. The 1-pack case is a baseline; any optimizations we do to handle multiple packs cannot possibly perform better than this. The 50-pack case is as far as Git should generally allow your repository to go, if you have auto-gc enabled with the default settings. So this represents the maximum performance improvement we would expect under normal circumstances. The 1,000-pack case is hopefully rare, though I have seen it in the wild where automatic maintenance was broken for some time (and the repository continued to receive pushes). This represents cases where we care less about general performance, but want to make sure that a full repack command does not take excessively long. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-29 11:05:06 -07:00
Junio C Hamano	f8f7adce9f	Sync with maint * maint: Some fixes for 2.9.3	2016-07-28 14:21:18 -07:00
Junio C Hamano	8213178c86	Eighth batch of topics for 2.10 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-28 14:20:48 -07:00
Junio C Hamano	2a96d39824	t9100: portability fix Do not say "export VAR=VAL"; "VAR=VAL && export VAR" is always more portable. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2016-07-28 14:20:13 -07:00

... 4 5 6 7 8 ...

44180 Commits