git-commit-vandalism

Author	SHA1	Message	Date
Thomas Ackermann	145e073b84	user-manual: improve html and pdf formatting Use asciidoc style 'article' instead of 'book' and change asciidoc title level. This removes blank first page and superfluous "Part I" page (there is no "Part II") in pdf output. Also pdf size is decreased by this from 77 to 67 pages. In html output this removes unnecessary sub-tocs and chapter numbering. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 11:30:17 -08:00
Sebastian Schuberth	c6127fa3e2	builtin/help.c: speed up is_git_command() by checking for builtin commands first Since `2dce956` is_git_command() is a bit slow as it does file I/O in the call to list_commands_in_dir(). Avoid the file I/O by adding an early check for the builtin commands. Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 11:26:31 -08:00
Sebastian Schuberth	a3c5263438	builtin/help.c: call load_command_list() only when it is needed This avoids list_commands_in_dir() being called when not needed which is quite slow due to file I/O in order to list matching files in a directory. Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 11:26:10 -08:00
Sebastian Schuberth	3f784a4dcb	git.c: consistently use the term "builtin" instead of "internal command" Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 11:25:50 -08:00
Junio C Hamano	932f7e4769	Merge branch 'maint' * maint: Documentation/gitmodules: Only 'update' and 'url' are required l10n: de.po: fix translation of 'prefix'	2014-01-06 10:39:07 -08:00
Michael Haggerty	18d37e860d	safe_create_leading_directories(): add new error value SCLD_VANISHED Add a new possible error result that can be returned by safe_create_leading_directories() and safe_create_leading_directories_const(): SCLD_VANISHED. This value indicates that a file or directory on the path existed at one point (either it already existed or the function created it), but then it disappeared. This probably indicates that another process deleted the directory while we were working. If SCLD_VANISHED is returned, the caller might want to retry the function call, as there is a chance that a new attempt will succeed. Why doesn't safe_create_leading_directories() do the retrying internally? Because an empty directory isn't really ever safe until it holds a file. So even if safe_create_leading_directories() were absolutely sure that the directory existed before it returned, there would be no guarantee that the directory still existed when the caller tried to write something in it. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:22 -08:00
Michael Haggerty	f3565c0ca5	cmd_init_db(): when creating directories, handle errors conservatively safe_create_leading_directories_const() returns a non-zero value on error. The old code at this calling site recognized a couple of particular error values, and treated all other return values as success. Instead, be more conservative: recognize the errors we are interested in, but treat any other nonzero values as failures. This is more robust in case somebody adds another possible return value without telling us. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:22 -08:00
Michael Haggerty	0be0521b23	safe_create_leading_directories(): introduce enum for return values Instead of returning magic integer values (which a couple of callers go to the trouble of distinguishing), return values from an enum. Add a docstring. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:21 -08:00
Michael Haggerty	9e6f885d14	safe_create_leading_directories(): always restore slash at end of loop Always restore the slash that we scribbled over at the end of the loop, rather than also fixing it up at each premature exit from the loop. This makes it harder to forget to do the cleanup as new paths are added to the code. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:21 -08:00
Michael Haggerty	bf10cf70ad	safe_create_leading_directories(): split on first of multiple slashes If the input path has multiple slashes between path components (e.g., "foo//bar"), then the old code was breaking the path at the last slash, not the first one. So in the above example, the second slash was overwritten with NUL, resulting in the parent directory being sought as "foo/". When stat() is called on "foo/", it fails with ENOTDIR if "foo" exists but is not a directory. This caused the wrong path to be taken in the subsequent logic. So instead, split path components at the first intercomponent slash rather than the last one. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:20 -08:00
Michael Haggerty	26c8ae2a57	safe_create_leading_directories(): rename local variable Rename "pos" to "next_component", because now it always points at the next component of the path name that has to be processed. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:20 -08:00
Michael Haggerty	831651fde8	safe_create_leading_directories(): add explicit "slash" pointer Keep track of the position of the slash character independently of "pos", thereby making the purpose of each variable clearer and working towards other upcoming changes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:19 -08:00
Michael Haggerty	f05023324c	safe_create_leading_directories(): reduce scope of local variable This makes it more obvious that values of "st" don't persist across loop iterations. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:19 -08:00
Michael Haggerty	53a3972171	safe_create_leading_directories(): fix format of "if" chaining Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:34:19 -08:00
Ramkumar Ramachandra	c39a2f1178	completion: fix remote.pushdefault When attempting to complete $ git config remote.push<TAB> 'pushdefault' doesn't come up. This is because "$cur" is matched with "remote.*" and a list of remotes are completed. Add 'pushdefault' as a candidate for completion too, using __gitcomp_nl_append (). Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:17:25 -08:00
Ramkumar Ramachandra	422553df49	completion: fix branch.autosetup(merge\|rebase) When attempting to complete $ git config branch.auto<TAB> 'autosetupmerge' and 'autosetuprebase' don't come up. This is because "$cur" is matched with "branch.*" and a list of branches are completed. Add 'autosetupmerge', 'autosetuprebase' as candidates for completion too, using __gitcomp_nl_append (). Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:17:05 -08:00
Ramkumar Ramachandra	f33c2c0f9e	completion: introduce __gitcomp_nl_append () There are situations where multiple classes of completions possible. For example branch.<TAB> should try to complete branch.master. branch.autosetupmerge branch.autosetuprebase The first candidate has the suffix ".", and the second/ third candidates have the suffix " ". To facilitate completions of this kind, create a variation of __gitcomp_nl () that appends to the existing list of completion candidates, COMPREPLY. Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:14:48 -08:00
Ramkumar Ramachandra	d028b8906a	zsh completion: find matching custom bash completion If zsh completion is being read from a location that is different from system-wide default, it is likely that the user is trying to use a custom version, perhaps closer to the bleeding edge, installed in her own directory. We will more likely to find the matching bash completion script in the same directory than in those system default places. Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:14:29 -08:00
Junio C Hamano	c90d3dbe7d	Merge branch 'maint' of git://github.com/git-l10n/git-po into maint * 'maint' of git://github.com/git-l10n/git-po: l10n: de.po: fix translation of 'prefix'	2014-01-06 09:10:09 -08:00
W. Trevor King	43fda9455c	Documentation/gitmodules: Only 'update' and 'url' are required Descriptions for all the settings fell under the initial "Each submodule section also contains the following required keys:". The example shows sections with just 'path' and 'url' entries, which are indeed required, but we should still make the required/optional distinction explicit to clarify that the rest of them are optional. Signed-off-by: W. Trevor King <wking@tremily.us> Reviewed-by: Heiko Voigt <hvoigt@hvoigt.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:08:18 -08:00
Ramsay Jones	feefdf62c1	shallow: remove unused code Commit `58babfff` ("shallow.c: the 8 steps to select new commits for .git/shallow", 05-12-2013) added a function to implement step 5 of the quoted eight steps, namely 'remove_nonexistent_ours_in_pack()'. This function implements an optional optimization step in the new shallow commit selection algorithm. However, this function has no callers. (The commented out call sites would need to change, in order to provide information required by the function.) Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 09:05:40 -08:00
Ramsay Jones	16a2743cd0	send-pack.c: mark a file-local function static Commit `f2c681cf` ("send-pack: support pushing from a shallow clone via http", 05-12-2013) adds the 'advertise_shallow_grafts_buf' function as an external symbol. Noticed by sparse. ("'advertise_shallow_grafts_buf' was not declared. Should it be static?") Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-06 08:26:36 -08:00
Vasily Makarov	6bc76725ea	get_octopus_merge_bases(): cleanup redundant variable pptr is needless. Some related code got cleaned as well. Signed-off-by: Vasily Makarov <einmalfel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-03 10:29:03 -08:00
Tom Miller	10a6cc8890	fetch --prune: Run prune before fetching When we have a remote-tracking branch named "frotz/nitfol" from a previous fetch, and the upstream now has a branch named "frotz", fetch would fail to remove "frotz/nitfol" with a "git fetch --prune" from the upstream. git would inform the user to use "git remote prune" to fix the problem. Change the way "fetch --prune" works by moving the pruning operation before the fetching operation. This way, instead of warning the user of a conflict, it autmatically fixes it. Signed-off-by: Tom Miller <jackerran@gmail.com> Tested-by: Thomas Rast <tr@thomasrast.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-03 10:18:40 -08:00
Tom Miller	4b3b33a747	fetch --prune: always print header url If "fetch --prune" is run with no new refs to fetch, but it has refs to prune. Then, the header url is not printed as it would if there were new refs to fetch. Output before this patch: $ git fetch --prune remote-with-no-new-refs x [deleted] (none) -> origin/world Output after this patch: $ git fetch --prune remote-with-no-new-refs From https://github.com/git/git x [deleted] (none) -> origin/test Signed-off-by: Tom Miller <jackerran@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-03 10:13:39 -08:00
Ralf Thielow	cb0553651d	l10n: de.po: fix translation of 'prefix' The word 'prefix' is currently translated as 'Prefix' which is not a German word. It should be translated as 'Präfix'. Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>	2014-01-03 18:21:38 +01:00
Kyle J. McKay	ed7eda8b38	gc: notice gc processes run by other users Since `64a99eb4` git gc refuses to run without the --force option if another gc process on the same repository is already running. However, if the repository is shared and user A runs git gc on the repository and while that gc is still running user B runs git gc on the same repository the gc process run by user A will not be noticed and the gc run by user B will go ahead and run. The problem is that the kill(pid, 0) test fails with an EPERM error since user B is not allowed to signal processes owned by user A (unless user B is root). Update the test to recognize an EPERM error as meaning the process exists and another gc should not be run (unless --force is given). Signed-off-by: Kyle J. McKay <mackyle@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-02 16:15:29 -08:00
Jeff King	738a8beac4	t0000: drop "known breakage" test Having a simulated "known breakage" test means that the test suite will always tell us there is a bug to be fixed, even though it is only simulated. The right way to test this is in a sub-test, that can also check that we provide the correct exit status and output. Fortunately, we already have such a test (added much later by `5ebf89e`). We could arguably get rid of the simulated success test immediately above, as well, as it is also redundant with the tests added in `5ebf89e`. However, it does not have the annoying behavior of the "known breakage" test. It may also be easier to debug if the test suite is truly broken, since it is not a test-within-a-test, as the later tests are. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-02 14:43:12 -08:00
Jeff King	a63c12c9be	t0000: simplify HARNESS_ACTIVE hack Commit `517cd55` set HARNESS_ACTIVE unconditionally in sub-tests, because that value affects the output of "--verbose". t0000 needs stable output from its sub-tests, and we may or may not be running under a TAP harness. That commit made the decision to always set the variable, since it has another useful side effect, which is suppressing writes to t/test-results by the sub-tests (which would just pollute the real results). Since the last commit, though, the sub-tests have their own test-results directories, so this is no longer an issue. We can now update a few comments that are no longer accurate nor necessary. We can also revisit the choice of HARNESS_ACTIVE. Since we must choose one value for stability, it's probably saner to have it off. This means that future patches could test things like the test-results writing, or the "--quiet" option, which is currently ignored when run under a harness. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-02 14:43:11 -08:00
Jeff King	6883047071	t0000: set TEST_OUTPUT_DIRECTORY for sub-tests Running t0000 produces more trash directories than expected and does not clean up after itself: $ ./t0000-basic.sh [...] $ ls -d trash\ directory.* trash directory.failing-cleanup trash directory.mixed-results1 trash directory.mixed-results2 trash directory.partial-pass trash directory.test-verbose trash directory.test-verbose-only-2 These scratch areas for sub-tests should be under the t0000 trash directory, but because TEST_OUTPUT_DIRECTORY defaults to TEST_DIRECTORY, which is exported to help sub-tests find test-lib.sh, the sub-test trash directories are created under the toplevel t/ directory instead. Because some of the sub-tests simulate failures, their trash directories are kept around. Fix it by explicitly setting TEST_OUTPUT_DIRECTORY appropriately for sub-tests. An alternative fix would be to pass the --root parameter that only specifies where to put the trash directories, which would also work. However, using TEST_OUTPUT_DIRECTORY is more futureproof in case tests want to write more output in addition to the test-results/ (which are already suppressed in sub-tests using the HARNESS_ACTIVE setting) and trash directories. This fixes a regression introduced by `38b074d` (t/test-lib.sh: fix TRASH_DIRECTORY handling, 2013-04-14). Before that commit, the TEST_OUTPUT_DIRECTORY setting was not respected consistently so most tests did their work in a "trash" subdirectory of the current directory instead of the output dir. Signed-off-by: Jeff King <peff@peff.net> Clarified-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-02 14:40:03 -08:00
Jeff King	afbf5ca507	use distinct username/password for http auth tests The httpd server we set up to test git's http client code knows about a single account, in which both the username and password are "user@host" (the unusual use of the "@" here is to verify that we handle the character correctly when URL escaped). This means that we may miss a certain class of errors in which the username and password are mixed up internally by git. We can make our tests more robust by having distinct values for the username and password. In addition to tweaking the server passwd file and the client URL, we must teach the "askpass" harness to accept multiple values. As a bonus, this makes the setup of some tests more obvious; when we are expecting git to ask only about the password, we can seed the username askpass response with a bogus value. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-02 10:25:03 -08:00
Jeff King	e1c1a324fc	Revert "prompt: clean up strbuf usage" This reverts commit `31b49d9b65`. That commit taught do_askpass to hand ownership of our buffer back to the caller rather than simply return a pointer into our internal strbuf. What it failed to notice, though, was that our internal strbuf is static, because we are trying to emulate the getpass() interface. By handing off ownership, we created a memory leak that cannot be solved. Sometimes git_prompt returns a static buffer from getpass() (or our smarter git_terminal_prompt wrapper), and sometimes it returns an allocated string from do_askpass. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-02 10:21:40 -08:00
Benny Siegert	92164af978	Add MirBSD support to the build system. Add an entry into the table of supported OSes. Do not set _XOPEN_SOURCE (contrary to OpenBSD) because that disables the u_short and u_long typedefs, which are used unconditionally in various other header files. Signed-off-by: Benny Siegert <bsiegert@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2014-01-02 10:19:14 -08:00
Christian Couder	663a8566be	replace info: rename 'full' to 'long' and clarify in-code symbols Enum names SHORT/MEDIUM/FULL were too broad to be descriptive. And they clashed with built-in symbols on platforms like Windows. Clarify by giving them REPLACE_FORMAT_ prefix. Rename 'full' format in "git replace --format=<name>" to 'long', to match others (i.e. 'short' and 'medium'). Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:33:11 -08:00
Junio C Hamano	44484662d8	Merge branch 'maint' * maint: for-each-ref: remove unused variable	2013-12-30 12:27:01 -08:00
Ramkumar Ramachandra	b9cf14d43b	for-each-ref: remove unused variable No code ever used this symbol since the command was introduced at `9f613ddd` (Add git-for-each-ref: helper for language bindings, 2006-09-15). Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:23:51 -08:00
Vicent Marti	ae4f07fbcc	pack-bitmap: implement optional name_hash cache When we use pack bitmaps rather than walking the object graph, we end up with the list of objects to include in the packfile, but we do not know the path at which any tree or blob objects would be found. In a recently packed repository, this is fine. A fetch would use the paths only as a heuristic in the delta compression phase, and a fully packed repository should not need to do much delta compression. As time passes, though, we may acquire more objects on top of our large bitmapped pack. If clients fetch frequently, then they never even look at the bitmapped history, and all works as usual. However, a client who has not fetched since the last bitmap repack will have "have" tips in the bitmapped history, but "want" newer objects. The bitmaps themselves degrade gracefully in this circumstance. We manually walk the more recent bits of history, and then use bitmaps when we hit them. But we would also like to perform delta compression between the newer objects and the bitmapped objects (both to delta against what we know the user already has, but also between "new" and "old" objects that the user is fetching). The lack of pathnames makes our delta heuristics much less effective. This patch adds an optional cache of the 32-bit name_hash values to the end of the bitmap file. If present, a reader can use it to match bitmapped and non-bitmapped names during delta compression. Here are perf results for p5310: Test origin/master HEAD^ HEAD ------------------------------------------------------------------------------------------------- 5310.2: repack to disk 36.81(37.82+1.43) 47.70(48.74+1.41) +29.6% 47.75(48.70+1.51) +29.7% 5310.3: simulated clone 30.78(29.70+2.14) 1.08(0.97+0.10) -96.5% 1.07(0.94+0.12) -96.5% 5310.4: simulated fetch 3.16(6.10+0.08) 3.54(10.65+0.06) +12.0% 1.70(3.07+0.06) -46.2% 5310.6: partial bitmap 36.76(43.19+1.81) 6.71(11.25+0.76) -81.7% 4.08(6.26+0.46) -88.9% You can see that the time spent on an incremental fetch goes down, as our delta heuristics are able to do their work. And we save time on the partial bitmap clone for the same reason. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Jeff King	bbcefa1f3f	t/perf: add tests for pack bitmaps This adds a few basic perf tests for the pack bitmap code to show off its improvements. The tests are: 1. How long does it take to do a repack (it gets slower with bitmaps, since we have to do extra work)? 2. How long does it take to do a clone (it gets faster with bitmaps)? 3. How does a small fetch perform when we've just repacked? 4. How does a clone perform when we haven't repacked since a week of pushes? Here are results against linux.git: Test origin/master this tree ----------------------------------------------------------------------- 5310.2: repack to disk 33.64(32.64+2.04) 67.67(66.75+1.84) +101.2% 5310.3: simulated clone 30.49(29.47+2.05) 1.20(1.10+0.10) -96.1% 5310.4: simulated fetch 3.49(6.79+0.06) 5.57(22.35+0.07) +59.6% 5310.6: partial bitmap 36.70(43.87+1.81) 8.18(21.92+0.73) -77.7% You can see that we do take longer to repack, but we do way better for further clones. A small fetch performs a bit worse, as we spend way more time on delta compression (note the heavy user CPU time, as we have 8 threads) due to the lack of name hashes for the bitmapped objects. The final test shows how the bitmaps degrade over time between packs. There's still a significant speedup over the non-bitmap case, but we don't do quite as well (we have to spend time accessing the "new" objects the old fashioned way, including delta compression). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Jeff King	212f2ffbf0	t: add basic bitmap functionality tests Now that we can read and write bitmaps, we can exercise them with some basic functionality tests. These tests aren't particularly useful for seeing the benefit, as the test repo is too small for it to make a difference. However, we can at least check that using bitmaps does not break anything. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Nguyễn Thái Ngọc Duy	d3d3e4c490	count-objects: recognize .bitmap in garbage-checking Count-objects will report any "garbage" files in the packs directory, including files whose extensions it does not know (case 1), and files whose matching ".pack" file is missing (case 2). Without having learned about ".bitmap" files, the current code reports all such files as garbage (case 1), even if their pack exists. Instead, they should be treated as case 2. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Vicent Marti	5cf2741c5a	repack: consider bitmaps when performing repacks Since `pack-objects` will write a `.bitmap` file next to the `.pack` and `.idx` files, this commit teaches `git-repack` to consider the new bitmap indexes (if they exist) when performing repack operations. This implies moving old bitmap indexes out of the way if we are repacking a repository that already has them, and moving the newly generated bitmap indexes into the `objects/pack` directory, next to their corresponding packfiles. Since `git repack` is now capable of handling these `.bitmap` files, a normal `git gc` run on a repository that has `pack.writebitmaps` set to true in its config file will generate bitmap indexes as part of the garbage collection process. Alternatively, `git repack` can be called with the `-b` switch to explicitly generate bitmap indexes if you are experimenting and don't want them on all the time. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Jeff King	b77fcd1edc	repack: handle optional files created by pack-objects We ask pack-objects to pack to a set of temporary files, and then rename them into place. Some files that pack-objects creates may be optional (like a .bitmap file), in which case we would not want to call rename(). We already call stat() and make the chmod optional if the file cannot be accessed. We could simply skip the rename step in this case, but that would be a minor regression in noticing problems with non-optional files (like the .pack and .idx files). Instead, we can now annotate extensions as optional, and skip them if they don't exist (and otherwise rely on rename() to barf). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Jeff King	42a02d8529	repack: turn exts array into array-of-struct This is slightly more verbose, but will let us annotate the extensions with further options in future commits. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Jeff King	b328c2166e	repack: stop using magic number for ARRAY_SIZE(exts) We have a static array of extensions, but hardcode the size of the array in our loops. Let's pull out this magic number, which will make it easier to change. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:23 -08:00
Vicent Marti	7cc8f97108	pack-objects: implement bitmap writing This commit extends more the functionality of `pack-objects` by allowing it to write out a `.bitmap` index next to any written packs, together with the `.idx` index that currently gets written. If bitmap writing is enabled for a given repository (either by calling `pack-objects` with the `--write-bitmap-index` flag or by having `pack.writebitmaps` set to `true` in the config) and pack-objects is writing a packfile that would normally be indexed (i.e. not piping to stdout), we will attempt to write the corresponding bitmap index for the packfile. Bitmap index writing happens after the packfile and its index has been successfully written to disk (`finish_tmp_packfile`). The process is performed in several steps: 1. `bitmap_writer_set_checksum`: this call stores the partial checksum for the packfile being written; the checksum will be written in the resulting bitmap index to verify its integrity 2. `bitmap_writer_build_type_index`: this call uses the array of `struct object_entry` that has just been sorted when writing out the actual packfile index to disk to generate 4 type-index bitmaps (one for each object type). These bitmaps have their nth bit set if the given object is of the bitmap's type. E.g. the nth bit of the Commits bitmap will be 1 if the nth object in the packfile index is a commit. This is a very cheap operation because the bitmap writing code has access to the metadata stored in the `struct object_entry` array, and hence the real type for each object in the packfile. 3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap index for one of the packfiles we're trying to repack, this call will efficiently rebuild the existing bitmaps so they can be reused on the new index. All the existing bitmaps will be stored in a `reuse` hash table, and the commit selection phase will prioritize these when selecting, as they can be written directly to the new index without having to perform a revision walk to fill the bitmap. This can greatly speed up the repack of a repository that already has bitmaps. 4. `bitmap_writer_select_commits`: if bitmap writing is enabled for a given `pack-objects` run, the sequence of commits generated during the Counting Objects phase will be stored in an array. We then use that array to build up the list of selected commits. Writing a bitmap in the index for each object in the repository would be cost-prohibitive, so we use a simple heuristic to pick the commits that will be indexed with bitmaps. The current heuristics are a simplified version of JGit's original implementation. We select a higher density of commits depending on their age: the 100 most recent commits are always selected, after that we pick 1 commit of each 100, and the gap increases as the commits grow older. On top of that, we make sure that every single branch that has not been merged (all the tips that would be required from a clone) gets their own bitmap, and when selecting commits between a gap, we tend to prioritize the commit with the most parents. Do note that there is no right/wrong way to perform commit selection; different selection algorithms will result in different commits being selected, but there's no such thing as "missing a commit". The bitmap walker algorithm implemented in `prepare_bitmap_walk` is able to adapt to missing bitmaps by performing manual walks that complete the bitmap: the ideal selection algorithm, however, would select the commits that are more likely to be used as roots for a walk in the future (e.g. the tips of each branch, and so on) to ensure a bitmap for them is always available. 5. `bitmap_writer_build`: this is the computationally expensive part of bitmap generation. Based on the list of commits that were selected in the previous step, we perform several incremental walks to generate the bitmap for each commit. The walks begin from the oldest commit, and are built up incrementally for each branch. E.g. consider this dag where A, B, C, D, E, F are the selected commits, and a, b, c, e are a chunk of simplified history that will not receive bitmaps. A---a---B--b--C--c--D \ E--e--F We start by building the bitmap for A, using A as the root for a revision walk and marking all the objects that are reachable until the walk is over. Once this bitmap is stored, we reuse the bitmap walker to perform the walk for B, assuming that once we reach A again, the walk will be terminated because A has already been SEEN on the previous walk. This process is repeated for C, and D, but when we try to generate the bitmaps for E, we can reuse neither the current walk nor the bitmap we have generated so far. What we do now is resetting both the walk and clearing the bitmap, and performing the walk from scratch using E as the origin. This new walk, however, does not need to be completed. Once we hit B, we can lookup the bitmap we have already stored for that commit and OR it with the existing bitmap we've composed so far, allowing us to limit the walk early. After all the bitmaps have been generated, another iteration through the list of commits is performed to find the best XOR offsets for compression before writing them to disk. Because of the incremental nature of these bitmaps, XORing one of them with its predecesor results in a minimal "bitmap delta" most of the time. We can write this delta to the on-disk bitmap index, and then re-compose the original bitmaps by XORing them again when loaded. This is a phase very similar to pack-object's `find_delta` (using bitmaps instead of objects, of course), except the heuristics have been greatly simplified: we only check the 10 bitmaps before any given one to find best compressing one. This gives good results in practice, because there is locality in the ordering of the objects (and therefore bitmaps) in the packfile. 6. `bitmap_writer_finish`: the last step in the process is serializing to disk all the bitmap data that has been generated in the two previous steps. The bitmap is written to a tmp file and then moved atomically to its final destination, using the same process as `pack-write.c:write_idx_file`. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Vicent Marti	aa32939fea	rev-list: add bitmap mode to speed up object lists The bitmap reachability index used to speed up the counting objects phase during `pack-objects` can also be used to optimize a normal rev-list if the only thing required are the SHA1s of the objects during the list (i.e., not the path names at which trees and blobs were found). Calling `git rev-list --objects --use-bitmap-index [committish]` will perform an object iteration based on a bitmap result instead of actually walking the object graph. These are some example timings for `torvalds/linux` (warm cache, best-of-five): $ time git rev-list --objects master > /dev/null real 0m34.191s user 0m33.904s sys 0m0.268s $ time git rev-list --objects --use-bitmap-index master > /dev/null real 0m1.041s user 0m0.976s sys 0m0.064s Likewise, using `git rev-list --count --use-bitmap-index` will speed up the counting operation by building the resulting bitmap and performing a fast popcount (number of bits set on the bitmap) on the result. Here are some sample timings of different ways to count commits in `torvalds/linux`: $ time git rev-list master \| wc -l 399882 real 0m6.524s user 0m6.060s sys 0m3.284s $ time git rev-list --count master 399882 real 0m4.318s user 0m4.236s sys 0m0.076s $ time git rev-list --use-bitmap-index --count master 399882 real 0m0.217s user 0m0.176s sys 0m0.040s This also respects negative refs, so you can use it to count a slice of history: $ time git rev-list --count v3.0..master 144843 real 0m1.971s user 0m1.932s sys 0m0.036s $ time git rev-list --use-bitmap-index --count v3.0..master real 0m0.280s user 0m0.220s sys 0m0.056s Though note that the closer the endpoints, the less it helps. In the traversal case, we have fewer commits to cross, so we take less time. But the bitmap time is dominated by generating the pack revindex, which is constant with respect to the refs given. Note that you cannot yet get a fast --left-right count of a symmetric difference (e.g., "--count --left-right master...topic"). The slow part of that walk actually happens during the merge-base determination when we parse "master...topic". Even though a count does not actually need to know the real merge base (it only needs to take the symmetric difference of the bitmaps), the revision code would require some refactoring to handle this case. Additionally, a `--test-bitmap` flag has been added that will perform the same rev-list manually (i.e. using a normal revwalk) and using bitmaps, and verify that the results are the same. This can be used to exercise the bitmap code, and also to verify that the contents of the .bitmap file are sane. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Vicent Marti	6b8fda2db1	pack-objects: use bitmaps when packing objects In this patch, we use the bitmap API to perform the `Counting Objects` phase in pack-objects, rather than a traditional walk through the object graph. For a reasonably-packed large repo, the time to fetch and clone is often dominated by the full-object revision walk during the Counting Objects phase. Using bitmaps can reduce the CPU time required on the server (and therefore start sending the actual pack data with less delay). For bitmaps to be used, the following must be true: 1. We must be packing to stdout (as a normal `pack-objects` from `upload-pack` would do). 2. There must be a .bitmap index containing at least one of the "have" objects that the client is asking for. 3. Bitmaps must be enabled (they are enabled by default, but can be disabled by setting `pack.usebitmaps` to false, or by using `--no-use-bitmap-index` on the command-line). If any of these is not true, we fall back to doing a normal walk of the object graph. Here are some sample timings from a full pack of `torvalds/linux` (i.e. something very similar to what would be generated for a clone of the repository) that show the speedup produced by various methods: [existing graph traversal] $ time git pack-objects --all --stdout --no-use-bitmap-index \ </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m44.111s user 0m42.396s sys 0m3.544s [bitmaps only, without partial pack reuse; note that pack reuse is automatic, so timing this required a patch to disable it] $ time git pack-objects --all --stdout </dev/null >/dev/null Counting objects: 3237103, done. Compressing objects: 100% (508752/508752), done. Total 3237103 (delta 2699584), reused 3237103 (delta 2699584) real 0m5.413s user 0m5.604s sys 0m1.804s [bitmaps with pack reuse (what you get with this patch)] $ time git pack-objects --all --stdout </dev/null >/dev/null Reusing existing pack: 3237103, done. Total 3237103 (delta 0), reused 0 (delta 0) real 0m1.636s user 0m1.460s sys 0m0.172s Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Jeff King	ce2bc42456	pack-objects: split add_object_entry This function actually does three things: 1. Check whether we've already added the object to our packing list. 2. Check whether the object meets our criteria for adding. 3. Actually add the object to our packing list. It's a little hard to see these three phases, because they happen linearly in the rather long function. Instead, this patch breaks them up into three separate helper functions. The result is a little easier to follow, though it unfortunately suffers from some optimization interdependencies between the stages (e.g., during step 3 we use the packing list index from step 1 and the packfile information from step 2). More importantly, though, the various parts can be composed differently, as they will be in the next patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Vicent Marti	fff42755ef	pack-bitmap: add support for bitmap indexes A bitmap index is a `.bitmap` file that can be found inside `$GIT_DIR/objects/pack/`, next to its corresponding packfile, and contains precalculated reachability information for selected commits. The full specification of the format for these bitmap indexes can be found in `Documentation/technical/bitmap-format.txt`. For a given commit SHA1, if it happens to be available in the bitmap index, its bitmap will represent every single object that is reachable from the commit itself. The nth bit in the bitmap is the nth object in the packfile; if it's set to 1, the object is reachable. By using the bitmaps available in the index, this commit implements several new functions: - `prepare_bitmap_git` - `prepare_bitmap_walk` - `traverse_bitmap_commit_list` - `reuse_partial_packfile_from_bitmap` The `prepare_bitmap_walk` function tries to build a bitmap of all the objects that can be reached from the commit roots of a given `rev_info` struct by using the following algorithm: - If all the interesting commits for a revision walk are available in the index, the resulting reachability bitmap is the bitwise OR of all the individual bitmaps. - When the full set of WANTs is not available in the index, we perform a partial revision walk using the commits that don't have bitmaps as roots, and limiting the revision walk as soon as we reach a commit that has a corresponding bitmap. The earlier OR'ed bitmap with all the indexed commits can now be completed as this walk progresses, so the end result is the full reachability list. - For revision walks with a HAVEs set (a set of commits that are deemed uninteresting), first we perform the same method as for the WANTs, but using our HAVEs as roots, in order to obtain a full reachability bitmap of all the uninteresting commits. This bitmap then can be used to: a) limit the subsequent walk when building the WANTs bitmap b) finding the final set of interesting commits by performing an AND-NOT of the WANTs and the HAVEs. If `prepare_bitmap_walk` runs successfully, the resulting bitmap is stored and the equivalent of a `traverse_commit_list` call can be performed by using `traverse_bitmap_commit_list`; the bitmap version of this call yields the objects straight from the packfile index (without having to look them up or parse them) and hence is several orders of magnitude faster. As an extra optimization, when `prepare_bitmap_walk` succeeds, the `reuse_partial_packfile_from_bitmap` call can be attempted: it will find the amount of objects at the beginning of the on-disk packfile that can be reused as-is, and return an offset into the packfile. The source packfile can then be loaded and the bytes up to `offset` can be written directly to the result without having to consider the entires inside the packfile individually. If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files are available), the `rev_info` struct is left untouched, and can be used to perform a manual rev-walk using `traverse_commit_list`. Hence, this new set of functions are a generic API that allows to perform the equivalent of git rev-list --objects [roots...] [^uninteresting...] for any set of commits, even if they don't have specific bitmaps generated for them. In further patches, we'll use this bitmap traversal optimization to speed up the `pack-objects` and `rev-list` commands. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00
Vicent Marti	0d4455a3ab	documentation: add documentation for the bitmap format This is the technical documentation for the JGit-compatible Bitmap v1 on-disk format. Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-12-30 12:19:22 -08:00

... 19 20 21 22 23 ...

36553 Commits