git-commit-vandalism

Author	SHA1	Message	Date
Richard Hartmann	b1d5a570fc	templates: Reformat pre-commit hook's message Now that we're using heredoc, the message can span the full 80 chars. Signed-off-by: Richard Hartmann <richih.mailinglist@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 09:52:52 -07:00
Richard Hartmann	27b6e17a6d	templates: Use heredoc in pre-commit hook This way, it is easier to see how the text we give the end users would look like, and it will allow us to use (near) full width of the source file. Signed-off-by: Richard Hartmann <richih.mailinglist@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 09:51:16 -07:00
Stefan Beller	d3c9cf32ca	diff.c: Do not initialize a variable, which gets reassigned anyway. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 09:45:21 -07:00
Stefan Beller	70a0cc9e5c	commit: Fix a memory leak in determine_author_info The date variable is assigned new memory via xmemdupz and 2 lines later it is assigned new memory again via xmalloc, but the first assignment is never freed nor used. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 09:45:21 -07:00
Stefan Beller	5d9cfa29d2	daemon.c:handle: Remove unneeded check for null pointer. addr doesn't need to be checked at that line as it it already accessed 7 lines before in the if (addr->sa_family). Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 09:45:11 -07:00
Junio C Hamano	5333f2afc4	Revert "git-clone.txt: remove the restriction on pushing from a shallow clone" This reverts commit `dacd2bcc41`. "It fails reliably without corrupting the receiving repository when it should fail" may be better than the situation before the receiving end was hardened recently, but the fact that sometimes the push does not go through still remains. It is better to advice the users that they cannot push from a shallow repository as a limitation before they decide to use (or not to use) a shallow clone. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:35:32 -07:00
Junio C Hamano	bd23794552	mailmap: style fixes Wrap overlong lines and format the multi-line comments to match our coding style. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:23:39 -07:00
Eric Sunshine	fbfba7ade0	mailmap: debug: avoid passing NULL to fprintf() '%s' conversion specification POSIX does not state the behavior of '%s' conversion when passed a NULL pointer. Some implementations interpolate literal "(null)"; others may crash. Callers of debug_mm() often pass NULL as indication of either a missing name or email address. Instead, let's always supply a proper string pointer, and make it a bit more descriptive: "(none)" Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:20:32 -07:00
Eric Sunshine	a8002a5f0e	mailmap: debug: eliminate -Wformat field precision type warning The compiler complains that '' in fprintf() format "%.s" should have type int, but we pass size_t. Fix this. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:20:11 -07:00
Eric Sunshine	0939a242fe	mailmap: debug: fix malformed fprintf() format conversion specification Resolve segmentation fault due to size_t variable being consumed by '%s'. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:19:56 -07:00
Eric Sunshine	c10be0c6ac	mailmap: debug: fix out-of-order fprintf() arguments Resolve segmentation fault due to arguments passed in wrong order. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:18:04 -07:00
Junio C Hamano	97e751be79	mailmap: do not downcase mailmap entries The email addresses in the records read from the .mailmap file are downcased very early, and then used to match against e-mail addresses in the input. Because we do use case insensitive version of string list to manage these entries, there is no need to do this, and worse yet, downcasing the rewritten/canonical e-mail read from the .mailmap file loses information. Stop doing that, and also make the string list used to keep multiple names for an mailmap entry case insensitive (the code that uses the list, lookup_prefix(), expects a case insensitive match). Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:17:20 -07:00
Eric Sunshine	3aff56ddbe	t4203: demonstrate loss of uppercase characters in canonical email The email addresses read from .mailmap are downcased before being inserted into the mailmap data structure, which undesirably loses information. It is impossible, for instance, to map <first.last@host> to <First.Last@host>. Demonstrate this problem. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:16:32 -07:00
Junio C Hamano	8c3811510e	mailmap: do not lose single-letter names In parse_name_and_email() function, there is this line: *name = (nstart < nend ? nstart : NULL); When the function is given a buffer "A <A@example.org> <old@x.z>", nstart scans from the beginning of the buffer, skipping whitespaces (there isn't any, so nstart points at the buffer), while nend starts from one byte before the first '<' and skips whitespaces backwards and stops at the first non-whitespace (i.e. it hits "A" at the beginning of the buffer). nstart == nend in this case for a single-letter name, and an off-by-one error makes it fail to pick up the name, which makes the entry equivalent to <A@example.org> <old@x.z> without the name. Signed-off-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:16:00 -07:00
Eric Sunshine	109025b4e1	t4203: demonstrate loss of single-character name in mailmap entry A bug in mailmap.c:parse_name_and_email() causes it to overlook the single-character name in "A <user@host>" and parse it only as "<user@host>". Demonstrate this problem. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 08:09:04 -07:00
Stefan Beller	f4f49e2258	.mailmap: Combine more (email, name) to individual persons I got more responses from people regarding the .mailmap file. All added persons gave permission to add them to the .mailmap file. It's mostly email mappings again. However we also have Nick Stokoe, who contributed as Nick Woolley. He changed his name, but kept the email. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-15 07:41:53 -07:00
Eric Sunshine	cb5c9521f1	t4203: test check-mailmap command invocation Test the command-line interface of check-mailmap. (Actual .mailmap functionality is already covered by existing tests.) Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-13 10:20:28 -07:00
Eric Sunshine	226ad3482a	builtin: add git-check-mailmap command Introduce command check-mailmap, similar to check-attr and check-ignore, which allows direct testing of .mailmap configuration. As plumbing accessible to scripts and other porcelain, check-mailmap publishes the stable, well-tested .mailmap functionality employed by built-in Git commands. Consequently, script authors need not re-implement .mailmap functionality manually, thus avoiding potential quirks and behavioral differences. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-13 10:19:37 -07:00
Stefan Beller	94b410bba8	.mailmap: Map email addresses to names People change email addresses quite often and sometimes forget to add their entry to the mailmap file. I have contacted lots of people, whose name occurs multiple times in the short log having different email addresses. The entries in the mailmap of this patch are either confirmed by them or are trivial. Trivial means different capitalisation of the domain (@MIT.EDU and @mit.edu) or the domain was localhost, (none) or @local. Additionally to adding (name, email) mappings to the .mailmap file, it has also been sorted ("LC_ALL=C /usr/bin/sort", byte-value sort). While the most changes happen at the email addresses, we also have a name change in here. Karl Hasselström is now known as Karl Wiberg due to marriage. Congratulations! To find out whom to contact I used the following small script: #!/bin/bash git shortlog -sne \|awk '{ NF--; $1=""; print }' \|sort \|uniq -d > mailmapdoubles while read line ; do # remove leading whitespace trimmed=$(echo $line \| sed -e 's/^ //g' -e 's/ $//g') echo "git shortlog -sne \| grep \""$trimmed"\"" done < mailmapdoubles > mailmapdoubles2 sh mailmapdoubles2 rm mailmapdoubles rm mailmapdoubles2 Also interesting for similar tasks are these snippets: # Finding out duplicates by comparing email addresses: git shortlog -sne \|awk '{ print $NF }' \|sort \|uniq -d # Finding out duplicates by comparing names: git shortlog -sne \|awk '{ NF--; $1=""; print }' \|sort \|uniq -d Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 12:53:02 -07:00
Junio C Hamano	0da7a53a76	Update draft release notes for 1.8.4 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 12:04:44 -07:00
Junio C Hamano	fb1c85d2e9	Merge branch 'jc/remote-http-argv-array' * jc/remote-http-argv-array: remote-http: use argv-array	2013-07-12 12:04:19 -07:00
Junio C Hamano	d5a3897f94	Merge branch 'rs/pickaxe-simplify' * rs/pickaxe-simplify: diffcore-pickaxe: simplify has_changes and contains	2013-07-12 12:04:17 -07:00
Junio C Hamano	533a05f63a	Merge branch 'tr/test-lint-no-export-assignment-in-shell' * tr/test-lint-no-export-assignment-in-shell: test-lint: detect 'export FOO=bar' t9902: fix 'test A == B' to use = operator	2013-07-12 12:04:16 -07:00
Junio C Hamano	624ec4f99d	Merge branch 'rr/name-rev-stdin-doc' * rr/name-rev-stdin-doc: name-rev doc: rewrite --stdin paragraph	2013-07-12 12:04:14 -07:00
Junio C Hamano	6492deafdd	Merge branch 'ft/diff-rename-default-score-is-half' * ft/diff-rename-default-score-is-half: diff-options: document default similarity index	2013-07-12 12:04:13 -07:00
Junio C Hamano	f1e03522dd	Merge branch 'ml/cygwin-does-not-have-fifo' * ml/cygwin-does-not-have-fifo: test-lib.sh - cygwin does not have usable FIFOs	2013-07-12 12:04:10 -07:00
Junio C Hamano	784bdd61ae	Merge branch 'tf/gitweb-extra-breadcrumbs' An Gitweb installation that is a part of larger site can optionally show extra links that point at the levels higher than the Gitweb pages itself in the link hierarchy of pages. * tf/gitweb-extra-breadcrumbs: gitweb: allow extra breadcrumbs to prefix the trail	2013-07-12 12:04:09 -07:00
Junio C Hamano	778e4b8903	Merge branch 'ms/remote-tracking-branches-in-doc' * ms/remote-tracking-branches-in-doc: Change "remote tracking" to "remote-tracking"	2013-07-12 12:04:07 -07:00
Junio C Hamano	5b307e95e8	Merge branch 'jk/pull-to-integrate' * jk/pull-to-integrate: pull: change the description to "integrate" changes push: avoid suggesting "merging" remote changes	2013-07-12 12:04:06 -07:00
Junio C Hamano	e70aee5c86	Merge branch 'jk/maint-config-multi-order' * jk/maint-config-multi-order: git-config(1): clarify precedence of multiple values	2013-07-12 12:04:04 -07:00
Junio C Hamano	8a6482227c	Merge branch 'as/log-output-encoding-in-user-format' "log --format=" did not honor i18n.logoutputencoding configuration and this attempts to fix it. * as/log-output-encoding-in-user-format: t4205 (log-pretty-formats): avoid using `sed` t6006 (rev-list-format): add tests for "%b" and "%s" for the case i18n.commitEncoding is not set t4205, t6006, t7102: make functions better readable t4205 (log-pretty-formats): revert back single quotes t4041, t4205, t6006, t7102: use iso8859-1 rather than iso-8859-1 t4205: replace .\+ with ..* in sed commands pretty: --format output should honor logOutputEncoding pretty: Add failing tests: --format output should honor logOutputEncoding t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs t7102 (reset): don't hardcode SHA-1 in expected outputs t6006 (rev-list-format): don't hardcode SHA-1 in expected outputs	2013-07-12 12:04:01 -07:00
Nguyễn Thái Ngọc Duy	dacd2bcc41	git-clone.txt: remove the restriction on pushing from a shallow clone The document says one cannot push from a shallow clone. But that is not true (maybe it was at some point in the past). The client does not stop such a push nor does it give any indication to the receiver that this is a shallow push. If the receiver accepts it, it's in. Since `52fed6e` (receive-pack: check connectivity before concluding "git push" - 2011-09-02), receive-pack is prepared to deal with broken push, a shallow push can't cause any corruption. Update the document to reflect that. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 12:03:28 -07:00
Thomas Rast	a77f106c78	run-command: dup_devnull(): guard against syscalls failing dup_devnull() did not check the return values of open() and dup2(). Fix this omission. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:30:09 -07:00
Dale R. Worley	a2cb86c152	git_mkstemps: correctly test return value of open() open() returns -1 on failure, and indeed 0 is a possible success value if the user closed stdin in our process. Fix the test. Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:30:08 -07:00
Jeff King	23c339c0f2	sha1_object_info_extended: pass object_info to helpers We take in a "struct object_info" which contains pointers to storage for items the caller cares about. But then rather than pass the whole object to the low-level loose/packed helper functions, we pass the individual pointers. Let's pass the whole struct instead, which will make adding more items later easier. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:29:27 -07:00
Jeff King	5b0864070e	sha1_object_info_extended: make type calculation optional Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a "cat-file --batch-check" that does not include %(objecttype). This patch adds a "typep" query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real 0m8.680s user 0m8.160s sys 0m0.512s to: real 0m7.205s user 0m6.580s sys 0m0.608s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:16:36 -07:00
Jeff King	412916ee13	packed_object_info: make type lookup optional Currently, packed_object_info can save some work by not calculating the size or disk_size of the object if the caller is not interested. However, it always calculates the true object type, whether the caller cares or not, and only optionally returns the easy-to-get "representation type". Let's swap these types. The function will now return the representation type (or OBJ_BAD on failure), and will only optionally fill in the true type. There should be no behavior change yet, as the only caller, sha1_object_info_extended, will always feed it a type pointer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:14:06 -07:00
Jeff King	90191d37ab	packed_object_info: hoist delta type resolution to helper To calculate the type of a packed object, we must walk down its delta chain until we hit a true base object with a real type. Most of the code in packed_object_info is for handling this case. Let's hoist it out into a separate helper function, which will make it easier to make the type-lookup optional in the future (and keep our indentation level sane). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:13:23 -07:00
Jeff King	052fe5eaca	sha1_loose_object_info: make type lookup optional Until recently, the only items to request from sha1_object_info_extended were type and size. This meant that we always had to open a loose object file to determine one or the other. But with the addition of the disk_size query, it's possible that we can fulfill the query without even opening the object file at all. However, since the function interface always returns the type, we have no way of knowing whether the caller cares about it or not. This patch only modified sha1_loose_object_info to make type lookup optional using an out-parameter, similar to the way the size is handled (and the return value is "0" or "-1" for success or error, respectively). There should be no functional change yet, though, as sha1_object_info_extended, the only caller, will always ask for a type. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:10:04 -07:00
Jeff King	f2f57e31f6	sha1_object_info_extended: rename "status" to "type" The value we get from each low-level object_info function (e.g., loose, packed) is actually the object type (or -1 for error). Let's explicitly call it "type", which will make further refactorings easier to read. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:10:03 -07:00
Jeff King	25fba78d36	cat-file: disable object/refname ambiguity check for batch mode A common use of "cat-file --batch-check" is to feed a list of objects from "rev-list --objects" or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since `798c35f` (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by `ca91993` (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectname)' <objects on the linux.git repository: real 1m13.494s user 0m25.924s sys 0m47.532s If we revert `ca91993`, the packed-refs up-to-date check, it gets a little better: real 0m54.697s user 0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert `798c35f`, disabling the warning entirely, we get a much more reasonable time: real 0m7.452s user 0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 10:09:56 -07:00
Junio C Hamano	ee6e5843c1	Merge branch 'nd/warn-ambiguous-object-name' into jk/cat-file-batch-optim * nd/warn-ambiguous-object-name: get_sha1: warn about full or short object names that look like refs	2013-07-12 10:09:50 -07:00
Heiko Voigt	b2dc09455a	do not die when error in config parsing of buf occurs If a config parsing error in a file occurs we can die and let the user fix the issue. This is different for the buf parsing function since it can be used to parse blobs of .gitmodules files. If a parsing error occurs here we should proceed since otherwise a database containing such an error in a single revision could be rendered unusable. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:34:58 -07:00
Heiko Voigt	1bc888193e	teach config --blob option to parse config from database This can be used to read configuration values directly from git's database. For example it is useful for reading to be checked out .gitmodules files directly from the database. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:34:57 -07:00
Heiko Voigt	4d8dd1494e	config: make parsing stack struct independent from actual data source To simplify adding other sources we extract all functions needed for parsing into a list of callbacks. We implement those callbacks for the current file parsing. A new source can implement its own set of callbacks. Instead of storing the concrete FILE pointer for parsing we store a void pointer. A new source can use this to store its custom data. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:34:57 -07:00
Heiko Voigt	dbb9a81255	config: drop cf validity check in get_next_char() The global variable cf is set with an initialized value in all codepaths before calling this function. The complete call graph looks like this: git_config_from_file -> do_config_from -> git_parse_file -> get_next_char -> get_value -> get_next_char -> parse_value -> get_next_char -> get_base_var -> get_next_char -> get_extended_base_var -> get_next_char The variable is initialized in do_config_from. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:34:57 -07:00
Heiko Voigt	ca4b5de28b	config: factor out config file stack management Because a config callback may start parsing a new file, the global context regarding the current config file is stored as a stack. Currently we only need to manage that stack from git_config_from_file. Let's factor it out to allow new sources of config data. Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:34:57 -07:00
Jeff King	4783e7ea83	t0008: avoid SIGPIPE race condition on fifo To test check-ignore's --stdin feature, we use two fifos to send and receive data. We carefully keep a descriptor to its input open so that it does not receive EOF between input lines. However, we do not do the same for its output. That means there is a potential race condition in which check-ignore has opened the output pipe once (when we read the first line), and then writes the second line before we have re-opened the pipe. In that case, check-ignore gets a SIGPIPE and dies. The outer shell then tries to open the output fifo but blocks indefinitely, because there is no writer. We can fix it by keeping a descriptor open through the whole procedure. This should also help if check-ignore dies for any other reason (we would already have opened the fifo and would therefore not block, but just get EOF on read). However, we are technically still susceptible to check-ignore dying early, before we have opened the fifo. This is an unlikely race and shouldn't generally happen in practice, though, so we can hopefully ignore it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:24:29 -07:00
Jeff King	8b8dfd5132	pack-revindex: radix-sort the revindex The pack revindex stores the offsets of the objects in the pack in sorted order, allowing us to easily find the on-disk size of each object. To compute it, we populate an array with the offsets from the sha1-sorted idx file, and then use qsort to order it by offsets. That does O(n log n) offset comparisons, and profiling shows that we spend most of our time in cmp_offset. However, since we are sorting on a simple off_t, we can use numeric sorts that perform better. A radix sort can run in O(kn), where k is the number of "digits" in our number. For a 64-bit off_t, using 16-bit "digits" gives us k=4. On the linux.git repo, with about 3M objects to sort, this yields a 400% speedup. Here are the best-of-five numbers for running echo HEAD \| git cat-file --batch-check="%(objectsize:disk) on a fully packed repository, which is dominated by time spent building the pack revindex: before after real 0m0.834s 0m0.204s user 0m0.788s 0m0.164s sys 0m0.040s 0m0.036s This matches our algorithmic expectations. log(3M) is ~21.5, so a traditional sort is ~21.5n. Our radix sort runs in kn, where k is the number of radix digits. In the worst case, this is k=4 for a 64-bit off_t, but we can quit early when the largest value to be sorted is smaller. For any repository under 4G, k=2. Our algorithm makes two passes over the list per radix digit, so we end up with 4n. That should yield ~5.3x speedup. We see 4x here; the difference is probably due to the extra bucket book-keeping the radix sort has to do. On a smaller repo, the difference is less impressive, as log(n) is smaller. For git.git, with 173K objects (but still k=2), we see a 2.7x improvement: before after real 0m0.046s 0m0.017s user 0m0.036s 0m0.012s sys 0m0.008s 0m0.000s On even tinier repos (e.g., a few hundred objects), the speedup goes away entirely, as the small advantage of the radix sort gets erased by the book-keeping costs (and at those sizes, the cost to generate the the rev-index gets lost in the noise anyway). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Brandon Casey <drafnel@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:20:54 -07:00
Jeff King	012b32bb46	pack-revindex: use unsigned to store number of objects A packfile may have up to 2^32-1 objects in it, so the "right" data type to use is uint32_t. We currently use a signed int, which means that we may behave incorrectly for packfiles with more than 2^31-1 objects on 32-bit systems. Nobody has noticed because having 2^31 objects is pretty insane. The linux.git repo has on the order of 2^22 objects, which is hundreds of times smaller than necessary to trigger the bug. Let's bump this up to an "unsigned". On 32-bit systems, this gives us the correct data-type, and on 64-bit systems, it is probably more efficient to use the native "unsigned" than a true uint32_t. While we're at it, we can fix the binary search not to overflow in such a case if our unsigned is 32 bits. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2013-07-12 09:18:42 -07:00

... 4 5 6 7 8 ...

34309 Commits