git-commit-vandalism

Author	SHA1	Message	Date
Linus Torvalds	e869e113c8	block-sha1: Use '(B&C)+(D&(B^C))' instead of '(B&C)\|(D&(B\|C))' in round 3 It's an equivalent expression, but the '+' gives us some freedom in instruction selection (for example, we can use 'lea' rather than 'add'), and associates with the other additions around it to give some minor scheduling freedom. Suggested-by: linux@horizon.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-06 13:56:45 -07:00
Linus Torvalds	ab14c823df	block-sha1: macroize the rounds a bit further Avoid repeating the shared parts of the different rounds by adding a macro layer or two. It was already more cpp than C. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-06 13:56:45 -07:00
Linus Torvalds	7b5075fcfb	block-sha1: re-use the temporary array as we calculate the SHA1 The mozilla-SHA1 code did this 80-word array for the 80 iterations. But the SHA1 state is really just 512 bits, and you can actually keep it in a kind of "circular queue" of just 16 words instead. This requires us to do the xor updates as we go along (rather than as a pre-phase), but that's really what we want to do anyway. This gets me really close to the OpenSSL performance on my Nehalem. Look ma, all C code (ok, there's the rol/ror hack, but that one doesn't strictly even matter on my Nehalem, it's just a local optimization). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-06 13:56:45 -07:00
Linus Torvalds	139e3456ec	block-sha1: make the 'ntohl()' part of the first SHA1 loop This helps a teeny bit. But what I -really- want to do is to avoid the whole 80-array loop, and do the xor updates as I go along.. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-06 13:56:45 -07:00
Junio C Hamano	fd536d3439	block-sha1: minor fixups Bert Wesarg noticed non-x86 version of SHA_ROT() had a typo. Also spell in-line assembly as __asm__(), otherwise I seem to get error: implicit declaration of function 'asm' from my compiler. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-06 13:56:45 -07:00
Linus Torvalds	b8e48a89b8	block-sha1: try to use rol/ror appropriately Use the one with the smaller constant. It _can_ generate slightly smaller code (a constant of 1 is special), but perhaps more importantly it's possibly faster on any uarch that does a rotate with a loop. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-06 13:56:45 -07:00
Junio C Hamano	b26a9d5089	block-sha1: undo ctx->size change Undo the change I picked up from the mailing list discussion suggested by Nico, not because it is wrong, but it will be done at the end of the follow-up series. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-06 13:56:19 -07:00
Linus Torvalds	d7c208a92e	Add new optimized C 'block-sha1' routines Based on the mozilla SHA1 routine, but doing the input data accesses a word at a time and with 'htonl()' instead of loading bytes and shifting. It requires an architecture that is ok with unaligned 32-bit loads and a fast htonl(). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-08-05 19:28:21 -07:00

8 Commits