Our put_be32() routine and its variants (get_be32(), put_be64(), etc)
has two implementations: on some platforms we cast memory in place and
use nothl()/htonl(), which can cause unaligned memory access. And on
others, we pick out the individual bytes using bitshifts.
This introduces extra complexity, and sometimes causes compilers to
generate warnings about type-punning. And it's not clear there's any
performance advantage.
This split goes back to 660231aa97 (block-sha1: support for
architectures with memory alignment restrictions, 2009-08-12). The
unaligned versions were part of the original block-sha1 code in
d7c208a92e (Add new optimized C 'block-sha1' routines, 2009-08-05),
which says it is:
Based on the mozilla SHA1 routine, but doing the input data accesses a
word at a time and with 'htonl()' instead of loading bytes and shifting.
Back then, Linus provided timings versus the mozilla code which showed a
27% improvement:
https://lore.kernel.org/git/alpine.LFD.2.01.0908051545000.3390@localhost.localdomain/
However, the unaligned loads were either not the useful part of that
speedup, or perhaps compilers and processors have changed since then.
Here are times for computing the sha1 of 4GB of random data, with and
without -DNO_UNALIGNED_LOADS (and BLK_SHA1=1, of course). This is with
gcc 10, -O2, and the processor is a Core i9-9880H.
[stock]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 6.638 s ± 0.081 s [User: 6.269 s, System: 0.368 s]
Range (min … max): 6.550 s … 6.841 s 10 runs
[-DNO_UNALIGNED_LOADS]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 6.418 s ± 0.015 s [User: 6.058 s, System: 0.360 s]
Range (min … max): 6.394 s … 6.447 s 10 runs
And here's the same test run on an AMD A8-7600, using gcc 8.
[stock]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 11.721 s ± 0.113 s [User: 10.761 s, System: 0.951 s]
Range (min … max): 11.509 s … 11.861 s 10 runs
[-DNO_UNALIGNED_LOADS]
Benchmark #1: t/helper/test-tool sha1 <foo.rand
Time (mean ± σ): 11.744 s ± 0.066 s [User: 10.807 s, System: 0.928 s]
Range (min … max): 11.637 s … 11.863 s 10 runs
So the unaligned loads don't seem to help much, and actually make things
worse. It's possible there are platforms where they provide more
benefit, but:
- the non-x86 platforms for which we use this code are old and obscure
(powerpc and s390).
- the main caller that cares about performance is block-sha1. But
these days it is rarely used anyway, in favor of sha1dc (which is
already much slower, and nobody seems to have cared that much).
Let's just drop unaligned versions entirely in the name of simplicity.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our compat/bswap.h lacks the usual preprocessor guards against multiple
inclusion. This usually isn't an issue since it only gets included from
git-compat-util.h, which has its own guards. But it would produce
redeclaration errors if any file included it separately.
Our hdr-check target would complain about this, except that it currently
skips items in compat/ entirely.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new get_be64 macro to enable 64 bit endian conversions on memory
that may or may not be aligned.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify the implementation and allow callers to use expressions with
side-effects by turning the macros get_be16, get_be32 and put_be32 into
inline functions.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pointer p is dereferenced and we get an unsigned char. Before
shifting it's automatically promoted to int. Left-shifting a signed
32-bit value bigger than 127 by 24 places is undefined. Explicitly
convert to a 32-bit unsigned type to avoid undefined behaviour if
the highest bit is set.
Found with Clang's UBSan.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The byte-swapping code automatically decides, based on the
platform, whether it is sensible to cast and do a potentially
unaligned ntohl(), or to pick individual bytes out of an
array.
It can be handy to override this decision, though, when
turning on compiler flags that will complain about unaligned
loads (such as -fsanitize=undefined). This patch adds a
macro check to make this possible.
There's no nice Makefile knob here; this is for prodding at
Git's internals, and anybody using it can set
"-DNO_UNALIGNED_LOADS" in the same place they are setting up
"-fsanitize".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no /usr/include/endian.h equivalent on z/OS, but the
compiler will define macros to indicate endianness on host and
target hardware. This adds a test for these macros as a last
resort for determining byte order.
Signed-off-by: David Michael <fedora.dm0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The changes to make detection of endianness more portable had a bug
that breaks on (at least) Solaris x86.
The bug appears to be a simple copy/paste typo. It checks for
_BIG_ENDIAN and not _LITTLE_ENDIAN for both the case where we would
decide the system is big endian and little endian. Instead, the
second test should be for _LITTLE_ENDIAN and not _BIG_ENDIAN.
Two fixes were possible:
1. Change the negation order of the conditions in the second test.
2. Reverse the order of the conditions in the second test.
Use the second option so that the condition we expect is always a
positive check.
Signed-off-by: Ben Walton <bdwalton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit swaps the order we check the macros defined by
the compiler and the system headers from the original. Since the
order of check should not matter (i.e. it is insane to define both
__BIG_ENDIAN and friends and BIG_ENDIAN and friends and in a
conflicting way), it is the most conservative thing to do not to
change it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit d60c49c (read-cache.c: allow unaligned mapping of the
index file, 2012-04-03) introduced helpers to access
unaligned data. However, we already have get_be32, which has
a few advantages:
1. It's already written, so we avoid duplication.
2. It's probably faster, since it does the endian
conversion and the alignment fix at the same time.
3. The get_be32 code is well-tested, having been in
block-sha1 for a long time. By contrast, our custom
helpers were probably almost never used, since the user
needed to manually define a macro to enable them.
We have to add a get_be16 implementation to the existing
get_be32, but that is very simple to do.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The BLK_SHA1 code has optimized wrappers for doing endian
conversions on memory that may not be aligned. Let's pull
them out so that we can use them elsewhere, especially the
time-tested list of platforms that prefer each strategy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The POSIX standard doesn't currently define a `ntohll`/`htonll`
function pair to perform network-to-host and host-to-network
swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk
storage of EWAH bitmaps if they are not in native byte order.
Many thanks to Ramsay Jones <ramsay@ramsay1.demon.co.uk> and
Torsten Bögershausen <tboegi@web.de> for cygwin/mingw/msvc
portability fixes.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this change, gcc -pedantic warns:
cache.h: In function 'ce_to_dtype':
cache.h:270:21: warning: ISO C forbids braced-groups within expressions [-pedantic]
An inline function is more readable anyway.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit 0fcabdeb52, compat/bswap.h
redefined htonl and ntohl to bswap32 not only if bswap32 has been
defined earlier in compat/bswap.h (which is done only on selected
platforms), but also if bswap32 has been defined anywhere else. This
broke Git at least for NetBSD systems running on big-endian machines
(where ntohl and htonl should, of course, be NOOPs), since NetBSD
defines a bswap32 macro in the system headers.
So, we now undefine any previously defined bswap32 in compat/bswap.h
before defining our own.
Signed-off-by: Holger Weiß <holger@zedat.fu-berlin.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are some places in git where a long is passed to htonl/ntohl. llvm
doesn't support matching operands of different bitwidths intentionally.
This patch fixes the build with llvm-gcc (and clang) on x86_64.
Signed-off-by: Benjamin Kramer <benny.kra@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When compiling with MSVC on x86-compatible, use an intrinsic for byte swapping.
In contrast to the GCC path, we do not prefer inline assembly here as it is not
supported for the x64 platform.
Signed-off-by: Sebastian Schuberth <sschuberth@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit 51ea551 ("make sure byte swapping is optimal for git"
2009-08-18) introduced a "sane definition for ntohl()/htonl()"
for use on some GNU C platforms. Unfortunately, for some of
these platforms, this results in the introduction of a problem
which is essentially the reverse of a problem that commit 6e1c234
("Fix some warnings (on cygwin) to allow -Werror" 2008-07-3) was
intended to fix.
In particular, on platforms where the uint32_t type is defined
to be unsigned long, the return type of the new ntohl()/htonl()
is causing gcc to issue printf format warnings, such as:
warning: long unsigned int format, unsigned int arg (arg 3)
(nine such warnings, covering six different files). The earlier
commit (6e1c234) needed to suppress these same warnings, except
that the types were in the opposite direction; namely the format
specifier ("%u") was 'unsigned int' and the argument type (ie the
return type of ntohl()) was 'long unsigned int' (aka uint32_t).
In order to suppress these warnings, the earlier commit used the
(C99) PRIu32 format specifier, since the definition of this macro
is suitable for use with the uint32_t type on that platform.
This worked because the return type of the (original) platform
ntohl()/htonl() functions was uint32_t.
In order to suppress these warnings, we change the return type of
the new byte swapping functions in the compat/bswap.h header file
from 'unsigned int' to uint32_t.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Jeff King <peff@peff.net>
We rely on ntohl() and htonl() to perform byte swapping in many places.
However, some platforms have libraries providing really poor
implementations of those which might cause significant performance
issues, especially with the block-sha1 code.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>