X-Original-To: alpine-aports@lists.alpinelinux.org Received: from mx12.valuehost.ru (mx12.valuehost.ru [217.112.42.215]) by lists.alpinelinux.org (Postfix) with ESMTP id C3E96F84DB6 for ; Fri, 14 Dec 2018 07:45:17 +0000 (UTC) Received: from mx7.valuehost.ru (unknown [127.0.0.255]) by mx12.valuehost.ru (Postfix) with ESMTP id 37ADD645BE for ; Fri, 14 Dec 2018 10:45:16 +0300 (MSK) From: alpine-mips-patches Date: Fri, 14 Dec 2018 10:07:39 +0000 Subject: [alpine-aports] [PATCH] community/xxhash: fix 20x speed degradation on x86*, upgrade to 0.6.5 To: alpine-aports@lists.alpinelinux.org Message-Id: <20181214074516.37ADD645BE@mx12.valuehost.ru> X-Mailinglist: alpine-aports Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: Yes, it is 20 times slower on x86* than it should be because xxhash.c always uses "safe" memcpy()-based methods for unaligned memory access (XXH_readXX) irregardless of input alignment due to x86-default XXH_FORCE_ALIGN_CHECK=0. This ends up with real memcpy() calls in hot path (with -O2 too). The bug affects Alpine x86* (not just edge, but at least 3.8 too -- i.e. this is not something introduced in 0.6.5) for aligned and unaligned inputs. Other architectures are severely affected for unaligned inputs only. The fix lifts the XXH_FORCE_MEMORY_ACCESS=1 condition to enable XXH_readXX methods based on __attribute__((__packed__)) usage everywhere except ARMv6 (which is covered by its own case earlier). This is safe and fast because the compiler will either: - use direct storage access instructions on capable architectures such as aarch64, armv7, ppc64le, s390x, x86* irregardless of input alignment; - or use relatively fast LWL/LWR instructions on mips* with unaligned input; - or use byte load/stores and shifts/ors on armel with unaligned input which is still faster then memcpy() call. All aports that use xxhash.c are likely affected. For example, community/zstd suffers too though not so grave (~15% difference for "zstd -t" on big archive) and main/lz4 is twice slower on basic compression levels. Other aport changes: - modernize; - enable check(); it is short and fast so suitable for slow builders too. The python part is left intact though newer version exists. --- community/xxhash/APKBUILD | 32 +++++++++++-------- ...ft-XXH_FORCE_MEMORY_ACCESS-condition.patch | 14 ++++++++ 2 files changed, 33 insertions(+), 13 deletions(-) create mode 100644 community/xxhash/lift-XXH_FORCE_MEMORY_ACCESS-condition.patch diff --git a/community/xxhash/APKBUILD b/community/xxhash/APKBUILD index a25c6d7256..781d6254a9 100644 --- a/community/xxhash/APKBUILD +++ b/community/xxhash/APKBUILD @@ -2,10 +2,10 @@ # Maintainer: Stuart Cardall pkgname=xxhash _pkgname=xxHash -pkgver=0.6.2 +pkgver=0.6.5 _pypkg=python-xxhash _pyver=0.6.1 -pkgrel=1 +pkgrel=0 pkgdesc="Extremely fast non-cryptographic hash algorithm" url="http://www.xxhash.com" arch="all" @@ -15,6 +15,7 @@ makedepends="python2-dev python3-dev py-setuptools" subpackages="$pkgname-dev $pkgname-doc py2-$pkgname:_py2 py3-$pkgname:_py3" source="$_pkgname-$pkgver.tar.gz::https://github.com/Cyan4973/$_pkgname/archive/v$pkgver.tar.gz $_pypkg-$pkgver.tar.gz::https://github.com/ifduyue/$_pypkg/archive/v$_pyver.tar.gz + lift-XXH_FORCE_MEMORY_ACCESS-condition.patch " builddir="$srcdir/"$_pkgname-$pkgver pybuilddir="$srcdir/"$_pypkg-$_pyver @@ -22,14 +23,18 @@ pybuilddir="$srcdir/"$_pypkg-$_pyver build() { cd "$builddir" sed -i 's|--leak-check=yes|-v --leak-check=full --show-leak-kinds=all|' Makefile -# make test || return 1 - make xxhsum || return 1 + make CPPFLAGS= xxhsum cd "$pybuilddir" ln -s "$srcdir"/$_pkgname-$pkgver/xxhash.c ./xxhash/xxhash.c ln -s "$srcdir"/$_pkgname-$pkgver/xxhash.h ./xxhash/xxhash.h - python2 setup.py build || return 1 - python3 setup.py build || return 1 + python2 setup.py build + python3 setup.py build +} + +check() { + cd "$builddir" + make check } package() { @@ -38,11 +43,11 @@ package() { mkdir -p "$pkgdir"/usr/include/xxhash mkdir -p "$pkgdir"/usr/share/man/man1 mkdir -p "$pkgdir"/usr/share/doc/xxhash - install -m755 xxhsum "$pkgdir"/usr/bin || return 1 - install -m644 xxhsum.1 "$pkgdir"/usr/share/man/man1 || return 1 - install -m644 LICENSE "$pkgdir"/usr/share/doc/xxhash || return 1 - install -m644 xxhash.h "$pkgdir"/usr/include/xxhash || return 1 - install -m644 xxhash.c "$pkgdir"/usr/include/xxhash || return 1 + install -m755 xxhsum "$pkgdir"/usr/bin + install -m644 xxhsum.1 "$pkgdir"/usr/share/man/man1 + install -m644 LICENSE "$pkgdir"/usr/share/doc/xxhash + install -m644 xxhash.h "$pkgdir"/usr/include/xxhash + install -m644 xxhash.c "$pkgdir"/usr/include/xxhash } _py2() { @@ -63,5 +68,6 @@ _py() { $python setup.py install --prefix=/usr --root="$subpkgdir" } -sha512sums="1e8017f78baf5747f739d6ab0c6c3ce51e4ddf53bd0aced3e2495fceefea23b408e395ff2f38681ad54e8588525fa12c13b08c1ff5fabf1df75044525c15e781 xxHash-0.6.2.tar.gz -72a99d744ccaac830e9789053acb9728b2da457c7841e2aae96e9748450f09366b9830f6d92b62ac494e938f43c1fea7910c9d5257824ae33c1fe48f199ed9cc python-xxhash-0.6.2.tar.gz" +sha512sums="085643b52e091ac0eedd54c4459220b3643d825ca71a11e952d00ea2041c570ff57d8553d0378f34e038ca9ee3b40d2048ed02d44d5aff1fbfcbf5e642487ba0 xxHash-0.6.5.tar.gz +72a99d744ccaac830e9789053acb9728b2da457c7841e2aae96e9748450f09366b9830f6d92b62ac494e938f43c1fea7910c9d5257824ae33c1fe48f199ed9cc python-xxhash-0.6.5.tar.gz +5503fc4177bbbc8ebac3c921be1a560b7197d1e66cb94064013fa5df750c6659520bb8ddec689b2b3ccb51cec3088508c7dce4bc2cf8c6127053d96e39cd7e6e lift-XXH_FORCE_MEMORY_ACCESS-condition.patch" diff --git a/community/xxhash/lift-XXH_FORCE_MEMORY_ACCESS-condition.patch b/community/xxhash/lift-XXH_FORCE_MEMORY_ACCESS-condition.patch new file mode 100644 index 0000000000..581a44777e --- /dev/null +++ b/community/xxhash/lift-XXH_FORCE_MEMORY_ACCESS-condition.patch @@ -0,0 +1,14 @@ +--- a/xxhash.c ++++ b/xxhash.c +@@ -54,10 +54,7 @@ + || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) \ + || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) ) + # define XXH_FORCE_MEMORY_ACCESS 2 +-# elif (defined(__INTEL_COMPILER) && !defined(_WIN32)) || \ +- (defined(__GNUC__) && ( defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) \ +- || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) \ +- || defined(__ARM_ARCH_7S__) )) ++# elif (defined(__INTEL_COMPILER) && !defined(_WIN32)) || defined(__GNUC__) + # define XXH_FORCE_MEMORY_ACCESS 1 + # endif + #endif -- 2.19.2 --- Unsubscribe: alpine-aports+unsubscribe@lists.alpinelinux.org Help: alpine-aports+help@lists.alpinelinux.org ---