Received: from vps5.brixit.nl (vps5.brixit.nl [192.81.221.234]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id 0A6C0780FD8 for <~alpine/devel@lists.alpinelinux.org>; Thu, 13 Aug 2020 13:39:19 +0000 (UTC) Received: from [127.0.0.1] (unknown [62.109.10.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by vps5.brixit.nl (Postfix) with ESMTPSA id D57996070E for <~alpine/devel@lists.alpinelinux.org>; Thu, 13 Aug 2020 13:39:16 +0000 (UTC) To: ~alpine/devel@lists.alpinelinux.org References: <1593625212.dirkptm3b0.none.ref@localhost> <1593625212.dirkptm3b0.none@localhost> From: Oliver Smith Autocrypt: addr=ollieparanoid@postmarketos.org; keydata= mQINBFkgqFYBEACpmb35WGjNXMOALKSpRaZSLnPSQWMGFS1bVbMl4ugIP4aqQCN4qUeC3DAh 57OajhP9eWwbyeAh9l7G2sJYaiuJOonQsnLdNe4vXLH9r8rpr7EAgv6RsiuShIox7W45z1p6 v3SJQi5W+tayzXAsr1shYHKIhx48xBdDjqxHIvYaoyMTiWqyp4o4W0YlH4MTafuEjW1wANma e9thyyhbdRql2kcLjIAkLNRh7rGI3NT0bJboc5p71srv2TqwbbrMOZtmRo9qPFPUpAt7qFaf aRVaasXiIR+zLHfIoW1g7tlzdLPrW1QVvPNBOIUMA4NqKSWiQxsgzdu7Suydou3Kb4O0FHRv vHdfOkB07uECSZTZSdseIXOu9Ofsi6tD0hhz+7ODqknM9IFlPD32CY/H3uvbw9u5qyDAlaSV 6b1djAVzCgc5zJM/WQx8GbH9ww5i8iG+2b/OSSGQRUOr0wxpXDKuN1gbXwZhtVVa56icKJta bbTyhIZZi2/XNqxPtcZxV9LTZ0+uNXcHelO8g/qFNJX0aCtAfaLwec5OZ1qppjeQ8paos13N JtBnH7U+0OCcH7B4Hc91I6L5qnUHQBC84MqLehw/XSBYEp56NXrRJByn9cVCoqWoD+Og4YCx pn0QboraBbYBg32jP8nEt3IRYWyD7hrU3szH9H+OlpCUHFhzYQARAQABtC1PbGl2ZXIgU21p dGggPG9sbGllcGFyYW5vaWRAcG9zdG1hcmtldG9zLm9yZz6JAk4EEwEKADgWIQRW6FMNtYvc gBJX9fBa5/VRPgiFywUCXVhwwQIbIwULCQgHAwUVCgkICwUWAgMBAAIeAQIXgAAKCRBa5/VR PgiFy6mPD/9Kh4T2GOwwJhya9t8YPMmLWZYob/HacosZUWdMHju3B2f7UT7IIbdD4157IihA vECbKhbd0eVfd/WS0mY3i4aBA2981tShpvWN1Ho5ySYMgVQ0OZ0PFBQ9lKvNJbEverwyk9PV y48oGQyOSlY5wlOoBRrESvm5xPChm0uA8hWMMUqv0rvdQ4kPyFri4WEOe3fvvygX2rTbstlD 93NIKwdjJ4/U+DxMVeVxRV0v5lpBLc/Ck6RhX2LKSgTxlctyeQcJYYjloeJX9NtayvX5ZyNC FZ/i9mlk/sihMnOFTufm6Ku506phWo98d3kIMgm/N40EG6ijMbirYYjzYVUvCK9ejslmHJ5k Ipq8u/Xpxx9QW6ndxvhbrtlkJW5kkJ2WYihtloUaFJUsVw1Kk39IMPR42DeDnuB7RpVXF/E2 wqwM83idB+iHW93YExVejsG5JY5MbsRr9aGWX8jPm9Yx67JKZQB0SUDI8AS1MMEeTH2bjqPR IE7opycfM1Bdccc5VV0uR+eJ5MSq5dhO6j6dETP+q0OYOwpE8KBznr8PqOuiJowVAhGtPKF8 xexa6xTzCXElZKvMHtun82K9rHGyldRxeEt8urkvBamN9pfiBlr7KqBgO/89ZUGNn/5pIoqr N2cJjyCRYOSXZASuYkUYyA87+mElWDmsiZbcprIDqNkzDLkCDQRZIKhWARAAtnQOhY4KQwU4 io++WxmiNIKuvA5lzMDugBCK/EcY8hNjVh3L2JjgJ1izwcL2mWaUL384tyBODoAAwlShDObf c8LNozIHVr0JOoblFPR6Jzi9WVq0dquqvqd8ZQu2AXiSjvyvqlqlmX3+/+duSqGa6/p/fYor pCmtTIkGI9CEB8ZabEtlj/rA09QhMnlkHEM9pPtzuUDhddxx7qJ6qFwtp2+WGT7Cd4fHKrnk 5YfwG4RxubxsFkC/3ekO/hmnqkDhM7xTOR7e5+3EHaRoYDAW6DD+QfhLnPFtSLl71G7mzQ9u nvM/H6lWtZLb4SSqOxsNUJwY9FUkqLWnHxeC1xkVNjeRCf0ojobPxmRwG4/uQlE5UNwUolgP zYUU+EbZjtB1TZpD9wVILjkJrkVfdjdGjT3WTuMYbZbF3i63cq2T4jnktfW0zU6LZAsB+sn/ FbkaSBQqN39o+1EhjPEJS5sYksPgHxpLyWgPcUaLilFnoTXAJMafj2B8Pq7yNs/izLGbrNIX vByMZblbkO1SsKmDxYyV4mPDpc6nMVbIMGE5T0HVElBy8nc0kXMrRo2iidm9r2uMdIzTDK4i k8oYaZ1gzPCeL/+rGKA94n6heow5CKJeEb7L22DoPKYVv9JOjLhbZ2+jxZKoCe2NfyRac7Pb iDgztzmks+vzZFmyBSRANrsAEQEAAYkCNgQYAQgAIAIbDBYhBFboUw21i9yAElf18Frn9VE+ CIXLBQJZIKx1AAoJEFrn9VE+CIXL84UP/3l/f77Sfn2Ldwh2h+GyK4Em20/BNgnYx5H8lvKF Cswd2yWp4gO92rrmgnfa43Hl3x7/4Afn2WV5/kQwJL41xoJhhi3n3nOxPzXrHRi2eonLzwup 09VChs4Sg4Q7fHeUK+fXyKg7KgUY5LHDoYH6Md8Cuy5Er6st9Xam6daXDmkVVVQ74b2yXMfs W9h8yBpWjg+JSh68LZf0quevM4iLEq+qZVvRM8lzaDyVs6fAT4iNmaw/+5+RZi7aCH2PLRRI wR4fUfha/MNo1nupLSnQD16kfB5DHkwbHWp96USVkYHl/lxGN55FjH1dP5TBfgAsurCjmxWx ZTQpS+sqivEElg4j7+rpIOLCugskq8EN0Hv7j9nOaov7iB/BzubT9XHqy518/IQ0UAaNPgow pvx7ISd9QXpMhTSeETVgLTv4SaoZZqE2UUKVVkbF5RAt2ykF/4Iow2UEX9nyg0g3g5LW82zV 6xyGm+XdIAoRawBe1vcS0xHfysfqEK23YTpQC4Q69yfjHknaA6rK8rvPrQJK34JWMICes6A9 1RpA51CsEVUZTIha6nkIRF2aOdZaC2NeVhbYX66YEERV2EA5Wy7Fi5ES/7/mhQRkqCj6r6Zw 2Py3fUwz07s/NcFvqkrICZDTmCH4jydV6jUgLwzw4uf82HKwmQxPvyw1XWaK9fXUMON2 Subject: Re: Distro optimization flags Message-ID: Date: Thu, 13 Aug 2020 15:39:12 +0200 MIME-Version: 1.0 In-Reply-To: <1593625212.dirkptm3b0.none@localhost> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Hi all, while I can't look into this in detail right now, I'd like to share a data point. I just switched the CI job of a python program from debian stretch to alpine 3.12 and found that the testsuite takes almost 8x the time now (~8 min instead of ~1 min). https://gitlab.com/postmarketOS/build.postmarketos.org/-/commit/bc3567ce2216226e78f0e31a9da22f3049f94c64 I wonder if compiling python with different flags already makes a big difference, maybe I'll try it out at some point. Best regards, Oliver Alex Xu (Hello71): > Recently there was some discussion on #alpine-devel about optimization > flags. I think it's worth looking at this issue more closely. > > === Rationale === > > -Os is much slower than -O2. I recompiled gcc 9.3.0-r3 from head with > arguments {3} from below and tested compiling Linux 5.7.7 allnoconfig from > tmpfs on edge with make -j4. On my Intel laptop, edge gcc takes about 45 > seconds, O2 gcc takes 39 seconds, and Debian sid takes only 30 seconds. > On my Ryzen desktop, edge takes 38 seconds, O2 takes 33 seconds, and > Debian takes only 22 seconds. In other words, O2 is about a 15% speedup, > and LTO is another 30-50% on top of that. > > https://lore.kernel.org/lkml/20110323211415.GA8791@elte.hu/ from 2011 > says that the kernel ran 'hackbench 15' 10% faster using -O2. > > http://web.archive.org/web/20200408145313/https://rv8.io/bench from 2017 > appears to say that rv8 ran about 25% faster using -O2 compared to -Os. > > === Drawbacks === > > Obviously, the main issue with this change is increased code size. > However, this issue is likely less severe than presented at [1], > because: > > 1. libtracker and some other packages had wrong APKBUILDs that didn't > strip libs. I think -O2 causes slightly larger debug tables to be > generated. I have submitted merge requests to fix the packages I > have found, and we may fix abuild to not require special ordering of > subpackages in these cases. > > 2. It is possible to use a more limited -O2, which does not cause as > much code ballooning. I got this idea from [2], which is a bad idea > to do in a specific package but seems reasonable system-wide. These > -O2 flags have a small improvement on old Intel processors, but > actually slow down speed on AMD processors, and significantly > increase code size. > > 3. LTO is roughly as powerful at reducing code size as O2 is at > increasing it. > > I checked size of attica (example from [1]) with these configurations. > Column 1 is package size, column 2 is installed size as reported by apk, > and column 3+ is the CFLAGS/CXXFLAGS. > > {1} 165461 585728 -Os > {2} 225285 823296 -O2 > {3} 198665 757760 -O2 -fno-align-jumps -fno-align-functions -fno-align-loops -fno-align-labels -fno-prefetch-loop-arrays -freorder-blocks-algorithm=simple > {4} 175413 614400 -O2 -flto -fno-align-jumps -fno-align-functions -fno-align-loops -fno-align-labels -fno-prefetch-loop-arrays -freorder-blocks-algorithm=simple > {5} 176036 675840 -O2 -fno-asynchronous-unwind-tables -fno-align-jumps -fno-align-functions -fno-align-loops -fno-align-labels -fno-prefetch-loop-arrays -freorder-blocks-algorithm=simple > {6} 154055 540672 -O2 -flto -fno-asynchronous-unwind-tables -fno-align-jumps -fno-align-functions -fno-align-loops -fno-align-labels -fno-prefetch-loop-arrays -freorder-blocks-algorithm=simple > > gcc size is harder to measure here, as I built gcc without most > languages. The size of usr/libexec/gcc increased from 43076k excluding > cc2obj and d21 to 49144k excluding cc1plus. However, the latter number > may not be accurate, as for some reason my attica -Os is a different > size from the edge attica. > > === Analysis === > > Unfortunately, it doesn't seem safe to set -fno-asynchronous-unwind-tables > globally. I provide it here only as a reference (and because I did the > benchmark before looking up exactly what the flag does). > > LTO is a can of worms that I think is definitely worth opening at some > point, but should wait at least until both musl 1.2 and gcc 10 are done, > which I gather will take some time. Additionally, it is somewhat > orthogonal to -Ox. So, the question now is whether a 10-25% increase in > performance justifies a 15-30% increase in code size. > > There is also a third option: we can use -O2 in some common CPU-heavy > programs and libraries, such as gcc and openssl. Alpine already uses > default optimization for musl, which I think works out to -Os for most > components and -O3 for performance-sensitive areas. It would be great if > all packages could do this, but it also sounds like way too much work to > patch every single package (and probably PGO is the right answer there > anyways). > > There are also probably other compile flags that we should be looking > at, such as security flags, or linker flags (-Wl,--hash-style=gnu, > -Wl,-O, etc). However, I didn't investigate those at this time. > > === Other distros === > > Although I didn't do much research, I think other distros did not > carefully select their optimization flags (as opposed to security > flags). Most mainstream distros seem to basically use whatever gcc gives > them for -O2. Clear Linux seems to set everything to MAXIMUM > OPTIMIZATION. Gentoo recommends -O2 -march=native -pipe and punts the > decision to the user. OpenWRT uses -Os, which can be overridden > per-target, although I couldn't find any targets overriding the > optimization flags. > > === Limitations === > > These benchmarks are obviously very limited. However, I don't want to go > down the path of extensive benchmarks just to find people coming out of > the woodwork and complaining that a 20% increase in code size (i.e. > excluding scripts, docs, FS overhead, etc) overflows their hard drives. > > Additionally, whoever desperately needs that extra few dozen megabytes > should be using squashfs or zstd apk, so the uncompressed/gzip numbers > are not that useful. > > == Conclusions == > > Personally, I think a 15% speedup is very much worth a 15% increase in > the small portion of my storage used for storing programs. I definitely > think that the optimization level for gcc itself should be changed, and > building it with LTO should be fixed/implemented as soon as possible. I > certainly hope that nobody is installing gcc on their minimal IoT > systems or whatever that cannot spare 10 MB of space. (Also, those > people are wasting space already on Obj-C and D support.) > > In my opinion, anybody that doesn't want to use an extra few dozen > megabytes of space either should care more about the extra power > consumption, or should be using a custom OpenWRT or Buildroot anyways, > where they can customize everything. > > [1] https://lists.alpinelinux.org/~alpine/devel/%3C2896c13070c508a49cbaa72c8fb7f34ea947358b.camel%40cogitri.dev%3E > [2] https://github.com/richfelker/mallocng-draft/commit/a9187f0387dcbb77f1f7e4d7774602fd394fb27b > > Cheers, > Alex. >