For discussion of Alpine Linux development and developer support

7 5

[alpine-devel] force compile flag for musl?

Natanael Copa
Details
Message ID
<20171025164614.14c4c57e@ncopa-desktop.copa.dup.pw>
Sender timestamp
1508942774
DKIM signature
missing
Download raw message
Patch: +2 -0
Hi,


I wonder what you think about overriding the -Os compile flag for musl,
and hardcode it to -O2.

I think this makes sense since the functions in libc are so often used
that we want trade better performance at the cost of slightly bigger
binary.

This means that we override whatever user as set CFLAGS to
in /etc/abuild.conf

We already do this with zlib.

What do you think?

diff --git a/main/musl/APKBUILD b/main/musl/APKBUILD
index 1938bbb3ca..193002186d 100644
--- a/main/musl/APKBUILD
+++ b/main/musl/APKBUILD
@@ -54,6 +54,8 @@ build() {
        fi
 
        # note: not autotools
+       # force -O2 compile flag for better performance
+       CFLAGS="-O2" \
        LDFLAGS="$LDFLAGS -Wl,-soname,libc.musl-${CARCH}.so.1" \
        ./configure \
                --build=$CBUILD \

-nc


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
A. Wilcox
Details
Message ID
<59F0E348.6020807@adelielinux.org>
In-Reply-To
<20171025164614.14c4c57e@ncopa-desktop.copa.dup.pw> (view parent)
Sender timestamp
1508959048
DKIM signature
missing
Download raw message
On 25/10/17 09:46, Natanael Copa wrote:
> Hi,
> 
> 
> I wonder what you think about overriding the -Os compile flag for 
> musl, and hardcode it to -O2.


Possibly.  Is there some benchmarks available, maybe using libc-test
or such?


> What do you think?
> 
> +       # force -O2 compile flag for better performance +
> CFLAGS="-O2" \


No.  Stuff in abuild.conf needs to be preserved and -O2 tacked on the
end (GCC will only honour last -O flag passed, so this is what you want).

This is INCREDIBLY important to us at Adélie because for instance
ppc64 BE requires -fno-inline-small-functions due to GCC bug elsewise
it will cause ABI issues with long double.  We also use -march / -mcpu
which would be discarded here as well, which would cause it to be
slower on some platforms (-O2 won't help as much on x86_32 as it would
on -march=pentium4 x86_32, for instance).

I agree with the idea of using -O2 but not with the implementation of
blowing away all other CFLAGS.

Best,
--arw

-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org
Przemysław Pawełczyk
Details
Message ID
<15f542f8688.cf7ee9da1704.8790472834519953285@zoho.com>
In-Reply-To
<20171025164614.14c4c57e@ncopa-desktop.copa.dup.pw> (view parent)
Sender timestamp
1508945937
DKIM signature
missing
Download raw message
---- On Wed, 25 Oct 2017 16:46:14 +0200 Natanael Copa <ncopa@alpinelinux.org> wrote ---- 
> I wonder what you think about overriding the -Os compile flag for musl, 
> and hardcode it to -O2. 

I would be very careful with such changes.

There is misconception that the higher optimization level, the faster
code is generated.  That is the general -Olevel idea, but not what is
seen in practice.  Gains (or losses) from higher optimization levels
vary between archs and obviously depend on the code that is being
optimized.

Smaller code, beside being smaller, is also more cache-friendly, so -Os
can be faster than -O2 and often is.  OTOH higher optimization levels
for x86-64 usually tend to give better results than on other archs.

There is no rule.  It all depends on:
- source code,
- compiler,
- platform.

>  
> I think this makes sense since the functions in libc are so often used 
> that we want trade better performance at the cost of slightly bigger 
> binary. 

This makes sense if we really get better performance with -O2 on all
platforms AL supports.  And to be able to confirm that, it has te be
measured.

>  
> This means that we override whatever user as set CFLAGS to 
> in /etc/abuild.conf 
>  
> We already do this with zlib. 

zlib is a different beast, because it's computational software.  It's
much more natural to see gains from higher -Olevel in that kind of apps.

>  
> What do you think? 

There were similar changes in aports for various applications over
recent months, but I haven't seen even one proof behind them.

Performance improvements are imporant, and they may come from simply
bumping optimization level, but it should be verified, not blindly
assumed.

Regards,
Przemek


>  
> diff --git a/main/musl/APKBUILD b/main/musl/APKBUILD 
> index 1938bbb3ca..193002186d 100644 
> --- a/main/musl/APKBUILD 
> +++ b/main/musl/APKBUILD 
> @@ -54,6 +54,8 @@ build() { 
>         fi 
>   
>         # note: not autotools 
> +       # force -O2 compile flag for better performance 
> +       CFLAGS="-O2" \ 
>         LDFLAGS="$LDFLAGS -Wl,-soname,libc.musl-${CARCH}.so.1" \ 
>         ./configure \ 
>                 --build=$CBUILD \ 
>  
> -nc 



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
William Pitcock
Details
Message ID
<CA+T2pCH2JffrB_V_AUym6982Xxx9jLRDVO8ECVKrH4Z31zso3A@mail.gmail.com>
In-Reply-To
<59F0E348.6020807@adelielinux.org> (view parent)
Sender timestamp
1508974600
DKIM signature
missing
Download raw message
Hi,

On Wed, Oct 25, 2017 at 2:17 PM, A. Wilcox <awilfox@adelielinux.org> wrote:
> On 25/10/17 09:46, Natanael Copa wrote:
>> Hi,
>>
>>
>> I wonder what you think about overriding the -Os compile flag for
>> musl, and hardcode it to -O2.
>
>
> Possibly.  Is there some benchmarks available, maybe using libc-test
> or such?
>
>
>> What do you think?
>>
>> +       # force -O2 compile flag for better performance +
>> CFLAGS="-O2" \
>
>
> No.  Stuff in abuild.conf needs to be preserved and -O2 tacked on the
> end (GCC will only honour last -O flag passed, so this is what you want).
>
> This is INCREDIBLY important to us at Adélie because for instance
> ppc64 BE requires -fno-inline-small-functions due to GCC bug elsewise
> it will cause ABI issues with long double.  We also use -march / -mcpu
> which would be discarded here as well, which would cause it to be
> slower on some platforms (-O2 won't help as much on x86_32 as it would
> on -march=pentium4 x86_32, for instance).

Maybe we should fix the specfiles to bring in
-fno-inline-small-functions on ppc64/ppc64le?

This seems like something that could hurt if you run into it on a ppc64 machine.

William


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
A. Wilcox
Details
Message ID
<59F1317C.8090706@adelielinux.org>
In-Reply-To
<CA+T2pCH2JffrB_V_AUym6982Xxx9jLRDVO8ECVKrH4Z31zso3A@mail.gmail.com> (view parent)
Sender timestamp
1508979068
DKIM signature
missing
Download raw message
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 25/10/17 18:36, William Pitcock wrote:
> Hi,
> 
> On Wed, Oct 25, 2017 at 2:17 PM, A. Wilcox
> <awilfox@adelielinux.org> wrote:
>> On 25/10/17 09:46, Natanael Copa wrote: This is INCREDIBLY
>> important to us at Adélie because for instance ppc64 BE requires
>> -fno-inline-small-functions due to GCC bug elsewise it will cause
>> ABI issues with long double.
> 
> Maybe we should fix the specfiles to bring in 
> -fno-inline-small-functions on ppc64/ppc64le?
> 
> This seems like something that could hurt if you run into it on a
> ppc64 machine.
> 
> William


I plan on doing more regression testing with different versions and
builds of GCC (at least 5.x, 6.3, and 7.2 in addition to 6.4) to see
if this is a regression or where the bug was introduced.  It would be
much better to find the root cause of this breakage and include a
patch to gcc than to work around it with a specfile.

Also, I have so far not run into any issues with ppc64 BE with
- -fno-inline-small-functions but that could be masking a more subtle
compiler bug.  It's all quite scary, to be blunt... but such is life
with very untested code... don't think anyone has ever used
- -mabi=elfv2 with ppc64 before.

Best,
- --arw


- -- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJZ8TF4AAoJEMspy1GSK50UP8oP/ixtkeSXyULt4FENU3JUbXbr
Rh0dBfuy3QHrG7Xhn3Yu/DgYUwAOvbOUzXheuPQo2Yiw8HZ8jbFEZCOzWV4SRFov
XT/DmXx4VF5BBYd05s8Nn0YYQh4t1FhKnzf2ksaiJCI47cx6KprpNbHX5YqDoQue
MntjLdb+1O/EYVIHclvf99lCkajr1VakLC+PJEF0WGpvDhDaTG4q+wLbexsGqgsN
rhKkGxHOfwf9czbZ9W2tij9CHdejb/GdtCnw18JdL15k8cSiexPg0e1Ze6xWcSiK
OSB+wehMU6y3V7NqGealPaeMWOCqAtBDOtALmR+iz6X/n8NNoGobYpenVt8DM6Jv
mfaGAPmcaX4l7diDTW5ly1PhRtUG3LqK32z9Q8R3FBDZ14U48pciXrNbPFSL402q
uk1EDA1Wz76AijRSRa/FyN0Z1DdA8r0ZOzlBueyxCgF/2AfbCJjJeLUZOH+vmkz9
3DB/irbqSvMLrXiDAYbOP+8fk0+sUCanJhJ7WJ00yuFm4Z065DGFE08k0vKTl3gO
UpwnGA91PCSvybbmYyTSs3yiJJjOuIm46R9dcpWWFnRZONu7xZpTCNheUnwoa3Zn
gqmgqho5nYas1H218/D+SV5oohIJ+DFKBt8oYCkYHebux/esR+V9O43X4jlH91Zl
v8pyh2YDYXW2buvE+zUF
=DaQS
-----END PGP SIGNATURE-----


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Natanael Copa
Details
Message ID
<20171025214603.00169caf@ncopa-desktop.copa.dup.pw>
In-Reply-To
<15f542f8688.cf7ee9da1704.8790472834519953285@zoho.com> (view parent)
Sender timestamp
1508960763
DKIM signature
missing
Download raw message
On Wed, 25 Oct 2017 17:38:57 +0200
Przemys*aw Pawe*czyk <przemoc@zoho.com> wrote:

> ---- On Wed, 25 Oct 2017 16:46:14 +0200 Natanael Copa <ncopa@alpinelinux.org> wrote ---- 
> > I wonder what you think about overriding the -Os compile flag for musl, 
> > and hardcode it to -O2.   
> 
> I would be very careful with such changes.
> 
> There is misconception that the higher optimization level, the faster
> code is generated.  That is the general -Olevel idea, but not what is
> seen in practice.  Gains (or losses) from higher optimization levels
> vary between archs and obviously depend on the code that is being
> optimized.
> 
> Smaller code, beside being smaller, is also more cache-friendly, so -Os
> can be faster than -O2 and often is.  OTOH higher optimization levels
> for x86-64 usually tend to give better results than on other archs.
> 
> There is no rule.  It all depends on:
> - source code,
> - compiler,
> - platform.

As I understand -Os and -O2 are basically the same thing, with the
difference that -O2 enables some more alignments.

> > I think this makes sense since the functions in libc are so often
> > used that we want trade better performance at the cost of slightly
> > bigger binary.   
> 
> This makes sense if we really get better performance with -O2 on all
> platforms AL supports.  And to be able to confirm that, it has te be
> measured.

You are right, of course.

I did a quick test with the code that made me think of -O2 in the first place. It is from here:
https://superuser.com/questions/1219609/why-is-the-alpine-docker-image-over-50-slower-than-the-ubuntu-image

BENCHMARK="import timeit; print(timeit.timeit('import json; json.dumps(list(range(10000)))', number=5000))"

# with -O2

$ for i in $(seq 0 6); do python3 -c "$BENCHMARK"; done
8.284425613994244
8.354899992002174
8.359624709992204
8.392496702988865
8.3223694319895
8.285188248992199
8.294311116012977

# with -Os

$ for i in $(seq 0 6); do python3 -c "$BENCHMARK"; done
8.267152725020424
8.260578833986074
8.232940819987562
8.224149581015809
8.27815192801063
8.31035227200482
8.29849099801504

So it looks that for this specific workload, -Os is actually slightly
faster.

It just highlights the point that you need to actually measure before
trying to optimize.


> >  
> > This means that we override whatever user as set CFLAGS to 
> > in /etc/abuild.conf 
> >  
> > We already do this with zlib.   
> 
> zlib is a different beast, because it's computational software.  It's
> much more natural to see gains from higher -Olevel in that kind of
> apps.

I'm glad to hear that I'm not totally stupid :)

> 
> >  
> > What do you think?   
> 
> There were similar changes in aports for various applications over
> recent months, but I haven't seen even one proof behind them.
> 
> Performance improvements are imporant, and they may come from simply
> bumping optimization level, but it should be verified, not blindly
> assumed.

Yeah, lets keep -Os and only change if its tested to have any impact.


> 
> Regards,
> Przemek
> 
> 
> >  
> > diff --git a/main/musl/APKBUILD b/main/musl/APKBUILD 
> > index 1938bbb3ca..193002186d 100644 
> > --- a/main/musl/APKBUILD 
> > +++ b/main/musl/APKBUILD 
> > @@ -54,6 +54,8 @@ build() { 
> >         fi 
> >   
> >         # note: not autotools 
> > +       # force -O2 compile flag for better performance 
> > +       CFLAGS="-O2" \ 
> >         LDFLAGS="$LDFLAGS -Wl,-soname,libc.musl-${CARCH}.so.1" \ 
> >         ./configure \ 
> >                 --build=$CBUILD \ 
> >  
> > -nc   
> 



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Jakub Jirutka
Details
Message ID
<DC7CCC66-BC27-4659-8B26-280496BC7879@jirutka.cz>
In-Reply-To
<15f542f8688.cf7ee9da1704.8790472834519953285@zoho.com> (view parent)
Sender timestamp
1508960983
DKIM signature
missing
Download raw message
Uh, I’ve accidentally sent my response to just specific people and not to ML. :( So once again and sorry for mess.


>> --- a/main/musl/APKBUILD 
>> +++ b/main/musl/APKBUILD 
>> @@ -54,6 +54,8 @@ build() { 
>>       fi 
>> 
>>       # note: not autotools 
>> +       # force -O2 compile flag for better performance 
>> +       CFLAGS="-O2" \ 
>>       LDFLAGS="$LDFLAGS -Wl,-soname,libc.musl-${CARCH}.so.1" \ 
>>       ./configure \ 
>>               --build=$CBUILD \ 

This is IMO not correct, CFLAGS does not define just -Os, but more flags (-fomit-frame-pointer). Here you replace all the default flags with -O2.
I suggest to use `CFLAGS="${CFLAGS/-Os/-O2}"` instead, that’s what I already used in some abuilds.

> OTOH higher optimization levels for x86-64 usually tend to give better results than on other archs.

x86_64 is the most used platform, especially when you need performance (and can’t afford IBM’s proprietary architectures). I’d bet that it applies even to Alpine, but don’t have any numbers.


> There were similar changes in aports for various applications over recent months, but I haven’t seen even one proof behind them.
> 
> Performance improvements are imporant, and they may come from simply bumping optimization level, but it should be verified, not blindly assumed.

Technically you’re right, it’d be indeed nice to have some real proof. Unfortunately it’s quite hard to make a good benchmark and probably no one of us have time to do that. :(

However, there’s another reason why to prefer -O2. It’s the default optimization level that use almost all upstream projects and even downstream (other distros). So it’s the most tested variant and if the project care about performance, they typically optimize for -O2 (or -O3). But I must mention that I’m really not expert in this field, I can only say what I see as the most used, but not whether it really makes in technical aspects.

I’m the one who has changed -Os to -O2 in _some_ specific abuilds where performance is important, e.g. x264, opus, qemu, postgresql… I actually tried to find some proof for PostgreSQL, but the only performance good comparison I found [1] compares just -O2, -O3, -O4, -march=native and -flto, not -Os.


Just a note: we already compile many aports with -O2 and most of you don’t even know about it and/or didn’t care. These are almost all aports built by CMake. CMake by default doesn’t log what it is really executing, unless you set `-DCMAKE_VERBOSE_MAKEFILE=ON` (that’s why I usually enable it). If you’d enable verbose mode, you would see that -Os is passed to gcc, but it’s followed by -O2 added by CMake. That’s what `-DCMAKE_BUILD_TYPE=Release` do (among others); it’s a profile that predefines various flags including -O and it has higher priority than CFLAGS from environment. If you want -Os, there’s a built-in profile MinSizeRel. You can look which aports use this flag… ;)

However, as I have discovered over time, MinSizeRel is not always usable. Many projects have very bad CMakeLists, they fully or partially ignore CMAKE_BUILD_TYPE or foolishly assumes that release profile is just and only Release, so build sometimes even fails with MinSizeRel.


So, to sum it up, -O2 is the default and most used optimization level in most upstream projects and even other distributions, even we already use it for many aports, whatever you’re aware of it or not. So I’d support changing the default to -O2, at least for x86_64, and change it to -Os in specific abuilds where it makes sense (especially small static binaries).

As a (partial) academic person, I must agree with Przemysław about benchmarking vs. believing, but it’s unfortunately not realistic, at least not for all aports. It’d be really great and useful if someone can measure difference at least for very core components like musl libc.


I’d like to ask Skarnet and Shiz about their opinion and expertise in this topic.


Jakub

[1]: https://blog.pgaddict.com/posts/compiler-optimization-vs-postgresql

> On 25. Oct 2017, at 17:38, Przemysław Pawełczyk <przemoc@zoho.com> wrote:
> 
> ---- On Wed, 25 Oct 2017 16:46:14 +0200 Natanael Copa <ncopa@alpinelinux.org> wrote ---- 
>> I wonder what you think about overriding the -Os compile flag for musl, 
>> and hardcode it to -O2. 
> 
> I would be very careful with such changes.
> 
> There is misconception that the higher optimization level, the faster
> code is generated.  That is the general -Olevel idea, but not what is
> seen in practice.  Gains (or losses) from higher optimization levels
> vary between archs and obviously depend on the code that is being
> optimized.
> 
> Smaller code, beside being smaller, is also more cache-friendly, so -Os
> can be faster than -O2 and often is.  OTOH higher optimization levels
> for x86-64 usually tend to give better results than on other archs.
> 
> There is no rule.  It all depends on:
> - source code,
> - compiler,
> - platform.
> 
>> 
>> I think this makes sense since the functions in libc are so often used 
>> that we want trade better performance at the cost of slightly bigger 
>> binary. 
> 
> This makes sense if we really get better performance with -O2 on all
> platforms AL supports.  And to be able to confirm that, it has te be
> measured.
> 
>> 
>> This means that we override whatever user as set CFLAGS to 
>> in /etc/abuild.conf 
>> 
>> We already do this with zlib. 
> 
> zlib is a different beast, because it's computational software.  It's
> much more natural to see gains from higher -Olevel in that kind of apps.
> 
>> 
>> What do you think? 
> 
> There were similar changes in aports for various applications over
> recent months, but I haven't seen even one proof behind them.
> 
> Performance improvements are imporant, and they may come from simply
> bumping optimization level, but it should be verified, not blindly
> assumed.
> 
> Regards,
> Przemek
> 
> 
>> 
>> diff --git a/main/musl/APKBUILD b/main/musl/APKBUILD 
>> index 1938bbb3ca..193002186d 100644 
>> --- a/main/musl/APKBUILD 
>> +++ b/main/musl/APKBUILD 
>> @@ -54,6 +54,8 @@ build() { 
>>       fi 
>> 
>>       # note: not autotools 
>> +       # force -O2 compile flag for better performance 
>> +       CFLAGS="-O2" \ 
>>       LDFLAGS="$LDFLAGS -Wl,-soname,libc.musl-${CARCH}.so.1" \ 
>>       ./configure \ 
>>               --build=$CBUILD \ 
>> 
>> -nc 
> 
> 
> 
> ---
> Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
> Help:         alpine-devel+help@lists.alpinelinux.org
> ---
> 



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Natanael Copa
Details
Message ID
<20171025215422.3c9ddcfd@ncopa-desktop.copa.dup.pw>
In-Reply-To
<59F0E348.6020807@adelielinux.org> (view parent)
Sender timestamp
1508961262
DKIM signature
missing
Download raw message
On Wed, 25 Oct 2017 14:17:28 -0500
"A. Wilcox" <awilfox@adelielinux.org> wrote:

> On 25/10/17 09:46, Natanael Copa wrote:
> > Hi,
> > 
> > 
> > I wonder what you think about overriding the -Os compile flag for 
> > musl, and hardcode it to -O2.  
> 
> 
> Possibly.  Is there some benchmarks available, maybe using libc-test
> or such?
> 
> 
> > What do you think?
> > 
> > +       # force -O2 compile flag for better performance +
> > CFLAGS="-O2" \  
> 
> 
> No.  Stuff in abuild.conf needs to be preserved and -O2 tacked on the
> end (GCC will only honour last -O flag passed, so this is what you want).
> 
> This is INCREDIBLY important to us at Adélie because for instance
> ppc64 BE requires -fno-inline-small-functions due to GCC bug elsewise
> it will cause ABI issues with long double.  We also use -march / -mcpu
> which would be discarded here as well, which would cause it to be
> slower on some platforms (-O2 won't help as much on x86_32 as it would
> on -march=pentium4 x86_32, for instance).

Thanks. This is useful information.
 
> I agree with the idea of using -O2 but not with the implementation of
> blowing away all other CFLAGS.

I was thinking about this, yes. I think something like:

  CFLAGS="$CFLAGS -O2"

would be better. Alternatively we could have an optional performance opt, something like:

  CFLAGS="${CFLAGS_OPT_PERFORMACE:-$CFLAGS}"

But I think we just keep it simple as it is, because I measured it and
-O2 was actually slightly slower for the specific use case.

Thanks!

-nc

> 
> Best,
> --arw
> 



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---