~alpine/devel

[RFC] Enable -fno-plt for x86 and x86_64

Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Details
Message ID
<1628515011.zujvcn248v.none@localhost>
DKIM signature
missing
Download raw message
## Summary

-fno-plt is a gcc flag that disables the use of PLT stubs for function 
calls. I suggest we enable it for x86 and x86_64. The code size bloat is 
substantial on aarch64, and (as far as I could tell) it is silently 
ignored on arm32, mips64, s390x, riscv64, and ppc64le. On mips64 and 
riscv64, there is -mno-plt, but it significantly bloats code size and 
instruction count.

## Benefits

Each PLT thunk adds at least 32 bytes to the binary size on x86 and 
x86_64. The PLT also occupies valuable instruction cache space and, on 
x86, indirect jump predictor space (the actual function call occupies 
one slot, plus PLT accesses on x86 use indirect calls so occupy another 
slot).

I did a quick test, and building apk-tools on x86_64 with -fno-plt (plus 
the standard abuild.conf flags) reduces package size from 119480 to 
115686 (-3794) and install size from 253648 to 245384 (-8264).

## Downsides

As far as I know, the main reason why -fno-plt is not used by default in 
gcc is because it is incompatible with lazy linking. This is not 
relevant to Alpine Linux, as musl does not support lazy linking anyways.

There is another, less documented issue: -fno-plt increases code size 
for non-interposable ELF function calls on x86 and x86_64 from 5 bytes 
to 6 bytes. On aarch64, calls double from 8 bytes to 16 bytes, plus 8 
more bytes per function to get the GOT address. On rv64gc, -mno-plt 
inflates calls from 4 bytes to 10 bytes. Note that these numbers are 
with -fPIE; -fno-PIE numbers differ.

Certain packages may not compile or run properly with -fno-plt, but 
these are likely to be minimal. See the next section for more 
information.

## Other distros

Arch Linux has enabled -fno-plt by default (and supports only x86_64) 
since 2017: https://github.com/archlinux/svntogit-packages/commit/72de4d337e02e5626598038b801517e47988b4c8.

It is disabled for xorg, glibc, valgrind, and openjdk:
https://github.com/archlinux/svntogit-packages/search?q=%22-fno-plt%22

Gentoo does not use -fno-plt but a third-party overlay enables -fno-plt 
along with other optimization flags such as LTO. There are minimal 
issues reported with -fno-plt, mostly around the usual suspects doing 
sketchy things like wine, glibc, and xorg-server:
https://github.com/InBetweenNames/gentooLTO/blob/master/sys-config/ltoize/files/package.cflags/no-plt.conf

I strongly suspect the remaining packages are false positives, since 
exceptions are applied for basically anybody that reports an issue, and 
they ave almost never removed.

OpenWRT has enabled -fno-plt since 2016: 
https://github.com/openwrt/openwrt/commit/fb713ddd4dd49fb60ee4ab732071abf2c3ad5fc5.

As far as I could tell, Debian, Fedora, and openSUSE do not use 
-fno-plt by default. I believe for Debian, this is because BIND_NOW is 
still not enabled by default. For Fedora and openSUSE, I think binary 
size is not a priority.

## Conclusion

I think -fno-plt has moderate benefits, and minimal costs for x86 and 
x86_64. There is some risk of breaking some packages, but I think the 
risk is minimal as long as the known-broken packages (xorg, glibc, 
valgrind, and openjdk) have -fno-plt fully tested and disabled if 
necessary, and the flag overall is given enough time to test in edge 
before a stable release.

I am primarily concerned with and have primarily analyzed x86 and 
x86_64, but am not opposed to enabling it also for other architectures 
if someone does better analysis and finds a better cost-benefit ratio.
Reply to thread Export thread (mbox)