Received: from magnesium.8pit.net (magnesium.8pit.net [45.76.88.171]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id C4357780D14 for <~alpine/devel@lists.alpinelinux.org>; Sat, 20 Nov 2021 13:35:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=opensmtpd; bh=tHTKSRVT98 1WJEg2jnZelww7S2V+Q2amcRsAgppL0w8=; h=from:subject:to:date; d=soeren-tempel.net; b=wLEUxd+Ok1pcQRWP40m+IJO2zap6YF+bMyYAIkECdnJiKdw 4WgBcgI1ocdlxNW7l6SVOlZj8P6470ghaQi3IB2qPSRL0EDVSCHCNGQY/G/Z4UMkZuFAlW h6JDfN+akvfdhq9rHrg+ijxQMJfKEhMfWWvjpCNsIHkujvemAa5Oqw= Received: from localhost (ip5f5ae01d.dynamic.kabel-deutschland.de [95.90.224.29]) by magnesium.8pit.net (OpenSMTPD) with ESMTPSA id 6471cdcf (TLSv1.3:AEAD-AES256-GCM-SHA384:256:YES) for <~alpine/devel@lists.alpinelinux.org>; Sat, 20 Nov 2021 14:35:00 +0100 (CET) Date: Sat, 20 Nov 2021 14:34:56 +0100 To: ~alpine/devel@lists.alpinelinux.org Subject: Thoughts on self-hosting compilers in Alpine From: =?UTF-8?Q?S=C3=B6ren?= Tempel Message-Id: <33KG0XO61I4IL.2Z7RTAZ5J3SY6@8pit.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello, The recent issue with building GHC [1] lead to an interesting discussion regarding the general handling of self-hosting compilers in Alpine on IRC. I want to further discuss a proposal for improving the current packaging of self-hosting compilers in Alpine in this ML thread. ## Current Situation Alpine actually packages a lot of self-hosting compilers (e.g. community/rust, community/go, community/ghc, testing/cyclone, =E2=80=A6). M= ost of these depend on a previous version of the compiler (as available in the Alpine repositories) for building. For example, community/go provides a virtual package called "go-bootstrap" and depends itself on this virtual package. ## Problem Statement This leads to three problems: 1. Trust Chain Transparency: Unless you want to go all the way back to an initial non-self-hosting version of a self-hosting compiler, building= self-hosting compilers practically requires pre-existing binaries at som= e point. For example, our initial version of GHC was cross-compiled from the binary provided by Ubuntu using Docker [2]. However, with self-hosting compilers, you need to trust not just the previous version of the compiler but the entire chain of versions used [3]. With our present setup, this chain is obscured and (without looking at the Git history) it is in unclear that GHC (for example) was initially bootstrap= ped from the Ubuntu binary. 2. Dynamic Libraries: If the previous version of the compiler is linked dynamically against libraries packaged by Alpine, the previous version of the compiler (as required for the build process) will stop working if these libraries are upgraded to a non-ABI compatible version. This is the root cause of the GHC problem mentioned above: When= libffi (against which GHC links dynamically) was upgraded from libffi.so.7 to libffi.so.8 the version of GHC (as available in the repositories) became defunct and could not be used to rebuild GHC against libffi.so.8. 3. Builder setup: When setting up builders for a new Alpine release (e.g. as we did a few weeks ago for 3.15), manual intervention is necessary to install an initial version for every packaged self-hosting compiler from Alpine Edge. ## Proposal I propose improving packaging of self-hosting compilers by providing two packages for each self-hosting compiler in aports.git: 1. ${repo}/${self_hosting_compiler}-stage0 which builds and packages $compiler without requiring a previous version of this compiler to be available in the Alpine repository. This package should basically follow= the bootstrap path recommended by upstream. This bootstrap path depends = on the specific compiler. For GHC, upstream recommends using provided binaries (the ghc-stage0 package would thus package those). For Go, upstream recommends bootstrapping from Go 1.4 (which is still written in C) for most architectures. 2. ${repo}/${self_hosting_compiler} which is initially (i.e. each time a new builder is set up) compiled using ${self_hosting_compiler}-stage0 but from that point onward build from the previous version available in the Alpine repositories. This could be implemented by having both ${self_hosting_compiler} and ${self_hosting_compiler}-stage0 provide a virtual package called ${self_hosting_compiler}-bootstrap. The latter would be assigned the lower $provider_priority (see APKBUILD(5)) to ensure it is only used if the former isn't available. This approach would make the trust chain for self-hosting compilers more transparent, it would ease setting up new builders, and it should also allow us to deal with soname-rebuilds for dynamic libraries by reboostrapping the compiler using -stage0 in these cases. ## Discussion For many self-hosting compilers, the -stage0 package would likely depend on= upstream binaries. In this regard I want to stress that=E2=80=94even with t= he current situation=E2=80=94we already trust pre-existing binaries (e.g. the initial Ubuntu GHC binary for our GHC package). Furthermore, there is an existing effort on bootstrappable builds which attempts to minimize the amount of pre-existing binaries required for bootstrapping self-hosting compilers [4]. For example, for rust people are working on providing an implementation of the rust programming language in C++ (mrustc) [5]. Presently, it does not seem to be possible to build the most recent rust version with mrustc. However, technically it would be possible to use mrustc for community/rust-stage0 as soon as mrustc is reliably capable of initially bootstrapping our packaged version of community/rust. Thoughts on this proposal? ## Further Reading * Ken's Turing Award lecture "Reflections on Trusting Trust": https://doi.org/10.1145/358198.358210 * The bootstrappable builds project https://bootstrappable.org/ [1]: https://lists.alpinelinux.org/~alpine/devel/%3C20211021133615.32f08070= %40ncopa-desktop.lan%3E [2]: https://gitlab.alpinelinux.org/alpine/aports/-/commit/8488e8747aa7cb27= 5882157b8a4a53c274c71927 [3]: https://doi.org/10.1145/358198.358210 [4]: https://bootstrappable.org/ [5]: https://github.com/thepowersgang/mrustc