Received: from mx1.mailbun.net (mx1.mailbun.net [170.39.20.100]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id 0F783780DC7 for <~alpine/devel@lists.alpinelinux.org>; Sat, 20 Nov 2021 17:47:39 +0000 (UTC) Received: from [2607:fb90:d88b:166a:9c0d:8a7b:f6db:a1d3] (unknown [172.58.109.64]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: ariadne@dereferenced.org) by mx1.mailbun.net (Postfix) with ESMTPSA id 4F58E118C1A; Sat, 20 Nov 2021 17:47:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=dereferenced.org; s=mailbun; t=1637430456; bh=2h1LBupVK4s5WKmshYvhraogZWI3uggQOaeZfdbOuVw=; h=Date:From:To:cc:Subject:In-Reply-To:References; b=MJG0GSYubvBo6rpQOUaVdKvIuEOL60FlWftSmORNjbaMrSyf0iNEczPWfsRaUzSgi dpHA63bhCYGqq2JXPO+nLGH75StYRHirH2mK0KDxTH9NGSTj0vcVHzdGCE4Z+Qzkze iZBcjD+jEskk5Ju2HTfELMTiZj78U3ktksKv7hZgQeLGrsdt192obfg+LdHHhBacpJ vIX/k2bGmXtvI5PZ7BSpyx+9bJFARc4Wp0PiPtUnbW7ec8WYI8CLpL+UAI1udGXJKV ZiO19P36sFrsh1GvZUrKWWBXxtW74/GVlSAmR4qiMuci7OF5aqEi3r0Oslbr5CRDBL upsmtBC2OExuA== Date: Sat, 20 Nov 2021 11:47:30 -0600 (CST) From: Ariadne Conill To: =?ISO-8859-15?Q?S=F6ren_Tempel?= cc: ~alpine/devel@lists.alpinelinux.org Subject: Re: Thoughts on self-hosting compilers in Alpine In-Reply-To: <33KG0XO61I4IL.2Z7RTAZ5J3SY6@8pit.net> Message-ID: <9c3ab565-3489-62e4-c15e-a7af3c2ef569@dereferenced.org> References: <33KG0XO61I4IL.2Z7RTAZ5J3SY6@8pit.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-1682417409-1637430456=:10302" This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --0-1682417409-1637430456=:10302 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8BIT Hi, On Sat, 20 Nov 2021, Sören Tempel wrote: > Hello, > > The recent issue with building GHC [1] lead to an interesting discussion > regarding the general handling of self-hosting compilers in Alpine on > IRC. I want to further discuss a proposal for improving the current > packaging of self-hosting compilers in Alpine in this ML thread. > > ## Current Situation > > Alpine actually packages a lot of self-hosting compilers (e.g. > community/rust, community/go, community/ghc, testing/cyclone, …). Most > of these depend on a previous version of the compiler (as available in > the Alpine repositories) for building. For example, community/go > provides a virtual package called "go-bootstrap" and depends itself on > this virtual package. Also gcc and clang. > > ## Problem Statement > > This leads to three problems: > > 1. Trust Chain Transparency: Unless you want to go all the way back to > an initial non-self-hosting version of a self-hosting compiler, building > self-hosting compilers practically requires pre-existing binaries at some > point. For example, our initial version of GHC was cross-compiled from > the binary provided by Ubuntu using Docker [2]. However, with > self-hosting compilers, you need to trust not just the previous version > of the compiler but the entire chain of versions used [3]. With our > present setup, this chain is obscured and (without looking at the Git > history) it is in unclear that GHC (for example) was initially bootstrapped > from the Ubuntu binary. Right. > 2. Dynamic Libraries: If the previous version of the compiler is linked > dynamically against libraries packaged by Alpine, the previous > version of the compiler (as required for the build process) will stop > working if these libraries are upgraded to a non-ABI compatible > version. This is the root cause of the GHC problem mentioned above: When > libffi (against which GHC links dynamically) was upgraded from > libffi.so.7 to libffi.so.8 the version of GHC (as available in the > repositories) became defunct and could not be used to rebuild GHC > against libffi.so.8. The solution here is to version key system libraries like libffi3.3 and so on. This isn't done as much as it needs to be done. Every time I bring this up, I am told that we don't need to do something we need to be doing. This proposal would be far more expensive (rebootstrapping) than just doing the right thing. > 3. Builder setup: When setting up builders for a new Alpine release > (e.g. as we did a few weeks ago for 3.15), manual intervention is > necessary to install an initial version for every packaged self-hosting > compiler from Alpine Edge. An alternative would be to provide a bootstrap repo providing only the bootstrap compilers and their dependencies, and no other packages from edge. > > ## Proposal > > I propose improving packaging of self-hosting compilers by providing two > packages for each self-hosting compiler in aports.git: > > 1. ${repo}/${self_hosting_compiler}-stage0 which builds and packages > $compiler without requiring a previous version of this compiler to be > available in the Alpine repository. This package should basically follow > the bootstrap path recommended by upstream. This bootstrap path depends on the > specific compiler. For GHC, upstream recommends using provided binaries > (the ghc-stage0 package would thus package those). For Go, upstream > recommends bootstrapping from Go 1.4 (which is still written in C) for > most architectures. What happens for cases where architectures were added *late* into the bootstrap path, such as ppc64le? A stage0 would not exist for these. > 2. ${repo}/${self_hosting_compiler} which is initially (i.e. each time a > new builder is set up) compiled using ${self_hosting_compiler}-stage0 > but from that point onward build from the previous version available in > the Alpine repositories. > > This could be implemented by having both ${self_hosting_compiler} and > ${self_hosting_compiler}-stage0 provide a virtual package called > ${self_hosting_compiler}-bootstrap. The latter would be assigned the > lower $provider_priority (see APKBUILD(5)) to ensure it is only used if > the former isn't available. > > This approach would make the trust chain for self-hosting compilers more > transparent, it would ease setting up new builders, and it should also > allow us to deal with soname-rebuilds for dynamic libraries by > reboostrapping the compiler using -stage0 in these cases. As noted above, rebootstrapping the whole chain of compilers would be highly expensive for languages like Rust. Each step would take hours. > ## Discussion > > For many self-hosting compilers, the -stage0 package would likely depend on > upstream binaries. In this regard I want to stress that—even with the > current situation—we already trust pre-existing binaries (e.g. the > initial Ubuntu GHC binary for our GHC package). Furthermore, there is an > existing effort on bootstrappable builds which attempts to minimize the > amount of pre-existing binaries required for bootstrapping self-hosting > compilers [4]. > > For example, for rust people are working on providing an implementation > of the rust programming language in C++ (mrustc) [5]. Presently, it > does not seem to be possible to build the most recent rust version with > mrustc. However, technically it would be possible to use mrustc for > community/rust-stage0 as soon as mrustc is reliably capable of > initially bootstrapping our packaged version of community/rust. This proposal is not possible for Rust: mrustc is not capable of serving as stage0 for several architectures, at least riscv64, s390x, as 1.29 did not have musl support for them. Ariadne --0-1682417409-1637430456=:10302--