Received: from mx1.tetrasec.net (mx1.tetrasec.net [74.117.189.118]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id F347B781A80 for <~alpine/devel@lists.alpinelinux.org>; Thu, 16 Jan 2020 11:15:26 +0000 (UTC) Received: from mx1.tetrasec.net (mail.local [127.0.0.1]) by mx1.tetrasec.net (Postfix) with ESMTP id 2624A2DE4097 for <~alpine/devel@lists.alpinelinux.org>; Thu, 16 Jan 2020 11:15:25 +0000 (UTC) Received: from ncopa-desktop.copa.dup.pw (67.63.200.37.customer.cdi.no [37.200.63.67]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: alpine@tanael.org) by mx1.tetrasec.net (Postfix) with ESMTPSA id 9B7F22DE3B3B for <~alpine/devel@lists.alpinelinux.org>; Thu, 16 Jan 2020 11:15:24 +0000 (UTC) Date: Thu, 16 Jan 2020 12:15:17 +0100 From: Natanael Copa To: ~alpine/devel@lists.alpinelinux.org Subject: Re: new package format and repository layout changes Message-ID: <20200116121517.0a050f85@ncopa-desktop.copa.dup.pw> In-Reply-To: <20191230145542.1a7ca9cf@vostro> References: <20191230145542.1a7ca9cf@vostro> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-alpine-linux-musl) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Mon, 30 Dec 2019 14:55:42 +0200 Timo Teras wrote: > Hi all, > > I am currently going through the list of data that goes to a package > and a repository index, as well as the installed db. And trying to > draft the first schema version of what data goes where. So I am now > having some issues I'd like to discuss here. > > 1. Repository pinning > > There's one fundamental issue in the current installed-db that causes > pain - especially when we do strong signing. This is how the package > pinning is done (the "@edge" tagging to enable specific repositories > for specific dependencies only). > > Main problem is that the package origin repository needs to be tracked > for detecting pinning changes and it's not in the package meta data > currently. The current workaround is to have the origin tag in > installed-db which means there's data that cannot be signed ahead of > time. There's also some other subtle issues. What are the pinning changes you need to detect? Do you have any example of the problem? > My thinking is to start putting the repository meta data (distribution > name, branch, component etc.) in the package. This way the package > origin is known and signed. We already have `origin` which has the information about which aport the package came from: $ apk search --origin lua5.2-apk apk-tools-2.10.4-r3 I think have the build time origin is useful and I would like to have the information if if was build from `main/` or `community/` in there. I think signing the build time origin is ok. However, the install time origin is a different story. We need to be able to collect a subset of a repository and generate a new index. We dot that for our release media where we `apk fetch` a list of packages and their dependencies and store those in apks/ on the ISO image. I think it is useful to be able to do so without need to re-sign the packages, even if we need to sign the index. > > 2. repositories list format > > If the above happens, we might need to do some changes how the tags are > specified. > > There was some discussion earlier if we should support more debian > style definition of listing the distro repositories. E.g. > http://dl-cdn.alpinelinux.org/alpine edge main community > > Where the first word is the base URL (or perhaps even some $MIRROR > variable). The second word the distribution branch. And remaining words > would be the list of enabled repositories. I think this is useful. The use case is for setting up repositories for build time dependencies. we currently have a .rootbld-repositories[1] in our aports tree that defines the dependencies for each repository. That way we can define that when building packages in `community`, we also need to use `main` repository for dependencies, but not testing. But we may want use the mirror from the system. (/etc/apk/repositories) but not the rest of the info there. I'm also thinking that we could have a list of mirrors so we could fetch in parallel from different mirrors. (which ofc gives interesting problems if the mirrors are out of sync) Separating mirror, distribution/release branch and repository is a general good idea I think. > I think the package naming could then be: > $base_url/$branch/$repo/$arch/$pkgname-$pkgrel.$uniqueid.$arch.apk > and automatically constructed from the package metadata. > > (Also wondering if the $uniqueid should be just random generated uuid, > or some sort of hash calculated from the package metadata and > contents. The requirement is that it can be used to identify if two > packages are the same or not.) I'm only skeptic to the uniqeid part here. It is useful to be able to wget https://.../apk-tools-static.apk and extract that. Same with busybox-static. But i guess thats a special case for apk-tools and busybox, and I guess we can solve that differently. Another problem with the unique id is that abuild currently needs to know if a package is build or not, before the package is build (and the unique id is not generated). Currently it checks if package exists locally: if [ ! -f "$REPODEST/$repo/${subpkgarch/noarch/$CARCH}/$subpkgname-$pkgver-r$pkgrel.apk" ]; then from https://gitlab.alpinelinux.org/alpine/abuild/blob/master/abuild.in#L1968 This is so running `abuild -r` a second time gives: $ abuild -r >>> apk-tools: Package is up to date In other words abuild needs to be able to calculate the unique id before the package is built. Otherwise abuild will always rebuild everything, which is kind of annoying when building 300 packages due to an ABI breakage. I guess that can be solved by consulting the index, but then we need tooling for that. > 3. 'noarch' handling > > When implementing the above, I would finally like to properly > implement the 'noarch'. Currently the sources set 'noarch' and build > subpackage properly. But they are put to the target architecture's > storage and when creating index the arch is rewritten to the target > arch always. The plan is to start creating real 'noarch' repository > and put the built package there. I'm wonder if we'd put separate > index there, or include the noarch packages also in the target arch > index. The biggest challenge with a proper 'noarch' handling is that it requires coordination of the builders of different arches. If a build time dependency is noarch, should the builder build it or should it wait for some other builder to build it? If builders of all arches builds it, which builder should upload it to the shared `noarch` repository? If we give the responsibility for noarch to a specific arch builder (lets say x86_64), what do we do if there is a arch package (lets say aarch64) that depends on a mixed arch/noarch aport that is disabled for the noarch builders arch? How do we early detect if noarch flag is wrongly set? For example a package could have only generated C headers that are arch dependent (like linux-headers). Or if -doc package is noarch and the man-page is generated at build time, and there are different options depending on arch. What about packages that has both arch dependeant and noarch dependant Also if we store the noarch in different directory in locally built repo, we get the problem of detecting if package is built or not if the aport generates both arch and noarch subpackages. With unique id (as mentioned above) we now need to consult 2 different indexes to know if a package needs to be rebuilt or not. IMHO, separating noarch creates some complicated problems so I question if the value it brings outweigths the cost. I think it makes sense to first have 100% reproducibility built noarch packages, and a new coordinated build infra structure in place before we finally fix the 'noarch' handling. The current build infra is too stupid and simple. > 4. version handling > > Sort of unrelated, but something I'd like to also bring up once again. > Since now that if we do proper distribution / branch tracking. And the > package downgrades happen at times. I'm wondering if we should make > the package version "informative" only. And use the build_time to > decide which package is the "preferred version". In most cases it is > the latest built package from the repository we want to be using. > > Alternative would be introduce some sort of concept similar to > debian/pacman package "epoch". > > Though another way to look at it that the buildtime is the > automatically generated epoch number. :) I don't think that buildtime (or even git commit time) is a good source for deciding what version is to be preferred. I am afraid that it will make us think in what order we build packages. For example, lets say we have package foo-4.5 in community repo and the next gen foo-5.0 in testing, and user has both community and testing repos enabled. Now there is a security issue so foo-5.1 and foo-4.6 is released. Which version the user ends up with now depends of which order those are fixed. If developer push testing/foo-5.1 first then will user end up with the community/foo-4.6 due to it has a newer build time stamp. I sort of like Ariadne's idea of repository weighting though. Thanks! > > -- > > Any thoughts, comments or concerns regarding the above planned > changes? > > Thanks, > Timo