Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id 864D1782C37 for <~alpine/devel@lists.alpinelinux.org>; Fri, 17 Jan 2020 01:29:45 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id 6so10842715pgk.0 for <~alpine/devel@lists.alpinelinux.org>; Thu, 16 Jan 2020 17:29:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=CSKwdzB7BHtssJf7yTefoeQyB/UAII643/vl4p51Ig0=; b=uUZoysE8curAVn5teWWw87RNV3xIGepNzYuOJfr+5+almoLgjGHYg9Pdr6zjjU+Qm1 0SHyhx+VTu8XchIZLgYgn49R422moVmLfH/chInSNpaAf6xqVDc1BFzeMZb+ihsXPjNc 5hbAGlE+goq+Sun6uLJsxYGYVXKkMl0Ejn97z8TW3QE/HLtAQVdlmQXBlQDNEdCNcgZ7 rJEH8YlOf+LBjGKKMPWVEv8Myl2Dw+LeGNm76nsGSqiEl1k1kq1WoUpdCpTMcVxJL0cT mo80UiXWR19d9Zq9ag3PsLQeQhZmeVZR7EPsOmwZb0G5DdX2Rxu4Hf8wFfj2uYOqec8k MS+Q== X-Gm-Message-State: APjAAAVHAqzvo3Rh4z2QnXoCm+bfWgp2nuqKt+GZgGn4jTNJ6XL3ZIik z62Imt8Az3nAKpq1I4esNDk= X-Google-Smtp-Source: APXvYqxIw6j5jiKk8Y+5+goOiTkk1DAzrKjbDLyJlEQ7cbu2dTgEkZUO/kyfIMcX7b+kgyEAkfoedg== X-Received: by 2002:a65:68c8:: with SMTP id k8mr42824451pgt.216.1579224583161; Thu, 16 Jan 2020 17:29:43 -0800 (PST) Received: from vostro ([120.20.224.114]) by smtp.gmail.com with ESMTPSA id i3sm26851400pfg.94.2020.01.16.17.29.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jan 2020 17:29:43 -0800 (PST) Date: Fri, 17 Jan 2020 03:29:33 +0200 From: Timo Teras To: Natanael Copa Cc: ~alpine/devel@lists.alpinelinux.org Subject: Re: new package format and repository layout changes Message-ID: <20200117032933.421dfe8b@vostro> In-Reply-To: <20200116121517.0a050f85@ncopa-desktop.copa.dup.pw> References: <20191230145542.1a7ca9cf@vostro> <20200116121517.0a050f85@ncopa-desktop.copa.dup.pw> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-alpine-linux-musl) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 16 Jan 2020 12:15:17 +0100 Natanael Copa wrote: > On Mon, 30 Dec 2019 14:55:42 +0200 > Timo Teras wrote: > > > I am currently going through the list of data that goes to a package > > and a repository index, as well as the installed db. And trying to > > draft the first schema version of what data goes where. So I am now > > having some issues I'd like to discuss here. > > > > 1. Repository pinning > > > > There's one fundamental issue in the current installed-db that > > causes pain - especially when we do strong signing. This is how the > > package pinning is done (the "@edge" tagging to enable specific > > repositories for specific dependencies only). > > > > Main problem is that the package origin repository needs to be > > tracked for detecting pinning changes and it's not in the package > > meta data currently. The current workaround is to have the origin > > tag in installed-db which means there's data that cannot be signed > > ahead of time. There's also some other subtle issues. > > What are the pinning changes you need to detect? Do you have any > example of the problem? So the problem in general is that if you have edge's main + community untagged, and @testing tagged. Then the following happens: 1. install package from @testing 2. that package gets updated in our repo, and the old one get's removed 3. the index is updated After this we have system with package from @testing. To be able to recalculate the system consistency, we need to be able to verify that the currently installed package came from @testing. If this is not possible apk would currently try to enforce upgrading to the updated version, even if something else was just added (the updated tagged package would be preferred). This becomes even more important on run-from-ram system. When on reboot recalculating what can be installed. On above instance the package would no be installed if the tag cannot be validated. Currently, the installed database contains the repository tag as-is where it was installed. This is inserted by apk on install time. For run-from-ram systems this is also stored in the cache repository index. But the above system is fragile. And can fail in certain scenarios too. The point is that having the @tag syntax in repositories that is locally configurable is problematic. My idea was to perhaps keep the @tag syntax but bind it to something that comes from the index and is present in the packages. > > My thinking is to start putting the repository meta data > > (distribution name, branch, component etc.) in the package. This > > way the package origin is known and signed. > > We already have `origin` which has the information about which aport > the package came from: > > $ apk search --origin lua5.2-apk > apk-tools-2.10.4-r3 > > I think have the build time origin is useful and I would like to have > the information if if was build from `main/` or `community/` in there. > I think signing the build time origin is ok. > > However, the install time origin is a different story. We need to be > able to collect a subset of a repository and generate a new index. We > dot that for our release media where we `apk fetch` a list of packages > and their dependencies and store those in apks/ on the ISO image. > > I think it is useful to be able to do so without need to re-sign the > packages, even if we need to sign the index. Yes. This is different. The media is partial mirror of the repository. The plan is to be able to clone partial mirrors easily. So to clarify, the mirror name, or local media partial mirrors would just work. They would have the full original index and the selection of packages present that are specified. We can add tooling to do "apk mkmirror " or similar. What I want to avoid is that someone takes just the .apk file from edge/main, and puts into his alpine mirror of v3.14-stable. And comes telling it's broken. There's already lot of confusion on what are the limitations of tagged repositories, and the "compatibility" of mixing edge with stable releases. This probably needs better disclaimer somewhere. The other question is also that if we build a package from our builders. Do we want to allow them to be rebranded as "new-cool-linux(tm)" without resigning them? I think no. So in this sense I think signing also the repository name makes sense. But taking step back, there's two pieces of meta data we may or may not be interested in: 1. the aports source repository / package name where it's built from. The source package is in the origin field already. We may or may not be interested in the aports name. 2. the build environment. Mostly this would be set by the builder to declare what base distribution was used to build it. Though, tracking all of it might be burdensome. So perhaps it's simpler to just state that this package for built for "alpine / edge / community". The above would also help the caching support. I believe many already use (and I'd like to officially design this to be done correctly) shared apk cache between containers. There are subtle conditions when this might work now. But I'd basically like to make apk cache a local alpine mirror that has one or more distros/releases/arches in the same hierarchy. So the cache dir would have the same distro/release/repository structure under it. To make the above work even when installing individual packages, the package should contain the information... So for me putting this information to package makes a lot of sense from functional, security and usability side of things. If you have use case where the "rebranding" of package would be needed without modifying/resigning the package is needed, I'd like to learn more about it. For the generic "make partial mirror locally, on boot media, on cache" would be supported out of box. Copy full index / copy only wanted packages. apk would have the ability to go to next mirror / media source if the index listed package is not there. One more option to consider is to put the repository tag origin in installed db and the overlay somehow. But it needs to be removed from the cache. In this case the signatures would not cover the tag info on regular install. Perhaps we can then do secure variant by having local signing key for the overlay/install-db, and trust comes from kernel command line / initramfs / TPM. > > 2. repositories list format > > > > If the above happens, we might need to do some changes how the tags > > are specified. > > > > There was some discussion earlier if we should support more debian > > style definition of listing the distro repositories. E.g. > > http://dl-cdn.alpinelinux.org/alpine edge main community > > > > Where the first word is the base URL (or perhaps even some $MIRROR > > variable). The second word the distribution branch. And remaining > > words would be the list of enabled repositories. > > I think this is useful. The use case is for setting up repositories > for build time dependencies. we currently have a > .rootbld-repositories[1] in our aports tree that defines the > dependencies for each repository. That way we can define that when > building packages in `community`, we also need to use `main` > repository for dependencies, but not testing. But we may want use the > mirror from the system. (/etc/apk/repositories) but not the rest of > the info there. > > I'm also thinking that we could have a list of mirrors so we could > fetch in parallel from different mirrors. (which ofc gives interesting > problems if the mirrors are out of sync) > > Separating mirror, distribution/release branch and repository is a > general good idea I think. Probably we need to keep the syntax backwards compatible. Would just some variable substitution syntax $MIRROR be enough? > > I think the package naming could then be: > > $base_url/$branch/$repo/$arch/$pkgname-$pkgrel.$uniqueid.$arch.apk > > and automatically constructed from the package metadata. > > > > (Also wondering if the $uniqueid should be just random generated > > uuid, or some sort of hash calculated from the package metadata and > > contents. The requirement is that it can be used to identify if two > > packages are the same or not.) > > I'm only skeptic to the uniqeid part here. It is useful to be able to > wget https://.../apk-tools-static.apk and extract that. Same with > busybox-static. But i guess thats a special case for apk-tools and > busybox, and I guess we can solve that differently. wget | tar pipeline would no longer work. You'd need apk to extract, or convert to tar first anyway. I suggest we would provide the raw static executable in future. > Another problem with the unique id is that abuild currently needs to > know if a package is build or not, before the package is build (and > the unique id is not generated). Currently it checks if package > exists locally: > > if [ ! -f > "$REPODEST/$repo/${subpkgarch/noarch/$CARCH}/$subpkgname-$pkgver-r$pkgrel.apk" > ]; then > > from > https://gitlab.alpinelinux.org/alpine/abuild/blob/master/abuild.in#L1968 > > This is so running `abuild -r` a second time gives: > > $ abuild -r > >>> apk-tools: Package is up to date > > In other words abuild needs to be able to calculate the unique id > before the package is built. Otherwise abuild will always rebuild > everything, which is kind of annoying when building 300 packages due > to an ABI breakage. > > I guess that can be solved by consulting the index, but then we need > tooling for that. Yes, this would need to be fixed somehow different. Doing query if the package with the specific $pkgver-r$pkgrel version exists should be good enough. Perhaps even adding away to query to package's timestamp so that could be used to test if the APKBUILD was modified afterwards without pkgrel bump. > > 3. 'noarch' handling > > > > When implementing the above, I would finally like to properly > > implement the 'noarch'. Currently the sources set 'noarch' and build > > subpackage properly. But they are put to the target architecture's > > storage and when creating index the arch is rewritten to the target > > arch always. The plan is to start creating real 'noarch' repository > > and put the built package there. I'm wonder if we'd put separate > > index there, or include the noarch packages also in the target arch > > index. > > The biggest challenge with a proper 'noarch' handling is that it > requires coordination of the builders of different arches. If a build > time dependency is noarch, should the builder build it or should it > wait for some other builder to build it? > > If builders of all arches builds it, which builder should upload it to > the shared `noarch` repository? > > If we give the responsibility for noarch to a specific arch builder > (lets say x86_64), what do we do if there is a arch package (lets say > aarch64) that depends on a mixed arch/noarch aport that is disabled > for the noarch builders arch? I'd probably make it responsibility of the first valid builder architecture. Or designate x86_64 if 'all' is used. > How do we early detect if noarch flag is wrongly set? For example a > package could have only generated C headers that are arch dependent > (like linux-headers). Or if -doc package is noarch and the man-page is > generated at build time, and there are different options depending on > arch. We need better lint / CI handling for this. But that's packager's error. Those happen and we cannot automate all of those checks. Of course it's good if we catch more of these. But there are many other ways to mess up the package too. > What about packages that has both arch dependeant and noarch > dependant But yes, there could be some sync issues. Especially if the 'noarch' is a dependency of the next package to be built. > Also if we store the noarch in different directory in locally built > repo, we get the problem of detecting if package is built or not if > the aport generates both arch and noarch subpackages. With unique id > (as mentioned above) we now need to consult 2 different indexes to > know if a package needs to be rebuilt or not. Correct. We'd use two indexes. > IMHO, separating noarch creates some complicated problems so I > question if the value it brings outweigths the cost. Hmm... I think the biggest complication is the packages that have noarch subpkgs. One option is to deprecate that. Do the pure noarch packages (no subpackage is arch dependent) only as noarch. And everything else would need to be built target specific. This would greatly simplify the problem I think. > I think it makes sense to first have 100% reproducibility built noarch > packages, and a new coordinated build infra structure in place before > we finally fix the 'noarch' handling. The current build infra is too > stupid and simple. Agreed. Doing noarch requires infra updates too. > > > 4. version handling > > > > Sort of unrelated, but something I'd like to also bring up once > > again. Since now that if we do proper distribution / branch > > tracking. And the package downgrades happen at times. I'm wondering > > if we should make the package version "informative" only. And use > > the build_time to decide which package is the "preferred version". > > In most cases it is the latest built package from the repository we > > want to be using. > > > > Alternative would be introduce some sort of concept similar to > > debian/pacman package "epoch". > > > > Though another way to look at it that the buildtime is the > > automatically generated epoch number. :) > > I don't think that buildtime (or even git commit time) is a good > source for deciding what version is to be preferred. I am afraid that > it will make us think in what order we build packages. > > For example, lets say we have package foo-4.5 in community repo and > the next gen foo-5.0 in testing, and user has both community and > testing repos enabled. Now there is a security issue so foo-5.1 and > foo-4.6 is released. Which version the user ends up with now depends > of which order those are fixed. If developer push testing/foo-5.1 > first then will user end up with the community/foo-4.6 due to it has > a newer build time stamp. > > I sort of like Ariadne's idea of repository weighting though. Right. So one thing to consider is to deprecate tagged repository support, and add repository weighing to decide which comes first? Or do we need to introduce some new knob to do that? Timo