Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id 3FC2F781A5B for <~alpine/devel@lists.alpinelinux.org>; Thu, 23 Jan 2020 16:29:21 +0000 (UTC) Received: by mail-lj1-f196.google.com with SMTP id w1so4185604ljh.5 for <~alpine/devel@lists.alpinelinux.org>; Thu, 23 Jan 2020 08:29:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QtZc9hCgoCb0ZQlYNDwAz0YMBoKbdkVQ/fdEumqAdO8=; b=U4WeppzM6eUGYmloPNnICC7LX1VDwqixiMvKHOOEGoJGPbe5ei/Mupd+2Ch3LrA2wx rgstwybSFoOVfcNcG+FWzLciU8QHwV8igpruZ1QhblDTUGa4QYNUSRdpAmMi0T31TS8m SxNqi8qUd4nnY5eD3JQgtKXVxbpk0G9m3TTLcFcnbtkW/g/piuA32ESrL/4YxOjgLcyV BPv8RosHDcEpP7bkrb2C33NKu+1NEShUQoRUb4oAi6DIMmoi4OtHusdE4gJauql+6FJc HDXdnCNlPoKoTvXd0vuLV5w3TlaDkEo08B60DZKa7MFazWzFK/nweHMbG9UMMh1P1FSC u4CA== X-Gm-Message-State: APjAAAWoFd6PK16I4H9SaQHZ31N9275gV5U0efllAcKF1Y1H3tk+Wron GEUS693HkvpNZFjOTDtMQSg= X-Google-Smtp-Source: APXvYqy7ufoXTw6rjoQTcx678v1wOJA/Wztdtr7TLYEilEbJsCuvOpH7F+eWCFWKLpCfAhZ2om/Lag== X-Received: by 2002:a2e:995a:: with SMTP id r26mr23538712ljj.78.1579796960336; Thu, 23 Jan 2020 08:29:20 -0800 (PST) Received: from vostro (87-100-234-203.bb.dnainternet.fi. [87.100.234.203]) by smtp.gmail.com with ESMTPSA id u16sm1539590lfi.36.2020.01.23.08.29.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Jan 2020 08:29:19 -0800 (PST) Date: Thu, 23 Jan 2020 18:29:16 +0200 From: Timo Teras To: "Ariadne Conill" Cc: ~alpine/devel@lists.alpinelinux.org Subject: Re: Lets talk about apk-tools 3, and apk-tools in 2020 in general Message-ID: <20200123182916.0cb90a09@vostro> In-Reply-To: <1c4796e0cda2248c2de159d4d467421c@dereferenced.org> References: <1c4796e0cda2248c2de159d4d467421c@dereferenced.org> X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-alpine-linux-musl) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, On Thu, 23 Jan 2020 15:13:43 +0000 "Ariadne Conill" wrote: > I am writing this email today to discuss the proposed changes to > apk-tools, in the context of trying to include all potential > stakeholders so that we can have a discussion about the future > of apk-tools, capture the conclusions made and drive them forward > in the sense of actionable changes. Did this go to some other recipients in addition to alpine-devel? I hope all interested parties would be subscribed here. Or should we setup separate apk-tools mailing list? > Timo announced back in December that he was pursuing some new > development on apk-tools, which would become the apk-tools 3 > branch. The proposed changes are bold and forward-thinking, > intended to allow apk-tools to scale to the growth we expect > Alpine and other APK-based distributions to have in this decade. >=20 > To be clear: absolutely nothing is set in stone. The apk-tools > 3 tree may be published and no distribution including Alpine may > actually use it. As many people have, off-the-record, talking > amongst themselves raised concerns about the scope and depth of > the proposed apk-tools 3 changes, I believe it important to > step back and have a conversation that identifies all stakeholders, > so that we may understand the full requirements and usage cases > for apk-tools. This will allow us to ensure that apk-tools 3 > is a success for everyone involved. This is rather frustrating to hear. I've communicated these plans and asked feedback openly on the mailing list, several tickets, in the IRC, and even privately from few. Please, I hope that all who have concerns would raise them here, or include me in the private conversations. While I have strong arguments to drive the suggested changes forward, I'm willing to talk and reason about them. I understand also that "big change" =3D "doubt". But if it's only fear/uncertainty/doubt, we'll be happy to explain further. If there's real technical, practical or other issues we have not seen yet, please, please bring them up on the list so those can be addressed in the design. > In order to make actionable decisions, I believe it prudent to > approach this with a little bit of background and discussion of > the pros and cons of the proposed apk-tools changes, so that we > can come to a conclusion as to what we want to do in order to > move forward. Yes, I trued to explain much of it in the original mail. But probably missed parts of it. So this is good to address it. > First, some background: there are two primary kinds of data that > apk-tools manipulates: the package databases (installed db and > indices) and packages themselves. Package databases are > presently stored in a compressed tar stream, as are packages. > Tar streams are good for packages, but as presently used by > APK, not very good for databases, because the APKINDEX.tar.gz > and friends only contain a couple of files instead of storing > the object tree directly in the tar stream. Yes, much of this is based on the original 10+ years old design. The requirements and target of that design was quite different at the time. > What Timo is proposing in the v3.0-wip branch is to replace the tar > streams with a unified container format that is sufficient for storing > both packages and databases. This would also change the way > data is stored in the database so that the database is serialized > directly into the container. However, it is important to > realize that we could accomplish that same serialization, > including mmap-based random access, with tar streams. Not sure if I follow this. Are you suggesting keeping pakcages as tar with the database blob there? If yes, this is something I considered but rejected mostly due to the fact there would be lot of meta data duplication that could cause further compatibility or security issues. > There are some pros to the approach taken in the v3.0-wip tree: >=20 > * A truly unified database and package format means that we > ultimately have less code to audit and maintain. And to have smaller attack surface. That is, to do signature verification on the earliest possible level, before any large scale parsing is done. This design alone would have protected against the CVEs we have seen in apk history. > * mmap-based random access will significantly improve > performance, especially for embedded systems. The mmap itself could be implemented on uncompressed text files, but would not really solve the performance issues. Majority of the performance problems come from parsing the text. There also additional motivations: * The formats are designed so that the installed db will be collection of fragments of the package databases. This allows much stronger audit of the system. * To get rid of the SHA-1 based "package identity". > There are also some cons to this approach: >=20 > * Changing the format in such a radical way brings significant > risk. The tar streams code has already been audited and a > few CVEs have been fixed over the years. Throwing that out > means we start over again, possibly reintroducing variations > of bugs we have already fixed. Many stakeholders have said > privately that they would rather not have exposure to this > risk and would prefer a more conservative approach. >=20 > * Compression of data will have to happen *inside* the container > for mmap-based random access to work efficiently. Not necessarily. The idea is that on-disk index and databases would not be compressed. The http(s) index would likely be compressed, but uncompressed during download. The packages could and are still planned to be compressed (that is have it parametrized on what compression algorithm if any to use). We don't need random access to the package file. > * Building on the last point, exposure of the container in a way > that allows it to be used for mmap-based random access makes > it a desirable target for tampering. The current signature > verification scheme of signing only the control section will > be insufficient here, as an attacker could trivially generate > a modified container that explicitly attacks the parsing code. > Work will most certainly need to be done in the area of tamper > resistance before people will be enthusiastic about mmaping > data they fetched from the internet. At the very least, > use of HTTPS for all package fetches will become a hard > requirement, while the current format is tamper-resistant > and it's tamper resistance has been improved over the past > decade. The index would be mmapped. Packages probably not. > * Usage of a unified container format for package data and > database data removes transparency from the current package > format. Right now, an APK package can be manipulated with > the tar command if a user wishes to know its contents. Using > the package manager is not even required. This is also debatable if it's pro or con. The one creating it needs to know the format details, and craft the tar with special properties. This has been a complaint, and I have received only positive feedback form the plans to introduce "make package" functionality in apk. We've also had compatibility issues due to different tar implementations producing different tar files in the past. > There are other changes that people are concerned about, such > as being able to compose new repositories from pre-existing > ones. While those are important discussions to have as well, > we are not really discussing them here, as those concerns can > easily be overcome.=20 Yes, this was a raised and discussed issue. While I still have some reservations about this. I do understand also the need for this, and have been working on updating the design on how to keep this feature. > Overall governance of the apk-tools project itself is also not > necessarily being discussed here, while we need to have that > discussion as there are many non-Alpine stakeholders at this point, > we can do that later. This might be something we need to address. It has been raised that apk-tools should not be connected to Alpine since there are non-Alpine users, and potentially more so in the future. What comes to the codebase, it's mostly written by me, git stats saying: 811 Timo Ter=C3=A4s 197 Natanael Copa 44 William Pitcock 22 Jakub Jirutka and various small contributions from a number of others. =46rom my point of view, I'm currently the benevolent dictator on the project. Though, I know there's been times I've been neglecting it a bit, so thank you for those who have had write access and helped to maintain the codebase. > Ultimately, from the perspective of apk-tools maintenance, > we need to come to a conclusion on how to improve the > scalability of the package manager as our consumers > (distributions like Alpine, Adelie, Abyss and possibly soon > Yocto and other opkg consumers who are looking at switching) > are faced with growing package indices. The motivation is > to improve the package manager so that it can work well > with the requirements we believe distributions will have in > this decade. >=20 > In 2010, Alpine had a few thousand packages. Now days, we > have almost 20k. That is starting to approach the size of > distributions like Debian, which have roughly 50-60k. It > is clear that we need to re-evaluate some scalability > choices we are quickly outgrowing. Timo should be applauded > for starting that process. >=20 > I look forward to hearing everyone's thoughts on this, so > we can decide how to move forward for this development > cycle and beyond! Yes, we are looking for everyone's feedback. Timo