~alpine/devel

8 4

Lets talk about apk-tools 3, and apk-tools in 2020 in general

Details
Message ID
<1c4796e0cda2248c2de159d4d467421c@dereferenced.org>
DKIM signature
missing
Download raw message
Hello,

I am writing this email today to discuss the proposed changes to
apk-tools, in the context of trying to include all potential
stakeholders so that we can have a discussion about the future
of apk-tools, capture the conclusions made and drive them forward
in the sense of actionable changes.

Timo announced back in December that he was pursuing some new
development on apk-tools, which would become the apk-tools 3
branch.  The proposed changes are bold and forward-thinking,
intended to allow apk-tools to scale to the growth we expect
Alpine and other APK-based distributions to have in this decade.

To be clear: absolutely nothing is set in stone.  The apk-tools
3 tree may be published and no distribution including Alpine may
actually use it.  As many people have, off-the-record, talking
amongst themselves raised concerns about the scope and depth of
the proposed apk-tools 3 changes, I believe it important to
step back and have a conversation that identifies all stakeholders,
so that we may understand the full requirements and usage cases
for apk-tools.  This will allow us to ensure that apk-tools 3
is a success for everyone involved.

In order to make actionable decisions, I believe it prudent to
approach this with a little bit of background and discussion of
the pros and cons of the proposed apk-tools changes, so that we
can come to a conclusion as to what we want to do in order to
move forward.

First, some background: there are two primary kinds of data that
apk-tools manipulates: the package databases (installed db and
indices) and packages themselves.  Package databases are
presently stored in a compressed tar stream, as are packages.
Tar streams are good for packages, but as presently used by
APK, not very good for databases, because the APKINDEX.tar.gz
and friends only contain a couple of files instead of storing
the object tree directly in the tar stream.  What Timo is
proposing in the v3.0-wip branch is to replace the tar streams
with a unified container format that is sufficient for storing
both packages and databases.  This would also change the way
data is stored in the database so that the database is serialized
directly into the container.  However, it is important to
realize that we could accomplish that same serialization,
including mmap-based random access, with tar streams.

There are some pros to the approach taken in the v3.0-wip tree:

* A truly unified database and package format means that we
  ultimately have less code to audit and maintain.

* mmap-based random access will significantly improve
  performance, especially for embedded systems.

There are also some cons to this approach:

* Changing the format in such a radical way brings significant
  risk.  The tar streams code has already been audited and a
  few CVEs have been fixed over the years.  Throwing that out
  means we start over again, possibly reintroducing variations
  of bugs we have already fixed.  Many stakeholders have said
  privately that they would rather not have exposure to this
  risk and would prefer a more conservative approach.

* Compression of data will have to happen *inside* the container
  for mmap-based random access to work efficiently.

* Building on the last point, exposure of the container in a way
  that allows it to be used for mmap-based random access makes
  it a desirable target for tampering.  The current signature
  verification scheme of signing only the control section will
  be insufficient here, as an attacker could trivially generate
  a modified container that explicitly attacks the parsing code.
  Work will most certainly need to be done in the area of tamper
  resistance before people will be enthusiastic about mmaping
  data they fetched from the internet.  At the very least,
  use of HTTPS for all package fetches will become a hard
  requirement, while the current format is tamper-resistant
  and it's tamper resistance has been improved over the past
  decade.

* Usage of a unified container format for package data and
  database data removes transparency from the current package
  format.  Right now, an APK package can be manipulated with
  the tar command if a user wishes to know its contents.  Using
  the package manager is not even required.

There are other changes that people are concerned about, such
as being able to compose new repositories from pre-existing
ones.  While those are important discussions to have as well,
we are not really discussing them here, as those concerns can
easily be overcome.  Overall governance of the apk-tools
project itself is also not necessarily being discussed here,
while we need to have that discussion as there are many
non-Alpine stakeholders at this point, we can do that later.

Ultimately, from the perspective of apk-tools maintenance,
we need to come to a conclusion on how to improve the
scalability of the package manager as our consumers
(distributions like Alpine, Adelie, Abyss and possibly soon
Yocto and other opkg consumers who are looking at switching)
are faced with growing package indices.  The motivation is
to improve the package manager so that it can work well
with the requirements we believe distributions will have in
this decade.

In 2010, Alpine had a few thousand packages.  Now days, we
have almost 20k.  That is starting to approach the size of
distributions like Debian, which have roughly 50-60k.  It
is clear that we need to re-evaluate some scalability
choices we are quickly outgrowing.  Timo should be applauded
for starting that process.

I look forward to hearing everyone's thoughts on this, so
we can decide how to move forward for this development
cycle and beyond!

Ariadne
Details
Message ID
<C03A5J4BZ2TN.2VO1P2UDDOI6Q@homura>
In-Reply-To
<1c4796e0cda2248c2de159d4d467421c@dereferenced.org> (view parent)
DKIM signature
missing
Download raw message
Thanks for writing this up, Ariadne. I agree with everything you said.
I'd like to call attention to a few points:

On Thu Jan 23, 2020 at 3:13 PM, Ariadne Conill wrote:
> * Changing the format in such a radical way brings significant
> risk. The tar streams code has already been audited and a
> few CVEs have been fixed over the years. Throwing that out
> means we start over again, possibly reintroducing variations
> of bugs we have already fixed. Many stakeholders have said
> privately that they would rather not have exposure to this
> risk and would prefer a more conservative approach.

We ought to be careful to avoid fixing things which aren't broken,
noting that unoptimal != broken. I do NOT see APKv3 as an opportunity to
scratch a bunch of itches. Let's keep the changes as small as possible
and avoid throwing out lots of code, or introducing lots of new code.

We should structure these changes as slow, incremental improvements,
where one change can be built and shipped and then fully battle tested
before the next one lands. apk is used in many production systems today
and has production-tier expectations of stability. The importance of a
conservative approach is probably the most relevant take-away from this
discussion.

> * Usage of a unified container format for package data and
> database data removes transparency from the current package
> format. Right now, an APK package can be manipulated with
> the tar command if a user wishes to know its contents. Using
> the package manager is not even required.

Just to lend validation this point: I use /bin/tar to study apk files
all the time. Tar beats homegrown 10 times out of 10.

> There are other changes that people are concerned about, such
> as being able to compose new repositories from pre-existing
> ones. While those are important discussions to have as well,
> we are not really discussing them here, as those concerns can
> easily be overcome. Overall governance of the apk-tools
> project itself is also not necessarily being discussed here,
> while we need to have that discussion as there are many
> non-Alpine stakeholders at this point, we can do that later.

I was one of the people bringing these concerns forward. I think that
tightening the scope and focusing on reuse of existing designs and code
makes this easier - if we can avoid overhauling the container format
then we don't have to bikeshed about what goes into it.

Finally, in terms of general principles, I want to emphasize that we
shouldn't turn ourselves into paperclip maximizers. Performance isn't
the singluar metric we need to optimize - maintainability, reliability,
and usability are all _more_ important. Any optimizations which cannot
be proven to directly improve known bottlenecks in apk are a NACK from
me. apk is already one of the fastest, if not _the_ fastest, package
managers around.
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200123182916.0cb90a09@vostro>
In-Reply-To
<1c4796e0cda2248c2de159d4d467421c@dereferenced.org> (view parent)
DKIM signature
missing
Download raw message
Hi,

On Thu, 23 Jan 2020 15:13:43 +0000
"Ariadne Conill" <ariadne@dereferenced.org> wrote:

> I am writing this email today to discuss the proposed changes to
> apk-tools, in the context of trying to include all potential
> stakeholders so that we can have a discussion about the future
> of apk-tools, capture the conclusions made and drive them forward
> in the sense of actionable changes.

Did this go to some other recipients in addition to alpine-devel? I
hope all interested parties would be subscribed here. Or should we
setup separate apk-tools mailing list?

> Timo announced back in December that he was pursuing some new
> development on apk-tools, which would become the apk-tools 3
> branch.  The proposed changes are bold and forward-thinking,
> intended to allow apk-tools to scale to the growth we expect
> Alpine and other APK-based distributions to have in this decade.
> 
> To be clear: absolutely nothing is set in stone.  The apk-tools
> 3 tree may be published and no distribution including Alpine may
> actually use it.  As many people have, off-the-record, talking
> amongst themselves raised concerns about the scope and depth of
> the proposed apk-tools 3 changes, I believe it important to
> step back and have a conversation that identifies all stakeholders,
> so that we may understand the full requirements and usage cases
> for apk-tools.  This will allow us to ensure that apk-tools 3
> is a success for everyone involved.

This is rather frustrating to hear. I've communicated these plans and
asked feedback openly on the mailing list, several tickets, in the IRC,
and even privately from few.

Please, I hope that all who have concerns would raise them here, or
include me in the private conversations.

While I have strong arguments to drive the suggested changes forward,
I'm willing to talk and reason about them.

I understand also that "big change" = "doubt". But if it's only
fear/uncertainty/doubt, we'll be happy to explain further. If there's
real technical, practical or other issues we have not seen yet, please,
please bring them up on the list so those can be addressed in the
design.

> In order to make actionable decisions, I believe it prudent to
> approach this with a little bit of background and discussion of
> the pros and cons of the proposed apk-tools changes, so that we
> can come to a conclusion as to what we want to do in order to
> move forward.

Yes, I trued to explain much of it in the original mail. But probably
missed parts of it. So this is good to address it.

> First, some background: there are two primary kinds of data that
> apk-tools manipulates: the package databases (installed db and
> indices) and packages themselves.  Package databases are
> presently stored in a compressed tar stream, as are packages.
> Tar streams are good for packages, but as presently used by
> APK, not very good for databases, because the APKINDEX.tar.gz
> and friends only contain a couple of files instead of storing
> the object tree directly in the tar stream.

Yes, much of this is based on the original 10+ years old design. The
requirements and target of that design was quite different at the time.

> What Timo is proposing in the v3.0-wip branch is to replace the tar
> streams with a unified container format that is sufficient for storing
> both packages and databases.  This would also change the way
> data is stored in the database so that the database is serialized
> directly into the container.  However, it is important to
> realize that we could accomplish that same serialization,
> including mmap-based random access, with tar streams.

Not sure if I follow this. Are you suggesting keeping pakcages as tar
with the database blob there? If yes, this is something I considered
but rejected mostly due to the fact there would be lot of meta data
duplication that could cause further compatibility or security issues.

> There are some pros to the approach taken in the v3.0-wip tree:
> 
> * A truly unified database and package format means that we
>   ultimately have less code to audit and maintain.

And to have smaller attack surface. That is, to do signature
verification on the earliest possible level, before any large scale
parsing is done. This design alone would have protected against the
CVEs we have seen in apk history.

> * mmap-based random access will significantly improve
>   performance, especially for embedded systems.

The mmap itself could be implemented on uncompressed text files, but
would not really solve the performance issues. Majority of the
performance problems come from parsing the text.

There also additional motivations:

* The formats are designed so that the installed db will be collection
  of fragments of the package databases. This allows much stronger
  audit of the system.

* To get rid of the SHA-1 based "package identity".

> There are also some cons to this approach:
> 
> * Changing the format in such a radical way brings significant
>   risk.  The tar streams code has already been audited and a
>   few CVEs have been fixed over the years.  Throwing that out
>   means we start over again, possibly reintroducing variations
>   of bugs we have already fixed.  Many stakeholders have said
>   privately that they would rather not have exposure to this
>   risk and would prefer a more conservative approach.
> 
> * Compression of data will have to happen *inside* the container
>   for mmap-based random access to work efficiently.

Not necessarily. The idea is that on-disk index and databases would not
be compressed. The http(s) index would likely be compressed, but
uncompressed during download.

The packages could and are still planned to be compressed (that is have
it parametrized on what compression algorithm if any to use). We don't
need random access to the package file.

> * Building on the last point, exposure of the container in a way
>   that allows it to be used for mmap-based random access makes
>   it a desirable target for tampering.  The current signature
>   verification scheme of signing only the control section will
>   be insufficient here, as an attacker could trivially generate
>   a modified container that explicitly attacks the parsing code.
>   Work will most certainly need to be done in the area of tamper
>   resistance before people will be enthusiastic about mmaping
>   data they fetched from the internet.  At the very least,
>   use of HTTPS for all package fetches will become a hard
>   requirement, while the current format is tamper-resistant
>   and it's tamper resistance has been improved over the past
>   decade.

The index would be mmapped. Packages probably not.

> * Usage of a unified container format for package data and
>   database data removes transparency from the current package
>   format.  Right now, an APK package can be manipulated with
>   the tar command if a user wishes to know its contents.  Using
>   the package manager is not even required.

This is also debatable if it's pro or con. The one creating it needs to
know the format details, and craft the tar with special properties.
This has been a complaint, and I have received only positive feedback
form the plans to introduce "make package" functionality in apk.

We've also had compatibility issues due to different tar implementations
producing different tar files in the past.

> There are other changes that people are concerned about, such
> as being able to compose new repositories from pre-existing
> ones.  While those are important discussions to have as well,
> we are not really discussing them here, as those concerns can
> easily be overcome. 

Yes, this was a raised and discussed issue. While I still have some
reservations about this. I do understand also the need for this, and
have been working on updating the design on how to keep this feature.

> Overall governance of the apk-tools project itself is also not
> necessarily being discussed here, while we need to have that
> discussion as there are many non-Alpine stakeholders at this point,
> we can do that later.

This might be something we need to address. It has been raised that
apk-tools should not be connected to Alpine since there are non-Alpine
users, and potentially more so in the future.

What comes to the codebase, it's mostly written by me, git stats saying:
   811	Timo Teräs
   197	Natanael Copa
    44	William Pitcock
    22	Jakub Jirutka

and various small contributions from a number of others.

From my point of view, I'm currently the benevolent dictator on the
project. Though, I know there's been times I've been neglecting it a
bit, so thank you for those who have had write access and helped to
maintain the codebase.

> Ultimately, from the perspective of apk-tools maintenance,
> we need to come to a conclusion on how to improve the
> scalability of the package manager as our consumers
> (distributions like Alpine, Adelie, Abyss and possibly soon
> Yocto and other opkg consumers who are looking at switching)
> are faced with growing package indices.  The motivation is
> to improve the package manager so that it can work well
> with the requirements we believe distributions will have in
> this decade.
> 
> In 2010, Alpine had a few thousand packages.  Now days, we
> have almost 20k.  That is starting to approach the size of
> distributions like Debian, which have roughly 50-60k.  It
> is clear that we need to re-evaluate some scalability
> choices we are quickly outgrowing.  Timo should be applauded
> for starting that process.
> 
> I look forward to hearing everyone's thoughts on this, so
> we can decide how to move forward for this development
> cycle and beyond!

Yes, we are looking for everyone's feedback.

Timo
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200123184934.0009c19a@vostro>
In-Reply-To
<C03A5J4BZ2TN.2VO1P2UDDOI6Q@homura> (view parent)
DKIM signature
missing
Download raw message
On Thu, 23 Jan 2020 10:36:10 -0500
"Drew DeVault" <sir@cmpwn.com> wrote:

> Thanks for writing this up, Ariadne. I agree with everything you said.
> I'd like to call attention to a few points:
> 
> On Thu Jan 23, 2020 at 3:13 PM, Ariadne Conill wrote:
> > * Changing the format in such a radical way brings significant
> > risk. The tar streams code has already been audited and a
> > few CVEs have been fixed over the years. Throwing that out
> > means we start over again, possibly reintroducing variations
> > of bugs we have already fixed. Many stakeholders have said
> > privately that they would rather not have exposure to this
> > risk and would prefer a more conservative approach.  
> 
> We ought to be careful to avoid fixing things which aren't broken,
> noting that unoptimal != broken. I do NOT see APKv3 as an opportunity
> to scratch a bunch of itches. Let's keep the changes as small as
> possible and avoid throwing out lots of code, or introducing lots of
> new code.

The current index format needs to go. It has become broken by design.
Rather than add bandaid. I'd like to fix it properly at this point. It
may need lot of code (and a lot of additional tests in the testsuite).
That's software development.

> We should structure these changes as slow, incremental improvements,
> where one change can be built and shipped and then fully battle tested
> before the next one lands. apk is used in many production systems
> today and has production-tier expectations of stability. The
> importance of a conservative approach is probably the most relevant
> take-away from this discussion.

One option to consider then is to add increased transition time.
Perhaps building both package/index formats for a time to allow the new
formats to gain time proven merits. Of course all non-Alpine users are
free to make their own transition schedule, or choose to not upgrade,
or to migrate something else they prefer instead.

> > * Usage of a unified container format for package data and
> > database data removes transparency from the current package
> > format. Right now, an APK package can be manipulated with
> > the tar command if a user wishes to know its contents. Using
> > the package manager is not even required.  
> 
> Just to lend validation this point: I use /bin/tar to study apk files
> all the time. Tar beats homegrown 10 times out of 10.

We can do "apk2tar" or similar applet to give yous till tars. The
benefit here is that the above would check signatures etc. Just using
tar on raw .apk is not secure as it does not check the signatures.

> > There are other changes that people are concerned about, such
> > as being able to compose new repositories from pre-existing
> > ones. While those are important discussions to have as well,
> > we are not really discussing them here, as those concerns can
> > easily be overcome. Overall governance of the apk-tools
> > project itself is also not necessarily being discussed here,
> > while we need to have that discussion as there are many
> > non-Alpine stakeholders at this point, we can do that later.  
> 
> I was one of the people bringing these concerns forward. I think that
> tightening the scope and focusing on reuse of existing designs and
> code makes this easier - if we can avoid overhauling the container
> format then we don't have to bikeshed about what goes into it.

This is something we can talk about. But I'm not sure what benefit it
gives. Because even if we keep TAR format, APK will not honor the tar
headers anymore - it would use the database blob for all file metadata.
And the metadata needs to go to the database so we can get strong secure
auditing via system database.

The main problem is that in TAR files the file metadata, directory
structure and file checksums are scattered and not easily signable.
This needs to change to support the increased security requirements we
have for apk.

> Finally, in terms of general principles, I want to emphasize that we
> shouldn't turn ourselves into paperclip maximizers. Performance isn't
> the singluar metric we need to optimize - maintainability,
> reliability, and usability are all _more_ important. Any
> optimizations which cannot be proven to directly improve known
> bottlenecks in apk are a NACK from me. apk is already one of the
> fastest, if not _the_ fastest, package managers around.

We have bugs saying otherwise. And I personally think the
index parsing performance is not enough on embedded. But what comes to
package installation speeds it's very good currently. The suggestion to
dropping TAR format is not due to performance, but due to security
concerns.

Timo
Laurent Bercot <ska-devel@skarnet.org>
Details
Message ID
<em8cd27d7f-8ccd-4b31-bac7-bb20fc37b428@elzian>
In-Reply-To
<1c4796e0cda2248c2de159d4d467421c@dereferenced.org> (view parent)
DKIM signature
missing
Download raw message
  I realize this is probably superficial compared to what you are talking
about here (it's a functional request where you are having an 
operational
discussion), but I'm writing it here for the record.
  From a packager's point of view, there are *two* things I want a 
package
manager to do which are missing from apk as it currently exists. So if
apk is undergoing a significant rewrite, I may as well submit the idea.

  1. A package manager should be able to handle several versions of
the same package being present on the machine at the same time. I should
be able to install util-linux-2.33 as well as util-linux-2.35, and use
programs from either version as I choose, knowing that there is one
default version, and using other versions may require a little more
effort on my part. The default version should be upgraded when the
package is upgraded, but it should also be possible to pin a default
version.
  For non-FHS installations, the functionality is easily achieved via
symlinks: /pkg/util-linux/default is a symlink to /pkg/util-linux/2.35
for instance, and /sbin/fdisk is a symlink to
/pkg/util-linux/default/sbin/fdisk. Details are easily workable.
  For FHS installations, it is more difficult to achieve, but some other
distributions have an "alternatives" feature, that still make it 
possible
to have several versions of the same software on the machine, at the 
cost
of being a little more annoying for the user to run a non-default 
version
of a program.

  2. Building a package should not require root privileges, or hacks, no
matter what goes into the package. In other words: please axe fakeroot 
in
abuild. This would, among other things, make abuild work with static
programs (as well as make abuild easier to bootstrap).
  fakeroot is used because the tar utility does not allow a non-root user
to encode arbitrary uids/gids, or certain special files such as device
nodes, in the archive. This could be avoided by having a better archive
creation tool, that takes a file hierarchy to encode as is *and also* a
metadata file, listing special permissions to modify in-archive. If you
are going to change the archive format, this is the perfect opportunity
to add the functionality, and make package creation a lot more robust.

  Thank you for considering these feature requests,

--
  Laurent
Details
Message ID
<C03GA221CL95.Y6GUV7YWEKMQ@homura>
In-Reply-To
<em8cd27d7f-8ccd-4b31-bac7-bb20fc37b428@elzian> (view parent)
DKIM signature
missing
Download raw message
Please take this to the GitLab issues, these comments are unrelated to
the discussion at hand. We've already got a big enough topic to discuss
here without going into unrelated tangents.
Details
Message ID
<20200124101059.33d2822f@ncopa-desktop.copa.dup.pw>
In-Reply-To
<em8cd27d7f-8ccd-4b31-bac7-bb20fc37b428@elzian> (view parent)
DKIM signature
missing
Download raw message
On Thu, 23 Jan 2020 20:23:07 +0000
"Laurent Bercot" <ska-devel@skarnet.org> wrote:

>   2. Building a package should not require root privileges, or hacks, no
> matter what goes into the package. In other words: please axe fakeroot 
> in
> abuild. This would, among other things, make abuild work with static
> programs (as well as make abuild easier to bootstrap).
>   fakeroot is used because the tar utility does not allow a non-root user
> to encode arbitrary uids/gids, or certain special files such as device
> nodes, in the archive. This could be avoided by having a better archive
> creation tool, that takes a file hierarchy to encode as is *and also* a
> metadata file, listing special permissions to modify in-archive. If you
> are going to change the archive format, this is the perfect opportunity
> to add the functionality, and make package creation a lot more robust.

I think this is a good point and I support this. Getting rid of
fakeroot is highly wanted.

I belive this is also one of the motivating factors why we want write
the tooling in apk-tools project namespace to create the package.

Thank you for your input!

-nc
Details
Message ID
<20200124120354.1d650b80@ncopa-desktop.copa.dup.pw>
In-Reply-To
<C03A5J4BZ2TN.2VO1P2UDDOI6Q@homura> (view parent)
DKIM signature
missing
Download raw message
On Thu, 23 Jan 2020 10:36:10 -0500
"Drew DeVault" <sir@cmpwn.com> wrote:

> Thanks for writing this up, Ariadne. I agree with everything you said.
> I'd like to call attention to a few points:
> 
> On Thu Jan 23, 2020 at 3:13 PM, Ariadne Conill wrote:
> > * Changing the format in such a radical way brings significant
> > risk. The tar streams code has already been audited and a
> > few CVEs have been fixed over the years. Throwing that out
> > means we start over again, possibly reintroducing variations
> > of bugs we have already fixed. Many stakeholders have said
> > privately that they would rather not have exposure to this
> > risk and would prefer a more conservative approach.
> 
> We ought to be careful to avoid fixing things which aren't broken,
> noting that unoptimal != broken. I do NOT see APKv3 as an opportunity to
> scratch a bunch of itches. Let's keep the changes as small as possible
> and avoid throwing out lots of code, or introducing lots of new code.
> 
> We should structure these changes as slow, incremental improvements,
> where one change can be built and shipped and then fully battle tested
> before the next one lands. apk is used in many production systems today
> and has production-tier expectations of stability. The importance of a
> conservative approach is probably the most relevant take-away from this
> discussion.

I think this is good points. Slow incremental changes is to prefer.

However, I think we need to do some breaking, non-backwards compatible
changes like changing the index format. What I don't want is multiple
breaking changes, so we should try identify what changes that are
breaking, and try to do all those in one shot.

For example we don't want change the index and package formats multiple
times.

-nc
Details
Message ID
<20200124122102.5a92a498@ncopa-desktop.copa.dup.pw>
In-Reply-To
<20200123184934.0009c19a@vostro> (view parent)
DKIM signature
missing
Download raw message
On Thu, 23 Jan 2020 18:49:34 +0200
Timo Teras <timo.teras@iki.fi> wrote:

> On Thu, 23 Jan 2020 10:36:10 -0500
> "Drew DeVault" <sir@cmpwn.com> wrote:
> 
> > Thanks for writing this up, Ariadne. I agree with everything you said.
> > I'd like to call attention to a few points:
> > 
> > On Thu Jan 23, 2020 at 3:13 PM, Ariadne Conill wrote:  
> > > * Changing the format in such a radical way brings significant
> > > risk. The tar streams code has already been audited and a
> > > few CVEs have been fixed over the years. Throwing that out
> > > means we start over again, possibly reintroducing variations
> > > of bugs we have already fixed. Many stakeholders have said
> > > privately that they would rather not have exposure to this
> > > risk and would prefer a more conservative approach.    
> > 
> > We ought to be careful to avoid fixing things which aren't broken,
> > noting that unoptimal != broken. I do NOT see APKv3 as an opportunity
> > to scratch a bunch of itches. Let's keep the changes as small as
> > possible and avoid throwing out lots of code, or introducing lots of
> > new code.  
> 
> The current index format needs to go. It has become broken by design.
> Rather than add bandaid. I'd like to fix it properly at this point. It
> may need lot of code (and a lot of additional tests in the testsuite).
> That's software development.
> 
> > We should structure these changes as slow, incremental improvements,
> > where one change can be built and shipped and then fully battle tested
> > before the next one lands. apk is used in many production systems
> > today and has production-tier expectations of stability. The
> > importance of a conservative approach is probably the most relevant
> > take-away from this discussion.  
> 
> One option to consider then is to add increased transition time.
> Perhaps building both package/index formats for a time to allow the new
> formats to gain time proven merits. Of course all non-Alpine users are
> free to make their own transition schedule, or choose to not upgrade,
> or to migrate something else they prefer instead.

I guess the biggest challenge is when we change package format and edge
builders start build new format. How do we do that? Should we rebuild
everything? Or should we have the old tar format mixed with the new? do
we generate double, both old and new?

When we set up the builders for the new stable release we will build
everything from scratch, so at one point we will have a stable release
with all packages in new format. Question is what we do with edge.

I don't think we need to answer this now, but it is something we need
to plan for. (Which I think was Ariadnes point here)

> > > * Usage of a unified container format for package data and
> > > database data removes transparency from the current package
> > > format. Right now, an APK package can be manipulated with
> > > the tar command if a user wishes to know its contents. Using
> > > the package manager is not even required.    
> > 
> > Just to lend validation this point: I use /bin/tar to study apk files
> > all the time. Tar beats homegrown 10 times out of 10.  
> 
> We can do "apk2tar" or similar applet to give yous till tars. The
> benefit here is that the above would check signatures etc. Just using
> tar on raw .apk is not secure as it does not check the signatures.

I admit that I like the raw tar format. It is very handy, but I
understand why it needs to go away (it took me a while). We will need a
convenient multiplatform way to extract contents or list contents in
apk files. 

a tool that can do a raw extraction, like `apk2tar` or extract-apk or
whatever will be sufficient. I think it would be nice if it was a
single file, either statically compiled (inclusive for MacOS and maybe
windows), or it could be a single file python script or similar.

In any case, I think its a problem that can be solved.

... 

> We have bugs saying otherwise. And I personally think the
> index parsing performance is not enough on embedded. But what comes to
> package installation speeds it's very good currently. The suggestion to
> dropping TAR format is not due to performance, but due to security
> concerns.

Also, as Ariadne pointed out. Even if its fast enough for now, we need
it to scale for the future, and I think it is good that we address this
before it becomes a problem on more than embedded.

Thank you for working on this!

-nc
Reply to thread Export thread (mbox)