~alpine/devel

7 4

apk-tools plans

Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20191203180717.0016af8a@vostro>
DKIM signature
missing
Download raw message
Hi Alpiners,

I am writing to let you know, and discuss here some items related to
apk-tools development.

For many years already, we have had discussion in the IRC on certain
apk-tools related design changes, and feature improvements. I have
already twice or so started working on some of it, yet there's been
always something else that came up. But I'm finally committing to doing
these. Hopefully, we already get some of the changes done by late
February / early March, and get next spring Alpine release to use them.

As background.. it has come obvious that the Alpine package base has
increased a lot. And low end embedded systems are starting to struggle
with the amount of packages in our DB. There's also a number of bugs
filed related to this. Majority of the apk-tools startup time goes to
just uncompressing and parsing the index files and installed db.

Another thing, I really want to improve is the security of 'apk audit'
and the system integrity checking. The concept is to create and store
signed file manifests in the DB that can be used to establish strong
trust in a system. (The current 'apk audit' was designed for 'lbu
commit'.)

Slightly related is also changing the file formats so that signature
checking can be done first without much parsing to make the attack
surface smaller. (The old design was motivated by restrictions from
the original shell script based Alpine package manager; the signatures
were not considered as first class citizen back then.)

In connection with the work, I'm hoping also to do additional minor
improvements such as implementing some multicore support, and
supporting better handling of duplicate files in simultaneously
installed packages.

Unfortunately the above changes cannot be fixed easily without changing
to binary file formats.

The primary target for me is to redo the binary apk package and index
formats for next Alpine release. We can discuss the exact details in
this or another thread later in the coming weeks. Main design being
security and speed. The idea is that index will be mmap:able; and the
structures from .apk can be directly copied to installed-db.

For later features there's ideas such as supporting TPM or hostkey
signed system DB (e.g. 'world', set of config options, trusted
package keys, etc.). So the whole system state would be signed and
validateable.

Due to packages and indexes being binary formats in future, the plan is
to build into apk the feature to create these files. So some of the
related functionality from 'abuild' would be moved to apk. This also
simplifies things if apk-tools is used in other distributions.

Please, feel free to comment or present additional questions regarding
the concept or roadmap. I'll publish more technical details of the
formats later. And try to publish my work-in-progress git tree of apk
sooner than later.

For those needing/wanting to parse the old style text databases, we
could add applets to dump current state in the old style format.

Cheers,
Timo
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20191229181150.7a0dcace@vostro>
In-Reply-To
<20191203180717.0016af8a@vostro> (view parent)
DKIM signature
missing
Download raw message
Hi,

On Tue, 3 Dec 2019 18:07:17 +0200
Timo Teras <timo.teras@iki.fi> wrote:

> Another thing, I really want to improve is the security of 'apk audit'
> and the system integrity checking. The concept is to create and store
> signed file manifests in the DB that can be used to establish strong
> trust in a system. (The current 'apk audit' was designed for 'lbu
> commit'.)
> 
> Slightly related is also changing the file formats so that signature
> checking can be done first without much parsing to make the attack
> surface smaller. (The old design was motivated by restrictions from
> the original shell script based Alpine package manager; the signatures
> were not considered as first class citizen back then.)
>[snip]
> Unfortunately the above changes cannot be fixed easily without
> changing to binary file formats.
> 
> The primary target for me is to redo the binary apk package and index
> formats for next Alpine release. We can discuss the exact details in
> this or another thread later in the coming weeks. Main design being
> security and speed. The idea is that index will be mmap:able; and the
> structures from .apk can be directly copied to installed-db.

So I've made quite a bit progress with this. The work-in-progress
branch is now available at:
https://gitlab.alpinelinux.org/alpine/apk-tools/tree/v3.0-wip

This is still early preview code. The intent is to show at this point
on how I'm planning the file format to be like and how the signing is
to work. There's still open ends and the "schemas" are still volatile.

So far the code contains the basic encoding and decoding of the file
format as well as signing and verification of it. It's made pretty
generic and the signing can already do RSA/ECC/etc based on what type of
private key is given.

From technical point of view the format is first a container layer
basically Tag-Length-Value blobs. The main blocks are to be the
"database", "signatures", and for packages the "files" section.

The "database" section is mostly resembling flat buffers format.
Basically it's a hierarchical object tree. The intent is to have enough
information to make deep copies without schema, but to pretty-print it
you'd need the schema. This is a trade of chosen to keep the repetitive
field type encodings out, but to allow some generic functionality to be
written without schema knowledge.

As said, the main motivate for this work is to allow mmap() access, and
fairly trivial signature verification code. Additionally the format is
designed so that the packages' signed data dump can be just trivially
copied into the installed database. So there will be support to copy
"signed database" blobs to be inside another database. This is the key
to strong audit trail so that installed-db is just copy of these blobs.
And to do all this without much parsing so that accesses are fast
enough.

I will be working to finish up the schemas of the "package", "index"
and "installed database" formats next. And then start working on the
new format tools to create packages.

Plan is to move from 'abuild' to 'apk-tools' the intelligence to
construct packages, and manage repositories. So in future 'abuild'
would just call 'apk mkpkg' or similar with a description file what
needs to go in. This should be helpful if some other distributions
choose to use apk but want to integrate it to their build scripts.

Additionally, we are hoping to put all the repository management code
to apk-tools. There it will be more simpler to implement features such
as "keep superceded packages in repository for at least X days before
deleting them".

Feedback welcome. Though, better to concentrate to the architectural
and overview of how things - instead of nitpicking style and/or minor
issues on the commit as it's still pretty volatile.

Timo
Details
Message ID
<CAAOiGNz4ZqpCVX6GODPta84nYDvuc1J83VE1_mYTbeKkQooEvw@mail.gmail.com>
In-Reply-To
<20191229181150.7a0dcace@vostro> (view parent)
DKIM signature
missing
Download raw message
Hello,

On Sun, Dec 29, 2019 at 10:13 AM Timo Teras <timo.teras@iki.fi> wrote:
>
> Hi,
>
> On Tue, 3 Dec 2019 18:07:17 +0200
> Timo Teras <timo.teras@iki.fi> wrote:
>
> > Another thing, I really want to improve is the security of 'apk audit'
> > and the system integrity checking. The concept is to create and store
> > signed file manifests in the DB that can be used to establish strong
> > trust in a system. (The current 'apk audit' was designed for 'lbu
> > commit'.)
> >
> > Slightly related is also changing the file formats so that signature
> > checking can be done first without much parsing to make the attack
> > surface smaller. (The old design was motivated by restrictions from
> > the original shell script based Alpine package manager; the signatures
> > were not considered as first class citizen back then.)
> >[snip]
> > Unfortunately the above changes cannot be fixed easily without
> > changing to binary file formats.
> >
> > The primary target for me is to redo the binary apk package and index
> > formats for next Alpine release. We can discuss the exact details in
> > this or another thread later in the coming weeks. Main design being
> > security and speed. The idea is that index will be mmap:able; and the
> > structures from .apk can be directly copied to installed-db.
>
> So I've made quite a bit progress with this. The work-in-progress
> branch is now available at:
> https://gitlab.alpinelinux.org/alpine/apk-tools/tree/v3.0-wip
>
> This is still early preview code. The intent is to show at this point
> on how I'm planning the file format to be like and how the signing is
> to work. There's still open ends and the "schemas" are still volatile.
>
> So far the code contains the basic encoding and decoding of the file
> format as well as signing and verification of it. It's made pretty
> generic and the signing can already do RSA/ECC/etc based on what type of
> private key is given.

This is a great improvement over what we have in 2.10, and I can't
wait for it to land in Alpine.  However, I think we should continue to
support apk-tools 2.10 for the spring release, just to make sure the
new design is fully proven, as it is a complete redesign of things.

> From technical point of view the format is first a container layer
> basically Tag-Length-Value blobs. The main blocks are to be the
> "database", "signatures", and for packages the "files" section.
>
> The "database" section is mostly resembling flat buffers format.
> Basically it's a hierarchical object tree. The intent is to have enough
> information to make deep copies without schema, but to pretty-print it
> you'd need the schema. This is a trade of chosen to keep the repetitive
> field type encodings out, but to allow some generic functionality to be
> written without schema knowledge.
>
> As said, the main motivate for this work is to allow mmap() access, and
> fairly trivial signature verification code. Additionally the format is
> designed so that the packages' signed data dump can be just trivially
> copied into the installed database. So there will be support to copy
> "signed database" blobs to be inside another database. This is the key
> to strong audit trail so that installed-db is just copy of these blobs.
> And to do all this without much parsing so that accesses are fast
> enough.
>
> I will be working to finish up the schemas of the "package", "index"
> and "installed database" formats next. And then start working on the
> new format tools to create packages.
>
> Plan is to move from 'abuild' to 'apk-tools' the intelligence to
> construct packages, and manage repositories. So in future 'abuild'
> would just call 'apk mkpkg' or similar with a description file what
> needs to go in. This should be helpful if some other distributions
> choose to use apk but want to integrate it to their build scripts.

I suspect that this will be useful to distributions outside the
Alpine/abuild ecosystem.  There's a few distributions which want to
use APK, but want to build their own packages with their own tools.

> Additionally, we are hoping to put all the repository management code
> to apk-tools. There it will be more simpler to implement features such
> as "keep superceded packages in repository for at least X days before
> deleting them".
>
> Feedback welcome. Though, better to concentrate to the architectural
> and overview of how things - instead of nitpicking style and/or minor
> issues on the commit as it's still pretty volatile.

I'll go over it all this week, but I think it pretty much is in
agreement with what we have discussed in IRC in the past.

Ariadne
Details
Message ID
<20200116151947.63f7ade8@ncopa-desktop.copa.dup.pw>
In-Reply-To
<20191229181150.7a0dcace@vostro> (view parent)
DKIM signature
missing
Download raw message
Hi!

On Sun, 29 Dec 2019 18:11:50 +0200
Timo Teras <timo.teras@iki.fi> wrote:

> Hi,
> 
> On Tue, 3 Dec 2019 18:07:17 +0200
> Timo Teras <timo.teras@iki.fi> wrote:
> 
> > Another thing, I really want to improve is the security of 'apk audit'
> > and the system integrity checking. The concept is to create and store
> > signed file manifests in the DB that can be used to establish strong
> > trust in a system. (The current 'apk audit' was designed for 'lbu
> > commit'.)
> > 
> > Slightly related is also changing the file formats so that signature
> > checking can be done first without much parsing to make the attack
> > surface smaller. (The old design was motivated by restrictions from
> > the original shell script based Alpine package manager; the signatures
> > were not considered as first class citizen back then.)
> >[snip]
> > Unfortunately the above changes cannot be fixed easily without
> > changing to binary file formats.
> > 
> > The primary target for me is to redo the binary apk package and index
> > formats for next Alpine release. We can discuss the exact details in
> > this or another thread later in the coming weeks. Main design being
> > security and speed. The idea is that index will be mmap:able; and the
> > structures from .apk can be directly copied to installed-db.  
> 
> So I've made quite a bit progress with this. The work-in-progress
> branch is now available at:
> https://gitlab.alpinelinux.org/alpine/apk-tools/tree/v3.0-wip
> 
> This is still early preview code. The intent is to show at this point
> on how I'm planning the file format to be like and how the signing is
> to work. There's still open ends and the "schemas" are still volatile.
> 
> So far the code contains the basic encoding and decoding of the file
> format as well as signing and verification of it. It's made pretty
> generic and the signing can already do RSA/ECC/etc based on what type of
> private key is given.

Speaking of signing and verification. One of the features that I
appreciate in the current design is that we calculate the checksum
in-flight, while waiting for the network (or disk) IO.

I recently learned about blake3[1] which I find very interesting. As i
understand it support streaming which means that it can detect hash
mismatch before it has received all data. And it is *fast*.

[1]: https://www.infoq.com/news/2020/01/blake3-fast-crypto-hash/

> From technical point of view the format is first a container layer
> basically Tag-Length-Value blobs. The main blocks are to be the
> "database", "signatures", and for packages the "files" section.
> 
> The "database" section is mostly resembling flat buffers format.
> Basically it's a hierarchical object tree. The intent is to have enough
> information to make deep copies without schema, but to pretty-print it
> you'd need the schema. This is a trade of chosen to keep the repetitive
> field type encodings out, but to allow some generic functionality to be
> written without schema knowledge.
> 
> As said, the main motivate for this work is to allow mmap() access, and
> fairly trivial signature verification code. Additionally the format is
> designed so that the packages' signed data dump can be just trivially
> copied into the installed database. So there will be support to copy
> "signed database" blobs to be inside another database. This is the key
> to strong audit trail so that installed-db is just copy of these blobs.
> And to do all this without much parsing so that accesses are fast
> enough.

This means that the index should not be stored compressed locally.
Maybe we could decompress it while fetching it from network.

> I will be working to finish up the schemas of the "package", "index"
> and "installed database" formats next. And then start working on the
> new format tools to create packages.
> 
> Plan is to move from 'abuild' to 'apk-tools' the intelligence to
> construct packages, and manage repositories. So in future 'abuild'
> would just call 'apk mkpkg' or similar with a description file what
> needs to go in. This should be helpful if some other distributions
> choose to use apk but want to integrate it to their build scripts.

This is very good. I guess it makes sense to separate out the mkpkg
functionality to a separate binary, so we can separate the build time
tools with the run time, to keep the runtime smaller.


> Additionally, we are hoping to put all the repository management code
> to apk-tools. There it will be more simpler to implement features such
> as "keep superceded packages in repository for at least X days before
> deleting them".

This would be awesome. Or keep N versions of packages from each aport
(origin). That way we could have current and previous versions. Useful
for rolling back kernel updates for example.
 
> Feedback welcome. Though, better to concentrate to the architectural
> and overview of how things - instead of nitpicking style and/or minor
> issues on the commit as it's still pretty volatile.

I think it as been mentioned before but I think it would be nice if we
could have 2 operational install modes:

- quick: in-the-air extraction/verification of packages (current style)
- safe: store all packages locally and verify before trying to extract them

safe mode is useful when the network connection is unreliable. In case
of network error it could continue where it left off last time, rather
than try fetch it all from scratch. Something like
`apk upgrade --fetch-first` or `apk upgrade --safe`.


I also think it may be a good time to look over the options and try
make things more consistent. For example `apk info` operates on both
locally installed packages and the cached index. Maybe we could have an
applet that only works on installed packages (for example list
contents) and separate that only queries the cached index(es). Or maybe
a flag for it.

Also, `apk info` uses `--subcmd` while apk cache does not have double
dashes. `apk cache subcmd`

Some of the tools are designed for scripting, while some are designed
for human friendly output. Would be nice to have some common flag for
that. Personally i prefer script friendly and have a flag for human
friendly output, like `--pretty`, which could be the default if stdout
is a tty or similar.

Thank you for working on this! I find it very exciting!

> 
> Timo
Details
Message ID
<BZXA7BCRF3I4.1S1I48FFFZ1KD@homura>
In-Reply-To
<20200116151947.63f7ade8@ncopa-desktop.copa.dup.pw> (view parent)
DKIM signature
missing
Download raw message
On Thu Jan 16, 2020 at 3:19 PM, Natanael Copa wrote:
> I think it as been mentioned before but I think it would be nice if we
> could have 2 operational install modes:
>
> - quick: in-the-air extraction/verification of packages (current style)
> - safe: store all packages locally and verify before trying to extract
> them
>
> safe mode is useful when the network connection is unreliable. In case
> of network error it could continue where it left off last time, rather
> than try fetch it all from scratch. Something like
> `apk upgrade --fetch-first` or `apk upgrade --safe`.

In my opinion, the "quick" mode is so unsafe that I find it really
objectionable to have at all. It's not just an unreliable network to
consider - what if a signature check fails partway through and the
upgrade aborts half-finished? What if an man in the middle orchestrates
this situation? "Quick" mode also makes it difficult to reckon if
there'll be sufficient disk space in advance - today apk just breaks
your system if it hits ENOSPC.
Details
Message ID
<20200116154250.2b1d1fda@ncopa-desktop.copa.dup.pw>
In-Reply-To
<BZXA7BCRF3I4.1S1I48FFFZ1KD@homura> (view parent)
DKIM signature
missing
Download raw message
On Thu, 16 Jan 2020 09:22:30 -0500
"Drew DeVault" <sir@cmpwn.com> wrote:

> On Thu Jan 16, 2020 at 3:19 PM, Natanael Copa wrote:
> > I think it as been mentioned before but I think it would be nice if we
> > could have 2 operational install modes:
> >
> > - quick: in-the-air extraction/verification of packages (current style)
> > - safe: store all packages locally and verify before trying to extract
> > them
> >
> > safe mode is useful when the network connection is unreliable. In case
> > of network error it could continue where it left off last time, rather
> > than try fetch it all from scratch. Something like
> > `apk upgrade --fetch-first` or `apk upgrade --safe`.  
> 
> In my opinion, the "quick" mode is so unsafe that I find it really
> objectionable to have at all. It's not just an unreliable network to
> consider - what if a signature check fails partway through and the
> upgrade aborts half-finished? What if an man in the middle orchestrates
> this situation? "Quick" mode also makes it difficult to reckon if
> there'll be sufficient disk space in advance - today apk just breaks
> your system if it hits ENOSPC.

That is how current apk works, and is part of reason why it is
significantly faster than other package managers. In general, when the
other package managers are done with fetching the package and is about
to start unpack them, apk is already done with it all.

What apk does is that it extracts each file into a temp file without
execute permissions until the signature is verified. If signature
matches it renames all the files in one go, or deletes on failure.
Rename and set permissions are very quick operation, while read/write
all the data once again is not.

Disk usage is available in the index so it can be calculated before
anything is downloaded.

But you are right to be worried. CVE-2018-1000849 is a proof of that.

-nc
Details
Message ID
<BZXBE25LLK3N.BU92AYRUAJ17@homura>
In-Reply-To
<20200116154250.2b1d1fda@ncopa-desktop.copa.dup.pw> (view parent)
DKIM signature
missing
Download raw message
On Thu Jan 16, 2020 at 3:42 PM, Natanael Copa wrote:
> That is how current apk works, and is part of reason why it is
> significantly faster than other package managers. In general, when the
> other package managers are done with fetching the package and is about
> to start unpack them, apk is already done with it all.

"Move fast and break things" :)

> What apk does is that it extracts each file into a temp file without
> execute permissions until the signature is verified. If signature
> matches it renames all the files in one go, or deletes on failure.
> Rename and set permissions are very quick operation, while read/write
> all the data once again is not.

That makes more sense.
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200117034111.2afdcaec@vostro>
In-Reply-To
<20200116154250.2b1d1fda@ncopa-desktop.copa.dup.pw> (view parent)
DKIM signature
missing
Download raw message
On Thu, 16 Jan 2020 15:42:50 +0100
Natanael Copa <ncopa@alpinelinux.org> wrote:

> On Thu, 16 Jan 2020 09:22:30 -0500
> "Drew DeVault" <sir@cmpwn.com> wrote:
> 
> > On Thu Jan 16, 2020 at 3:19 PM, Natanael Copa wrote:  
> > > I think it as been mentioned before but I think it would be nice
> > > if we could have 2 operational install modes:
> > >
> > > - quick: in-the-air extraction/verification of packages (current
> > > style)
> > > - safe: store all packages locally and verify before trying to
> > > extract them
> > >
> > > safe mode is useful when the network connection is unreliable. In
> > > case of network error it could continue where it left off last
> > > time, rather than try fetch it all from scratch. Something like
> > > `apk upgrade --fetch-first` or `apk upgrade --safe`.    
> > 
> > In my opinion, the "quick" mode is so unsafe that I find it really
> > objectionable to have at all. It's not just an unreliable network to
> > consider - what if a signature check fails partway through and the
> > upgrade aborts half-finished? What if an man in the middle
> > orchestrates this situation? "Quick" mode also makes it difficult
> > to reckon if there'll be sufficient disk space in advance - today
> > apk just breaks your system if it hits ENOSPC.  

True. This is something we want to fix also in the "fast" version. To
support rollback to the original state regardless of what happens and
where. I believe this was mentioned in one of the earlier mails, or in
one of the tickets.

> That is how current apk works, and is part of reason why it is
> significantly faster than other package managers. In general, when the
> other package managers are done with fetching the package and is about
> to start unpack them, apk is already done with it all.
> 
> What apk does is that it extracts each file into a temp file without
> execute permissions until the signature is verified. If signature
> matches it renames all the files in one go, or deletes on failure.
> Rename and set permissions are very quick operation, while read/write
> all the data once again is not.
> 
> Disk usage is available in the index so it can be calculated before
> anything is downloaded.
> 
> But you are right to be worried. CVE-2018-1000849 is a proof of that.

That is also partially due to tar-format and perhaps even how we do
signing on it. Those were implemented more than a decade ago as an
"optional" feature for apk.

Times have changes. We are looking to make security and performance
first class citizens in the file formats. This is driving the whole
development spurt.

Timo
Reply to thread Export thread (mbox)