~alpine/devel

21 6

new package format and repository layout changes

Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20191230145542.1a7ca9cf@vostro>
DKIM signature
missing
Download raw message
Hi all,

I am currently going through the list of data that goes to a package
and a repository index, as well as the installed db. And trying to
draft the first schema version of what data goes where. So I am now
having some issues I'd like to discuss here.

1. Repository pinning

There's one fundamental issue in the current installed-db that causes
pain - especially when we do strong signing. This is how the package
pinning is done (the "@edge" tagging to enable specific repositories
for specific dependencies only).

Main problem is that the package origin repository needs to be tracked
for detecting pinning changes and it's not in the package meta data
currently. The current workaround is to have the origin tag in
installed-db which means there's data that cannot be signed ahead of
time. There's also some other subtle issues.

My thinking is to start putting the repository meta data (distribution
name, branch, component etc.) in the package. This way the package
origin is known and signed.

2. repositories list format

If the above happens, we might need to do some changes how the tags are
specified.

There was some discussion earlier if we should support more debian
style definition of listing the distro repositories. E.g.
  http://dl-cdn.alpinelinux.org/alpine edge main community

Where the first word is the base URL (or perhaps even some $MIRROR
variable). The second word the distribution branch. And remaining words
would be the list of enabled repositories.

I think the package naming could then be:
  $base_url/$branch/$repo/$arch/$pkgname-$pkgrel.$uniqueid.$arch.apk
and automatically constructed from the package metadata.

(Also wondering if the $uniqueid should be just random generated uuid,
or some sort of hash calculated from the package metadata and contents.
The requirement is that it can be used to identify if two packages are
the same or not.)

3. 'noarch' handling

When implementing the above, I would finally like to properly implement
the 'noarch'. Currently the sources set 'noarch' and build subpackage
properly. But they are put to the target architecture's storage and
when creating index the arch is rewritten to the target arch always.
The plan is to start creating real 'noarch' repository and put the
built package there. I'm wonder if we'd put separate index there, or
include the noarch packages also in the target arch index.

4. version handling

Sort of unrelated, but something I'd like to also bring up once again.
Since now that if we do proper distribution / branch tracking. And the
package downgrades happen at times. I'm wondering if we should make the
package version "informative" only. And use the build_time to decide
which package is the "preferred version". In most cases it is the
latest built package from the repository we want to be using.

Alternative would be introduce some sort of concept similar to
debian/pacman package "epoch".

Though another way to look at it that the buildtime is the
automatically generated epoch number. :)

--

Any thoughts, comments or concerns regarding the above planned changes?

Thanks,
Timo
Rasmus Thomsen <oss@cogitri.dev>
Details
Message ID
<11500e3dc7d7da1a3d6acf4e1d855906630f6c31.camel@cogitri.dev>
In-Reply-To
<20191230145542.1a7ca9cf@vostro> (view parent)
DKIM signature
missing
Download raw message
Hello,

> 4. version handling
> 
> Sort of unrelated, but something I'd like to also bring up once
> again.
> Since now that if we do proper distribution / branch tracking. And
> the
> package downgrades happen at times. I'm wondering if we should make
> the
> package version "informative" only. And use the build_time to decide
> which package is the "preferred version". In most cases it is the
> latest built package from the repository we want to be using.
> 
> Alternative would be introduce some sort of concept similar to
> debian/pacman package "epoch".
> 
> Though another way to look at it that the buildtime is the
> automatically generated epoch number. :)

Having the buildtime as automatically generated epoch number sounds
great to me, I don't see any situation where we'd have a package with a
'fresher' buildtime that isn't meant to replace the previous version of
that package and that'd make downgrades work automagically.

Regards,

Rasmus
Details
Message ID
<20191230175446.GD846450@alpha>
In-Reply-To
<11500e3dc7d7da1a3d6acf4e1d855906630f6c31.camel@cogitri.dev> (view parent)
DKIM signature
missing
Download raw message
On Mon, Dec 30, 2019 at 02:12:25PM +0000, Rasmus Thomsen wrote:
> Hello,
> 
> > 4. version handling
> > 
> > Sort of unrelated, but something I'd like to also bring up once
> > again.
> > Since now that if we do proper distribution / branch tracking. And
> > the
> > package downgrades happen at times. I'm wondering if we should make
> > the
> > package version "informative" only. And use the build_time to decide
> > which package is the "preferred version". In most cases it is the
> > latest built package from the repository we want to be using.
> > 
> > Alternative would be introduce some sort of concept similar to
> > debian/pacman package "epoch".
> > 
> > Though another way to look at it that the buildtime is the
> > automatically generated epoch number. :)
> 
> Having the buildtime as automatically generated epoch number sounds
> great to me, I don't see any situation where we'd have a package with a
> 'fresher' buildtime that isn't meant to replace the previous version of
> that package and that'd make downgrades work automagically.
> 
> Regards,
> 
> Rasmus
> 

Wouldn't build-time timestamps interfere with reproducable builds? A
better option would be to use commit timestamps, so that it's at least
stable.
Rasmus Thomsen <oss@cogitri.dev>
Details
Message ID
<34cd93c2ddfb985c29b3b74862c9e71945a34954.camel@cogitri.dev>
In-Reply-To
<20191230175446.GD846450@alpha> (view parent)
DKIM signature
missing
Download raw message
On Mon, 2019-12-30 at 18:54 +0100, Kevin Daudt wrote:
> On Mon, Dec 30, 2019 at 02:12:25PM +0000, Rasmus Thomsen wrote:
> > Hello,
> > 
> > > 4. version handling
> > > 
> > > Sort of unrelated, but something I'd like to also bring up once
> > > again.
> > > Since now that if we do proper distribution / branch tracking.
> > > And
> > > the
> > > package downgrades happen at times. I'm wondering if we should
> > > make
> > > the
> > > package version "informative" only. And use the build_time to
> > > decide
> > > which package is the "preferred version". In most cases it is the
> > > latest built package from the repository we want to be using.
> > > 
> > > Alternative would be introduce some sort of concept similar to
> > > debian/pacman package "epoch".
> > > 
> > > Though another way to look at it that the buildtime is the
> > > automatically generated epoch number. :)
> > 
> > Having the buildtime as automatically generated epoch number sounds
> > great to me, I don't see any situation where we'd have a package
> > with a
> > 'fresher' buildtime that isn't meant to replace the previous
> > version of
> > that package and that'd make downgrades work automagically.
> > 
> > Regards,
> > 
> > Rasmus
> > 
> 
> Wouldn't build-time timestamps interfere with reproducable builds? A
> better option would be to use commit timestamps, so that it's at
> least
> stable.

Hm, the contents of the package would still be reproduciable, so I'm
not sure if that'd be actually be much of a problem, but using the
commit timestamps sounds good too - hadn't thought about that! :)

Regards,

Rasmus
Details
Message ID
<BZIY70GY8LED.6O9B0FCA816L@homura>
In-Reply-To
<34cd93c2ddfb985c29b3b74862c9e71945a34954.camel@cogitri.dev> (view parent)
DKIM signature
missing
Download raw message
I'm not sure where in the thread this was originally mentioned, but -1
to signing the repository name - i.e. main, community, edge, etc.

The source of the package is unimportant if its content can be verified
with the signature. The ability to freely move signed packages between
repos without re-signing them is desirable to me. Note as well that we
do not sign the name of the mirror the package came from, despite
arguably qualifying as some kind of metadata about the package.
Details
Message ID
<20191230180430.GE846450@alpha>
In-Reply-To
<34cd93c2ddfb985c29b3b74862c9e71945a34954.camel@cogitri.dev> (view parent)
DKIM signature
missing
Download raw message
On Mon, Dec 30, 2019 at 05:59:17PM +0000, Rasmus Thomsen wrote:
> On Mon, 2019-12-30 at 18:54 +0100, Kevin Daudt wrote:
> > On Mon, Dec 30, 2019 at 02:12:25PM +0000, Rasmus Thomsen wrote:
> > > Hello,
> > > 
> > > > 4. version handling
> > > > 
> > > > Sort of unrelated, but something I'd like to also bring up once
> > > > again.
> > > > Since now that if we do proper distribution / branch tracking.
> > > > And
> > > > the
> > > > package downgrades happen at times. I'm wondering if we should
> > > > make
> > > > the
> > > > package version "informative" only. And use the build_time to
> > > > decide
> > > > which package is the "preferred version". In most cases it is the
> > > > latest built package from the repository we want to be using.
> > > > 
> > > > Alternative would be introduce some sort of concept similar to
> > > > debian/pacman package "epoch".
> > > > 
> > > > Though another way to look at it that the buildtime is the
> > > > automatically generated epoch number. :)
> > > 
> > > Having the buildtime as automatically generated epoch number sounds
> > > great to me, I don't see any situation where we'd have a package
> > > with a
> > > 'fresher' buildtime that isn't meant to replace the previous
> > > version of
> > > that package and that'd make downgrades work automagically.
> > > 
> > > Regards,
> > > 
> > > Rasmus
> > > 
> > 
> > Wouldn't build-time timestamps interfere with reproducable builds? A
> > better option would be to use commit timestamps, so that it's at
> > least
> > stable.
> 
> Hm, the contents of the package would still be reproduciable, so I'm
> not sure if that'd be actually be much of a problem, but using the
> commit timestamps sounds good too - hadn't thought about that! :)
> 
> Regards,
> 
> Rasmus
> 

Reproducable builds is about the entire package, not just about the
contents. abuild was already adjusted to account for this, although some
changes had to be reverted due to issues.

Kevin
Details
Message ID
<CAAOiGNz+TN0nvWMB5YLnvFVk4jEGxN3s8VbnzJNGV_UU8FNSfw@mail.gmail.com>
In-Reply-To
<20191230145542.1a7ca9cf@vostro> (view parent)
DKIM signature
missing
Download raw message
Hello,

On Mon, Dec 30, 2019 at 6:56 AM Timo Teras <timo.teras@iki.fi> wrote:
>
> Hi all,
>
> I am currently going through the list of data that goes to a package
> and a repository index, as well as the installed db. And trying to
> draft the first schema version of what data goes where. So I am now
> having some issues I'd like to discuss here.
>
> 1. Repository pinning
>
> There's one fundamental issue in the current installed-db that causes
> pain - especially when we do strong signing. This is how the package
> pinning is done (the "@edge" tagging to enable specific repositories
> for specific dependencies only).
>
> Main problem is that the package origin repository needs to be tracked
> for detecting pinning changes and it's not in the package meta data
> currently. The current workaround is to have the origin tag in
> installed-db which means there's data that cannot be signed ahead of
> time. There's also some other subtle issues.
>
> My thinking is to start putting the repository meta data (distribution
> name, branch, component etc.) in the package. This way the package
> origin is known and signed.

The problem with this is that it makes it difficult to pack a new repo
with apk fetch.  I would prefer to retain that capability, as I need
it for work.  While it is possible that we could repack the fetched
packages with new metadata, it seems wasteful to me.

> 2. repositories list format
>
> If the above happens, we might need to do some changes how the tags are
> specified.
>
> There was some discussion earlier if we should support more debian
> style definition of listing the distro repositories. E.g.
>   http://dl-cdn.alpinelinux.org/alpine edge main community

I think this is only worth it if we add support for multiple types of
repository (for example deb-src).  Otherwise, we should keep things
simple.

>
> Where the first word is the base URL (or perhaps even some $MIRROR
> variable). The second word the distribution branch. And remaining words
> would be the list of enabled repositories.
>
> I think the package naming could then be:
>   $base_url/$branch/$repo/$arch/$pkgname-$pkgrel.$uniqueid.$arch.apk
> and automatically constructed from the package metadata.

I don't follow.  There's no guarantee that a generated URI will
actually point to a package that still exists.  Packages are added and
removed from repos all the time.

> (Also wondering if the $uniqueid should be just random generated uuid,
> or some sort of hash calculated from the package metadata and contents.
> The requirement is that it can be used to identify if two packages are
> the same or not.)

If we really need a uniqueid (I'm skeptical), then I would suggest
using a truncated hash for it, or perhaps simply a CRC32 or similar.

> 3. 'noarch' handling
>
> When implementing the above, I would finally like to properly implement
> the 'noarch'. Currently the sources set 'noarch' and build subpackage
> properly. But they are put to the target architecture's storage and
> when creating index the arch is rewritten to the target arch always.
> The plan is to start creating real 'noarch' repository and put the
> built package there. I'm wonder if we'd put separate index there, or
> include the noarch packages also in the target arch index.

We should use a separate index for noarch.  I would prefer to see it
work in a way where we could move Alpine to being a fully multi-arch
distro in the future, where we can use qemu-user to run binaries for
other archs.  This mostly solves cross-compiling in a clean way, too.

> 4. version handling
>
> Sort of unrelated, but something I'd like to also bring up once again.
> Since now that if we do proper distribution / branch tracking. And the
> package downgrades happen at times. I'm wondering if we should make the
> package version "informative" only. And use the build_time to decide
> which package is the "preferred version". In most cases it is the
> latest built package from the repository we want to be using.

If a package (or admin) declares a specific version dependency, we
should prefer it unless we supply --available.  A problem with using
build_time as the automatic preference occurs in the case where a user
mixes repositories without correctly pinning them.  In the present
case, the highest version matching the declared dependencies will
always be preferred.  If we switch to build_time, this case may result
in versions being mixed if a security update happens in an old release
that isn't simply a version bump.

> Alternative would be introduce some sort of concept similar to
> debian/pacman package "epoch".
>
> Though another way to look at it that the buildtime is the
> automatically generated epoch number. :)

I suspect the epoch concept is actually *not* the right way to go,
which is why I have never been in favor of it.  I suspect what we need
to do is have repository weighting, and use the weightings to
determine which version should be selected instead of blindly taking
the highest version.  That way we can say edge has highest preference,
but 3.11 has moderate preference, when calculating the upgrade
transaction(s).

Ariadne
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200102222207.6808c028@vostro.lan>
In-Reply-To
<20191230180430.GE846450@alpha> (view parent)
DKIM signature
missing
Download raw message
On Mon, 30 Dec 2019 19:04:30 +0100
Kevin Daudt <kdaudt@alpinelinux.org> wrote:

> On Mon, Dec 30, 2019 at 05:59:17PM +0000, Rasmus Thomsen wrote:
> > On Mon, 2019-12-30 at 18:54 +0100, Kevin Daudt wrote:  
> > > On Mon, Dec 30, 2019 at 02:12:25PM +0000, Rasmus Thomsen wrote:  
> > > > Hello,
> > > >   
> > > > > 4. version handling
> > > > > 
> > > > > Sort of unrelated, but something I'd like to also bring up
> > > > > once again.
> > > > > Since now that if we do proper distribution / branch tracking.
> > > > > And
> > > > > the
> > > > > package downgrades happen at times. I'm wondering if we should
> > > > > make
> > > > > the
> > > > > package version "informative" only. And use the build_time to
> > > > > decide
> > > > > which package is the "preferred version". In most cases it is
> > > > > the latest built package from the repository we want to be
> > > > > using.
> > > > > 
> > > > > Alternative would be introduce some sort of concept similar to
> > > > > debian/pacman package "epoch".
> > > > > 
> > > > > Though another way to look at it that the buildtime is the
> > > > > automatically generated epoch number. :)  
> > > > 
> > > > Having the buildtime as automatically generated epoch number
> > > > sounds great to me, I don't see any situation where we'd have a
> > > > package with a
> > > > 'fresher' buildtime that isn't meant to replace the previous
> > > > version of
> > > > that package and that'd make downgrades work automagically.
> > > 
> > > Wouldn't build-time timestamps interfere with reproducable
> > > builds? A better option would be to use commit timestamps, so
> > > that it's at least
> > > stable.  
> > 
> > Hm, the contents of the package would still be reproduciable, so I'm
> > not sure if that'd be actually be much of a problem, but using the
> > commit timestamps sounds good too - hadn't thought about that! :)
> 
> Reproducable builds is about the entire package, not just about the
> contents. abuild was already adjusted to account for this, although
> some changes had to be reverted due to issues.

That would probably then mean that we don't include build time in the
meta data either. Or use $SOURCE_DATE_EPOCH.

This probably implies that we need care on doing the signatures too.
E.g. ECDSA often has by defaulta random element in the signature
(there's a deterministic mode now too, but not always the default).

Timo
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200102222839.67345997@vostro.lan>
In-Reply-To
<BZIY70GY8LED.6O9B0FCA816L@homura> (view parent)
DKIM signature
missing
Download raw message
On Mon, 30 Dec 2019 13:00:34 -0500
"Drew DeVault" <sir@cmpwn.com> wrote:

> I'm not sure where in the thread this was originally mentioned, but -1
> to signing the repository name - i.e. main, community, edge, etc.
> 
> The source of the package is unimportant if its content can be
> verified with the signature. The ability to freely move signed
> packages between repos without re-signing them is desirable to me.
> Note as well that we do not sign the name of the mirror the package
> came from, despite arguably qualifying as some kind of metadata about
> the package.

The mirror is irrelevant for signing from my point of view.

But I'd rather not allow moving a alpine edge main to stable main
without rebuild (or at least resigning). So that's a different thing.

Though, I understand there might be need to to move packages like this
in certain scenarios. Is this just for purpose or redistributing the
packages in a new partial mirror, or alternate branding?

I would be interested to hear more about the use case to learn if we
could introduce a new feature to suit this requirement.

Timo
Details
Message ID
<BZLL8CSB7JHD.2FJF7E8L20TI@homura>
In-Reply-To
<20200102222839.67345997@vostro.lan> (view parent)
DKIM signature
missing
Download raw message
Some use cases would include installing packages from a local cache,
downloading them out of band and feeding them to apk when resolving
system problems, pulling a package out of edge/another release and
sticking it into your own repo to make it available to a particular
system (for example, I sometimes pull packages from edge/testing over to
mirror.sr.ht).

I would also argue this from a philosophical question, because the
signature verifies the data is authentic, not where it comes from.
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200102224643.1f2ace2e@vostro.lan>
In-Reply-To
<CAAOiGNz+TN0nvWMB5YLnvFVk4jEGxN3s8VbnzJNGV_UU8FNSfw@mail.gmail.com> (view parent)
DKIM signature
missing
Download raw message
On Mon, 30 Dec 2019 14:47:38 -0600
Ariadne Conill <ariadne@dereferenced.org> wrote:

> On Mon, Dec 30, 2019 at 6:56 AM Timo Teras <timo.teras@iki.fi> wrote:
> >
> > Hi all,
> >
> > I am currently going through the list of data that goes to a package
> > and a repository index, as well as the installed db. And trying to
> > draft the first schema version of what data goes where. So I am now
> > having some issues I'd like to discuss here.
> >
> > 1. Repository pinning
> >
> > There's one fundamental issue in the current installed-db that
> > causes pain - especially when we do strong signing. This is how the
> > package pinning is done (the "@edge" tagging to enable specific
> > repositories for specific dependencies only).
> >
> > Main problem is that the package origin repository needs to be
> > tracked for detecting pinning changes and it's not in the package
> > meta data currently. The current workaround is to have the origin
> > tag in installed-db which means there's data that cannot be signed
> > ahead of time. There's also some other subtle issues.
> >
> > My thinking is to start putting the repository meta data
> > (distribution name, branch, component etc.) in the package. This
> > way the package origin is known and signed.  
> 
> The problem with this is that it makes it difficult to pack a new repo
> with apk fetch.  I would prefer to retain that capability, as I need
> it for work.  While it is possible that we could repack the fetched
> packages with new metadata, it seems wasteful to me.

Is this same issue as what Drew mentioned? But I'm rather curious on
this. Perhaps we should instead allow building mixed repos like this?

How about resigning the packages? You need a key for the new repo
anyway...

But am actually thinking now that from both security and maintenance
side it's better to ship the repository as part of the package. We can
then see in what environment the package was built.

Also, there's issues on supporting "apk add http://path/to/pkg.apk" and
for that to work the package needs to contain the repository data.
Though, if we are going more and more "follow the repository"
direction, supporting this might not be feasible.

> > 2. repositories list format
> >
> > If the above happens, we might need to do some changes how the tags
> > are specified.
> >
> > There was some discussion earlier if we should support more debian
> > style definition of listing the distro repositories. E.g.
> >   http://dl-cdn.alpinelinux.org/alpine edge main community  
> 
> I think this is only worth it if we add support for multiple types of
> repository (for example deb-src).  Otherwise, we should keep things
> simple.

Noted.

> > Where the first word is the base URL (or perhaps even some $MIRROR
> > variable). The second word the distribution branch. And remaining
> > words would be the list of enabled repositories.
> >
> > I think the package naming could then be:
> >   $base_url/$branch/$repo/$arch/$pkgname-$pkgrel.$uniqueid.$arch.apk
> > and automatically constructed from the package metadata.  
> 
> I don't follow.  There's no guarantee that a generated URI will
> actually point to a package that still exists.  Packages are added and
> removed from repos all the time.

That was mostly to indicate that I'm planning to change the visible
package naming scheme.

And yes, the repo will get still update. Perhaps we add some options to
keep packages in the repo for a grace time before deletion - so using
stale index would still work during that grace time.

> > (Also wondering if the $uniqueid should be just random generated
> > uuid, or some sort of hash calculated from the package metadata and
> > contents. The requirement is that it can be used to identify if two
> > packages are the same or not.)  
> 
> If we really need a uniqueid (I'm skeptical), then I would suggest
> using a truncated hash for it, or perhaps simply a CRC32 or similar.

Yes, I suppose that makes sense. Especially if considering reproducible
builds.

> > 3. 'noarch' handling
> >
> > When implementing the above, I would finally like to properly
> > implement the 'noarch'. Currently the sources set 'noarch' and
> > build subpackage properly. But they are put to the target
> > architecture's storage and when creating index the arch is
> > rewritten to the target arch always. The plan is to start creating
> > real 'noarch' repository and put the built package there. I'm
> > wonder if we'd put separate index there, or include the noarch
> > packages also in the target arch index.  
> 
> We should use a separate index for noarch.  I would prefer to see it
> work in a way where we could move Alpine to being a fully multi-arch
> distro in the future, where we can use qemu-user to run binaries for
> other archs.  This mostly solves cross-compiling in a clean way, too.

That means few more downloads, but I guess that's acceptable. It
probably simplifies things on the repository management side too.

Reminds me, I still have not made decision, if the new format will be
fixed to one endianess, or if I'll try to generate target endianess
file. In the latter case we'd need two copies of noarch: the little and
the big endian one.

> > 4. version handling
> >
> > Sort of unrelated, but something I'd like to also bring up once
> > again. Since now that if we do proper distribution / branch
> > tracking. And the package downgrades happen at times. I'm wondering
> > if we should make the package version "informative" only. And use
> > the build_time to decide which package is the "preferred version".
> > In most cases it is the latest built package from the repository we
> > want to be using.  
> 
> If a package (or admin) declares a specific version dependency, we
> should prefer it unless we supply --available.  A problem with using
> build_time as the automatic preference occurs in the case where a user
> mixes repositories without correctly pinning them.  In the present
> case, the highest version matching the declared dependencies will
> always be preferred.  If we switch to build_time, this case may result
> in versions being mixed if a security update happens in an old release
> that isn't simply a version bump.

The intent was not to override versioned world dependency or pinning.
But yes, it would a problem to add multiple different repositories
without pinning tag.

I wonder if we should make pinning explicit requirement when mixing
branches.

Allowing mixing things make people assume that it's supported. And
we've already got several bug reports on this. It's becoming FAQ that
yes, pinning might work between edge/latest-stable for a while. But
it's more intended for the testing repo. If ABI changes on a library in
the build development the produced, you cannot mix the packages anymore.

> > Alternative would be introduce some sort of concept similar to
> > debian/pacman package "epoch".
> >
> > Though another way to look at it that the buildtime is the
> > automatically generated epoch number. :)  
> 
> I suspect the epoch concept is actually *not* the right way to go,
> which is why I have never been in favor of it.  I suspect what we need
> to do is have repository weighting, and use the weightings to
> determine which version should be selected instead of blindly taking
> the highest version.  That way we can say edge has highest preference,
> but 3.11 has moderate preference, when calculating the upgrade
> transaction(s).

Ok, this is another interesting approach. We could embed in the
repository index metadata on the preference of it. Perhaps even in the
packages.

Perhaps we need to little bit better formulate first how we want the
repository and package preference to work, before we go and figure what
to put in the packages.

Timo
Details
Message ID
<20200116121517.0a050f85@ncopa-desktop.copa.dup.pw>
In-Reply-To
<20191230145542.1a7ca9cf@vostro> (view parent)
DKIM signature
missing
Download raw message
On Mon, 30 Dec 2019 14:55:42 +0200
Timo Teras <timo.teras@iki.fi> wrote:

> Hi all,
> 
> I am currently going through the list of data that goes to a package
> and a repository index, as well as the installed db. And trying to
> draft the first schema version of what data goes where. So I am now
> having some issues I'd like to discuss here.
> 
> 1. Repository pinning
> 
> There's one fundamental issue in the current installed-db that causes
> pain - especially when we do strong signing. This is how the package
> pinning is done (the "@edge" tagging to enable specific repositories
> for specific dependencies only).
> 
> Main problem is that the package origin repository needs to be tracked
> for detecting pinning changes and it's not in the package meta data
> currently. The current workaround is to have the origin tag in
> installed-db which means there's data that cannot be signed ahead of
> time. There's also some other subtle issues.

What are the pinning changes you need to detect? Do you have any
example of the problem?
 
> My thinking is to start putting the repository meta data (distribution
> name, branch, component etc.) in the package. This way the package
> origin is known and signed.

We already have `origin` which has the information about which aport
the package came from:

  $ apk search --origin lua5.2-apk
  apk-tools-2.10.4-r3

I think have the build time origin is useful and I would like to have
the information if if was build from `main/` or `community/` in there.
I think signing the build time origin is ok.

However, the install time origin is a different story. We need to be
able to collect a subset of a repository and generate a new index. We
dot that for our release media where we `apk fetch` a list of packages
and their dependencies and store those in apks/ on the ISO image.

I think it is useful to be able to do so without need to re-sign the
packages, even if we need to sign the index.

> 
> 2. repositories list format
> 
> If the above happens, we might need to do some changes how the tags are
> specified.
> 
> There was some discussion earlier if we should support more debian
> style definition of listing the distro repositories. E.g.
>   http://dl-cdn.alpinelinux.org/alpine edge main community
> 
> Where the first word is the base URL (or perhaps even some $MIRROR
> variable). The second word the distribution branch. And remaining words
> would be the list of enabled repositories.

I think this is useful. The use case is for setting up repositories for
build time dependencies. we currently have a .rootbld-repositories[1] in our
aports tree that defines the dependencies for each repository. That way
we can define that when building packages in `community`, we also need
to use `main` repository for dependencies, but not testing. But we may
want use the mirror from the system. (/etc/apk/repositories) but not
the rest of the info there.

I'm also thinking that we could have a list of mirrors so we could
fetch in parallel from different mirrors. (which ofc gives interesting
problems if the mirrors are out of sync)

Separating mirror, distribution/release branch and repository is a
general good idea I think.

> I think the package naming could then be:
>   $base_url/$branch/$repo/$arch/$pkgname-$pkgrel.$uniqueid.$arch.apk
> and automatically constructed from the package metadata.
> 
> (Also wondering if the $uniqueid should be just random generated uuid,
> or some sort of hash calculated from the package metadata and
> contents. The requirement is that it can be used to identify if two
> packages are the same or not.)

I'm only skeptic to the uniqeid part here. It is useful to be able to
wget https://.../apk-tools-static.apk and extract that. Same with
busybox-static. But i guess thats a special case for apk-tools and
busybox, and I guess we can solve that differently.

Another problem with the unique id is that abuild currently needs to
know if a package is build or not, before the package is build (and the
unique id is not generated). Currently it checks if package exists locally:

  if [ ! -f "$REPODEST/$repo/${subpkgarch/noarch/$CARCH}/$subpkgname-$pkgver-r$pkgrel.apk" ]; then

from https://gitlab.alpinelinux.org/alpine/abuild/blob/master/abuild.in#L1968

This is so running `abuild -r` a second time gives:

  $ abuild -r
  >>> apk-tools: Package is up to date

In other words abuild needs to be able to calculate the unique id before
the package is built. Otherwise abuild will always rebuild everything,
which is kind of annoying when building 300 packages due to an ABI
breakage.

I guess that can be solved by consulting the index, but then we need
tooling for that.

> 3. 'noarch' handling
> 
> When implementing the above, I would finally like to properly
> implement the 'noarch'. Currently the sources set 'noarch' and build
> subpackage properly. But they are put to the target architecture's
> storage and when creating index the arch is rewritten to the target
> arch always. The plan is to start creating real 'noarch' repository
> and put the built package there. I'm wonder if we'd put separate
> index there, or include the noarch packages also in the target arch
> index.

The biggest challenge with a proper 'noarch' handling is that it
requires coordination of the builders of different arches. If a build
time dependency is noarch, should the builder build it or should it
wait for some other builder to build it?

If builders of all arches builds it, which builder should upload it to
the shared `noarch` repository?

If we give the responsibility for noarch to a specific arch builder
(lets say x86_64), what do we do if there is a arch package (lets say
aarch64) that depends on  a mixed arch/noarch aport that is disabled
for the noarch builders arch?

How do we early detect if noarch flag is wrongly set? For example a
package could have only generated C headers that are arch dependent
(like linux-headers). Or if -doc package is noarch and the man-page is
generated at build time, and there are different options depending on
arch.

What about packages that has both arch dependeant and noarch dependant 

Also if we store the noarch in different directory in locally built
repo, we get the problem of detecting if package is built or not if the
aport generates both arch and noarch subpackages. With unique id (as
mentioned above) we now need to consult 2 different indexes to know if
a package needs to be rebuilt or not.

IMHO, separating noarch creates some complicated problems so I question
if the value it brings outweigths the cost.

I think it makes sense to first have 100% reproducibility built noarch
packages, and a new coordinated build infra structure in place before
we finally fix the 'noarch' handling. The current build infra is too
stupid and simple.

> 4. version handling
> 
> Sort of unrelated, but something I'd like to also bring up once again.
> Since now that if we do proper distribution / branch tracking. And the
> package downgrades happen at times. I'm wondering if we should make
> the package version "informative" only. And use the build_time to
> decide which package is the "preferred version". In most cases it is
> the latest built package from the repository we want to be using.
> 
> Alternative would be introduce some sort of concept similar to
> debian/pacman package "epoch".
> 
> Though another way to look at it that the buildtime is the
> automatically generated epoch number. :)

I don't think that buildtime (or even git commit time) is a good source
for deciding what version is to be preferred. I am afraid that it will
make us think in what order we build packages.

For example, lets say we have package foo-4.5 in community repo and the
next gen foo-5.0 in testing, and user has both community and testing
repos enabled. Now there is a security issue so foo-5.1 and foo-4.6 is
released. Which version the user ends up with now depends of which
order those are fixed. If developer push testing/foo-5.1 first then
will user end up with the community/foo-4.6 due to it has a newer build
time stamp.

I sort of like Ariadne's idea of repository weighting though.

Thanks!

> 
> --
> 
> Any thoughts, comments or concerns regarding the above planned
> changes?
> 
> Thanks,
> Timo
Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200117032933.421dfe8b@vostro>
In-Reply-To
<20200116121517.0a050f85@ncopa-desktop.copa.dup.pw> (view parent)
DKIM signature
missing
Download raw message
On Thu, 16 Jan 2020 12:15:17 +0100
Natanael Copa <ncopa@alpinelinux.org> wrote:

> On Mon, 30 Dec 2019 14:55:42 +0200
> Timo Teras <timo.teras@iki.fi> wrote:
> 
> > I am currently going through the list of data that goes to a package
> > and a repository index, as well as the installed db. And trying to
> > draft the first schema version of what data goes where. So I am now
> > having some issues I'd like to discuss here.
> > 
> > 1. Repository pinning
> > 
> > There's one fundamental issue in the current installed-db that
> > causes pain - especially when we do strong signing. This is how the
> > package pinning is done (the "@edge" tagging to enable specific
> > repositories for specific dependencies only).
> > 
> > Main problem is that the package origin repository needs to be
> > tracked for detecting pinning changes and it's not in the package
> > meta data currently. The current workaround is to have the origin
> > tag in installed-db which means there's data that cannot be signed
> > ahead of time. There's also some other subtle issues.  
> 
> What are the pinning changes you need to detect? Do you have any
> example of the problem?

So the problem in general is that if you have edge's main + community
untagged, and @testing tagged.

Then the following happens:
 1. install package from @testing
 2. that package gets updated in our repo, and the old one get's removed
 3. the index is updated

After this we have system with package from @testing. To be able to
recalculate the system consistency, we need to be able to verify that
the currently installed package came from @testing. If this is not
possible apk would currently try to enforce upgrading to the updated
version, even if something else was just added (the updated tagged
package would be preferred).

This becomes even more important on run-from-ram system. When on reboot
recalculating what can be installed. On above instance the package
would no be installed if the tag cannot be validated.

Currently, the installed database contains the repository tag as-is
where it was installed. This is inserted by apk on install time. For
run-from-ram systems this is also stored in the cache repository index. 

But the above system is fragile. And can fail in certain scenarios too.

The point is that having the @tag syntax in repositories that is
locally configurable is problematic. My idea was to perhaps keep the
@tag syntax but bind it to something that comes from the index and is
present in the packages.

> > My thinking is to start putting the repository meta data
> > (distribution name, branch, component etc.) in the package. This
> > way the package origin is known and signed.  
> 
> We already have `origin` which has the information about which aport
> the package came from:
> 
>   $ apk search --origin lua5.2-apk
>   apk-tools-2.10.4-r3
> 
> I think have the build time origin is useful and I would like to have
> the information if if was build from `main/` or `community/` in there.
> I think signing the build time origin is ok.
> 
> However, the install time origin is a different story. We need to be
> able to collect a subset of a repository and generate a new index. We
> dot that for our release media where we `apk fetch` a list of packages
> and their dependencies and store those in apks/ on the ISO image.
> 
> I think it is useful to be able to do so without need to re-sign the
> packages, even if we need to sign the index.

Yes. This is different. The media is partial mirror of the repository.
The plan is to be able to clone partial mirrors easily.

So to clarify, the mirror name, or local media partial mirrors would
just work. They would have the full original index and the selection of
packages present that are specified. We can add tooling to do "apk
mkmirror <list-of-packages>" or similar.

What I want to avoid is that someone takes just the .apk file from
edge/main, and puts into his alpine mirror of v3.14-stable. And comes
telling it's broken. There's already lot of confusion on what are the
limitations of tagged repositories, and the "compatibility" of mixing
edge with stable releases. This probably needs better disclaimer
somewhere.

The other question is also that if we build a package from our
builders. Do we want to allow them to be rebranded as
"new-cool-linux(tm)" without resigning them? I think no. So in this
sense I think signing also the repository name makes sense.

But taking step back, there's two pieces of meta data we may or may not
be interested in:
 1. the aports source repository / package name where it's built from.
 The source package is in the origin field already. We may or may not
 be interested in the aports name.

 2. the build environment. Mostly this would be set by the builder to
 declare what base distribution was used to build it. Though, tracking
 all of it might be burdensome. So perhaps it's simpler to just state
 that this package for built for "alpine / edge / community".

The above would also help the caching support. I believe many already
use (and I'd like to officially design this to be done correctly)
shared apk cache between containers. There are subtle conditions when
this might work now. But I'd basically like to make apk cache a local
alpine mirror that has one or more distros/releases/arches in the same
hierarchy. So the cache dir would have the same
distro/release/repository structure under it.

To make the above work even when installing individual packages, the
package should contain the information...

So for me putting this information to package makes a lot of sense from
functional, security and usability side of things.

If you have use case where the "rebranding" of package would be needed
without modifying/resigning the package is needed, I'd like to learn
more about it. For the generic "make partial mirror locally, on boot
media, on cache" would be supported out of box. Copy full index / copy
only wanted packages. apk would have the ability to go to next mirror /
media source if the index listed package is not there.

One more option to consider is to put the repository tag origin in
installed db and the overlay somehow. But it needs to be removed from
the cache. In this case the signatures would not cover the tag info on
regular install. Perhaps we can then do secure variant by having local
signing key for the overlay/install-db, and trust comes from kernel
command line / initramfs / TPM.

> > 2. repositories list format
> > 
> > If the above happens, we might need to do some changes how the tags
> > are specified.
> > 
> > There was some discussion earlier if we should support more debian
> > style definition of listing the distro repositories. E.g.
> >   http://dl-cdn.alpinelinux.org/alpine edge main community
> > 
> > Where the first word is the base URL (or perhaps even some $MIRROR
> > variable). The second word the distribution branch. And remaining
> > words would be the list of enabled repositories.  
> 
> I think this is useful. The use case is for setting up repositories
> for build time dependencies. we currently have a
> .rootbld-repositories[1] in our aports tree that defines the
> dependencies for each repository. That way we can define that when
> building packages in `community`, we also need to use `main`
> repository for dependencies, but not testing. But we may want use the
> mirror from the system. (/etc/apk/repositories) but not the rest of
> the info there.
> 
> I'm also thinking that we could have a list of mirrors so we could
> fetch in parallel from different mirrors. (which ofc gives interesting
> problems if the mirrors are out of sync)
> 
> Separating mirror, distribution/release branch and repository is a
> general good idea I think.

Probably we need to keep the syntax backwards compatible. Would just
some variable substitution syntax $MIRROR be enough?

> > I think the package naming could then be:
> >   $base_url/$branch/$repo/$arch/$pkgname-$pkgrel.$uniqueid.$arch.apk
> > and automatically constructed from the package metadata.
> > 
> > (Also wondering if the $uniqueid should be just random generated
> > uuid, or some sort of hash calculated from the package metadata and
> > contents. The requirement is that it can be used to identify if two
> > packages are the same or not.)  
> 
> I'm only skeptic to the uniqeid part here. It is useful to be able to
> wget https://.../apk-tools-static.apk and extract that. Same with
> busybox-static. But i guess thats a special case for apk-tools and
> busybox, and I guess we can solve that differently.

wget | tar pipeline would no longer work. You'd need apk to extract, or
convert to tar first anyway. I suggest we would provide the raw static
executable in future.

> Another problem with the unique id is that abuild currently needs to
> know if a package is build or not, before the package is build (and
> the unique id is not generated). Currently it checks if package
> exists locally:
> 
>   if [ ! -f
> "$REPODEST/$repo/${subpkgarch/noarch/$CARCH}/$subpkgname-$pkgver-r$pkgrel.apk"
> ]; then
> 
> from
> https://gitlab.alpinelinux.org/alpine/abuild/blob/master/abuild.in#L1968
> 
> This is so running `abuild -r` a second time gives:
> 
>   $ abuild -r
>   >>> apk-tools: Package is up to date  
> 
> In other words abuild needs to be able to calculate the unique id
> before the package is built. Otherwise abuild will always rebuild
> everything, which is kind of annoying when building 300 packages due
> to an ABI breakage.
> 
> I guess that can be solved by consulting the index, but then we need
> tooling for that.

Yes, this would need to be fixed somehow different. Doing query if the
package with the specific $pkgver-r$pkgrel version exists should be
good enough. Perhaps even adding away to query to package's timestamp
so that could be used to test if the APKBUILD was modified afterwards
without pkgrel bump.

> > 3. 'noarch' handling
> > 
> > When implementing the above, I would finally like to properly
> > implement the 'noarch'. Currently the sources set 'noarch' and build
> > subpackage properly. But they are put to the target architecture's
> > storage and when creating index the arch is rewritten to the target
> > arch always. The plan is to start creating real 'noarch' repository
> > and put the built package there. I'm wonder if we'd put separate
> > index there, or include the noarch packages also in the target arch
> > index.  
> 
> The biggest challenge with a proper 'noarch' handling is that it
> requires coordination of the builders of different arches. If a build
> time dependency is noarch, should the builder build it or should it
> wait for some other builder to build it?
> 
> If builders of all arches builds it, which builder should upload it to
> the shared `noarch` repository?
> 
> If we give the responsibility for noarch to a specific arch builder
> (lets say x86_64), what do we do if there is a arch package (lets say
> aarch64) that depends on  a mixed arch/noarch aport that is disabled
> for the noarch builders arch?

I'd probably make it responsibility of the first valid builder
architecture. Or designate x86_64 if 'all' is used.

> How do we early detect if noarch flag is wrongly set? For example a
> package could have only generated C headers that are arch dependent
> (like linux-headers). Or if -doc package is noarch and the man-page is
> generated at build time, and there are different options depending on
> arch.

We need better lint / CI handling for this. But that's packager's error.
Those happen and we cannot automate all of those checks. Of course it's
good if we catch more of these. But there are many other ways to mess
up the package too.

> What about packages that has both arch dependeant and noarch
> dependant 

But yes, there could be some sync issues. Especially if the 'noarch' is
a dependency of the next package to be built.

> Also if we store the noarch in different directory in locally built
> repo, we get the problem of detecting if package is built or not if
> the aport generates both arch and noarch subpackages. With unique id
> (as mentioned above) we now need to consult 2 different indexes to
> know if a package needs to be rebuilt or not.

Correct. We'd use two indexes.

> IMHO, separating noarch creates some complicated problems so I
> question if the value it brings outweigths the cost.

Hmm... I think the biggest complication is the packages that have
noarch subpkgs. One option is to deprecate that. Do the pure noarch
packages (no subpackage is arch dependent) only as noarch. And
everything else would need to be built target specific. This would
greatly simplify the problem I think.

> I think it makes sense to first have 100% reproducibility built noarch
> packages, and a new coordinated build infra structure in place before
> we finally fix the 'noarch' handling. The current build infra is too
> stupid and simple.

Agreed. Doing noarch requires infra updates too.

> 
> > 4. version handling
> > 
> > Sort of unrelated, but something I'd like to also bring up once
> > again. Since now that if we do proper distribution / branch
> > tracking. And the package downgrades happen at times. I'm wondering
> > if we should make the package version "informative" only. And use
> > the build_time to decide which package is the "preferred version".
> > In most cases it is the latest built package from the repository we
> > want to be using.
> > 
> > Alternative would be introduce some sort of concept similar to
> > debian/pacman package "epoch".
> > 
> > Though another way to look at it that the buildtime is the
> > automatically generated epoch number. :)  
> 
> I don't think that buildtime (or even git commit time) is a good
> source for deciding what version is to be preferred. I am afraid that
> it will make us think in what order we build packages.
> 
> For example, lets say we have package foo-4.5 in community repo and
> the next gen foo-5.0 in testing, and user has both community and
> testing repos enabled. Now there is a security issue so foo-5.1 and
> foo-4.6 is released. Which version the user ends up with now depends
> of which order those are fixed. If developer push testing/foo-5.1
> first then will user end up with the community/foo-4.6 due to it has
> a newer build time stamp.
> 
> I sort of like Ariadne's idea of repository weighting though.

Right.

So one thing to consider is to deprecate tagged repository support, and
add repository weighing to decide which comes first?

Or do we need to introduce some new knob to do that?

Timo

repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200117093110.13bfdc9f@vostro.lan>
In-Reply-To
<20200117032933.421dfe8b@vostro> (view parent)
DKIM signature
missing
Download raw message
On Fri, 17 Jan 2020 03:29:33 +0200
Timo Teras <timo.teras@iki.fi> wrote:

> So the problem in general is that if you have edge's main + community
> untagged, and @testing tagged.
> 
> Then the following happens:
>  1. install package from @testing
>  2. that package gets updated in our repo, and the old one get's
> removed 3. the index is updated
> 
> After this we have system with package from @testing. To be able to
> recalculate the system consistency, we need to be able to verify that
> the currently installed package came from @testing. If this is not
> possible apk would currently try to enforce upgrading to the updated
> version, even if something else was just added (the updated tagged
> package would be preferred).
> 
> This becomes even more important on run-from-ram system. When on
> reboot recalculating what can be installed. On above instance the
> package would no be installed if the tag cannot be validated.
> 
> Currently, the installed database contains the repository tag as-is
> where it was installed. This is inserted by apk on install time. For
> run-from-ram systems this is also stored in the cache repository
> index. 
> 
> But the above system is fragile. And can fail in certain scenarios
> too.
> 
> The point is that having the @tag syntax in repositories that is
> locally configurable is problematic. My idea was to perhaps keep the
> @tag syntax but bind it to something that comes from the index and is
> present in the packages.

I've been thinking this further.

The original issue why this was really, really needed was that the
run-from-ram setups can properly boot from cache when the package is no
longer in any index.

Another option to solve this, would be following scheme:
 - no repository information in the package
 - no storing of the tag info to installed-db nor cache
 - resolve repo tags always on run time
 - if package is installed, assume it has been in repository with
   proper tag. thus ignore any tag checking for these.
   (caveat: if someone modifies 'world' manually and adds the tag, that
   is not picked up properly)
 - for run-from-ram systems, the store a separate list in the overlay
   that contains the exact set of packages the system is running.

   caveat: after ugprade, one would need to do lbu commit to store the
   new system setup. though, this might in fact be preferred function,
   since this would also fix a known issue that doing "apk update" but
   not "apk upgrade" and then booting might break the reboot.

One more kludgy thing is the virtual packages. Those probably need to
be treated similarly as 'world'. Either as etc/apk/vpkg config file or
similar.

Having said all this. I am still somewhat concerned and thinking that
putting repository name to the package might be useful thing. But
perhaps in should be the originally-built-from-repository and not the
index name.

Does any of you share my concerns that the repo name should be signed?

Timo

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Details
Message ID
<BZY4HPQ7P5IO.32ODJBJGIGRTN@homura>
In-Reply-To
<20200117093110.13bfdc9f@vostro.lan> (view parent)
DKIM signature
missing
Download raw message
On Fri Jan 17, 2020 at 9:31 AM, Timo Teras wrote:
> Having said all this. I am still somewhat concerned and thinking that
> putting repository name to the package might be useful thing. But
> perhaps in should be the originally-built-from-repository and not the
> index name.
>
> Does any of you share my concerns that the repo name should be signed?

Still NACK on signing the repo name. Signed data should be autonomous of
its original source, so long as it's signed it doesn't matter how it got
to you.

The package should be tagged in world, so if that tag is unavailable
perhaps we can just print a warning on apk operations listing packages
which are tagged for nonexistent repos.

I'm also wondering if it would be wise for us to write a solver spec
before doing many more changes to it.

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200118001927.3492f70d@vostro.lan>
In-Reply-To
<BZY4HPQ7P5IO.32ODJBJGIGRTN@homura> (view parent)
DKIM signature
missing
Download raw message
On Fri, 17 Jan 2020 09:06:38 -0500
"Drew DeVault" <sir@cmpwn.com> wrote:

> On Fri Jan 17, 2020 at 9:31 AM, Timo Teras wrote:
> > Having said all this. I am still somewhat concerned and thinking
> > that putting repository name to the package might be useful thing.
> > But perhaps in should be the originally-built-from-repository and
> > not the index name.
> >
> > Does any of you share my concerns that the repo name should be
> > signed?  
> 
> Still NACK on signing the repo name. Signed data should be autonomous
> of its original source, so long as it's signed it doesn't matter how
> it got to you.

Would you be able to give some reasoning, arguments or use-cases why
you think this is the correct approach?

Thanks,
Timo

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Details
Message ID
<BZYFLKB40RWB.2OYQP2D818H8G@homura>
In-Reply-To
<20200118001927.3492f70d@vostro.lan> (view parent)
DKIM signature
missing
Download raw message
On Sat Jan 18, 2020 at 12:19 AM, Timo Teras wrote:
> > Still NACK on signing the repo name. Signed data should be autonomous
> > of its original source, so long as it's signed it doesn't matter how
> > it got to you.
>
> Would you be able to give some reasoning, arguments or use-cases why
> you think this is the correct approach?

The whole point of cryptographic signing is to be able to move packages
over an untrusted medium without ill effect. Should we also sign the
mirror URL? I don't think so. What if someone wants to stand up a new
mirror, do they really need us to intervene and agree to set up a key
for them?

Consider for example that I run Alpine CI on builds.sr.ht. What if I
want to cache downloaded packages on the LAN for faster builds by adding
a "magic" repo?

These kinds of use-cases ought to be supported. If the package contents
are signed by a trusted key, it's legit. Doesn't matter where it came
from.

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200118011336.1b420be4@vostro.lan>
In-Reply-To
<BZYFLKB40RWB.2OYQP2D818H8G@homura> (view parent)
DKIM signature
missing
Download raw message
On Fri, 17 Jan 2020 17:48:52 -0500
"Drew DeVault" <sir@cmpwn.com> wrote:

> On Sat Jan 18, 2020 at 12:19 AM, Timo Teras wrote:
> > > Still NACK on signing the repo name. Signed data should be
> > > autonomous of its original source, so long as it's signed it
> > > doesn't matter how it got to you.  
> >
> > Would you be able to give some reasoning, arguments or use-cases why
> > you think this is the correct approach?  
> 
> The whole point of cryptographic signing is to be able to move
> packages over an untrusted medium without ill effect. Should we also
> sign the mirror URL? I don't think so. What if someone wants to stand
> up a new mirror, do they really need us to intervene and agree to set
> up a key for them?
> 
> Consider for example that I run Alpine CI on builds.sr.ht. What if I
> want to cache downloaded packages on the LAN for faster builds by
> adding a "magic" repo?
> 
> These kinds of use-cases ought to be supported. If the package
> contents are signed by a trusted key, it's legit. Doesn't matter
> where it came from.

The above is not restricted in my suggestion. The signature is *not*
over the URL.

What I proposed putting in it, is the distro name and repository
portions only. E.g. the string "alpine/edge/community" or similar.

Doing caches and mirrors would still be supported as expected.

Are there any more detailed requirements you might have?

And do you think it should be possible to clone alpine/edge/community
and rename it to "mynewlinux/x.y-stable/main" without resigning?

Timo

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Details
Message ID
<185d5d5ee06c85855c43c3386bae7e90@dereferenced.org>
In-Reply-To
<20200118001927.3492f70d@vostro.lan> (view parent)
DKIM signature
missing
Download raw message
Hello,

January 17, 2020 4:19 PM, "Timo Teras" <timo.teras@iki.fi> wrote:

> On Fri, 17 Jan 2020 09:06:38 -0500
> "Drew DeVault" <sir@cmpwn.com> wrote:
> 
>> On Fri Jan 17, 2020 at 9:31 AM, Timo Teras wrote:
>> Having said all this. I am still somewhat concerned and thinking
>> that putting repository name to the package might be useful thing.
>> But perhaps in should be the originally-built-from-repository and
>> not the index name.
>> 
>> Does any of you share my concerns that the repo name should be
>> signed?
>> 
>> Still NACK on signing the repo name. Signed data should be autonomous
>> of its original source, so long as it's signed it doesn't matter how
>> it got to you.
> 
> Would you be able to give some reasoning, arguments or use-cases why
> you think this is the correct approach?

Downstream of Alpine, we use apk fetch to compose repositories for
customers which contain the exact set of packages we provide support
for.  These package sets are not aligned with the repository split
that upstream Alpine uses.  It would be desirable to retain this
functionality without having to resign the packages.

While breaking this functionality would only require some minor
rework of our scripts (to resign the packages), it also breaks the
ability to audit the supply chain: our customer cannot verify that
their package has actually originated from Alpine if we resign it
at present.  Accordingly, it would be desirable in any case that
we have to rewrite the control section of the package to be able to
include a signed copy of the previous control section, to ensure that
the supply chain audit-ability requirement is preserved.

Being able to compose new repositories from existing ones *and*
preserve the original signatures is, unfortunately, for various
reasons, a hard requirement for us.

Thanks,
Ariadne

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Details
Message ID
<BZYG4U9UHDDI.Q01LV0A35QJY@homura>
In-Reply-To
<20200118011336.1b420be4@vostro.lan> (view parent)
DKIM signature
missing
Download raw message
On Sat Jan 18, 2020 at 1:13 AM, Timo Teras wrote:
> Are there any more detailed requirements you might have?

No, at this point I'm mostly objecting on philisophical grounds and a
suspicion that this is going to restrict us more than it helps us in the
future.

> And do you think it should be possible to clone alpine/edge/community
> and rename it to "mynewlinux/x.y-stable/main" without resigning?

Sure, why not? What if I want to construct a new repo out of packages
cherry-picked from elsewhere? What if someone wants to support an
alternative release model for Alpine?

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Timo Teras <timo.teras@iki.fi>
Details
Message ID
<20200119130141.33011217@vostro>
In-Reply-To
<185d5d5ee06c85855c43c3386bae7e90@dereferenced.org> (view parent)
DKIM signature
missing
Download raw message
On Fri, 17 Jan 2020 23:15:08 +0000
"Ariadne Conill" <ariadne@dereferenced.org> wrote:

> January 17, 2020 4:19 PM, "Timo Teras" <timo.teras@iki.fi> wrote:
> 
> > On Fri, 17 Jan 2020 09:06:38 -0500
> > "Drew DeVault" <sir@cmpwn.com> wrote:
> >   
> >> On Fri Jan 17, 2020 at 9:31 AM, Timo Teras wrote:
> >> Having said all this. I am still somewhat concerned and thinking
> >> that putting repository name to the package might be useful thing.
> >> But perhaps in should be the originally-built-from-repository and
> >> not the index name.
> >> 
> >> Does any of you share my concerns that the repo name should be
> >> signed?
> >> 
> >> Still NACK on signing the repo name. Signed data should be
> >> autonomous of its original source, so long as it's signed it
> >> doesn't matter how it got to you.  
> > 
> > Would you be able to give some reasoning, arguments or use-cases why
> > you think this is the correct approach?  
> 
> Downstream of Alpine, we use apk fetch to compose repositories for
> customers which contain the exact set of packages we provide support
> for.  These package sets are not aligned with the repository split
> that upstream Alpine uses.  It would be desirable to retain this
> functionality without having to resign the packages.
> 
> While breaking this functionality would only require some minor
> rework of our scripts (to resign the packages), it also breaks the
> ability to audit the supply chain: our customer cannot verify that
> their package has actually originated from Alpine if we resign it
> at present.  Accordingly, it would be desirable in any case that
> we have to rewrite the control section of the package to be able to
> include a signed copy of the previous control section, to ensure that
> the supply chain audit-ability requirement is preserved.
> 
> Being able to compose new repositories from existing ones *and*
> preserve the original signatures is, unfortunately, for various
> reasons, a hard requirement for us.

Thanks for this. I suppose it this make sense.

My next questions would be then about what kind of signing policy you
are using/would like to have?

There obviously is Alpine key and your key in place.

So, what would be the desired policy:

a) sign packages with both keys, check only your key? (alpine signature
is more for sideband verification)

b) sign packages with both keys, have apk check both signatures?

c) sign packages with alpine key, index with your key; have clients
trust both?

d) sign index with your key; include trust-delegation-signature of
alpine key in index so that packages are signed with alpine key only,
and client has only your key?

e) something else? what?

Cheers,
Timo

Re: repo pinning, whether to include repository name in pkg [was Re: new package format and repository layout changes]

Details
Message ID
<25c9d7abc6500086acc511bf121b76db@dereferenced.org>
In-Reply-To
<20200119130141.33011217@vostro> (view parent)
DKIM signature
missing
Download raw message
Hello,

January 19, 2020 5:01 AM, "Timo Teras" <timo.teras@iki.fi> wrote:

> On Fri, 17 Jan 2020 23:15:08 +0000
> "Ariadne Conill" <ariadne@dereferenced.org> wrote:
> 
>> January 17, 2020 4:19 PM, "Timo Teras" <timo.teras@iki.fi> wrote:
>> 
>> On Fri, 17 Jan 2020 09:06:38 -0500
>> "Drew DeVault" <sir@cmpwn.com> wrote:
>> 
>> On Fri Jan 17, 2020 at 9:31 AM, Timo Teras wrote:
>> Having said all this. I am still somewhat concerned and thinking
>> that putting repository name to the package might be useful thing.
>> But perhaps in should be the originally-built-from-repository and
>> not the index name.
>> 
>> Does any of you share my concerns that the repo name should be
>> signed?
>> 
>> Still NACK on signing the repo name. Signed data should be
>> autonomous of its original source, so long as it's signed it
>> doesn't matter how it got to you.
>> 
>> Would you be able to give some reasoning, arguments or use-cases why
>> you think this is the correct approach?
>> 
>> Downstream of Alpine, we use apk fetch to compose repositories for
>> customers which contain the exact set of packages we provide support
>> for. These package sets are not aligned with the repository split
>> that upstream Alpine uses. It would be desirable to retain this
>> functionality without having to resign the packages.
>> 
>> While breaking this functionality would only require some minor
>> rework of our scripts (to resign the packages), it also breaks the
>> ability to audit the supply chain: our customer cannot verify that
>> their package has actually originated from Alpine if we resign it
>> at present. Accordingly, it would be desirable in any case that
>> we have to rewrite the control section of the package to be able to
>> include a signed copy of the previous control section, to ensure that
>> the supply chain audit-ability requirement is preserved.
>> 
>> Being able to compose new repositories from existing ones *and*
>> preserve the original signatures is, unfortunately, for various
>> reasons, a hard requirement for us.
> 
> Thanks for this. I suppose it this make sense.
> 
> My next questions would be then about what kind of signing policy you
> are using/would like to have?
> 
> There obviously is Alpine key and your key in place.
> 
> So, what would be the desired policy:
> 
> a) sign packages with both keys, check only your key? (alpine signature
> is more for sideband verification)
> 
> b) sign packages with both keys, have apk check both signatures?
> 
> c) sign packages with alpine key, index with your key; have clients
> trust both?
> 
> d) sign index with your key; include trust-delegation-signature of
> alpine key in index so that packages are signed with alpine key only,
> and client has only your key?
> 
> e) something else? what?

The ideal outcome would be (c).  I don't wish to repack packages
unless absolutely necessary.  Being able to sign the alpine key
with our own does sound nice though, but we presently work around
this by including our keys and the Alpine keys in /etc/apk/keys,
so it is not really that important.

Ariadne
Reply to thread Export thread (mbox)