For discussion of Alpine Linux development and developer support

5 3

[alpine-devel] Report from Reproducible builds summit 2018

Natanael Copa
Details
Message ID
<20181217133328.4dd1ef26@ncopa-desktop.copa.dup.pw>
Sender timestamp
1545050008
DKIM signature
missing
Download raw message
Hi,

I attended the reproducible builds[1] summit in paris last week, and
wanted to give a short report what I learned there and share some
thoughts on reproducible builds for Alpine.

I went to the summit because I think we should make it a long term goal
to make Alpine reproducible built, and I wanted to learn from people
with experience, what to expect and make a plan for Alpine how to get
there.

The summit in Paris was nicely organized with zero powerpoint
presentations. Instead, we were divided in to smaller groups and had a
number of group discussions and work session, where everyone was
encouraged to participate.

The notes from the session are here:
https://pad.riseup.net/p/reproduciblebuilds4-agenda

I tried to get discussions around bootstrapping rust, and how to deal
with golang packaging, but people didn't seem to be too interested in
that.

Some take away points for Alpine:

* We need a way to make older packages available, so that it is
  possible to rebuild the exact same install (or Docker image) later.
  Different distros solves this in different ways. I was told Fedora
  has some archive where they save all older packages. I was told
  Debian uses some sort of (filesystem?) snapshot archive. I have a
  couple of ideas how we could provide this.

* in order to make Alpine reproducible built, it would be good to have
  3rd party do a rebuild of all of our packages and compare with the
  offical packages. kpcyrd from Arch Linux worked on adding Alpine to
  https://tests.reproducible-builds.org and promised to follow up that.

* there are various tools that can compare different binaries to figure
  out why and what differs. I started to work on packaging diffoscope
  for alpine, but bumped into various failures in the test suite. One
  was a bug in libmagic from file(1), and this is now fixed. There were
  two other failures and with some help from diffoscope developers they
  are also fixed now.

* the work done by Suse shows that most packages will likely not need
  any patching. I got a number, ~500 packages of 10000 needed patching
  for Suse. Bernhard from Suse has also documented various common
  issues[2] (with a suggestion to a fix). He also has a tool[3] to
  monitor package versions from different distros, similar to
  release-monitoring.org. Alpine has been added.

I think we should try focus on the v3.9 release now. Once v3.9 is out I
would like to discuss how we can make alpine reproducible built. Just
mentioning some points before I forget:

* we may need to store the exact versions and/or hashes of the
  dependencies used when a package was built. I am not sure where we
  want store this. Maybe in the APKINDEX?

* we embed the signature in the .apk, which means its not possible to
  re-create the exact same .apk without having access to the private
  key. I'm not sure how to deal with that.

* I learned about this thing called IPFS[4], which may be worth have a
  closer look on.

Now, lets get v3.9 out....

-nc

[1]: https://reproducible-builds.org/events/paris2018/
[2]: https://github.com/bmwiedemann/theunreproduciblepackage
[3]: https://maintainer.zq1.de/
[4]: https://ipfs.io


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Chloe Kudryavtsev
Details
Message ID
<1a664e98-3f41-5503-60af-98865c0b785f@toastin.space>
In-Reply-To
<20181217133328.4dd1ef26@ncopa-desktop.copa.dup.pw> (view parent)
Sender timestamp
1545106061
DKIM signature
missing
Download raw message
On 12/17/18 7:33 AM, Natanael Copa wrote:
> * we may need to store the exact versions and/or hashes of the
>    dependencies used when a package was built. I am not sure where we
>    want store this. Maybe in the APKINDEX?

I think this is a good idea. Mostly a note in regards to the next comment.

> * we embed the signature in the .apk, which means its not possible to
>    re-create the exact same .apk without having access to the private
>    key. I'm not sure how to deal with that.

I do not believe we need to allow for that.
Since we want to store exact versions/hashes of dependencies in the 
.apk, I believe we can also store a hash of the resulting tree, 
pre-signature (meaning we sign the hash as well).
This hash should be visible using apk(1), to allow people to 
programmatically verify that two .apks are the same internally, and 
guarantees the integrity of the has in mirrors.


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Oliver Smith
Details
Message ID
<b8bbde6c-90b0-4c96-59e5-8475ae655bb7@bitmessage.ch>
In-Reply-To
<20181217133328.4dd1ef26@ncopa-desktop.copa.dup.pw> (view parent)
Sender timestamp
1545121620
DKIM signature
missing
Download raw message
Hello Natanel and ML,

I'm glad to read about this, thank you for this writeup!

I've looked into reproducible builds myself last year, even had a proof
of concept with a few packages. The tooling can't be re-used, as it was
based on pmbootstrap from postmarketOS, not Alpine's abuild directly.
But maybe I can help with some insights or contribute otherwise.

Natanael Copa:
> * we may need to store the exact versions and/or hashes of the
>   dependencies used when a package was built. I am not sure where we
>   want store this. Maybe in the APKINDEX?

I had created a .buildinfo.json file, where I placed all dependencies
that were installed at the build time, with their versions. That file
was placed next to the main apk (so no extra buildinfo file for
subpackages) in the binary repository directory. Storing the hashes
would be even better. I chose JSON, as it's trivial to parse that with
Python, but since Alpine's build tools are lightweight and do not depend
on Python, using another format probably makes moer sense. The idea for
this file was based on Debian's buildinfo file, that is described here:

https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles

The APKINDEX is generated from the apk files, so we would need to have
the information elsewhere already, right?

> * we embed the signature in the .apk, which means its not possible to
>   re-create the exact same .apk without having access to the private
>   key. I'm not sure how to deal with that.

My cheap workaround for that was: just make all files inside the .apk
file reproducible, not the apk itself. It would be better to have the
entire apk reproducible of course, but to do that, we would need to
store the signature elsewhere (e.g. create a .sig file for each .apk).

Having an extra signature file might also make it easier to allow
multiple entities to sign an apk, e.g. after an independent rebuild.

Regards,
Oliver



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Oliver Smith
Details
Message ID
<9bc897d8-6527-fa55-0f94-89c0722c4a3f@bitmessage.ch>
In-Reply-To
<20181217133328.4dd1ef26@ncopa-desktop.copa.dup.pw> (view parent)
Sender timestamp
1545121620
DKIM signature
missing
Download raw message
Hello Natanel and ML,

I'm glad to read about this, thank you for this writeup!

I've looked into reproducible builds myself last year, even had a proof
of concept with a few packages. The tooling can't be re-used, as it was
based on pmbootstrap from postmarketOS, not Alpine's abuild directly.
But maybe I can help with some insights or contribute otherwise.

Natanael Copa:
> * we may need to store the exact versions and/or hashes of the
>   dependencies used when a package was built. I am not sure where we
>   want store this. Maybe in the APKINDEX?

I had created a .buildinfo.json file, where I placed all dependencies
that were installed at the build time, with their versions. That file
was placed next to the main apk (so no extra buildinfo file for
subpackages) in the binary repository directory. Storing the hashes
would be even better. I chose JSON, as it's trivial to parse that with
Python, but since Alpine's build tools are lightweight and do not depend
on Python, using another format probably makes moer sense. The idea for
this file was based on Debian's buildinfo file, that is described here:

https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles

The APKINDEX is generated from the apk files, so we would need to have
the information elsewhere already, right?

> * we embed the signature in the .apk, which means its not possible to
>   re-create the exact same .apk without having access to the private
>   key. I'm not sure how to deal with that.

My cheap workaround for that was: just make all files inside the .apk
file reproducible, not the apk itself. It would be better to have the
entire apk reproducible of course, but to do that, we would need to
store the signature elsewhere (e.g. create a .sig file for each .apk).

Having an extra signature file might also make it easier to allow
multiple entities to sign an apk, e.g. after an independent rebuild.

Regards,
Oliver



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Max Rees
Details
Message ID
<20181230225258.GB9101@sachiel>
In-Reply-To
<1a664e98-3f41-5503-60af-98865c0b785f@toastin.space> (view parent)
Sender timestamp
1546210379
DKIM signature
missing
Download raw message
On Dec 17 11:07 PM, Chloe Kudryavtsev wrote:
> On 12/17/18 7:33 AM, Natanael Copa wrote:
> > * we may need to store the exact versions and/or hashes of the
> >    dependencies used when a package was built. I am not sure where we
> >    want store this. Maybe in the APKINDEX?
> 
> I think this is a good idea. Mostly a note in regards to the next comment.
> 
> > * we embed the signature in the .apk, which means its not possible to
> >    re-create the exact same .apk without having access to the private
> >    key. I'm not sure how to deal with that.
> 
> I do not believe we need to allow for that.
> Since we want to store exact versions/hashes of dependencies in the .apk, I
> believe we can also store a hash of the resulting tree, pre-signature
> (meaning we sign the hash as well).
> This hash should be visible using apk(1), to allow people to
> programmatically verify that two .apks are the same internally, and
> guarantees the integrity of the has in mirrors.

[apologies to Chloe - I forgot to list-reply on the first draft of this
message]

The "datahash" field of the .PKGINFO file should be able to serve this
purpose - it's the SHA256 checksum of the data.tar.gz file (i.e. the
actual tree contents), and since it's located in control.tar.gz it's
signed as part of the existing .apk file creation process. I agree that
apk(1) or perhaps a standalone utility should make it easier to
get the datahash of an .apk file.

As long as data.tar.gz is created reproducibly, then the datahash should
end up being the same.

Max


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Max Rees
Details
Message ID
<20181231031754.GC9101@sachiel>
In-Reply-To
<b8bbde6c-90b0-4c96-59e5-8475ae655bb7@bitmessage.ch> (view parent)
Sender timestamp
1546226276
DKIM signature
missing
Download raw message
On Dec 18 08:27 AM, Oliver Smith wrote:
> Hello Natanel and ML,
> 
> I'm glad to read about this, thank you for this writeup!
> 
> I've looked into reproducible builds myself last year, even had a proof
> of concept with a few packages. The tooling can't be re-used, as it was
> based on pmbootstrap from postmarketOS, not Alpine's abuild directly.
> But maybe I can help with some insights or contribute otherwise.
> 
> Natanael Copa:
> > * we may need to store the exact versions and/or hashes of the
> >   dependencies used when a package was built. I am not sure where we
> >   want store this. Maybe in the APKINDEX?
> 
> I had created a .buildinfo.json file, where I placed all dependencies
> that were installed at the build time, with their versions. That file
> was placed next to the main apk (so no extra buildinfo file for
> subpackages) in the binary repository directory. Storing the hashes
> would be even better. I chose JSON, as it's trivial to parse that with
> Python, but since Alpine's build tools are lightweight and do not depend
> on Python, using another format probably makes moer sense. The idea for
> this file was based on Debian's buildinfo file, that is described here:
> 
> https://wiki.debian.org/ReproducibleBuilds/BuildinfoFiles

Debian buildinfo seems like a source of good ideas. However I think
that, as you stated, the format should remain simple in order to remain
compatible with abuild. I've been working on abuildd and I think my
current rudimentary strategy would be to dump the contents of
`apk info -v` or similar to a file (probably including the APKINDEX "C:"
field as well) for each package origin (thus not subpackages). This'll
be possible since abuildd is based on rootbld and thus only build-base +
the (recursive) dependencies of the package will be installed for each
build. This information will naturally be quite verbose and thus it
probably shouldn't be put in the APKINDEX - perhaps in a separate git
repo?

> > * we embed the signature in the .apk, which means its not possible to
> >   re-create the exact same .apk without having access to the private
> >   key. I'm not sure how to deal with that.
> 
> My cheap workaround for that was: just make all files inside the .apk
> file reproducible, not the apk itself. It would be better to have the
> entire apk reproducible of course, but to do that, we would need to
> store the signature elsewhere (e.g. create a .sig file for each .apk).

This is also the solution I arrived at - the data.tar.gz should be
reproducible exactly (same datahash), and then we can compare most of
the .PKGINFO modulo the builddate, datetime comment, and packager.

Max


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---