Mail archive
alpine-devel

Re: [alpine-devel] RFC: Fixing license field in APKBUILDs (or a bit more)

From: William Pitcock <nenolod_at_dereferenced.org>
Date: Wed, 31 Jan 2018 12:26:26 -0600

Hello,

On Mon, Jan 29, 2018 at 4:23 PM, Przemysław Pawełczyk <przemoc_at_zoho.com> wrote:
> Preface
> -------
>
> It is kind of a follow up to the the previous thread started month ago:
>
> License naming in APKBUILD - SPDX License List
>
> Please check it if you haven't already.
>
>
> Intro
> -----
>
> Conversion from simplistic and imprecise license naming that was used
> before in Alpine Linux (e.g. GPL, GPL2, BSD, etc.) to slightly more
> verbose but also more precise and standardized license naming will
> undoubtedly make quality of Alpine Linux packages higher.
>
> SPDX license identifiers are already getting adoption in many
> open-source circles. I believe that Alpine Linux did a good thing by
> deciding to use SPDX over half year ago. Unluckily, or maybe luckily,
> conversion didn't truly followed on back then. There were some changes
> here and there, but nothing of greater scale to really nail all existing
> packages. I wrote "luckily", because at the end of 2017 SPDX License
> List got new version 3, which has some changes compared to version 2.x.
>
> I believe, as I already wrote in previous thread, that we should stick
> to this new version, and most likely to its updates too, when they will
> be ready, as I doubt they will be disruptive.
>
> One unfortunate thing about sticking with version 3 of the list is that
> one of distros reusing Alpine Linux as its base, Adelie Linux, is
> apparently fixed on older version of SPDX License List, so already done
> and upcoming changes may be not truly welcomed by them to some extent,
> but I hope we'll be able to resolve all problems eventually and Alpine
> Linux and Adelie Linux relationship will remain good and healthy.

Adelie strongly prefers to use SPDX 2.

We have already done some amount of license audit (e.g. for the subset
of Alpine packages we ship), which has been using SPDX 2 identifiers.
If we switch to SPDX 3 identifiers, we will have to start over, as
they will need to be reverified.
In addition, all packages that we are planning to upstream (KDE)
presently use the SPDX 2 identifiers.
We also have already done a lot of work to incorporate SPDX 2 into our
standard packaging procedures, a few contributors complained that SPDX
3 identifiers are "annoying" and "mental bandwidth wasting."

A possible compromise would be to allow either SPDX 2 or SPDX 3
identifiers, based on the maintainer's preference: SPDX 3 deprecates
but does not remove the SPDX 2 identifiers; in other words SPDX 3 is a
superset of SPDX 2. Put differently, any tool which works with SPDX 3
identifiers has to work with SPDX 2 identifiers as well.

> It will be great achievement if we'll manage to correctly define all
> licenses of available packages before releasing Alpine Linux 3.8.
> There are roughly 3 months for that. It's not much for 4000+ packages!
> It's most likely even not enough, but we won't know without trying!

I am alright with delaying Alpine 3.8 release to get this done, it is
very important.
Overall, this change is also needed in Adelie in order to freeze, as
we need to be able to give our supported customers clear answers on
what their legal situation is.
We would like to contribute the results from our audit upstream, but
we have already been working on this using SPDX 2 identifiers.

>
>
> Present
> -------
>
> Some changes in license fields are already happening, but we need to
> pause for a moment and look how they're done right now, or at least how
> they were done so far.
>
> Roughly 2 kind of activities happened in aports repository since
> 2017-12-30 regarding license field in various APKBUILDs:
> - invidual changes,
> - massive changes.
>
> Massive change was only one, already mentioned in previous thread, and
> as Jakub stated in his commit 63f5e7d29565 himself, "no verification has
> been done if the specified license information is correct!"
> Therefore all packages being part of this massive change will need to be
> investigated anyway.
>
> Invidual changes were about dozens to this date. They're hopefully
> correct. They seem like casual changes "I read that mail, so I'll fix
> this APKBUILD", and they're appreciated, but they're not good enough in
> the big picture. I'll explain it soon.
>
>
> Problems
> --------
>
> How these efforts could be improved and what needs to be changed to be
> able to do it properly, i.e. actually fix license fields and not only
> replace them from one group of letters to other groups of letters and
> pretend we're done?
>
> Let's mention the problems we're facing now.
>
> 0. Lack of organized work.
>
> 1. Lack of trackability.
>
> Sheer amount of packages in Alpine Linux make casual change approach
> impractical. Corrected license field in one APKBUILD is
> indistinguishable from another one that hasn't been scrutinized yet,
> which is unacceptable.
>
> 2. Lack of veritability.
>
> That may sound harsh, but I think that one pair of eye per package is
> not enough. Why? Because providing wrong license information is
> worse than not providing it at all, therefore such information must
> be verified by others.
>
> 3. Lack of subpackage licenses.
>
> Well, they're thoretically possible already in APKBUILDs. You have
> to redefine license variable in subpackage function. It is very
> rarely done, though, and it's kind of understandable why it is like
> that considering inconvenience of redefining variables.
>
> Let me give you an example. Let's look at LZ4 library.
> Its README.md file states "LZ4 library is provided as open-source
> software using BSD 2-Clause license." So BSD-2-Clause, easy, right?
> Checking README file is not enough. LICENSE file gives better image,
> because you can read there "all other files [not in the lib dir] use
> a GPLv2 license, unless explicitly stated otherwise". But if you'll
> look into source code of test and CLI tools, you'll find that it's
> not GPL-2.0-only, as one could presume, but actually GPL-2.0-or-later
> (and I think this is the reason why SPDX decided to abandon GPL-2.0
> and GPL-2.0+ naming style, as the first one is too similar to casual
> GPLv2, which can mean both in practice).
> Test tools usually aren't shipped in packages, so that wouldn't be a
> problem, but CLI tools are shipped. So lz4 package should have
> GPL-2.0-or-later license only, while lz4-libs should have
> BSD-2-Clause license only.
>
> 4. Lack of non-space license separator.
>
> Space is not good enough, because complex licenses can contain space.
> Example: LGPL-2.1-only WITH Nokia-Qt-exception-1.1
>
> SPDX power doesn't come only from its wide license list, but from the
> fact that people behind it actually thought about it and came with
> license expressions, so not only exceptions can be expressed, like in
> the given example, but also dual-licensing, etc.
>
> So you may ask, why there is a need for some separator if there are
> these expressions? I'm not an expert in this field, but I believe
> there is a difference between multiple-licensed source file
> (depending on conjunctive or disjunctive character of licensing,
> you'll use AND or OR operators, e.g. Apache-2.0 AND MIT, GPL-2.0-only
> OR MIT) and having different licenses for different source files
> that are all part of one final product. If half of program's source
> code is licensed under MIT, and other half is licensed under
> Apache-2.0, in my opinion you shouldn't describe it as MIT AND
> Apache-2.0 or MIT OR Apache-2.0, as both descriptions are misleading.
> The only way I see to describe it would be: MIT<separator>Apache-2.0.
> The separator definitely feels like "and", but it's different than
> AND and I think it's better to preserve such distinction.
>
> 5. Support for non-SPDX licenses.
>
> SPDX License List, including license exceptions, is quite broad, but
> there may be still some custom licenses, that aren't widely used and
> therefore weren't recognized by SPDX so far, but are used in some of
> packages available in Alpine Linux. Putting license="custom" is not
> a solution. Leaving license field empty and introducing !spdx option
> (*) is also bad, because project may use mix of SPDX and non-SPDX
> licenses.
>
> (*) I'm assuming that in future there will be support in abuild for
> checking license field whether licenses mentioned in it conform
> to SPDX names; Carlo together with Natanael already did some work
> toward that, which is appreciated, but with this message I hope
> it becomes clear that PoC presented so far is not good enough and
> ultimately some dedicated library/tool may be needed to properly
> deal with that, because parsing in shell script may not
> necessarily be an easy and sane way.
>
> 6. Lack of reusability.
>
> This part may interest Alpine Linux community the least, but if there
> are efforts related to documenting open-source world, it's better if
> they're done in a manner that is easy to be reused by others.
> APKBUILD format may look nifty, being in fact busybox's ash script,
> but it gives not only nice possibilities (that can be abused), but
> also many limitations, like poor data types, lack of nested structs,
> etc.
>
>
> Solutions?
> ----------
>
> I was thinking for a considerable time about it and my ideas actually
> changed through this process and I would like to share them with you and
> hear your feedback. First I'll address mentioned problems.
>
> 1. APKBUILD with fixed licenses needs some kind of marking.
> In my last mail I suggested adding !license option to practically all
> APKBUILDs, so after fixing the license, option would be removed and
> that's how we could differentiate APKBUILDs that already passed
> license inspection. But I'm not fond of this idea anymore, as I'm no
> longer sure that options field is the right place for such stuff.
> (Also license inspection should not overlook new packages that were
> added this year and supposedly already with good license info,
> because license inspection should happen independently of standard
> reviews happenning for new aports that land in testing. My point is
> to always try to have correct license for new packages, but don't
> stress it too much before release of Alpine Linux 3.8, because it
> will be kind of transitory period and we can become much more strict
> later, and promotion from testing to community or main should be always
> preceded with thorough license inspection anyway.)

We could put another metadata field in, such as:

# X-License-Verifier: Name <e-mail>

The person who has audited and fixed the license on the package would
add that header to sign off on it. This solves point 2 below, as
well.

> 2. License verification needs to be recorded, so people won't be
> rechecking stuff that has already reached some threshold (I think
> that 3 people sounds good for starters) and whenever mistake is
> found, previous reviews must be invalidated.
> Git commit messages alone aren't good enough for that, because you
> won't be able to invalidate them.

I believe Alpine needs a "legal" working group which handles auditing
the distribution for license compliance, as well as determining
whether or not custom licenses meet the OSI redistribution guidelines.

The license audit would be a good initial project for such a working group.

> 3. APKBUILD format needs to be somehow changed, extended or replaced.
> I believe it's a topic worth discussing, but possibly in some
> separate RFC thread.

I think it is better to push for restructuring APKBUILD for the 3.9
cycle at the earliest.

> I don't want to dwell on it too much here now, but I think that
> introducing another file, e.g. APKBUILD.meta, for structured data in
> human-readable format (like JSON, YAML, etc.) that would take all
> variables from APKBUILD and be able to put them in some hierarchy,
> would make package info more manageable and more maintainable.
> Shell scripts are quite unfortunate to work with as data storage
> containers. So APKBUILD after such extraction wouldn't have any
> variables, or at least no package-related variables, and would
> contain only functions necessary to describe building and packaging.
> There may be need for some kind of mechanism exposing information
> stored in APKBUILD.meta for APKBUILD, but in most cases it shouldn't
> be really needed and abuild would simply need to learn reading such
> additional file.
>
> Instead of creating separate file, it could be embedded into
> one big variable, but that could be more error prone, because of
> lacking proper syntax check, etc.
>
> Anyway, any smaller or bigger revolutions regarding APKBUILD (& co)
> won't happen soon (or sadly, may not happen at all, because I can
> foresee great opposition for such changes), but the bigger and more
> widely-used Alpine Linux becomes, the harder it is to improve some
> older decisions, so it's better to approach it earlier than never.
>
> 4. License expressions can be seperated with comma for instance.
> It seems like a natural choice, and for better appearance such commas
> could be followed by a space.

This seems fine.

> 5. Non-SDPX licenses need some kind of unique naming.
> That will allow to spot if there is more than one usage of such
> license. Then we can try to request a license added to the SPDX
> License List. Anyway, we need to track all non-SPDX licenses seen in
> packages and introduce some temporary identifiers for them that must
> be clearly discernible from SPDX identifiers. I think that putting
> non-SPDX identifiers in angle brackets, e.g. <Alpine-1.0>, which are
> commonly used for placeholders, should do the job, yet still make it
> possible to easily parse them and discern even if they were part of
> multiple-license expression.

This also seems fine.

> 6. As I wrote earlier, shell scripts are poor solutions for data
> storage, therefore I think canonical information regarding licenses
> shouldn't be put in aports, but in a completely new repository with
> flat hierarchy of software projects. No, I'm not proposing removing
> license field from APKBUILDs, but to make these fields populated or
> fixed in aports with the help of some scripts (that aren't written
> yet, but should be easy to do for 99% of cases) using data from this
> new upcoming repository, on a regular basis - weekly or every two
> weeks sounds rational.
>
> Having dedicated repository (I'll call it spdxify for now) for gathering
> data about licenses used by various software projects seems like the
> best way to move forward.
>
> It will reduce noise in aports, allowing to import fixed licenses in
> batches and will avoid adding additional stuff to APKBUILD just to track the
> progress. aports is also a moving target, so working outside of it will
> get rid of many collisions that would be inevitable otherwise.
>
> I think that spdxify repository layout could look like:
>
> +- lz4
> | +- 0NAME -- official name of the project
> | +- 0REPO -- official repository \ at least one of these
> | +- 0SRC -- official tarball location / should be present
> | +- licenses -- license expressions covering main
> | | software product (library in this case);
> | | one license expression per line
> | +- licenses-cli -- license expressions covering supplementary
> | | software products (CLI tools in this case)
> | | if they differ from main ones
> | | one license expression per line
> | +- licenses-doc -- license expressions covering documentation
> | | if they differ from main ones
> | .
> | . (perhaps more licenses* files)
> | .
> | |
> | +- reviewers -- ISO 8601 date and reviewer's full name
> | per line
> .
>
> Hierarchy should be flat, because there is no need for favoritism,
> what is in testing today in Alpine Linux, can be in community few
> weeks later, and I think that reflecting Alpine Linux hierarchy
> wouldn't be beneficial here, leading to noise like mentioned moves.
>
> 0NAME, 0REPO, 0SRC are files that will make information contained in
> the repository useful in a standalone manner, i.e. without access to
> aports. There can be same named project that will need having
> different directory names (obviously), so it's important to be able
> to tell what actual project is referred to in given directory.
> First come, first served should work fine, and new colliding project
> names would get a suffix _N, where N denotes N-th collision.
>
> There will be at least one licenses file for each project, and more
> if there are many products of its building/installing that may not
> necessarily be bundled together. Each licenses file should have one
> SPDX license expression per line, and first line should contain the most
> prominent one license expression if there are many in the project.
>
> Integral part of the whole idea is the concept of reviewers.
> Reviewer is the person who clones repository or downloads the most
> recent tarball of software project and inspects whether licenses found
> there match what licenses* file state and do the fixes if there are any
> mistakes. If there are mistakes, then old entries in reviewers file are
> removed before adding new one, but if there are no mistakes, then new
> reviewer is simply appended. Each reviewer's name should be preceded
> with the date (in ISO 8601) when review has been finished.
>
> Inside such repository there should be also .scripts folder with
> simple shell scripts to ease some tasks, like adding entry to
> reviewers files (based on user.name from git's config) followed by a
> git commit with automatic message, finding software with particular
> number of reviewers or not yet reviewed by you, etc.
>
> Outside of this repository we will need mentioned earlier Alpine
> Linux-specific scripts that will aid converting what's in licenses files
> into license field of APKBUILD files, and some mapping file for
> non-obvious cases (obvious cases are when package is named exactly the
> same in aports as in spdxify and there is only one licenses file), e.g.:
>
> lz4 lz4:cli
> lz4:libs lz4
>
> In such mapping combining more than one licenses* file into one
> license field will be also possible.
>
> That's roughly how I see it. I'm sure I didn't cover all the corners,
> but you should get some picture after reading this wall of text.
>
> I don't have all these scripts written yet and spdxify repository has
> not been created yet either. I plan to "snapshot" aports state very
> soon (hopefully on 2018-01-30 or 2018-01-31) and use packages in main/
> as the base to create first set of software projects that will need to
> be inspected. There are over 2000 packages in main, so I plan to split
> it into batches of ~500, which means [a-g]* packages from main will be
> the first ones. I don't even plan on importing existing license fields
> from APKBUILDs, because I think it may be harmful and more error-prone.
> It's better to start from scratch our license journey and not be biased
> by what was already put in some APKBUILDs (I've seen some mistakes in
> the past and I'm afraid there may be still many more of them).
>
> I haven't started working on all that yet, because I wanted to get some
> feedback whether people see value in such organized approach toward
> fixing license matters in Alpine Linux (that may actually also benefit
> other distributions in future) or not.

I prefer to have a COPYRIGHT file alongside the APKBUILD which
describes each part of the package and it's license situation.
I don't think a separate license repo is necessary.

> Final notes
> -----------
>
> It may look like I take licensing very seriously. Some may argue that
> maybe even too seriously. Common view is that only functionality is
> important and as long as the job can be done it doesn't matter what is
> the license behind the tool used for it. It may be true for most users,
> but others interested in utilizing Alpine Linux for their products,
> services and/or solutions, may not always have this nice freedom of
> choice.

It is important to Adelie, too. For end users we want to raise
awareness of what legal rights the users have to do with their
software. For customers who are engaging us for a support contract,
we need to be able to make their lawyers happy.

> Fixing current license mess will show that Alpine Linux cares about
> quality in yet another department, and I believe it can be beneficial
> to its overall image, but also to all users and developers being part
> of this great community, by rising awareness that licenses do matter.

Agreed.

William


---
Unsubscribe:  alpine-devel+unsubscribe_at_lists.alpinelinux.org
Help:         alpine-devel+help_at_lists.alpinelinux.org
---
Received on Wed Jan 31 2018 - 12:26:26 GMT