It is kind of a follow up to the the previous thread started month ago:
License naming in APKBUILD - SPDX License List
Please check it if you haven't already.
Conversion from simplistic and imprecise license naming that was used
before in Alpine Linux (e.g. GPL, GPL2, BSD, etc.) to slightly more
verbose but also more precise and standardized license naming will
undoubtedly make quality of Alpine Linux packages higher.
SPDX license identifiers are already getting adoption in many
open-source circles. I believe that Alpine Linux did a good thing by
deciding to use SPDX over half year ago. Unluckily, or maybe luckily,
conversion didn't truly followed on back then. There were some changes
here and there, but nothing of greater scale to really nail all existing
packages. I wrote "luckily", because at the end of 2017 SPDX License
List got new version 3, which has some changes compared to version 2.x.
I believe, as I already wrote in previous thread, that we should stick
to this new version, and most likely to its updates too, when they will
be ready, as I doubt they will be disruptive.
One unfortunate thing about sticking with version 3 of the list is that
one of distros reusing Alpine Linux as its base, Adelie Linux, is
apparently fixed on older version of SPDX License List, so already done
and upcoming changes may be not truly welcomed by them to some extent,
but I hope we'll be able to resolve all problems eventually and Alpine
Linux and Adelie Linux relationship will remain good and healthy.
It will be great achievement if we'll manage to correctly define all
licenses of available packages before releasing Alpine Linux 3.8.
There are roughly 3 months for that. It's not much for 4000+ packages!
It's most likely even not enough, but we won't know without trying!
Some changes in license fields are already happening, but we need to
pause for a moment and look how they're done right now, or at least how
they were done so far.
Roughly 2 kind of activities happened in aports repository since
2017-12-30 regarding license field in various APKBUILDs:
- invidual changes,
- massive changes.
Massive change was only one, already mentioned in previous thread, and
as Jakub stated in his commit 63f5e7d29565 himself, "no verification has
been done if the specified license information is correct!"
Therefore all packages being part of this massive change will need to be
Invidual changes were about dozens to this date. They're hopefully
correct. They seem like casual changes "I read that mail, so I'll fix
this APKBUILD", and they're appreciated, but they're not good enough in
the big picture. I'll explain it soon.
How these efforts could be improved and what needs to be changed to be
able to do it properly, i.e. actually fix license fields and not only
replace them from one group of letters to other groups of letters and
pretend we're done?
Let's mention the problems we're facing now.
0. Lack of organized work.
1. Lack of trackability.
Sheer amount of packages in Alpine Linux make casual change approach
impractical. Corrected license field in one APKBUILD is
indistinguishable from another one that hasn't been scrutinized yet,
which is unacceptable.
2. Lack of veritability.
That may sound harsh, but I think that one pair of eye per package is
not enough. Why? Because providing wrong license information is
worse than not providing it at all, therefore such information must
be verified by others.
3. Lack of subpackage licenses.
Well, they're thoretically possible already in APKBUILDs. You have
to redefine license variable in subpackage function. It is very
rarely done, though, and it's kind of understandable why it is like
that considering inconvenience of redefining variables.
Let me give you an example. Let's look at LZ4 library.
Its README.md file states "LZ4 library is provided as open-source
software using BSD 2-Clause license." So BSD-2-Clause, easy, right?
Checking README file is not enough. LICENSE file gives better image,
because you can read there "all other files [not in the lib dir] use
a GPLv2 license, unless explicitly stated otherwise". But if you'll
look into source code of test and CLI tools, you'll find that it's
not GPL-2.0-only, as one could presume, but actually GPL-2.0-or-later
(and I think this is the reason why SPDX decided to abandon GPL-2.0
and GPL-2.0+ naming style, as the first one is too similar to casual
GPLv2, which can mean both in practice).
Test tools usually aren't shipped in packages, so that wouldn't be a
problem, but CLI tools are shipped. So lz4 package should have
GPL-2.0-or-later license only, while lz4-libs should have
BSD-2-Clause license only.
4. Lack of non-space license separator.
Space is not good enough, because complex licenses can contain space.
Example: LGPL-2.1-only WITH Nokia-Qt-exception-1.1
SPDX power doesn't come only from its wide license list, but from the
fact that people behind it actually thought about it and came with
license expressions, so not only exceptions can be expressed, like in
the given example, but also dual-licensing, etc.
So you may ask, why there is a need for some separator if there are
these expressions? I'm not an expert in this field, but I believe
there is a difference between multiple-licensed source file
(depending on conjunctive or disjunctive character of licensing,
you'll use AND or OR operators, e.g. Apache-2.0 AND MIT, GPL-2.0-only
OR MIT) and having different licenses for different source files
that are all part of one final product. If half of program's source
code is licensed under MIT, and other half is licensed under
Apache-2.0, in my opinion you shouldn't describe it as MIT AND
Apache-2.0 or MIT OR Apache-2.0, as both descriptions are misleading.
The only way I see to describe it would be: MIT<separator>Apache-2.0.
The separator definitely feels like "and", but it's different than
AND and I think it's better to preserve such distinction.
5. Support for non-SPDX licenses.
SPDX License List, including license exceptions, is quite broad, but
there may be still some custom licenses, that aren't widely used and
therefore weren't recognized by SPDX so far, but are used in some of
packages available in Alpine Linux. Putting license="custom" is not
a solution. Leaving license field empty and introducing !spdx option
(*) is also bad, because project may use mix of SPDX and non-SPDX
(*) I'm assuming that in future there will be support in abuild for
checking license field whether licenses mentioned in it conform
to SPDX names; Carlo together with Natanael already did some work
toward that, which is appreciated, but with this message I hope
it becomes clear that PoC presented so far is not good enough and
ultimately some dedicated library/tool may be needed to properly
deal with that, because parsing in shell script may not
necessarily be an easy and sane way.
6. Lack of reusability.
This part may interest Alpine Linux community the least, but if there
are efforts related to documenting open-source world, it's better if
they're done in a manner that is easy to be reused by others.
APKBUILD format may look nifty, being in fact busybox's ash script,
but it gives not only nice possibilities (that can be abused), but
also many limitations, like poor data types, lack of nested structs,
I was thinking for a considerable time about it and my ideas actually
changed through this process and I would like to share them with you and
hear your feedback. First I'll address mentioned problems.
1. APKBUILD with fixed licenses needs some kind of marking.
In my last mail I suggested adding !license option to practically all
APKBUILDs, so after fixing the license, option would be removed and
that's how we could differentiate APKBUILDs that already passed
license inspection. But I'm not fond of this idea anymore, as I'm no
longer sure that options field is the right place for such stuff.
(Also license inspection should not overlook new packages that were
added this year and supposedly already with good license info,
because license inspection should happen independently of standard
reviews happenning for new aports that land in testing. My point is
to always try to have correct license for new packages, but don't
stress it too much before release of Alpine Linux 3.8, because it
will be kind of transitory period and we can become much more strict
later, and promotion from testing to community or main should be always
preceded with thorough license inspection anyway.)
2. License verification needs to be recorded, so people won't be
rechecking stuff that has already reached some threshold (I think
that 3 people sounds good for starters) and whenever mistake is
found, previous reviews must be invalidated.
Git commit messages alone aren't good enough for that, because you
won't be able to invalidate them.
3. APKBUILD format needs to be somehow changed, extended or replaced.
I believe it's a topic worth discussing, but possibly in some
separate RFC thread.
I don't want to dwell on it too much here now, but I think that
introducing another file, e.g. APKBUILD.meta, for structured data in
human-readable format (like JSON, YAML, etc.) that would take all
variables from APKBUILD and be able to put them in some hierarchy,
would make package info more manageable and more maintainable.
Shell scripts are quite unfortunate to work with as data storage
containers. So APKBUILD after such extraction wouldn't have any
variables, or at least no package-related variables, and would
contain only functions necessary to describe building and packaging.
There may be need for some kind of mechanism exposing information
stored in APKBUILD.meta for APKBUILD, but in most cases it shouldn't
be really needed and abuild would simply need to learn reading such
Instead of creating separate file, it could be embedded into
one big variable, but that could be more error prone, because of
lacking proper syntax check, etc.
Anyway, any smaller or bigger revolutions regarding APKBUILD (& co)
won't happen soon (or sadly, may not happen at all, because I can
foresee great opposition for such changes), but the bigger and more
widely-used Alpine Linux becomes, the harder it is to improve some
older decisions, so it's better to approach it earlier than never.
4. License expressions can be seperated with comma for instance.
It seems like a natural choice, and for better appearance such commas
could be followed by a space.
5. Non-SDPX licenses need some kind of unique naming.
That will allow to spot if there is more than one usage of such
license. Then we can try to request a license added to the SPDX
License List. Anyway, we need to track all non-SPDX licenses seen in
packages and introduce some temporary identifiers for them that must
be clearly discernible from SPDX identifiers. I think that putting
non-SPDX identifiers in angle brackets, e.g. <Alpine-1.0>, which are
commonly used for placeholders, should do the job, yet still make it
possible to easily parse them and discern even if they were part of
6. As I wrote earlier, shell scripts are poor solutions for data
storage, therefore I think canonical information regarding licenses
shouldn't be put in aports, but in a completely new repository with
flat hierarchy of software projects. No, I'm not proposing removing
license field from APKBUILDs, but to make these fields populated or
fixed in aports with the help of some scripts (that aren't written
yet, but should be easy to do for 99% of cases) using data from this
new upcoming repository, on a regular basis - weekly or every two
weeks sounds rational.
Having dedicated repository (I'll call it spdxify for now) for gathering
data about licenses used by various software projects seems like the
best way to move forward.
It will reduce noise in aports, allowing to import fixed licenses in
batches and will avoid adding additional stuff to APKBUILD just to track the
progress. aports is also a moving target, so working outside of it will
get rid of many collisions that would be inevitable otherwise.
I think that spdxify repository layout could look like:
| +- 0NAME -- official name of the project
| +- 0REPO -- official repository \ at least one of these
| +- 0SRC -- official tarball location / should be present
| +- licenses -- license expressions covering main
| | software product (library in this case);
| | one license expression per line
| +- licenses-cli -- license expressions covering supplementary
| | software products (CLI tools in this case)
| | if they differ from main ones
| | one license expression per line
| +- licenses-doc -- license expressions covering documentation
| | if they differ from main ones
| . (perhaps more licenses* files)
| +- reviewers -- ISO 8601 date and reviewer's full name
| per line
Hierarchy should be flat, because there is no need for favoritism,
what is in testing today in Alpine Linux, can be in community few
weeks later, and I think that reflecting Alpine Linux hierarchy
wouldn't be beneficial here, leading to noise like mentioned moves.
0NAME, 0REPO, 0SRC are files that will make information contained in
the repository useful in a standalone manner, i.e. without access to
aports. There can be same named project that will need having
different directory names (obviously), so it's important to be able
to tell what actual project is referred to in given directory.
First come, first served should work fine, and new colliding project
names would get a suffix _N, where N denotes N-th collision.
There will be at least one licenses file for each project, and more
if there are many products of its building/installing that may not
necessarily be bundled together. Each licenses file should have one
SPDX license expression per line, and first line should contain the most
prominent one license expression if there are many in the project.
Integral part of the whole idea is the concept of reviewers.
Reviewer is the person who clones repository or downloads the most
recent tarball of software project and inspects whether licenses found
there match what licenses* file state and do the fixes if there are any
mistakes. If there are mistakes, then old entries in reviewers file are
removed before adding new one, but if there are no mistakes, then new
reviewer is simply appended. Each reviewer's name should be preceded
with the date (in ISO 8601) when review has been finished.
Inside such repository there should be also .scripts folder with
simple shell scripts to ease some tasks, like adding entry to
reviewers files (based on user.name from git's config) followed by a
git commit with automatic message, finding software with particular
number of reviewers or not yet reviewed by you, etc.
Outside of this repository we will need mentioned earlier Alpine
Linux-specific scripts that will aid converting what's in licenses files
into license field of APKBUILD files, and some mapping file for
non-obvious cases (obvious cases are when package is named exactly the
same in aports as in spdxify and there is only one licenses file), e.g.:
In such mapping combining more than one licenses* file into one
license field will be also possible.
That's roughly how I see it. I'm sure I didn't cover all the corners,
but you should get some picture after reading this wall of text.
I don't have all these scripts written yet and spdxify repository has
not been created yet either. I plan to "snapshot" aports state very
soon (hopefully on 2018-01-30 or 2018-01-31) and use packages in main/
as the base to create first set of software projects that will need to
be inspected. There are over 2000 packages in main, so I plan to split
it into batches of ~500, which means [a-g]* packages from main will be
the first ones. I don't even plan on importing existing license fields
from APKBUILDs, because I think it may be harmful and more error-prone.
It's better to start from scratch our license journey and not be biased
by what was already put in some APKBUILDs (I've seen some mistakes in
the past and I'm afraid there may be still many more of them).
I haven't started working on all that yet, because I wanted to get some
feedback whether people see value in such organized approach toward
fixing license matters in Alpine Linux (that may actually also benefit
other distributions in future) or not.
It may look like I take licensing very seriously. Some may argue that
maybe even too seriously. Common view is that only functionality is
important and as long as the job can be done it doesn't matter what is
the license behind the tool used for it. It may be true for most users,
but others interested in utilizing Alpine Linux for their products,
services and/or solutions, may not always have this nice freedom of
Fixing current license mess will show that Alpine Linux cares about
quality in yet another department, and I believe it can be beneficial
to its overall image, but also to all users and developers being part
of this great community, by rising awareness that licenses do matter.
Received on Mon Jan 29 2018 - 23:23:32 UTC