For discussion of Alpine Linux development and developer support

5 4

[alpine-devel] RFC: Fixing license field in APKBUILDs (or a bit more)

Przemysław Pawełczyk
Details
Message ID
<1614404acdb.edd44d4135179.3833405768989653606@zoho.com>
Sender timestamp
1517264612
DKIM signature
missing
Download raw message
Preface
-------

It is kind of a follow up to the the previous thread started month ago:

    License naming in APKBUILD - SPDX License List

Please check it if you haven't already.


Intro
-----

Conversion from simplistic and imprecise license naming that was used
before in Alpine Linux (e.g. GPL, GPL2, BSD, etc.) to slightly more
verbose but also more precise and standardized license naming will
undoubtedly make quality of Alpine Linux packages higher.

SPDX license identifiers are already getting adoption in many
open-source circles.  I believe that Alpine Linux did a good thing by
deciding to use SPDX over half year ago.  Unluckily, or maybe luckily,
conversion didn't truly followed on back then.  There were some changes
here and there, but nothing of greater scale to really nail all existing
packages.  I wrote "luckily", because at the end of 2017 SPDX License
List got new version 3, which has some changes compared to version 2.x.

I believe, as I already wrote in previous thread, that we should stick
to this new version, and most likely to its updates too, when they will
be ready, as I doubt they will be disruptive.

One unfortunate thing about sticking with version 3 of the list is that
one of distros reusing Alpine Linux as its base, Adelie Linux, is
apparently fixed on older version of SPDX License List, so already done
and upcoming changes may be not truly welcomed by them to some extent,
but I hope we'll be able to resolve all problems eventually and Alpine
Linux and Adelie Linux relationship will remain good and healthy.

It will be great achievement if we'll manage to correctly define all
licenses of available packages before releasing Alpine Linux 3.8.
There are roughly 3 months for that.  It's not much for 4000+ packages!
It's most likely even not enough, but we won't know without trying!


Present
-------

Some changes in license fields are already happening, but we need to
pause for a moment and look how they're done right now, or at least how
they were done so far.

Roughly 2 kind of activities happened in aports repository since
2017-12-30 regarding license field in various APKBUILDs:
- invidual changes,
- massive changes.

Massive change was only one, already mentioned in previous thread, and
as Jakub stated in his commit 63f5e7d29565 himself, "no verification has
been done if the specified license information is correct!"
Therefore all packages being part of this massive change will need to be
investigated anyway.

Invidual changes were about dozens to this date.  They're hopefully
correct.  They seem like casual changes "I read that mail, so I'll fix
this APKBUILD", and they're appreciated, but they're not good enough in
the big picture.  I'll explain it soon.


Problems
--------

How these efforts could be improved and what needs to be changed to be
able to do it properly, i.e. actually fix license fields and not only
replace them from one group of letters to other groups of letters and
pretend we're done?

Let's mention the problems we're facing now.

0. Lack of organized work.

1. Lack of trackability.

   Sheer amount of packages in Alpine Linux make casual change approach
   impractical.  Corrected license field in one APKBUILD is
   indistinguishable from another one that hasn't been scrutinized yet,
   which is unacceptable.

2. Lack of veritability.

   That may sound harsh, but I think that one pair of eye per package is
   not enough.  Why?  Because providing wrong license information is
   worse than not providing it at all, therefore such information must
   be verified by others.

3. Lack of subpackage licenses.

   Well, they're thoretically possible already in APKBUILDs.  You have
   to redefine license variable in subpackage function.  It is very
   rarely done, though, and it's kind of understandable why it is like
   that considering inconvenience of redefining variables.

   Let me give you an example.  Let's look at LZ4 library.
   Its README.md file states "LZ4 library is provided as open-source
   software using BSD 2-Clause license."  So BSD-2-Clause, easy, right?
   Checking README file is not enough.  LICENSE file gives better image,
   because you can read there "all other files [not in the lib dir] use
   a GPLv2 license, unless explicitly stated otherwise".  But if you'll
   look into source code of test and CLI tools, you'll find that it's
   not GPL-2.0-only, as one could presume, but actually GPL-2.0-or-later
   (and I think this is the reason why SPDX decided to abandon GPL-2.0
   and GPL-2.0+ naming style, as the first one is too similar to casual
   GPLv2, which can mean both in practice).
   Test tools usually aren't shipped in packages, so that wouldn't be a
   problem, but CLI tools are shipped.  So lz4 package should have
   GPL-2.0-or-later license only, while lz4-libs should have
   BSD-2-Clause license only.

4. Lack of non-space license separator.

   Space is not good enough, because complex licenses can contain space.
   Example: LGPL-2.1-only WITH Nokia-Qt-exception-1.1

   SPDX power doesn't come only from its wide license list, but from the
   fact that people behind it actually thought about it and came with
   license expressions, so not only exceptions can be expressed, like in
   the given example, but also dual-licensing, etc.

   So you may ask, why there is a need for some separator if there are
   these expressions?  I'm not an expert in this field, but I believe
   there is a difference between multiple-licensed source file
   (depending on conjunctive or disjunctive character of licensing,
   you'll use AND or OR operators, e.g. Apache-2.0 AND MIT, GPL-2.0-only
   OR MIT) and having different licenses for different source files
   that are all part of one final product.  If half of program's source
   code is licensed under MIT, and other half is licensed under
   Apache-2.0, in my opinion you shouldn't describe it as MIT AND
   Apache-2.0 or MIT OR Apache-2.0, as both descriptions are misleading.
   The only way I see to describe it would be: MIT<separator>Apache-2.0.
   The separator definitely feels like "and", but it's different than
   AND and I think it's better to preserve such distinction.

5. Support for non-SPDX licenses.

   SPDX License List, including license exceptions, is quite broad, but
   there may be still some custom licenses, that aren't widely used and
   therefore weren't recognized by SPDX so far, but are used in some of
   packages available in Alpine Linux.  Putting license="custom" is not
   a solution.  Leaving license field empty and introducing !spdx option
   (*) is also bad, because project may use mix of SPDX and non-SPDX
   licenses.

   (*) I'm assuming that in future there will be support in abuild for
       checking license field whether licenses mentioned in it conform
       to SPDX names; Carlo together with Natanael already did some work
       toward that, which is appreciated, but with this message I hope
       it becomes clear that PoC presented so far is not good enough and
       ultimately some dedicated library/tool may be needed to properly
       deal with that, because parsing in shell script may not
       necessarily be an easy and sane way.

6. Lack of reusability.

   This part may interest Alpine Linux community the least, but if there
   are efforts related to documenting open-source world, it's better if
   they're done in a manner that is easy to be reused by others.
   APKBUILD format may look nifty, being in fact busybox's ash script,
   but it gives not only nice possibilities (that can be abused), but
   also many limitations, like poor data types, lack of nested structs,
   etc.


Solutions?
----------

I was thinking for a considerable time about it and my ideas actually
changed through this process and I would like to share them with you and
hear your feedback.  First I'll address mentioned problems.

1. APKBUILD with fixed licenses needs some kind of marking.
   In my last mail I suggested adding !license option to practically all
   APKBUILDs, so after fixing the license, option would be removed and
   that's how we could differentiate APKBUILDs that already passed
   license inspection.  But I'm not fond of this idea anymore, as I'm no
   longer sure that options field is the right place for such stuff.
   (Also license inspection should not overlook new packages that were
   added this year and supposedly already with good license info,
   because license inspection should happen independently of standard
   reviews happenning for new aports that land in testing.  My point is
   to always try to have correct license for new packages, but don't
   stress it too much before release of Alpine Linux 3.8, because it
   will be kind of transitory period and we can become much more strict
   later, and promotion from testing to community or main should be always
   preceded with thorough license inspection anyway.)

2. License verification needs to be recorded, so people won't be
   rechecking stuff that has already reached some threshold (I think
   that 3 people sounds good for starters) and whenever mistake is
   found, previous reviews must be invalidated.
   Git commit messages alone aren't good enough for that, because you
   won't be able to invalidate them.

3. APKBUILD format needs to be somehow changed, extended or replaced.
   I believe it's a topic worth discussing, but possibly in some
   separate RFC thread.

   I don't want to dwell on it too much here now, but I think that
   introducing another file, e.g. APKBUILD.meta, for structured data in
   human-readable format (like JSON, YAML, etc.) that would take all
   variables from APKBUILD and be able to put them in some hierarchy,
   would make package info more manageable and more maintainable.
   Shell scripts are quite unfortunate to work with as data storage
   containers.  So APKBUILD after such extraction wouldn't have any
   variables, or at least no package-related variables, and would
   contain only functions necessary to describe building and packaging.
   There may be need for some kind of mechanism exposing information
   stored in APKBUILD.meta for APKBUILD, but in most cases it shouldn't
   be really needed and abuild would simply need to learn reading such
   additional file.

   Instead of creating separate file, it could be embedded into
   one big variable, but that could be more error prone, because of
   lacking proper syntax check, etc.

   Anyway, any smaller or bigger revolutions regarding APKBUILD (& co)
   won't happen soon (or sadly, may not happen at all, because I can
   foresee great opposition for such changes), but the bigger and more
   widely-used Alpine Linux becomes, the harder it is to improve some
   older decisions, so it's better to approach it earlier than never.

4. License expressions can be seperated with comma for instance.
   It seems like a natural choice, and for better appearance such commas
   could be followed by a space.

5. Non-SDPX licenses need some kind of unique naming.
   That will allow to spot if there is more than one usage of such
   license.  Then we can try to request a license added to the SPDX
   License List.  Anyway, we need to track all non-SPDX licenses seen in
   packages and introduce some temporary identifiers for them that must
   be clearly discernible from SPDX identifiers.  I think that putting
   non-SPDX identifiers in angle brackets, e.g. <Alpine-1.0>, which are
   commonly used for placeholders, should do the job, yet still make it
   possible to easily parse them and discern even if they were part of
   multiple-license expression.

6. As I wrote earlier, shell scripts are poor solutions for data
   storage, therefore I think canonical information regarding licenses
   shouldn't be put in aports, but in a completely new repository with
   flat hierarchy of software projects.  No, I'm not proposing removing
   license field from APKBUILDs, but to make these fields populated or
   fixed in aports with the help of some scripts (that aren't written
   yet, but should be easy to do for 99% of cases) using data from this
   new upcoming repository, on a regular basis - weekly or every two
   weeks sounds rational.

Having dedicated repository (I'll call it spdxify for now) for gathering
data about licenses used by various software projects seems like the
best way to move forward.

It will reduce noise in aports, allowing to import fixed licenses in
batches and will avoid adding additional stuff to APKBUILD just to track the
progress.  aports is also a moving target, so working outside of it will
get rid of many collisions that would be inevitable otherwise.

I think that spdxify repository layout could look like:

    +- lz4
    |  +- 0NAME    -- official name of the project
    |  +- 0REPO    -- official repository        \ at least one of these
    |  +- 0SRC     -- official tarball location  / should be present
    |  +- licenses     -- license expressions covering main
    |  |                  software product (library in this case);
    |  |                  one license expression per line
    |  +- licenses-cli -- license expressions covering supplementary
    |  |                  software products (CLI tools in this case)
    |  |                  if they differ from main ones
    |  |                  one license expression per line
    |  +- licenses-doc -- license expressions covering documentation
    |  |                  if they differ from main ones
    |  .                  
    |  .                  (perhaps more licenses* files)
    |  .
    |  |
    |  +- reviewers    -- ISO 8601 date and reviewer's full name
    |                     per line
    .

Hierarchy should be flat, because there is no need for favoritism,
what is in testing today in Alpine Linux, can be in community few
weeks later, and I think that reflecting Alpine Linux hierarchy
wouldn't be beneficial here, leading to noise like mentioned moves.

0NAME, 0REPO, 0SRC are files that will make information contained in
the repository useful in a standalone manner, i.e. without access to
aports.  There can be same named project that will need having
different directory names (obviously), so it's important to be able
to tell what actual project is referred to in given directory.
First come, first served should work fine, and new colliding project
names would get a suffix _N, where N denotes N-th collision.

There will be at least one licenses file for each project, and more
if there are many products of its building/installing that may not
necessarily be bundled together.  Each licenses file should have one
SPDX license expression per line, and first line should contain the most
prominent one license expression if there are many in the project.

Integral part of the whole idea is the concept of reviewers.
Reviewer is the person who clones repository or downloads the most
recent tarball of software project and inspects whether licenses found
there match what licenses* file state and do the fixes if there are any
mistakes.  If there are mistakes, then old entries in reviewers file are
removed before adding new one, but if there are no mistakes, then new
reviewer is simply appended.  Each reviewer's name should be preceded
with the date (in ISO 8601) when review has been finished.

Inside such repository there should be also .scripts folder with
simple shell scripts to ease some tasks, like adding entry to
reviewers files (based on user.name from git's config) followed by a
git commit with automatic message, finding software with particular
number of reviewers or not yet reviewed by you, etc.

Outside of this repository we will need mentioned earlier Alpine
Linux-specific scripts that will aid converting what's in licenses files
into license field of APKBUILD files, and some mapping file for
non-obvious cases (obvious cases are when package is named exactly the
same in aports as in spdxify and there is only one licenses file), e.g.:

    lz4       lz4:cli
    lz4:libs  lz4

In such mapping combining more than one licenses* file into one
license field will be also possible.

That's roughly how I see it.  I'm sure I didn't cover all the corners,
but you should get some picture after reading this wall of text.

I don't have all these scripts written yet and spdxify repository has
not been created yet either.  I plan to "snapshot" aports state very
soon (hopefully on 2018-01-30 or 2018-01-31) and use packages in main/
as the base to create first set of software projects that will need to
be inspected.  There are over 2000 packages in main, so I plan to split
it into batches of ~500, which means [a-g]* packages from main will be
the first ones.  I don't even plan on importing existing license fields
from APKBUILDs, because I think it may be harmful and more error-prone.
It's better to start from scratch our license journey and not be biased
by what was already put in some APKBUILDs (I've seen some mistakes in
the past and I'm afraid there may be still many more of them).

I haven't started working on all that yet, because I wanted to get some
feedback whether people see value in such organized approach toward
fixing license matters in Alpine Linux (that may actually also benefit
other distributions in future) or not.


Final notes
-----------

It may look like I take licensing very seriously.  Some may argue that
maybe even too seriously.  Common view is that only functionality is
important and as long as the job can be done it doesn't matter what is
the license behind the tool used for it.  It may be true for most users,
but others interested in utilizing Alpine Linux for their products,
services and/or solutions, may not always have this nice freedom of
choice.

Fixing current license mess will show that Alpine Linux cares about
quality in yet another department, and I believe it can be beneficial
to its overall image, but also to all users and developers being part
of this great community, by rising awareness that licenses do matter.


Regards,
Przemek



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
William Pitcock
Details
Message ID
<CA+T2pCGAS+ehDKb60JQR8J6ixASGfjfOzQQe4A-UCgHFg=1K2A@mail.gmail.com>
In-Reply-To
<1614404acdb.edd44d4135179.3833405768989653606@zoho.com> (view parent)
Sender timestamp
1517423186
DKIM signature
missing
Download raw message
Hello,

On Mon, Jan 29, 2018 at 4:23 PM, Przemysław Pawełczyk <przemoc@zoho.com> wrote:
> Preface
> -------
>
> It is kind of a follow up to the the previous thread started month ago:
>
>     License naming in APKBUILD - SPDX License List
>
> Please check it if you haven't already.
>
>
> Intro
> -----
>
> Conversion from simplistic and imprecise license naming that was used
> before in Alpine Linux (e.g. GPL, GPL2, BSD, etc.) to slightly more
> verbose but also more precise and standardized license naming will
> undoubtedly make quality of Alpine Linux packages higher.
>
> SPDX license identifiers are already getting adoption in many
> open-source circles.  I believe that Alpine Linux did a good thing by
> deciding to use SPDX over half year ago.  Unluckily, or maybe luckily,
> conversion didn't truly followed on back then.  There were some changes
> here and there, but nothing of greater scale to really nail all existing
> packages.  I wrote "luckily", because at the end of 2017 SPDX License
> List got new version 3, which has some changes compared to version 2.x.
>
> I believe, as I already wrote in previous thread, that we should stick
> to this new version, and most likely to its updates too, when they will
> be ready, as I doubt they will be disruptive.
>
> One unfortunate thing about sticking with version 3 of the list is that
> one of distros reusing Alpine Linux as its base, Adelie Linux, is
> apparently fixed on older version of SPDX License List, so already done
> and upcoming changes may be not truly welcomed by them to some extent,
> but I hope we'll be able to resolve all problems eventually and Alpine
> Linux and Adelie Linux relationship will remain good and healthy.

Adelie strongly prefers to use SPDX 2.

We have already done some amount of license audit (e.g. for the subset
of Alpine packages we ship), which has been using SPDX 2 identifiers.
If we switch to SPDX 3 identifiers, we will have to start over, as
they will need to be reverified.
In addition, all packages that we are planning to upstream (KDE)
presently use the SPDX 2 identifiers.
We also have already done a lot of work to incorporate SPDX 2 into our
standard packaging procedures, a few contributors complained that SPDX
3 identifiers are "annoying" and "mental bandwidth wasting."

A possible compromise would be to allow either SPDX 2 or SPDX 3
identifiers, based on the maintainer's preference: SPDX 3 deprecates
but does not remove the SPDX 2 identifiers; in other words SPDX 3 is a
superset of SPDX 2.  Put differently, any tool which works with SPDX 3
identifiers has to work with SPDX 2 identifiers as well.

> It will be great achievement if we'll manage to correctly define all
> licenses of available packages before releasing Alpine Linux 3.8.
> There are roughly 3 months for that.  It's not much for 4000+ packages!
> It's most likely even not enough, but we won't know without trying!

I am alright with delaying Alpine 3.8 release to get this done, it is
very important.
Overall, this change is also needed in Adelie in order to freeze, as
we need to be able to give our supported customers clear answers on
what their legal situation is.
We would like to contribute the results from our audit upstream, but
we have already been working on this using SPDX 2 identifiers.

>
>
> Present
> -------
>
> Some changes in license fields are already happening, but we need to
> pause for a moment and look how they're done right now, or at least how
> they were done so far.
>
> Roughly 2 kind of activities happened in aports repository since
> 2017-12-30 regarding license field in various APKBUILDs:
> - invidual changes,
> - massive changes.
>
> Massive change was only one, already mentioned in previous thread, and
> as Jakub stated in his commit 63f5e7d29565 himself, "no verification has
> been done if the specified license information is correct!"
> Therefore all packages being part of this massive change will need to be
> investigated anyway.
>
> Invidual changes were about dozens to this date.  They're hopefully
> correct.  They seem like casual changes "I read that mail, so I'll fix
> this APKBUILD", and they're appreciated, but they're not good enough in
> the big picture.  I'll explain it soon.
>
>
> Problems
> --------
>
> How these efforts could be improved and what needs to be changed to be
> able to do it properly, i.e. actually fix license fields and not only
> replace them from one group of letters to other groups of letters and
> pretend we're done?
>
> Let's mention the problems we're facing now.
>
> 0. Lack of organized work.
>
> 1. Lack of trackability.
>
>    Sheer amount of packages in Alpine Linux make casual change approach
>    impractical.  Corrected license field in one APKBUILD is
>    indistinguishable from another one that hasn't been scrutinized yet,
>    which is unacceptable.
>
> 2. Lack of veritability.
>
>    That may sound harsh, but I think that one pair of eye per package is
>    not enough.  Why?  Because providing wrong license information is
>    worse than not providing it at all, therefore such information must
>    be verified by others.
>
> 3. Lack of subpackage licenses.
>
>    Well, they're thoretically possible already in APKBUILDs.  You have
>    to redefine license variable in subpackage function.  It is very
>    rarely done, though, and it's kind of understandable why it is like
>    that considering inconvenience of redefining variables.
>
>    Let me give you an example.  Let's look at LZ4 library.
>    Its README.md file states "LZ4 library is provided as open-source
>    software using BSD 2-Clause license."  So BSD-2-Clause, easy, right?
>    Checking README file is not enough.  LICENSE file gives better image,
>    because you can read there "all other files [not in the lib dir] use
>    a GPLv2 license, unless explicitly stated otherwise".  But if you'll
>    look into source code of test and CLI tools, you'll find that it's
>    not GPL-2.0-only, as one could presume, but actually GPL-2.0-or-later
>    (and I think this is the reason why SPDX decided to abandon GPL-2.0
>    and GPL-2.0+ naming style, as the first one is too similar to casual
>    GPLv2, which can mean both in practice).
>    Test tools usually aren't shipped in packages, so that wouldn't be a
>    problem, but CLI tools are shipped.  So lz4 package should have
>    GPL-2.0-or-later license only, while lz4-libs should have
>    BSD-2-Clause license only.
>
> 4. Lack of non-space license separator.
>
>    Space is not good enough, because complex licenses can contain space.
>    Example: LGPL-2.1-only WITH Nokia-Qt-exception-1.1
>
>    SPDX power doesn't come only from its wide license list, but from the
>    fact that people behind it actually thought about it and came with
>    license expressions, so not only exceptions can be expressed, like in
>    the given example, but also dual-licensing, etc.
>
>    So you may ask, why there is a need for some separator if there are
>    these expressions?  I'm not an expert in this field, but I believe
>    there is a difference between multiple-licensed source file
>    (depending on conjunctive or disjunctive character of licensing,
>    you'll use AND or OR operators, e.g. Apache-2.0 AND MIT, GPL-2.0-only
>    OR MIT) and having different licenses for different source files
>    that are all part of one final product.  If half of program's source
>    code is licensed under MIT, and other half is licensed under
>    Apache-2.0, in my opinion you shouldn't describe it as MIT AND
>    Apache-2.0 or MIT OR Apache-2.0, as both descriptions are misleading.
>    The only way I see to describe it would be: MIT<separator>Apache-2.0.
>    The separator definitely feels like "and", but it's different than
>    AND and I think it's better to preserve such distinction.
>
> 5. Support for non-SPDX licenses.
>
>    SPDX License List, including license exceptions, is quite broad, but
>    there may be still some custom licenses, that aren't widely used and
>    therefore weren't recognized by SPDX so far, but are used in some of
>    packages available in Alpine Linux.  Putting license="custom" is not
>    a solution.  Leaving license field empty and introducing !spdx option
>    (*) is also bad, because project may use mix of SPDX and non-SPDX
>    licenses.
>
>    (*) I'm assuming that in future there will be support in abuild for
>        checking license field whether licenses mentioned in it conform
>        to SPDX names; Carlo together with Natanael already did some work
>        toward that, which is appreciated, but with this message I hope
>        it becomes clear that PoC presented so far is not good enough and
>        ultimately some dedicated library/tool may be needed to properly
>        deal with that, because parsing in shell script may not
>        necessarily be an easy and sane way.
>
> 6. Lack of reusability.
>
>    This part may interest Alpine Linux community the least, but if there
>    are efforts related to documenting open-source world, it's better if
>    they're done in a manner that is easy to be reused by others.
>    APKBUILD format may look nifty, being in fact busybox's ash script,
>    but it gives not only nice possibilities (that can be abused), but
>    also many limitations, like poor data types, lack of nested structs,
>    etc.
>
>
> Solutions?
> ----------
>
> I was thinking for a considerable time about it and my ideas actually
> changed through this process and I would like to share them with you and
> hear your feedback.  First I'll address mentioned problems.
>
> 1. APKBUILD with fixed licenses needs some kind of marking.
>    In my last mail I suggested adding !license option to practically all
>    APKBUILDs, so after fixing the license, option would be removed and
>    that's how we could differentiate APKBUILDs that already passed
>    license inspection.  But I'm not fond of this idea anymore, as I'm no
>    longer sure that options field is the right place for such stuff.
>    (Also license inspection should not overlook new packages that were
>    added this year and supposedly already with good license info,
>    because license inspection should happen independently of standard
>    reviews happenning for new aports that land in testing.  My point is
>    to always try to have correct license for new packages, but don't
>    stress it too much before release of Alpine Linux 3.8, because it
>    will be kind of transitory period and we can become much more strict
>    later, and promotion from testing to community or main should be always
>    preceded with thorough license inspection anyway.)

We could put another metadata field in, such as:

# X-License-Verifier: Name <e-mail>

The person who has audited and fixed the license on the package would
add that header to sign off on it.  This solves point 2 below, as
well.

> 2. License verification needs to be recorded, so people won't be
>    rechecking stuff that has already reached some threshold (I think
>    that 3 people sounds good for starters) and whenever mistake is
>    found, previous reviews must be invalidated.
>    Git commit messages alone aren't good enough for that, because you
>    won't be able to invalidate them.

I believe Alpine needs a "legal" working group which handles auditing
the distribution for license compliance, as well as determining
whether or not custom licenses meet the OSI redistribution guidelines.

The license audit would be a good initial project for such a working group.

> 3. APKBUILD format needs to be somehow changed, extended or replaced.
>    I believe it's a topic worth discussing, but possibly in some
>    separate RFC thread.

I think it is better to push for restructuring APKBUILD for the 3.9
cycle at the earliest.

>    I don't want to dwell on it too much here now, but I think that
>    introducing another file, e.g. APKBUILD.meta, for structured data in
>    human-readable format (like JSON, YAML, etc.) that would take all
>    variables from APKBUILD and be able to put them in some hierarchy,
>    would make package info more manageable and more maintainable.
>    Shell scripts are quite unfortunate to work with as data storage
>    containers.  So APKBUILD after such extraction wouldn't have any
>    variables, or at least no package-related variables, and would
>    contain only functions necessary to describe building and packaging.
>    There may be need for some kind of mechanism exposing information
>    stored in APKBUILD.meta for APKBUILD, but in most cases it shouldn't
>    be really needed and abuild would simply need to learn reading such
>    additional file.
>
>    Instead of creating separate file, it could be embedded into
>    one big variable, but that could be more error prone, because of
>    lacking proper syntax check, etc.
>
>    Anyway, any smaller or bigger revolutions regarding APKBUILD (& co)
>    won't happen soon (or sadly, may not happen at all, because I can
>    foresee great opposition for such changes), but the bigger and more
>    widely-used Alpine Linux becomes, the harder it is to improve some
>    older decisions, so it's better to approach it earlier than never.
>
> 4. License expressions can be seperated with comma for instance.
>    It seems like a natural choice, and for better appearance such commas
>    could be followed by a space.

This seems fine.

> 5. Non-SDPX licenses need some kind of unique naming.
>    That will allow to spot if there is more than one usage of such
>    license.  Then we can try to request a license added to the SPDX
>    License List.  Anyway, we need to track all non-SPDX licenses seen in
>    packages and introduce some temporary identifiers for them that must
>    be clearly discernible from SPDX identifiers.  I think that putting
>    non-SPDX identifiers in angle brackets, e.g. <Alpine-1.0>, which are
>    commonly used for placeholders, should do the job, yet still make it
>    possible to easily parse them and discern even if they were part of
>    multiple-license expression.

This also seems fine.

> 6. As I wrote earlier, shell scripts are poor solutions for data
>    storage, therefore I think canonical information regarding licenses
>    shouldn't be put in aports, but in a completely new repository with
>    flat hierarchy of software projects.  No, I'm not proposing removing
>    license field from APKBUILDs, but to make these fields populated or
>    fixed in aports with the help of some scripts (that aren't written
>    yet, but should be easy to do for 99% of cases) using data from this
>    new upcoming repository, on a regular basis - weekly or every two
>    weeks sounds rational.
>
> Having dedicated repository (I'll call it spdxify for now) for gathering
> data about licenses used by various software projects seems like the
> best way to move forward.
>
> It will reduce noise in aports, allowing to import fixed licenses in
> batches and will avoid adding additional stuff to APKBUILD just to track the
> progress.  aports is also a moving target, so working outside of it will
> get rid of many collisions that would be inevitable otherwise.
>
> I think that spdxify repository layout could look like:
>
>     +- lz4
>     |  +- 0NAME    -- official name of the project
>     |  +- 0REPO    -- official repository        \ at least one of these
>     |  +- 0SRC     -- official tarball location  / should be present
>     |  +- licenses     -- license expressions covering main
>     |  |                  software product (library in this case);
>     |  |                  one license expression per line
>     |  +- licenses-cli -- license expressions covering supplementary
>     |  |                  software products (CLI tools in this case)
>     |  |                  if they differ from main ones
>     |  |                  one license expression per line
>     |  +- licenses-doc -- license expressions covering documentation
>     |  |                  if they differ from main ones
>     |  .
>     |  .                  (perhaps more licenses* files)
>     |  .
>     |  |
>     |  +- reviewers    -- ISO 8601 date and reviewer's full name
>     |                     per line
>     .
>
> Hierarchy should be flat, because there is no need for favoritism,
> what is in testing today in Alpine Linux, can be in community few
> weeks later, and I think that reflecting Alpine Linux hierarchy
> wouldn't be beneficial here, leading to noise like mentioned moves.
>
> 0NAME, 0REPO, 0SRC are files that will make information contained in
> the repository useful in a standalone manner, i.e. without access to
> aports.  There can be same named project that will need having
> different directory names (obviously), so it's important to be able
> to tell what actual project is referred to in given directory.
> First come, first served should work fine, and new colliding project
> names would get a suffix _N, where N denotes N-th collision.
>
> There will be at least one licenses file for each project, and more
> if there are many products of its building/installing that may not
> necessarily be bundled together.  Each licenses file should have one
> SPDX license expression per line, and first line should contain the most
> prominent one license expression if there are many in the project.
>
> Integral part of the whole idea is the concept of reviewers.
> Reviewer is the person who clones repository or downloads the most
> recent tarball of software project and inspects whether licenses found
> there match what licenses* file state and do the fixes if there are any
> mistakes.  If there are mistakes, then old entries in reviewers file are
> removed before adding new one, but if there are no mistakes, then new
> reviewer is simply appended.  Each reviewer's name should be preceded
> with the date (in ISO 8601) when review has been finished.
>
> Inside such repository there should be also .scripts folder with
> simple shell scripts to ease some tasks, like adding entry to
> reviewers files (based on user.name from git's config) followed by a
> git commit with automatic message, finding software with particular
> number of reviewers or not yet reviewed by you, etc.
>
> Outside of this repository we will need mentioned earlier Alpine
> Linux-specific scripts that will aid converting what's in licenses files
> into license field of APKBUILD files, and some mapping file for
> non-obvious cases (obvious cases are when package is named exactly the
> same in aports as in spdxify and there is only one licenses file), e.g.:
>
>     lz4       lz4:cli
>     lz4:libs  lz4
>
> In such mapping combining more than one licenses* file into one
> license field will be also possible.
>
> That's roughly how I see it.  I'm sure I didn't cover all the corners,
> but you should get some picture after reading this wall of text.
>
> I don't have all these scripts written yet and spdxify repository has
> not been created yet either.  I plan to "snapshot" aports state very
> soon (hopefully on 2018-01-30 or 2018-01-31) and use packages in main/
> as the base to create first set of software projects that will need to
> be inspected.  There are over 2000 packages in main, so I plan to split
> it into batches of ~500, which means [a-g]* packages from main will be
> the first ones.  I don't even plan on importing existing license fields
> from APKBUILDs, because I think it may be harmful and more error-prone.
> It's better to start from scratch our license journey and not be biased
> by what was already put in some APKBUILDs (I've seen some mistakes in
> the past and I'm afraid there may be still many more of them).
>
> I haven't started working on all that yet, because I wanted to get some
> feedback whether people see value in such organized approach toward
> fixing license matters in Alpine Linux (that may actually also benefit
> other distributions in future) or not.

I prefer to have a COPYRIGHT file alongside the APKBUILD which
describes each part of the package and it's license situation.
I don't think a separate license repo is necessary.

> Final notes
> -----------
>
> It may look like I take licensing very seriously.  Some may argue that
> maybe even too seriously.  Common view is that only functionality is
> important and as long as the job can be done it doesn't matter what is
> the license behind the tool used for it.  It may be true for most users,
> but others interested in utilizing Alpine Linux for their products,
> services and/or solutions, may not always have this nice freedom of
> choice.

It is important to Adelie, too.  For end users we want to raise
awareness of what legal rights the users have to do with their
software.  For customers who are engaging us for a support contract,
we need to be able to make their lawyers happy.

> Fixing current license mess will show that Alpine Linux cares about
> quality in yet another department, and I believe it can be beneficial
> to its overall image, but also to all users and developers being part
> of this great community, by rising awareness that licenses do matter.

Agreed.

William


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
William Pitcock
Details
Message ID
<CA+T2pCFFNCc+tk-tsjHnRzsEB3X_CNbRRysA2qf5PtvU0MGp=g@mail.gmail.com>
In-Reply-To
<CA+T2pCGAS+ehDKb60JQR8J6ixASGfjfOzQQe4A-UCgHFg=1K2A@mail.gmail.com> (view parent)
Sender timestamp
1517427328
DKIM signature
missing
Download raw message
Hello,

On Wed, Jan 31, 2018 at 12:26 PM, William Pitcock
<nenolod@dereferenced.org> wrote:
> Hello,
>
> On Mon, Jan 29, 2018 at 4:23 PM, Przemysław Pawełczyk <przemoc@zoho.com> wrote:
>> Preface
>> -------
>>
>> It is kind of a follow up to the the previous thread started month ago:
>>
>>     License naming in APKBUILD - SPDX License List
>>
>> Please check it if you haven't already.
>>
>>
>> Intro
>> -----
>>
>> Conversion from simplistic and imprecise license naming that was used
>> before in Alpine Linux (e.g. GPL, GPL2, BSD, etc.) to slightly more
>> verbose but also more precise and standardized license naming will
>> undoubtedly make quality of Alpine Linux packages higher.
>>
>> SPDX license identifiers are already getting adoption in many
>> open-source circles.  I believe that Alpine Linux did a good thing by
>> deciding to use SPDX over half year ago.  Unluckily, or maybe luckily,
>> conversion didn't truly followed on back then.  There were some changes
>> here and there, but nothing of greater scale to really nail all existing
>> packages.  I wrote "luckily", because at the end of 2017 SPDX License
>> List got new version 3, which has some changes compared to version 2.x.
>>
>> I believe, as I already wrote in previous thread, that we should stick
>> to this new version, and most likely to its updates too, when they will
>> be ready, as I doubt they will be disruptive.
>>
>> One unfortunate thing about sticking with version 3 of the list is that
>> one of distros reusing Alpine Linux as its base, Adelie Linux, is
>> apparently fixed on older version of SPDX License List, so already done
>> and upcoming changes may be not truly welcomed by them to some extent,
>> but I hope we'll be able to resolve all problems eventually and Alpine
>> Linux and Adelie Linux relationship will remain good and healthy.
>
> Adelie strongly prefers to use SPDX 2.
>
> We have already done some amount of license audit (e.g. for the subset
> of Alpine packages we ship), which has been using SPDX 2 identifiers.
> If we switch to SPDX 3 identifiers, we will have to start over, as
> they will need to be reverified.
> In addition, all packages that we are planning to upstream (KDE)
> presently use the SPDX 2 identifiers.
> We also have already done a lot of work to incorporate SPDX 2 into our
> standard packaging procedures, a few contributors complained that SPDX
> 3 identifiers are "annoying" and "mental bandwidth wasting."
>
> A possible compromise would be to allow either SPDX 2 or SPDX 3
> identifiers, based on the maintainer's preference: SPDX 3 deprecates
> but does not remove the SPDX 2 identifiers; in other words SPDX 3 is a
> superset of SPDX 2.  Put differently, any tool which works with SPDX 3
> identifiers has to work with SPDX 2 identifiers as well.

After discussing with jirutka, we came to the conclusion that SPDX 2
shorthand identifiers are fine as long as they are not vague.  For
example "GPL-2.0+" is equally valid to "GPL-2.0-or-later".  This
resolves the main gripe that Adelie has with SPDX 3.

William


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
A. Wilcox
Details
Message ID
<ea1c7968-4ccf-edfe-95fc-a7861977647d@adelielinux.org>
In-Reply-To
<20180131194540.GA21821@alpine.my.domain> (view parent)
Sender timestamp
1517431287
DKIM signature
missing
Download raw message
On 01/31/18 13:45, Cág wrote:
> William Pitcock wrote:
>  
>> After discussing with jirutka, we came to the conclusion that SPDX 2
>> shorthand identifiers are fine as long as they are not vague.  For
>> example "GPL-2.0+" is equally valid to "GPL-2.0-or-later".  This
>> resolves the main gripe that Adelie has with SPDX 3.
> 
> Dear,
> 
> I've just worked with the sc package that happens to be in the public
> domain. SPDX doesn't mention how to spell it. Should it be "public
> domain", "Public Domain", "Public-Domain" or something else?
> 
> Thanks
> 

Preface: IANAL, but I have been studying open source legal matters for
over a decade.

sc is, in my not-professional but fairly-well-educated opinion, not
legally packagable.

- There is no license specified.  None of the source files actually
state a license.  While the README states "This is a much modified
version of the public domain spread sheet sc", it does *not* state that
this distribution is still in the public domain.

- Further, the ending of the README:

> Since some people are wary of using a program that has no guarantee, >
I've decided to provide the following guarantee:
>
>  It is a well-known fact that any non-trivial program has bugs.  If
>  you haven't found them, you just haven't stumbled upon the proper
>  combinations of actions that will cause the bugs to manifest them-
>  selves.  Since sc stands for "Spreadsheet Calculator", and since a
>  spreadsheet calculator is by definition a non-trivial program, sc is
>  guaranteed to have bugs.

is not a real license, and does not specify what the user can and cannot
do with the program.  It is simply a tongue-in-cheek guarantee that
there are bugs.

In short, there is no actual license for this software, and it has not
been dedicated to the public domain.  Maybe you can contact upstream and
ask them to use either:

* CC-0 (a fairly legal public domain dedication)
* Unlicense (another fairly legal public domain dedication, specific to
software)
* WTFPL (fits with the tongue-in-cheek manner of the guarantee)

Let me also just make everyone aware that not all jurisdictions
recognise a public domain as even existing, which is why a simple
statement is not enough.  CC-0 and Unlicense (and to a point, WTFPL)
make explicit what you can do with the software even if your
jurisdiction does not recognise PD.

Best,
--arw


-- 
A. Wilcox (awilfox)
Project Lead, Adélie Linux
http://adelielinux.org
Cág
Details
Message ID
<20180131194540.GA21821@alpine.my.domain>
In-Reply-To
<CA+T2pCFFNCc+tk-tsjHnRzsEB3X_CNbRRysA2qf5PtvU0MGp=g@mail.gmail.com> (view parent)
Sender timestamp
1517427940
DKIM signature
missing
Download raw message
William Pitcock wrote:
 
> After discussing with jirutka, we came to the conclusion that SPDX 2
> shorthand identifiers are fine as long as they are not vague.  For
> example "GPL-2.0+" is equally valid to "GPL-2.0-or-later".  This
> resolves the main gripe that Adelie has with SPDX 3.

Dear,

I've just worked with the sc package that happens to be in the public
domain. SPDX doesn't mention how to spell it. Should it be "public
domain", "Public Domain", "Public-Domain" or something else?

Thanks

-- 
caóc



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
V.Krishn
Details
Message ID
<4c90173d-2a9d-63b8-8027-304d85154fb4@gmail.com>
In-Reply-To
<CA+T2pCFFNCc+tk-tsjHnRzsEB3X_CNbRRysA2qf5PtvU0MGp=g@mail.gmail.com> (view parent)
Sender timestamp
1517941267
DKIM signature
missing
Download raw message
On 02/01/2018 01:05 AM, William Pitcock wrote:
> After discussing with jirutka, we came to the conclusion that SPDX 2
> shorthand identifiers are fine as long as they are not vague.  For
> example "GPL-2.0+" is equally valid to "GPL-2.0-or-later".  This
> resolves the main gripe that Adelie has with SPDX 3.

Maybe introduce a file aports (eg. main/busybox/apkbuild.meta)
which also keeps aliases, eg. for "GPL-2.0+" is equally valid to
"GPL-2.0-or-later" (json format) along with other metas,
or a single file in eg aports/.meta OR aports/.meta/aliases

-- 
Regards,
V.Krishn


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---