X-Original-To: alpine-devel@lists.alpinelinux.org Received: from sender-pp-092.zoho.com (sender-pp-092.zoho.com [135.84.80.237]) by lists.alpinelinux.org (Postfix) with ESMTP id B10CE5C4E24 for ; Mon, 29 Jan 2018 22:23:34 +0000 (GMT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=zapps768; d=zoho.com; h=date:from:to:message-id:in-reply-to:subject:mime-version:content-type:user-agent; b=HG7Ajua77sKROQZE09f06+WKIrRZg/49C1EwsAbMuixPX7/AY/wMoLafn9jMY9io+66Xo4SDuDve SGzUd3xYtBkqKeduq32mIbJ9SCZeXVCZvX2z8pYy7MnpFQAOFYyZ Received: from mail.zoho.com by mx.zohomail.com with SMTP id 1517264612599919.4309317819321; Mon, 29 Jan 2018 14:23:32 -0800 (PST) Received: from [89.76.36.159] by mail.zoho.com with HTTP;Mon, 29 Jan 2018 14:23:32 -0800 (PST) Date: Mon, 29 Jan 2018 23:23:32 +0100 From: =?UTF-8?Q?Przemys=C5=82aw_Pawe=C5=82czyk?= To: "alpine-devel" Message-ID: <1614404acdb.edd44d4135179.3833405768989653606@zoho.com> In-Reply-To: Subject: [alpine-devel] RFC: Fixing license field in APKBUILDs (or a bit more) X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Priority: Medium User-Agent: Zoho Mail X-Mailer: Zoho Mail Preface ------- It is kind of a follow up to the the previous thread started month ago: License naming in APKBUILD - SPDX License List Please check it if you haven't already. Intro ----- Conversion from simplistic and imprecise license naming that was used before in Alpine Linux (e.g. GPL, GPL2, BSD, etc.) to slightly more verbose but also more precise and standardized license naming will undoubtedly make quality of Alpine Linux packages higher. SPDX license identifiers are already getting adoption in many open-source circles. I believe that Alpine Linux did a good thing by deciding to use SPDX over half year ago. Unluckily, or maybe luckily, conversion didn't truly followed on back then. There were some changes here and there, but nothing of greater scale to really nail all existing packages. I wrote "luckily", because at the end of 2017 SPDX License List got new version 3, which has some changes compared to version 2.x. I believe, as I already wrote in previous thread, that we should stick to this new version, and most likely to its updates too, when they will be ready, as I doubt they will be disruptive. One unfortunate thing about sticking with version 3 of the list is that one of distros reusing Alpine Linux as its base, Adelie Linux, is apparently fixed on older version of SPDX License List, so already done and upcoming changes may be not truly welcomed by them to some extent, but I hope we'll be able to resolve all problems eventually and Alpine Linux and Adelie Linux relationship will remain good and healthy. It will be great achievement if we'll manage to correctly define all licenses of available packages before releasing Alpine Linux 3.8. There are roughly 3 months for that. It's not much for 4000+ packages! It's most likely even not enough, but we won't know without trying! Present ------- Some changes in license fields are already happening, but we need to pause for a moment and look how they're done right now, or at least how they were done so far. Roughly 2 kind of activities happened in aports repository since 2017-12-30 regarding license field in various APKBUILDs: - invidual changes, - massive changes. Massive change was only one, already mentioned in previous thread, and as Jakub stated in his commit 63f5e7d29565 himself, "no verification has been done if the specified license information is correct!" Therefore all packages being part of this massive change will need to be investigated anyway. Invidual changes were about dozens to this date. They're hopefully correct. They seem like casual changes "I read that mail, so I'll fix this APKBUILD", and they're appreciated, but they're not good enough in the big picture. I'll explain it soon. Problems -------- How these efforts could be improved and what needs to be changed to be able to do it properly, i.e. actually fix license fields and not only replace them from one group of letters to other groups of letters and pretend we're done? Let's mention the problems we're facing now. 0. Lack of organized work. 1. Lack of trackability. Sheer amount of packages in Alpine Linux make casual change approach impractical. Corrected license field in one APKBUILD is indistinguishable from another one that hasn't been scrutinized yet, which is unacceptable. 2. Lack of veritability. That may sound harsh, but I think that one pair of eye per package is not enough. Why? Because providing wrong license information is worse than not providing it at all, therefore such information must be verified by others. 3. Lack of subpackage licenses. Well, they're thoretically possible already in APKBUILDs. You have to redefine license variable in subpackage function. It is very rarely done, though, and it's kind of understandable why it is like that considering inconvenience of redefining variables. Let me give you an example. Let's look at LZ4 library. Its README.md file states "LZ4 library is provided as open-source software using BSD 2-Clause license." So BSD-2-Clause, easy, right? Checking README file is not enough. LICENSE file gives better image, because you can read there "all other files [not in the lib dir] use a GPLv2 license, unless explicitly stated otherwise". But if you'll look into source code of test and CLI tools, you'll find that it's not GPL-2.0-only, as one could presume, but actually GPL-2.0-or-later (and I think this is the reason why SPDX decided to abandon GPL-2.0 and GPL-2.0+ naming style, as the first one is too similar to casual GPLv2, which can mean both in practice). Test tools usually aren't shipped in packages, so that wouldn't be a problem, but CLI tools are shipped. So lz4 package should have GPL-2.0-or-later license only, while lz4-libs should have BSD-2-Clause license only. 4. Lack of non-space license separator. Space is not good enough, because complex licenses can contain space. Example: LGPL-2.1-only WITH Nokia-Qt-exception-1.1 SPDX power doesn't come only from its wide license list, but from the fact that people behind it actually thought about it and came with license expressions, so not only exceptions can be expressed, like in the given example, but also dual-licensing, etc. So you may ask, why there is a need for some separator if there are these expressions? I'm not an expert in this field, but I believe there is a difference between multiple-licensed source file (depending on conjunctive or disjunctive character of licensing, you'll use AND or OR operators, e.g. Apache-2.0 AND MIT, GPL-2.0-only OR MIT) and having different licenses for different source files that are all part of one final product. If half of program's source code is licensed under MIT, and other half is licensed under Apache-2.0, in my opinion you shouldn't describe it as MIT AND Apache-2.0 or MIT OR Apache-2.0, as both descriptions are misleading. The only way I see to describe it would be: MITApache-2.0. The separator definitely feels like "and", but it's different than AND and I think it's better to preserve such distinction. 5. Support for non-SPDX licenses. SPDX License List, including license exceptions, is quite broad, but there may be still some custom licenses, that aren't widely used and therefore weren't recognized by SPDX so far, but are used in some of packages available in Alpine Linux. Putting license="custom" is not a solution. Leaving license field empty and introducing !spdx option (*) is also bad, because project may use mix of SPDX and non-SPDX licenses. (*) I'm assuming that in future there will be support in abuild for checking license field whether licenses mentioned in it conform to SPDX names; Carlo together with Natanael already did some work toward that, which is appreciated, but with this message I hope it becomes clear that PoC presented so far is not good enough and ultimately some dedicated library/tool may be needed to properly deal with that, because parsing in shell script may not necessarily be an easy and sane way. 6. Lack of reusability. This part may interest Alpine Linux community the least, but if there are efforts related to documenting open-source world, it's better if they're done in a manner that is easy to be reused by others. APKBUILD format may look nifty, being in fact busybox's ash script, but it gives not only nice possibilities (that can be abused), but also many limitations, like poor data types, lack of nested structs, etc. Solutions? ---------- I was thinking for a considerable time about it and my ideas actually changed through this process and I would like to share them with you and hear your feedback. First I'll address mentioned problems. 1. APKBUILD with fixed licenses needs some kind of marking. In my last mail I suggested adding !license option to practically all APKBUILDs, so after fixing the license, option would be removed and that's how we could differentiate APKBUILDs that already passed license inspection. But I'm not fond of this idea anymore, as I'm no longer sure that options field is the right place for such stuff. (Also license inspection should not overlook new packages that were added this year and supposedly already with good license info, because license inspection should happen independently of standard reviews happenning for new aports that land in testing. My point is to always try to have correct license for new packages, but don't stress it too much before release of Alpine Linux 3.8, because it will be kind of transitory period and we can become much more strict later, and promotion from testing to community or main should be always preceded with thorough license inspection anyway.) 2. License verification needs to be recorded, so people won't be rechecking stuff that has already reached some threshold (I think that 3 people sounds good for starters) and whenever mistake is found, previous reviews must be invalidated. Git commit messages alone aren't good enough for that, because you won't be able to invalidate them. 3. APKBUILD format needs to be somehow changed, extended or replaced. I believe it's a topic worth discussing, but possibly in some separate RFC thread. I don't want to dwell on it too much here now, but I think that introducing another file, e.g. APKBUILD.meta, for structured data in human-readable format (like JSON, YAML, etc.) that would take all variables from APKBUILD and be able to put them in some hierarchy, would make package info more manageable and more maintainable. Shell scripts are quite unfortunate to work with as data storage containers. So APKBUILD after such extraction wouldn't have any variables, or at least no package-related variables, and would contain only functions necessary to describe building and packaging. There may be need for some kind of mechanism exposing information stored in APKBUILD.meta for APKBUILD, but in most cases it shouldn't be really needed and abuild would simply need to learn reading such additional file. Instead of creating separate file, it could be embedded into one big variable, but that could be more error prone, because of lacking proper syntax check, etc. Anyway, any smaller or bigger revolutions regarding APKBUILD (& co) won't happen soon (or sadly, may not happen at all, because I can foresee great opposition for such changes), but the bigger and more widely-used Alpine Linux becomes, the harder it is to improve some older decisions, so it's better to approach it earlier than never. 4. License expressions can be seperated with comma for instance. It seems like a natural choice, and for better appearance such commas could be followed by a space. 5. Non-SDPX licenses need some kind of unique naming. That will allow to spot if there is more than one usage of such license. Then we can try to request a license added to the SPDX License List. Anyway, we need to track all non-SPDX licenses seen in packages and introduce some temporary identifiers for them that must be clearly discernible from SPDX identifiers. I think that putting non-SPDX identifiers in angle brackets, e.g. , which are commonly used for placeholders, should do the job, yet still make it possible to easily parse them and discern even if they were part of multiple-license expression. 6. As I wrote earlier, shell scripts are poor solutions for data storage, therefore I think canonical information regarding licenses shouldn't be put in aports, but in a completely new repository with flat hierarchy of software projects. No, I'm not proposing removing license field from APKBUILDs, but to make these fields populated or fixed in aports with the help of some scripts (that aren't written yet, but should be easy to do for 99% of cases) using data from this new upcoming repository, on a regular basis - weekly or every two weeks sounds rational. Having dedicated repository (I'll call it spdxify for now) for gathering data about licenses used by various software projects seems like the best way to move forward. It will reduce noise in aports, allowing to import fixed licenses in batches and will avoid adding additional stuff to APKBUILD just to track the progress. aports is also a moving target, so working outside of it will get rid of many collisions that would be inevitable otherwise. I think that spdxify repository layout could look like: +- lz4 | +- 0NAME -- official name of the project | +- 0REPO -- official repository \ at least one of these | +- 0SRC -- official tarball location / should be present | +- licenses -- license expressions covering main | | software product (library in this case); | | one license expression per line | +- licenses-cli -- license expressions covering supplementary | | software products (CLI tools in this case) | | if they differ from main ones | | one license expression per line | +- licenses-doc -- license expressions covering documentation | | if they differ from main ones | . | . (perhaps more licenses* files) | . | | | +- reviewers -- ISO 8601 date and reviewer's full name | per line . Hierarchy should be flat, because there is no need for favoritism, what is in testing today in Alpine Linux, can be in community few weeks later, and I think that reflecting Alpine Linux hierarchy wouldn't be beneficial here, leading to noise like mentioned moves. 0NAME, 0REPO, 0SRC are files that will make information contained in the repository useful in a standalone manner, i.e. without access to aports. There can be same named project that will need having different directory names (obviously), so it's important to be able to tell what actual project is referred to in given directory. First come, first served should work fine, and new colliding project names would get a suffix _N, where N denotes N-th collision. There will be at least one licenses file for each project, and more if there are many products of its building/installing that may not necessarily be bundled together. Each licenses file should have one SPDX license expression per line, and first line should contain the most prominent one license expression if there are many in the project. Integral part of the whole idea is the concept of reviewers. Reviewer is the person who clones repository or downloads the most recent tarball of software project and inspects whether licenses found there match what licenses* file state and do the fixes if there are any mistakes. If there are mistakes, then old entries in reviewers file are removed before adding new one, but if there are no mistakes, then new reviewer is simply appended. Each reviewer's name should be preceded with the date (in ISO 8601) when review has been finished. Inside such repository there should be also .scripts folder with simple shell scripts to ease some tasks, like adding entry to reviewers files (based on user.name from git's config) followed by a git commit with automatic message, finding software with particular number of reviewers or not yet reviewed by you, etc. Outside of this repository we will need mentioned earlier Alpine Linux-specific scripts that will aid converting what's in licenses files into license field of APKBUILD files, and some mapping file for non-obvious cases (obvious cases are when package is named exactly the same in aports as in spdxify and there is only one licenses file), e.g.: lz4 lz4:cli lz4:libs lz4 In such mapping combining more than one licenses* file into one license field will be also possible. That's roughly how I see it. I'm sure I didn't cover all the corners, but you should get some picture after reading this wall of text. I don't have all these scripts written yet and spdxify repository has not been created yet either. I plan to "snapshot" aports state very soon (hopefully on 2018-01-30 or 2018-01-31) and use packages in main/ as the base to create first set of software projects that will need to be inspected. There are over 2000 packages in main, so I plan to split it into batches of ~500, which means [a-g]* packages from main will be the first ones. I don't even plan on importing existing license fields from APKBUILDs, because I think it may be harmful and more error-prone. It's better to start from scratch our license journey and not be biased by what was already put in some APKBUILDs (I've seen some mistakes in the past and I'm afraid there may be still many more of them). I haven't started working on all that yet, because I wanted to get some feedback whether people see value in such organized approach toward fixing license matters in Alpine Linux (that may actually also benefit other distributions in future) or not. Final notes ----------- It may look like I take licensing very seriously. Some may argue that maybe even too seriously. Common view is that only functionality is important and as long as the job can be done it doesn't matter what is the license behind the tool used for it. It may be true for most users, but others interested in utilizing Alpine Linux for their products, services and/or solutions, may not always have this nice freedom of choice. Fixing current license mess will show that Alpine Linux cares about quality in yet another department, and I believe it can be beneficial to its overall image, but also to all users and developers being part of this great community, by rising awareness that licenses do matter. Regards, Przemek --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org ---