2 2

[alpine-devel] [RFC] Language packs

Natanael Copa
Details
Message ID
<20110902111413.2398fdaf@ncopa-desktop.nor.wtbts.net>
Sender timestamp
1314954853
DKIM signature
missing
Download raw message
Hi,

I would like to split out language support packages.

Mainly because in many situations you you don't want language support
at all (routers for example).

In other cases you might want support for one or two languages but not
all (error messages for squid? desktops?)

The very nice apk-tools feature "install_if" will allow us to do things
like: 'apk add lang-no' to install Norwegian language(s) and locales
for all applications providing such support. That is actually already
working in in current abuild, even if no apkbuild uses it. You can just
set 'linguas="no es" and it will automatically create
$pkgname-lang-no $pkgname-lang-es with proper install_if.

However, I don't think we want subpackages for every single language
since there are potensially thousands of languages, so we want group
them in some way. And we want be able to group them automatically with a
script, so each maintainer don't need to deal with 30+ different
languages for each package.

The big question is how do we group them?

I see the following options:

1. we group by ISO 639-1 (2 char codes). 
http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
This means we get packages like: squid-lang-no (which might hold no.nb
and no.nn for the 2 different norwegian languages). $pkgname-lang-en
would hold both en.GB and en.US. *-lang-es would hold spanish for both
spain and southamerica. I think this is the easiest way to do it, but
this might cause number of packages double in our repos. (or triple? i
don't know).

2. we group by first letter. For example $pkgname-lang-e holds english
(en) and spanish (es) and all other languages starting with 'e'. This
will give us fewer groups and is easy to implement from scripts. If you
want finish packages, then you apk add lang-f and you will get french
too even if you dont need it.

3. we group by language family.
http://en.wikipedia.org/wiki/List_of_language_families
This means we need some kind of database that can put the language dirs
in correct group. To install your language you will need to know what
family it is in and install it. For finish and swedish it would be
something like: apk add lang-uralic lang-indo-european

4. we group by continent.
With this its easier for enduser to pick the language. For norwegian
and finish it would be: apk add lang-europe. You would also get
hungarian and german. This means logs of work figuring out what
continent we put each language and create a database. Where do we put
spanish? lang-southamerica or lang-europe? French? lang-europe or
lang-africa?

5. you get all or nothing.
With this we have a single subpackage for each package with all
available languages. It means basically that you can enable all
languages or none with something like: 'apk add language-support'. You
could also add all languages for specific package: 'apk add
$pkgname-lang'

6. we don't bother split out language packs at all. Just ship them with
the main package and we don't bother the wasted space for unused
languages. This basically means: Do nothing. Let it be like it is now.

I think that the only practical solution would be either option #1 or
option #5.

What do you think?

Do we have other alternatives?

Thanks!

-nc


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Cameron Banta
Details
Message ID
<CABVwkxWz8L1xrFfPe0iDv3ptJQDQJQFDPWKhQRJ7C16ucbuV9w@mail.gmail.com>
In-Reply-To
<20110902113245.700ba8e8@ncopa-desktop.nor.wtbts.net> (view parent)
Sender timestamp
1314976811
DKIM signature
missing
Download raw message
On Fri, Sep 2, 2011 at 04:32, Natanael Copa <ncopa@alpinelinux.org> wrote:

> On Fri, 2 Sep 2011 11:14:13 +0200
> Natanael Copa <ncopa@alpinelinux.org> wrote:
>
> > 1. we group by ISO 639-1 (2 char codes).
> > http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
> > This means we get packages like: squid-lang-no (which might hold no.nb
> > and no.nn for the 2 different norwegian languages). $pkgname-lang-en
> > would hold both en.GB and en.US. *-lang-es would hold spanish for both
> > spain and southamerica. I think this is the easiest way to do it, but
> > this might cause number of packages double in our repos. (or triple? i
> > don't know).
>
> I just ran a test for the packages that are built in my current dev
> vserver:
>
> ~/aports/main $ ls -d */pkg/*/usr/share/locale/[a-z]* | sed
> 's:\(/[a-z][a-z]\)[_ @^/].*:\1:' | sort | uniq | wc -l
> 5943
>
> Option #1 means that we will get atleast more than 5943 additional
> packages in the repository if we split all that potensially could be
> split. (125 different packages in the test above)
>
>
What about creating language repos? So there would be main, testing,
lang/en, lang/no, etc. Then if you want a language, you just add it to your
/etc/apk/repositories.

This would keep the extra packages from cluttering - at the cost of more
complicated repos.

-Cameron
Natanael Copa
Details
Message ID
<20110902113245.700ba8e8@ncopa-desktop.nor.wtbts.net>
In-Reply-To
<20110902111413.2398fdaf@ncopa-desktop.nor.wtbts.net> (view parent)
Sender timestamp
1314955965
DKIM signature
missing
Download raw message
On Fri, 2 Sep 2011 11:14:13 +0200
Natanael Copa <ncopa@alpinelinux.org> wrote:
 
> 1. we group by ISO 639-1 (2 char codes). 
> http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
> This means we get packages like: squid-lang-no (which might hold no.nb
> and no.nn for the 2 different norwegian languages). $pkgname-lang-en
> would hold both en.GB and en.US. *-lang-es would hold spanish for both
> spain and southamerica. I think this is the easiest way to do it, but
> this might cause number of packages double in our repos. (or triple? i
> don't know).

I just ran a test for the packages that are built in my current dev vserver:

~/aports/main $ ls -d */pkg/*/usr/share/locale/[a-z]* | sed 's:\(/[a-z][a-z]\)[_ @^/].*:\1:' | sort | uniq | wc -l
5943

Option #1 means that we will get atleast more than 5943 additional
packages in the repository if we split all that potensially could be
split. (125 different packages in the test above)

We could also say something like, don't bother split if the locales are less than 10MB or something.

That would only give use those instead of the 125 packages above:

11596   gnumeric/pkg/gnumeric/usr/share/locale
12688   iso-codes/pkg/iso-codes/usr/share/locale (might be we probably don't want split this)
15160   nautilus/pkg/nautilus/usr/share/locale
19068   inkscape/pkg/inkscape/usr/share/locale
19160   gtk+3.0/pkg/gtk+3.0/usr/share/locale
20376   gtk+2.0/pkg/gtk+2.0/usr/share/locale
22736   pidgin/pkg/pidgin/usr/share/locale
30484   gimp/pkg/gimp/usr/share/locale

An interesting thing is that gtk+2.0 is totally:
 26072 gtk+2.0/pkg/gtk+2.0

It means that ~6MB is code and ~20MB is localization. I see a
possibility to save some serious space here...

-nc


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---