X-Original-To: alpine-user@lists.alpinelinux.org Received: from mail51c50.megamailservers.eu (mail156c50.megamailservers.eu [91.136.10.166]) by lists.alpinelinux.org (Postfix) with ESMTP id 1FEB85C0D9A for ; Thu, 16 Aug 2018 15:00:07 +0000 (GMT) X-Authenticated-User: alex@ewinkle.com Received: from Diaspar2 (host86-145-179-148.range86-145.btcentralplus.com [86.145.179.148]) (authenticated bits=0) by mail51c50.megamailservers.eu (8.14.9/8.13.1) with ESMTP id w7GF04HU020375 for ; Thu, 16 Aug 2018 15:00:06 +0000 From: "Alex Butler" To: References: <055e01d433f3$76e95660$64bc0320$@ewinkle.com> In-Reply-To: <055e01d433f3$76e95660$64bc0320$@ewinkle.com> Subject: RE: [alpine-user] Alpine limit on file descriptors? Date: Thu, 16 Aug 2018 16:00:04 +0100 Message-ID: <00d701d43571$d18e5690$74ab03b0$@ewinkle.com> X-Mailinglist: alpine-user Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00D8_01D4357A.3353F710" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQIl7wFu8D5A7su+VTls0EO5ViQzJwHFQEwn Content-Language: en-gb X-CTCH-RefID: str=0001.0A0B0204.5B759176.0058,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0 X-CTCH-VOD: Unknown X-CTCH-Spam: Unknown X-CTCH-Score: 0.000 X-CTCH-Rules: X-CTCH-Flags: 0 X-CTCH-ScoreCust: 0.000 X-CSC: 0 X-CHA: v=2.3 cv=G68y7es5 c=1 sm=1 tr=0 a=TSybf1B3+OEldKx8Dk7+AQ==:117 a=TSybf1B3+OEldKx8Dk7+AQ==:17 a=DAwyPP_o2Byb1YXLmDAA:9 a=5KEJ3k9QAAAA:8 a=_nfxolqlAAAA:8 a=6OveCeDcAAAA:8 a=kmYkUENqAAAA:8 a=e8nXkGCnCqE-RacpmfkA:9 a=ptcffY_V-Q8jCJsX:21 a=WKxf9D1G0DJoOZOt:21 a=CjuIK1q_8ugA:10 a=yMhMjlubAAAA:8 a=SSmOFEACAAAA:8 a=jhmbpJALAnSPNYYMNNMA:9 a=ooDJV-hBJQeq1WxJ:21 a=gKO2Hq4RSVkA:10 a=UiCQ7L4-1S4A:10 a=hTZeC7Yk6K0A:10 a=frz4AuCg-hUA:10 a=olg2BfGzmf2haRflzj8J:22 a=oJTC8uDdGDJpnXvZJx4k:22 a=nhU18LJPuMb9oU8zVQyO:22 a=p9kz6gysRiz6SGr9lVkz:22 This is a multipart message in MIME format. ------=_NextPart_000_00D8_01D4357A.3353F710 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Just to update. We've managed to find the issue which was that the musl-libc semaphore library defaults to 256 per process which was insufficient when we're spawning quite a few sub-processes and queues from one application. In particular the sem_open() function uses #define _POSIX_SEM_NSEMS_MAX 256 . as the limit (see git.musl-libc.org/cgit/musl/tree/include/limits.h#n63). My colleague Toby found and patched the musl-libc to increase the limit to 4096 in our Alpine build - which now runs the required number of processes and queue using the Python multiprocessing library. It now runs like a dream. Full details here: https://forums.resin.io/t/alpine-image-with-custom-musl-libc-settings/3746 Cheers, * Alex From: Alex Butler Sent: 14 August 2018 18:23 To: alpine-user@lists.alpinelinux.org Subject: [alpine-user] Alpine limit on file descriptors? We've been having some issues with what looks like some kind of limitation on the maximum number of file descriptors (or a /dev/shm semaphore limitation). Our application is in Python and uses the standard Python multiprocessing library to create processes and associated queues for communication (typically creating between 20 and 200 processes at start-up depending on hardware configuration). It runs fine on Raspbian/Debian with any number of processes we choose (within reason!) and runs fine under Alpine when we run with low numbers of processes. It however always barfs for larger numbers of processes under Alpine - suggesting (from the reported OSError) that it is running out of file descriptors. [Might be a red herring but might be related to the use of OS semaphore management in /dev/shm. Just not sure!] Anyway, after trying quite a few things we've narrowed it down to failing in every stock flavour of Alpine we've tried (x64, Raspberry Pi etc) but which just doesn't happen at all in the different flavours of Raspbian/Debian/Ubuntu etc. Is there some Alpine setting/limit which we haven't yet found which sets the maximum number of file descriptors (or some other subtle Alpine difference). We've tried all the "obvious" Linux file descriptor changes like ulimit, sysctl type changes etc. To help recreate this we've created a simple Python script (attached). Under Alpine (Raspberry Pi) it fails after the 85th process pair. If MAX_PAIRS is set to 85 it works fine. i.e. no exceptions. Put in anything bigger for MAX_PAIRS and we always get the following error message at the 86th: --- data for 83 was [83001, 83002, 83003, 83004, 83005, 83006, 83007, 83008, 83009] data for 84 was [84001, 84002, 84003, 84004, 84005, 84006, 84007, 84008, 84009] data for 85 was [85001, 85002, 85003, 85004, 85005, 85006, 85007, 85008, 85009] Traceback (most recent call last): File "queue_test.py", line 41, in q = Queue() File "/usr/lib/python2.7/multiprocessing/__init__.py", line 218, in Queue return Queue(maxsize) File "/usr/lib/python2.7/multiprocessing/queues.py", line 68, in __init__ self._wlock = Lock() File "/usr/lib/python2.7/multiprocessing/synchronize.py", line 147, in __init__ SemLock.__init__(self, SEMAPHORE, 1, 1) File "/usr/lib/python2.7/multiprocessing/synchronize.py", line 75, in __init__ sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue) OSError: [Errno 24] No file descriptors available --- As I said - on other Linux distro's this code runs fine. We'd _really_ like to use Alpine for a variety of obvious reasons. It's not obvious what is going on and not being able to run multiprocessing to the level of parallelism we need might be a deal-breaker. Incidentally, at MAX_PAIRS = 85 (when the test code runs fine), doing a "lsof | wc -l" reveals about 29991 file descriptors (~29k). I've attached a copy of the test python code for ease of replication. We just run it as root using "/usr/bin/python queue_test.py" Any help or suggestions as to what might be going on gratefully received! Cheers, Alex Butler UK ------=_NextPart_000_00D8_01D4357A.3353F710 Content-Type: text/html; boundary="----=_NextPart_000_00C5_01D43578.A5A33030"; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Just to update.  We’ve managed to find the = issue which was that the musl-libc semaphore library defaults to 256 per = process which was insufficient when we’re spawning quite a few = sub-processes and queues from one application.   In particular = the sem_open() function uses

 

#define _POSIX_SEM_NSEMS_MAX    = 256

 

… as the limit (see = git.musl-libc.org/cgit/musl/tree/include/limits.h#n63).

 

My colleague = Toby found and patched the musl-libc to increase the limit to 4096 in = our Alpine build – which now runs the required number of processes = and queue using the Python multiprocessing library.  It now runs = like a dream.

 

Full details here:

https://forums.resin.io/t/alpine-image-with-custom-musl= -libc-settings/3746

 

Cheers,

 

  • Alex

 

From: Alex Butler = <alex@ewinkle.com>
Sent: 14 August 2018 = 18:23
To: alpine-user@lists.alpinelinux.org
Subject: = [alpine-user] Alpine limit on file = descriptors?

 

We’ve = been having some issues with what looks like some kind of limitation on = the maximum number of file descriptors (or a /dev/shm semaphore = limitation).  Our application is in Python and uses the standard = Python multiprocessing library to create processes and associated queues = for communication (typically creating between 20 and 200 processes at = start-up depending on hardware configuration).   It runs fine = on Raspbian/Debian with any number of processes we choose (within = reason!) and runs fine under Alpine when we run with low numbers of = processes. 

 

It however = always barfs for larger numbers of processes under Alpine – = suggesting (from the reported OSError) that it is running out of file = descriptors.  [Might be a red herring but might be related to the = use of OS semaphore management in /dev/shm.  Just not = sure!]

 

Anyway, after trying quite a few things we’ve = narrowed it down to failing in every stock flavour of Alpine we’ve = tried (x64, Raspberry Pi etc) but which just doesn’t happen at all = in the different flavours of Raspbian/Debian/Ubuntu = etc.

 

Is there some Alpine setting/limit which we = haven’t yet found which sets the maximum number of file = descriptors (or some other subtle Alpine difference).  We’ve = tried all the “obvious” Linux file descriptor changes like = ulimit, sysctl type changes etc.

 

To help = recreate this we’ve created a simple Python script = (attached).

 

Under Alpine (Raspberry Pi) it fails after the = 85th process pair.  If MAX_PAIRS is set to 85 it works = fine.  i.e. no exceptions.  Put in anything bigger for = MAX_PAIRS and we always get the following error message at the = 86th:

 

---

data for 83 was = [83001, 83002, 83003, 83004, 83005, 83006, 83007, 83008, = 83009]

data for 84 was [84001, 84002, = 84003, 84004, 84005, 84006, 84007, 84008, 84009]

data for 85 was [85001, 85002, 85003, 85004, 85005, = 85006, 85007, 85008, 85009]

Traceback = (most recent call last):

  File = "queue_test.py", line 41, in <module>

    q =3D Queue()

  File = "/usr/lib/python2.7/multiprocessing/__init__.py", line 218, in = Queue

    return = Queue(maxsize)

  File = "/usr/lib/python2.7/multiprocessing/queues.py", line 68, in = __init__

    = self._wlock =3D Lock()

  File = "/usr/lib/python2.7/multiprocessing/synchronize.py", line 147, = in __init__

    = SemLock.__init__(self, SEMAPHORE, 1, 1)

  File = "/usr/lib/python2.7/multiprocessing/synchronize.py", line 75, = in __init__

    sl =3D = self._semlock =3D _multiprocessing.SemLock(kind, value, = maxvalue)

OSError: [Errno 24] No file = descriptors available

---

 

As I said = – on other Linux distro’s this code runs fine.  = We’d _really_ like to use Alpine for a variety of obvious = reasons.  It’s not obvious what is going on and not being = able to run multiprocessing to the level of parallelism we need might be = a deal-breaker.

 

Incidentally, at MAX_PAIRS =3D 85 (when the test code = runs fine), doing a “lsof | wc -l” reveals about 29991 file = descriptors (~29k).

 

I’ve = attached a copy of the test python code for ease of replication.  = We just run it as root using “/usr/bin/python = queue_test.py”

 

Any help or = suggestions as to what might be going on gratefully = received!

 

Cheers,

 

Alex = Butler

UK

------=_NextPart_000_00D8_01D4357A.3353F710-- --- Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org Help: alpine-user+help@lists.alpinelinux.org ---