X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 11477DC090A for ; Sat, 16 Jan 2016 17:48:13 +0000 (UTC) Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com [209.85.218.52]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id BEE72DC01AA for ; Sat, 16 Jan 2016 17:48:12 +0000 (UTC) Received: by mail-oi0-f52.google.com with SMTP id o124so144520681oia.3 for ; Sat, 16 Jan 2016 09:48:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=v9jVWqPrUFi2NGgfye+xoSK1hT4yEyDrUShQ3xqIo6A=; b=tEp1dKWOezJWmr6eowhBvASzLBL9t90DEKC+gWtRtnS1tOhlkkN2F6iNSriTwdN2GT HW0tnEzMdCCKgmuM2z1jK9jMb5hYIuM0F5FO68OOqXvxmVfxPJYf8OIAy++Xn1CytVBQ ka/+fMrKuvz5ZvWp279AgVdxWtuD/OTKlmeSVd2vhnLJe73N81LqLCOPBgo5In0/HTNM 34+yl/h2mboSQF5Cvwb2J5JVGek2xZUwoKkciwyLrBHhNel3ECeavD4q8/Nkp0TP4U3u 2MpUUtFfRCRCYzZSDhvoFkQq6yTQ4RfjLIDlFYPMOnua2wpLa6dS79RSYKjpQgYM4X5l 05Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=v9jVWqPrUFi2NGgfye+xoSK1hT4yEyDrUShQ3xqIo6A=; b=FsIUKSgrkjePWp9Fy+ibjKPovgi1XTjHa/mKimsxt9QgHQYRav8p8FzbQ8QUesIcCx psKPqwjax90b/SPx0pIB1t1aneEEseelgPkvnG0HOIzg0KvGVvoB6nQJjZZDNIpNfdaF Oaj8BI/egPWtbFWUCgf7PHAWEWTfCKcjsAH2tu864SRCiyLxpEMTPpppBkj+v19xDHZE hcrEwhqXEta/IlskWzRWxa3jIIUjuEPBto8r0RqDz7UsB6X9V3uMjzNoPm3MvDcw6mIx GnqKVE4hBwYJQmg6lw/hXPf2LAHzgJ14V9UnBNcugyLpjTX6ICWaLhNEEPfOwAnK+0EA TT2Q== X-Gm-Message-State: ALoCoQmeOpqhoSU5/XevgsOgdjlMbATTrtUQ+SiViTSmqAuk9AD9dg0FEp27/eBB2xdyuSYhDGRFuYOm9Of4wy5TvzqXfTGNkA== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.202.215.195 with SMTP id o186mr13231276oig.87.1452966490956; Sat, 16 Jan 2016 09:48:10 -0800 (PST) Received: by 10.202.81.6 with HTTP; Sat, 16 Jan 2016 09:48:10 -0800 (PST) In-Reply-To: <56978822.8020205@skarnet.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> <56978822.8020205@skarnet.org> Date: Sat, 16 Jan 2016 12:48:10 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=001a113d53109068980529771f3a X-Virus-Scanned: ClamAV using ClamSMTP --001a113d53109068980529771f3a Content-Type: text/plain; charset=UTF-8 Hi Laurent, apologies for the delay, On Thu, Jan 14, 2016 at 6:36 AM, Laurent Bercot wrote: > On 14/01/2016 06:55, Jude Nelson wrote: > >> I think you're close. The jist of it is that vdev needs to supply a >> lot more information than the kernel gives it. In particular, its >> helper programs go on to query the properties and status of each >> device (this often requires root privileges, i.e. via privileged >> ioctl()s), and vdev gathers the information into a (much larger) >> event packet and stores it in a directory tree under /dev for >> subsequent query by less-privileged programs. >> > > I see. > I think this is exactly what could be made modular. I've heard > people say they were reluctant to using vdev because it's not KISS, and > I suspect the ioctl machinery and data gathering is a large part of > the complexity. If that part could be pluggable, i.e. if admins could > choose a "data gatherer" just complex enough for their needs, I believe > it could encourage adoption. In other words, I'm looking at a 3-part > program: > - the netlink listener > - the data gatherer > - the event publisher > Of course, for libudev to work, you would need the full data gatherer; > but if people aren't using libudev programs, they can use a simpler one, > closer to what mdev is doing. It's all from a very high point-of-view, and I don't know the details of > the code so I have no idea whether it's envisionable for vdev, but that's > what I'm thinking off the top of my head. This sounds reasonable. In fact, within vdevd there are already distinct netlink listener and data gatherer threads that communicate over a producer/consumer queue. Splitting them into separate processes connected by a pipe is consistent with the current design, and would also help with portability. > > > > Funny you mention this--I also created runfs >> (https://github.com/jcnelson/runfs) to do exactly this. In >> particular, I use it for PID files. >> > > I have no love for mechanisms that help people keep using PID files, > which are an ugly relic that can't end up in the museum of mediaeval > programming soon enough. :P > Haha, true. I have other purposes for it though. That said, runfs is interesting, and I would love it if Unix provided > such a mechanism. Unfortunately, for now it has to rely on FUSE, which > is one of the most clunky mutant features of Linux, and an extra layer > of complexity; so I find it cleaner if a program can achieve its > functionality without depending on such a filesystem. > > I think this is one of the things Plan 9 got right--letting a process expose whatever fate-sharing state it wanted through the VFS. I agree that using FUSE to do this is a lot clunkier, but I don't think that's FUSE's fault. As far as I know, Linux doesn't allow a process to expose custom state through /proc. > > I agree that netlink is lighter, but I avoided it for two reasons: >> * Sometime down the road, I'd like to port vdev to OpenBSD. >> > > That's a good reason, and an additional reason to separate the > netlink listener from the event publisher (and the data gatherer). > The event publisher and client library can be made 100% portable, > whereas the netlink listener and data gatherer obviously cannot. > > > * There is no way to namespace netlink messages that I'm aware of. >> > > I didn't know that - I'm no netlink expert. But that's also a good > reason. AFAICT, there are 32 netlink multicast groups, and they use > hardcoded numbers - this is ugly, or at least requires a global > registry of what group is used for. If you can't namespace them, it > becomes even more of a scarce resource; although it's legitimate to > use one for uevent publishing, I'm pretty sure people will find a way > to clog them with random crap very soon - better stay away from > resources you can't reliably lock. And from what you're saying, even > systemd people have realized that. :) > > I'm not advocating netlink use for anything else than reading kernel > events. It's just that true multicast will be more efficient than manual > broadcast, there's no way around it. > > > By using a synthetic filesystem for >> message transport, I can use bind-mounts to control which device >> events get routed to which containers >> > > I'm torn between "oooh, clever" and "omg this hack is atrocious". :) > > Haha, thanks :) > > Yes, modulo some other mechanisms to ensure that the libudev-compat >> process doesn't get back-logged and lose messages. >> > > What do you mean by that? > If libudev-compat is, like libudev, linked into the application, then > you have no control over client behaviour; if a client doesn't properly > act on a notification, then there's nothing you can do about it and > it's not your responsibility. Can you give a few details about what > you're doing client-side? > > A bit of background: * Unlike netlink sockets, a program cannot control the size of an inotify descriptor's "receive" buffer. This is a system-wide constant, defined in /proc/sys/fs/inotify/max_queued_events. However, libudev offers clients the ability to do just this (via udev_monitor_set_receive_buffer_size). This is what I originally meant--libudev-compat needs to ensure that the desired receive buffer size is honored. * libudev's API exposes the udev_monitor's netlink socket descriptor directly to the client, so it can poll on it (via udev_monitor_get_fd). * libudev allows clients to define event filters, so they receive only the events that they want to receive (via udev_monitor_filter_*). The implementation achieves this by translating filters into BPF programs, and attaching them to the client's netlink socket. It is also somewhat complex, and I didn't want to have to re-write it each time I sync'ed the code with upstream. To work around these constraints, libudev-compat routes a udev_monitor's events through an internal socket pair. It uses inotify as an edge-trigger instead of a level-trigger: when there is at least one file to consume from the event directory, it will read as many files as it can and try to saturate the struct udev_monitor's socket pair (the number of bytes the socketpair can hold now gets controlled by udev_monitor_set_receive_buffer_size). The receive end of the socket pair and the inotify descriptor are unified into a single pollable epoll descriptor, which gets returned via libudev-compat's udev_monitor_get_fd (it will poll as ready if either there are unconsumed events in the socket pair, or a new file has arrived in the directory). The filtering implementation works almost unmodified, except that it attaches BPF programs to the udev_monitor's socket pair's receiving end instead of a netlink socket. In summary, the system doesn't try to outright prevent event loss for clients; it tries to ensure the clients can control their receive-buffer size, with expected results. One of the more subtle reasons for using eventfs is that it makes it possible to control the maximum number of bytes an event directory can hold. By making this work on a per-directory basis, the system retains the ability to control on a per-monitor basis the maximum number of events it will hold before NACKing the event-pusher. The udev_monitor_set_receive_buffer_size would also set the upper byte-limit value for its udev_monitor's event directory, thereby retaining the original API contract. > > I think both approaches are good ideas and would work just as well. >> I really like skabus's approach--I'll take a look at using it for >> message delivery as an additional (preferred?) vdev-to-libudev-compat >> message delivery mechanism :) It looks like it offers all the >> aforementioned benefits over netlink that I'm looking for. >> > > Unfortunately, it's not published yet, because there's still a lot > of work to be done on clients. And now I'm wondering whether it would > be more efficient to store messages in anonymous files and transmit > fds, instead of transmitting copies of messages. I may have to rewrite > stuff. :) > I think I'll be able to get back to work on skabus by the end of this > year - but no promises, since I'll be working on the Alpine init system > as soon as I'm done with my current contract. But I can leak a few > pieces of source code if you're interested. > > I'd be willing to take a crack at it, if I have time between now and the end of the year. I'm trying to finish my PhD this year, which is why vdev development has been slow-going for the past several months. Will keep you posted :) > > A question on the implementation--what do you think of having each >> subscriber create its own Unix domain socket in a canonical >> directory, and having the sender connect as a client to each >> subscriber? >> > > That's exactly how fifodirs work, with pipes instead of sockets. > But I don't think that's a good fit here. > > A point of fifodirs is to have many-to-many communication: there > are several subscribers, but there can also be several publishers > (even if in practice there's often only one publisher). Publishers and > subscribers are completely independent. > Here, you only ever have one publisher: the event dispatcher. You > only ever need one-to-many communication. > > Another point of fifodirs is to avoid the need for a daemon to act > as a bus. It's notification that happens between unrelated processes > without requiring a central server to ensure the communication. > It's important because I didn't want my supervision system (which is > supposed to manage daemons) to itself rely on a daemon (which would > then have to be unsupervised). > Here, you don't have that requirement, and you already have a daemon: > the event dispatcher is long-lived. I think a "socketdir" mechanism is just too heavy: > - for every event, you perform opendir(), readdir() and closedir() - for every event * subscriber, you perform at least socket(), connect(), > sendmsg() and close() - the client library needs to listen() and accept(), which means it > needs its own thread (and I hate, hate, hate, libraries that pull in > thread support in my otherwise single-threaded programs) > - the client library needs to perform access control on the socket, > to avoid connects from unrelated processes, and even then you can't > be certain it's the event publisher and not a random root process > > You definitely don't want a client library to be listen()ing. > listen() is server stuff - mixing client and server stuff is complex. > Too much so for what you need here. > Since each subscriber needs its own fd to read and >> close, the directory of subscriber sockets automatically gives the >> sender a list of who to communicate with and a count of how many fds >> to create. It also makes it easy to detect and clean up a dead >> subscriber's socket: the sender can request a struct ucred from a >> subscriber to get its PID (and then other details from /proc), and if >> the process ever exits (which the sender can detect on Linux using a >> netlink process monitor, like [1]), the process that created the >> socket can be assumed to be dead and the sender can unlink it. The >> sender would rely on additional process instance-identifying >> information from /proc (like its start-time) to avoid PID-reuse >> races. >> > > Bleh. Of course it can be made to work, but you really don't need all > that complexity. You have a daemon that wants to publish data, and > several clients that want to receive data from that daemon: it's > one (long-lived) to many (short-lived) communication, and there's a > perfectly appropriate, simple and portable IPC for that: a single Unix > domain socket that your daemon listens on and your clients connect to. > If you want to be perfectly reliable, you can implement some kind of > autoreconnect in the client library - in case you want to restart the > event publisher without killing X, for instance. But that's still a > lot simpler than playing with multiple sockets and mixing clients and > serverswhen you don't need to. Agreed--if the event dispatcher is going to be a message bus, then a lot of the aforementioned difficulties can be eliminated by design. But I'm uncomfortable with the technical debt it can introduce to the ecosystem--for example, a message bus has its own semantics that effectively require a bus-specific library, clients' design choices can require a message bus daemon to be running at all times, pervasive use of the message bus by system-level software can make the implementation a hard requirement for having a usable system, etc. (in short, we get dbus again). By going with filesystem-oriented approach, this risk is averted, since the filesystem interface is well-understood, universally supported, and somewhat future-proof. Most programs can use it without being aware of the fact. > > > > Thanks again for all your input! >> > > No problem. I love design discussions, I can't get enough of them. > (The reason why I left the Devuan mailing-list is that there was too > much ideological mumbo-jumbo, and not enough technical/design stuff. > Speaking of which, my apologies to Alpine devs for hijacking their ML; > if it's too OT/uninteresting, we'll take the discussion elsewhere.) Happy to move offline, unless the Alpine devs still want to be CC'ed :) -Jude > --- > Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org > Help: alpine-devel+help@lists.alpinelinux.org > --- > > --001a113d53109068980529771f3a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Laurent, apologies for the delay,

On Thu, Jan 14, 2016 at 6:36 AM, Lauren= t Bercot <ska-devel@skarnet.org> wrote:
On 14/01/2016 06:55, Jude Nelson wrote:
I think you're close.=C2=A0 The jist of it is that vdev needs to supply= a
lot more information than the kernel gives it.=C2=A0 In particular, its
helper programs go on to query the properties and status of each
device (this often requires root privileges, i.e. via privileged
ioctl()s), and vdev gathers the information into a (much larger)
event packet and stores it in a directory tree under /dev for
subsequent query by less-privileged programs.

=C2=A0I see.
=C2=A0I think this is exactly what could be made modular. I've heard people say they were reluctant to using vdev because it's not KISS, and=
I suspect the ioctl machinery and data gathering is a large part of
the complexity. If that part could be pluggable, i.e. if admins could
choose a "data gatherer" just complex enough for their needs, I b= elieve
it could encourage adoption. In other words, I'm looking at a 3-part program:
=C2=A0- the netlink listener
=C2=A0- the data gatherer
=C2=A0- the event publisher

=C2=A0Of course, for libudev to work, you would need the full data gatherer= ;
but if people aren't using libudev programs, they can use a simpler one= ,
closer to what mdev is doing.=C2=A0
=C2=A0It's all from a very high point-of-view, and I don't know the= details of
the code so I have no idea whether it's envisionable for vdev, but that= 's
what I'm thinking off the top of my head.

This sounds reasonable.=C2=A0 In fact, within vdevd there are already di= stinct netlink listener and data gatherer threads that communicate over a p= roducer/consumer queue.=C2=A0 Splitting them into separate processes connec= ted by a pipe is consistent with the current design, and would also help wi= th portability.
=C2=A0


Funny you mention this--I also created runfs
(https://github.com/jcnelson/runfs) to do exactly this.=C2=A0 I= n
particular, I use it for PID files.

=C2=A0I have no love for mechanisms that help people keep using PID files,<= br> which are an ugly relic that can't end up in the museum of mediaeval programming soon enough. :P

Haha, true.= =C2=A0 I have other purposes for it though.

=C2=A0That said, runfs is interesting, and I would love it if Unix provided=
such a mechanism. Unfortunately, for now it has to rely on FUSE, which
is one of the most clunky mutant features of Linux, and an extra layer
of complexity; so I find it cleaner if a program can achieve its
functionality without depending on such a filesystem.


I think this is one of the things Plan 9 g= ot right--letting a process expose whatever fate-sharing state it wanted th= rough the VFS.=C2=A0 I agree that using FUSE to do this is a lot clunkier, = but I don't think that's FUSE's fault.=C2=A0 As far as I know, = Linux doesn't allow a process to expose custom state through /proc.
=C2=A0

I agree that netlink is lighter, but I avoided it for two reasons:
* Sometime down the road, I'd like to port vdev to OpenBSD.

=C2=A0That's a good reason, and an additional reason to separate the netlink listener from the event publisher (and the data gatherer).
The event publisher and client library can be made 100% portable,
whereas the netlink listener and data gatherer obviously cannot.


* There is no way to namespace netlink messages that I'm aware of.

=C2=A0I didn't know that - I'm no netlink expert. But that's al= so a good
reason. AFAICT, there are 32 netlink multicast groups, and they use
hardcoded numbers - this is ugly, or at least requires a global
registry of what group is used for. If you can't namespace them, it
becomes even more of a scarce resource; although it's legitimate to
use one for uevent publishing, I'm pretty sure people will find a way to clog them with random crap very soon - better stay away from
resources you can't reliably lock. And from what you're saying, eve= n
systemd people have realized that. :)

=C2=A0I'm not advocating netlink use for anything else than reading ker= nel
events. It's just that true multicast will be more efficient than manua= l
broadcast, there's no way around it.


By using a synthetic filesystem for
message transport, I can use bind-mounts to control which device
events get routed to which containers

=C2=A0I'm torn between "oooh, clever" and "omg this hack= is atrocious". :)


Haha, thanks :)
=C2= =A0

Yes, modulo some other mechanisms to ensure that the libudev-compat
process doesn't get back-logged and lose messages.

=C2=A0What do you mean by that?
=C2=A0If libudev-compat is, like libudev, linked into the application, then=
you have no control over client behaviour; if a client doesn't properly=
act on a notification, then there's nothing you can do about it and
it's not your responsibility. Can you give a few details about what
you're doing client-side?


A bit of background:
* Unlike ne= tlink sockets, a program cannot control the size of an inotify descriptor&#= 39;s "receive" buffer.=C2=A0 This is a system-wide constant, defi= ned in=C2=A0/proc/sys/fs/inotify/max_queued_events.=C2=A0 However, libudev = offers clients the ability to do just this (via=C2=A0udev_monitor_set_recei= ve_buffer_size).=C2=A0 This is what I originally meant--libudev-compat need= s to ensure that the desired receive buffer size is honored.
* libudev&#= 39;s API exposes the udev_monitor's netlink socket descriptor directly = to the client, so it can poll on it (via=C2=A0udev_monitor_get_fd).
* li= budev allows clients to define event filters, so they receive only the even= ts that they want to receive (via udev_monitor_filter_*).=C2=A0 The impleme= ntation achieves this by translating filters into BPF programs, and attachi= ng them to the client's netlink socket.=C2=A0 It is also somewhat compl= ex, and I didn't want to have to re-write it each time I sync'ed th= e code with upstream.

To work around these constra= ints, libudev-compat routes a udev_monitor's events through an internal= socket pair.=C2=A0 It uses inotify as an edge-trigger instead of a level-t= rigger: =C2=A0when there is at least one file to consume from the event dir= ectory, it will read as many files as it can and try to saturate the struct= udev_monitor's socket pair (the number of bytes the socketpair can hol= d now gets controlled by udev_monitor_set_receive_buffer_size).=C2=A0 The r= eceive end of the socket pair and the inotify descriptor are unified into a= single pollable epoll descriptor, which gets returned via libudev-compat&#= 39;s =C2=A0udev_monitor_get_fd (it will poll as ready if either there are u= nconsumed events in the socket pair, or a new file has arrived in the direc= tory).=C2=A0 The filtering implementation works almost unmodified, except t= hat it attaches BPF programs to the udev_monitor's socket pair's re= ceiving end instead of a netlink socket.

In summar= y, the system doesn't try to outright prevent event loss for clients; i= t tries to ensure the clients can control their receive-buffer size, with e= xpected results.=C2=A0 One of the more subtle reasons for using eventfs is = that it makes it possible to control the maximum number of bytes an event d= irectory can hold.=C2=A0 By making this work on a per-directory basis, the = system retains the ability to control on a per-monitor basis the maximum nu= mber of events it will hold before NACKing the event-pusher.=C2=A0 The=C2= =A0udev_monitor_set_receive_buffer_size would also set the upper byte-limit= value for its udev_monitor's event directory, thereby retaining the or= iginal API contract.
=C2=A0

I think both approaches are good ideas and would work just as well.
I really like skabus's approach--I'll take a look at using it for message delivery as an additional (preferred?) vdev-to-libudev-compat
message delivery mechanism :)=C2=A0 It looks like it offers all the
aforementioned benefits over netlink that I'm looking for.

=C2=A0Unfortunately, it's not published yet, because there's still = a lot
of work to be done on clients. And now I'm wondering whether it would be more efficient to store messages in anonymous files and transmit
fds, instead of transmitting copies of messages. I may have to rewrite
stuff. :)
=C2=A0I think I'll be able to get back to work on skabus by the end of = this
year - but no promises, since I'll be working on the Alpine init system=
as soon as I'm done with my current contract. But I can leak a few
pieces of source code if you're interested.


I'd be willing to take a cr= ack at it, if I have time between now and the end of the year.=C2=A0 I'= m trying to finish my PhD this year, which is why vdev development has been= slow-going for the past several months.=C2=A0 Will keep you posted :)
=C2=A0

A question on the implementation--what do you think of having each
subscriber create its own Unix domain socket in a canonical
directory, and having the sender connect as a client to each
subscriber?

=C2=A0That's exactly how fifodirs work, with pipes instead of sockets.<= br> =C2=A0But I don't think that's a good fit here.

=C2=A0A point of fifodirs is to have many-to-many communication: there
are several subscribers, but there can also be several publishers
(even if in practice there's often only one publisher). Publishers and<= br> subscribers are completely independent.
=C2=A0Here, you only ever have one publisher: the event dispatcher. You
only ever need one-to-many communication.

=C2=A0Another point of fifodirs is to avoid the need for a daemon to act as a bus. It's notification that happens between unrelated processes without requiring a central server to ensure the communication.
It's important because I didn't want my supervision system (which i= s
supposed to manage daemons) to itself rely on a daemon (which would
then have to be unsupervised).
=C2=A0Here, you don't have that requirement, and you already have a dae= mon:
the event dispatcher is long-lived.
=C2=A0I think a "socketdir" mechanism is just too heavy:
=C2=A0- for every event, you perform opendir(), readdir() and closedir()=C2= =A0
=C2=A0- for every event * subscriber, you perform at least socket(), connec= t(),
sendmsg() and close()
=C2=A0- the client library needs to listen() and accept(), which means it needs its own thread (and I hate, hate, hate, libraries that pull in
thread support in my otherwise single-threaded programs)
=C2=A0- the client library needs to perform access control on the socket, to avoid connects from unrelated processes, and even then you can't
be certain it's the event publisher and not a random root process

=C2=A0You definitely don't want a client library to be listen()ing.
listen() is server stuff - mixing client and server stuff is complex.
Too much so for what you need here.

=C2=A0Since each subscriber needs its own fd to read and
close, the directory of subscriber sockets automatically gives the
sender a list of who to communicate with and a count of how many fds
to create.=C2=A0 It also makes it easy to detect and clean up a dead
subscriber's socket:=C2=A0 the sender can request a struct ucred from a=
subscriber to get its PID (and then other details from /proc), and if
the process ever exits (which the sender can detect on Linux using a
netlink process monitor, like [1]), the process that created the
socket can be assumed to be dead and the sender can unlink it.=C2=A0 The sender would rely on additional process instance-identifying
information from /proc (like its start-time) to avoid PID-reuse
races.

=C2=A0Bleh. Of course it can be made to work, but you really don't need= all
that complexity. You have a daemon that wants to publish data, and
several clients that want to receive data from that daemon: it's
one (long-lived) to many (short-lived) communication, and there's a
perfectly appropriate, simple and portable IPC for that: a single Unix
domain socket that your daemon listens on and your clients connect to.
=C2=A0If you want to be perfectly reliable, you can implement some kind of<= br> autoreconnect in the client library - in case you want to restart the
event publisher without killing X, for instance. But that's still a
lot simpler than playing with multiple sockets and mixing clients and
serverswhen you don't need to.

Agreed--= if the event dispatcher is going to be a message bus, then a lot of the afo= rementioned difficulties can be eliminated by design.=C2=A0 But I'm unc= omfortable with the technical debt it can introduce to the ecosystem--for e= xample, a message bus has its own semantics that effectively require a bus-= specific library, clients' design choices can require a message bus dae= mon to be running at all times, pervasive use of the message bus by system-= level software can make the implementation a hard requirement for having a = usable system, etc. (in short, we get dbus again).=C2=A0 By going with file= system-oriented approach, this risk is averted, since the filesystem interf= ace is well-understood, universally supported, and somewhat future-proof.= =C2=A0 Most programs can use it without being aware of the fact.
= =C2=A0



Thanks again for all your input!

=C2=A0No problem. I love design discussions, I can't get enough of them= .
(The reason why I left the Devuan mailing-list is that there was too
much ideological mumbo-jumbo, and not enough technical/design stuff.
Speaking of which, my apologies to Alpine devs for hijacking their ML;
if it's too OT/uninteresting, we'll take the discussion elsewhere.)=

Happy to move offline, unless the Alpine d= evs still want to be CC'ed :)

-Jude
=

---
Unsubscribe:=C2=A0 alpine-devel+unsubscribe@lists.alpinelinux.or= g
Help:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alpine-devel+help@lists.alpineli= nux.org
---


--001a113d53109068980529771f3a-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org ---