X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id C52E2DC0A15 for ; Tue, 19 Jan 2016 06:20:34 +0000 (UTC) Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 9A9ACDC0268 for ; Tue, 19 Jan 2016 06:20:33 +0000 (UTC) Received: by mail-ob0-f179.google.com with SMTP id py5so211332458obc.2 for ; Mon, 18 Jan 2016 22:20:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cGut8z4t3f62JMkaF5/W2edKE8IJQXTipu+1dbtQiFI=; b=NCIWOIzf2Zo4L3vDYW5DZWVGFzxDe1yCOcxoE07Tip7ZEHn4TM+6i9KkG9EPBPYvru f0gM3VP1ben+Oifu7pcGiMRgqAc3fAKqh1Z3QY49kYNNUbWTvJblHmcqVZSgDwVKywNR o6opWlWN5Us4r/r5d6ewKOK/yIVltf7bLmOUffLvR5P5RtDkGHJnmj7iDmUSSbte0Nmh mvJA1TKLw6pIokcFe/xVd5bwX7zfLJA9Pf/2RlDVe/ClUDjNEYLi+Mi0HiOBf0eQLsn2 Lxgf8MyTq+7i03ieBzi/AfhaHRZiH60u375RuskkXLEmrgPfgRGssmDXid6oQCz1MCRY rawg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=cGut8z4t3f62JMkaF5/W2edKE8IJQXTipu+1dbtQiFI=; b=mjE0OKEkWj4movEOvY5SEZLO4jIKZ+l31V3bAASjrMyS78livlXI3jkG/WsRLvMm0u qonMr/MQqZvwmdQ2e12g1LN/bA2+sItSc39AHXhIzXzKacsnHRkXkP4TceARWbOQSgB7 2QO0F122de5Gdv71w9GfDNB+FXvY0cJj+uDqyL/XDI0csMrqOY97aVTyg8CokNw195Cx g6Bg/uky0LkRX+/aCbj31122Yk69a7EVR308yNhfSCMbFefv6uyz9w2iTAE8CuF3d/Zo m5OODi6v6vzHSBlTGk9Yl6WsRX2t5YNjeMxNPNX/c0YSkpcGDBbg2pVwI8TByoeXyGsj dPHw== X-Gm-Message-State: ALoCoQnsbxFSXFNCX+O2VZBHSYnO0W0j+QCcGay+TzIwJGBCFiG6TwuRnO0RVueN4FYuW2EHN9RT71gPBipPTQUp6In2IshKig== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.60.134.202 with SMTP id pm10mr21998025oeb.50.1453184432705; Mon, 18 Jan 2016 22:20:32 -0800 (PST) Received: by 10.202.81.6 with HTTP; Mon, 18 Jan 2016 22:20:32 -0800 (PST) In-Reply-To: <569CD71C.2020407@skarnet.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> <56978822.8020205@skarnet.org> <569CD71C.2020407@skarnet.org> Date: Tue, 19 Jan 2016 01:20:32 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=047d7b417a63e798580529a9dd36 X-Virus-Scanned: ClamAV using ClamSMTP --047d7b417a63e798580529a9dd36 Content-Type: text/plain; charset=UTF-8 Hi Laurent, > I have a standalone netlink listener: > http://skarnet.org/software/s6-linux-utils/s6-uevent-listener.html > Any data gatherer / event dispatcher program can be used behind it. > I'm currently using it as "s6-uevent-listener s6-uevent-spawner mdev", > which spawns a mdev instance per uevent. > Ideally, I should be able to use it as something like > "s6-uevent-listener vdev-data-gatherer vdev-event-dispatcher" and have > a pipeline of 3 long-lived processes, every process being independently > replaceable on the command-line by any other implementation that uses > the same API. > > Sounds good! I'll aim to add that in the medium-term. > > * Unlike netlink sockets, a program cannot >> control the size of an inotify descriptor's "receive" buffer. This >> is a system-wide constant, defined in >> /proc/sys/fs/inotify/max_queued_events. However, libudev offers >> clients the ability to do just this (via >> udev_monitor_set_receive_buffer_size). This is what I originally >> meant--libudev-compat needs to ensure that the desired receive buffer >> size is honored. >> > > Reading the udev_monitor doc pages stirs up horrible memories of the > D-Bus API. Urge to destroy world rising. > > It looks like udev_monitor_set_receive_buffer_size() could be > completely stubbed out for your implementation via inotify. It is only > useful when events queue up in the kernel buffer because a client isn't > reading them fast enough; but with your system, events are stored in > the filesystem so they will never be lost - so there's no such thing as > a meaningful "kernel buffer" in your case, and nobody cares what its > size is: clients will always have access to the full set of events. > "return 0;" is the implementation you want here. > > > Blech. > I understand the API is inherently complex and kinda enforces the > system's architecture - which is very similar to what systemd does, so > it's very unsurprising to me that systemd phagocyted udev: those two > were *made* to be together - but it looks like by deciding to do things > differently and wanting to still provide compatibility, you ended up > coding something that's just as complex, and more convoluted (since > you're not using the original mechanisms) than the original. > > The filter mechanism is horribly specific and does not leave much > room for alternative implementations, so I know it's hard to do > correctly, but it seems to me that your implementation gets the worst > of both worlds: > - one of your implementation's advantages is that clients can never > lose events, but by piling your socketpair thingy onto it for an "accurate" > udev_monitor emulation, you make it so clients can actually shoot > themselves in the foot. It may be accurate, but it's lower quality than > your idea permits. - the original udev implementation's advantage is that clients are never > woken up when an event arrives if the event doesn't pass the filter. Here, > your application will never be woken up indeed, but libudev-compat will be, > since you will get readability on your inotify descriptor. Filters are > not server-side (or even kernel-side) as udev intended, they're > client-side, > and that's not efficient. > > I believe that you'd be much better off simply using a normal Unix > socket connection from the client to an event dispatcher daemon, and > implementing a small protocol where udev_monitor_filter primitives just > write strings to the socket, and the server reads them and implements > filters server-side by *not* linking filtered events to the > client's event directory. This way, clients really aren't woken up by > events that do not pass the filter. I agree with everything you have said. It is true that libudev-compat emphasizes compatibility to the point where it sacrifices simplicity and performance to achieve correctness (i.e. consistency with libudev's behavior). This is not because I believe in the soundness of libudev's design, but because I'm trying to avoid any breakage. Believe me, I would love to get away from libudev completely. If programs expect the device manager to expose device metadata and publish events, then the device manager should do so in a way that lets programs access them directly, without an additional client library. This is what vdev strives to do--its helpers expose all device metadata as a set of easy-to-parse files, and propagate events through the VFS (but I'm in favor of moving towards using an event dispatcher like you suggest, since that would be much simpler to implement and only incur a minimal increase to the subscriber's interface complexity). I think switching to a carefully-designed event dispatcher fixes both of these two problems, while allowing me to retain the unmodified event-filtering logic from libudev. Specifically, the event dispatcher would use a UNIX domain socket to establish a shared socket pair with each libudev-compat client, and libudev-compat would install the BPF programs on the client's end of the socket pair (this would also preserve the ability to set the receiving buffer size). This approach eliminates zero-copy multicast, but as you pointed out earlier this is probably not a problem in practice anymore, given how small messages are and how infrequent they appear to be. Moreover, device events could still be namespaced, for example: * each context would run its own event dispatcher * the parent context runs a client program (an "event-forwarder") that writes events to a FIFO * when the child context is started, the FIFO gets bind-mounted to a canonical location for its event dispatcher to connect to and receive events * the parent context controls which events get propagated to its children by interposing filtering programs between the event-forwarder and the shared FIFO (e.g. Don't want the child context to see USB hotplugs? Then capture and don't write USB events to the child's FIFO endpoint in the parent context.) > > But I'm uncomfortable with the technical debt it can introduce to the >> ecosystem--for example, a message bus has its own semantics that >> effectively require a bus-specific library, clients' design choices >> can require a message bus daemon to be running at all times, >> pervasive use of the message bus by system-level software can make >> the implementation a hard requirement for having a usable system, >> etc. (in short, we get dbus again). >> > > Huh? > I wasn't suggesting using a generic bus. > I was suggesting that the natural architecture for an event dispatcher > was that of a single publisher (the server) with multiple subscribers > (the clients). And that was similar to a bus - except simpler, because > you don't even have multiple publishers. > > It's not about using a system bus or anything of the kind. It's about > writing the event dispatcher and the client library as you'd write a bus > server and a bus client library (and please, forget about the insane > D-Bus model of message-passing between symmetrical peers - a client-server > model is much simpler, and easier to implement, at least on Unix). > Sorry--let me try to clarify what I meant. I was trying to say that one of the things that appeals to me about exposing events through a specialized filesystem is that it exposes a well-understood, universal, and easy-to-use API. All existing file-oriented tools would work with it, without modification. The downside is that it requires a somewhat complex implementation, as we discussed. I'm not suggesting that we look to dbus for inspiration :) I was trying to point out that while the upside of using an event dispatcher is that it has a simple implementation, the downside is that without careful design, an event dispatcher with a simple implementation can still evolve a complex contract with its client programs that is difficult to honor (so much so that a complex client library is all but required to mediate access to the dispatcher). I was pointing out that any system-wide complexity introduced by specifying a dispatcher-specific publish/subscribe protocol for device-aware applications should be considered as part of the "total complexity" of using an event dispatcher, so it can be minimized up-front (this was the "minimal increase to the subscriber's interface complexity" I mentioned above). But bringing this up was very academic of me ;) I don't think that using a carefully-designed event dispatcher is nearly as complex as using a filesystem. I feel like I can replace eventfs with an event dispatcher that is both simple to implement and simple to use, while lowering the overall complexity of device propagation and retaining enough functionality to achieve libudev compatibility for legacy programs. Thanks, Jude --047d7b417a63e798580529a9dd36 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi Laurent,
=C2=A0
=C2=A0I have a standalone netlink listener:
=C2=A0http://skarnet.org/software/= s6-linux-utils/s6-uevent-listener.html
=C2=A0Any data gatherer / event dispatcher program can be used behind it. I'm currently using it as "s6-uevent-listener s6-uevent-spawner md= ev",
which spawns a mdev instance per uevent.
=C2=A0Ideally, I should be able to use it as something like
"s6-uevent-listener vdev-data-gatherer vdev-event-dispatcher" and= have
a pipeline of 3 long-lived processes, every process being independently
replaceable on the command-line by any other implementation that uses
the same API.


Sounds good!=C2=A0 I'll aim= to add that in the medium-term.
=C2=A0

=C2=A0* Unlike netlink sockets, a program cannot
control the size of an inotify descriptor's "receive" buffer.= =C2=A0 This
is a system-wide constant, defined in
/proc/sys/fs/inotify/max_queued_events.=C2=A0 However, libudev offers
clients the ability to do just this (via
udev_monitor_set_receive_buffer_size).=C2=A0 This is what I originally
meant--libudev-compat needs to ensure that the desired receive buffer
size is honored.

=C2=A0Reading the udev_monitor doc pages stirs up horrible memories of the<= br> D-Bus API. Urge to destroy world rising.

=C2=A0It looks like udev_monitor_set_receive_buffer_size() could be
completely stubbed out for your implementation via inotify. It is only
useful when events queue up in the kernel buffer because a client isn't=
reading them fast enough; but with your system, events are stored in
the filesystem so they will never be lost - so there's no such thing as=
a meaningful "kernel buffer" in your case, and nobody cares what = its
size is: clients will always have access to the full set of events.
"return 0;" is the implementation you want here.
=

<snip>
=C2=A0

=C2=A0Blech.
=C2=A0I understand the API is inherently complex and kinda enforces the
system's architecture - which is very similar to what systemd does, so<= br> it's very unsurprising to me that systemd phagocyted udev: those two were *made* to be together - but it looks like by deciding to do things
differently and wanting to still provide compatibility, you ended up
coding something that's just as complex, and more convoluted (since
you're not using the original mechanisms) than the original.

=C2=A0The filter mechanism is horribly specific and does not leave much
room for alternative implementations, so I know it's hard to do
correctly, but it seems to me that your implementation gets the worst
of both worlds:
- one of your implementation's advantages is that clients can never
lose events, but by piling your socketpair thingy onto it for an "accu= rate"
udev_monitor emulation, you make it so clients can actually shoot
themselves in the foot. It may be accurate, but it's lower quality than=
your idea permits.=C2=A0
- the original udev implementation's advantage is that clients are neve= r
woken up when an event arrives if the event doesn't pass the filter. He= re,
your application will never be woken up indeed, but libudev-compat will be,=
since you will get readability on your inotify descriptor. Filters are
not server-side (or even kernel-side) as udev intended, they're client-= side,
and that's not efficient.

=C2=A0I believe that you'd be much better off simply using a normal Uni= x
socket connection from the client to an event dispatcher daemon, and
implementing a small protocol where udev_monitor_filter primitives just
write strings to the socket, and the server reads them and implements
filters server-side by *not* linking filtered events to the
client's event directory. This way, clients really aren't woken up = by
events that do not pass the filter.

I agree= with everything you have said.=C2=A0 It is true that libudev-compat emphas= izes compatibility to the point where it sacrifices simplicity and performa= nce to achieve correctness (i.e. consistency with libudev's behavior).= =C2=A0 This is not because I believe in the soundness of libudev's desi= gn, but because I'm trying to avoid any breakage.

<= div>Believe me, I would love to get away from libudev completely.=C2=A0 If = programs expect the device manager to expose device metadata and publish ev= ents, then the device manager should do so in a way that lets programs acce= ss them directly, without an additional client library.=C2=A0 This is what = vdev strives to do--its helpers expose all device metadata as a set of easy= -to-parse files, and propagate events through the VFS (but I'm in favor= of moving towards using an event dispatcher like you suggest, since that w= ould be much simpler to implement and only incur a minimal increase to the = subscriber's interface complexity).

I think sw= itching to a carefully-designed event dispatcher fixes both of these two pr= oblems, while allowing me to retain the unmodified event-filtering logic fr= om libudev.=C2=A0 Specifically, the event dispatcher would use a UNIX domai= n socket to establish a shared socket pair with each libudev-compat client,= and libudev-compat would install the BPF programs on the client's end = of the socket pair (this would also preserve the ability to set the receivi= ng buffer size).=C2=A0 This approach eliminates zero-copy multicast, but as= you pointed out earlier this is probably not a problem in practice anymore= , given how small messages are and how infrequent they appear to be.=C2=A0 = Moreover, device events could still be namespaced, for example:
*= each context would run its own event dispatcher
* the parent context ru= ns a client program (an "event-forwarder") that writes events to = a FIFO
* when the child context is started, the FIFO gets bind-mo= unted to a canonical location for its event dispatcher to connect to and re= ceive events
* the parent context controls which events get propa= gated to its children by interposing filtering programs between the event-f= orwarder and the shared FIFO (e.g. Don't want the child context to see = USB hotplugs?=C2=A0 Then capture and don't write USB events to the chil= d's FIFO endpoint in the parent context.)
=C2=A0

But I'm uncomfortable with the technical debt it can introduce to the ecosystem--for example, a message bus has its own semantics that
effectively require a bus-specific library, clients' design choices
can require a message bus daemon to be running at all times,
pervasive use of the message bus by system-level software can make
the implementation a hard requirement for having a usable system,
etc. (in short, we get dbus again).

=C2=A0Huh?
=C2=A0I wasn't suggesting using a generic bus.
=C2=A0I was suggesting that the natural architecture for an event dispatche= r
was that of a single publisher (the server) with multiple subscribers
(the clients). And that was similar to a bus - except simpler, because
you don't even have multiple publishers.

=C2=A0It's not about using a system bus or anything of the kind. It'= ;s about
writing the event dispatcher and the client library as you'd write a bu= s
server and a bus client library (and please, forget about the insane
D-Bus model of message-passing between symmetrical peers - a client-server<= br> model is much simpler, and easier to implement, at least on Unix).

Sorry--let me try to clarify what I meant.=C2= =A0 I was trying to say that one of the things that appeals to me about exp= osing events through a specialized filesystem is that it exposes a well-und= erstood, universal, and easy-to-use API.=C2=A0 All existing file-oriented t= ools would work with it, without modification.=C2=A0 The downside is that i= t requires a somewhat complex implementation, as we discussed.
I'm not suggesting that we look to dbus for inspiration :)= =C2=A0I was trying to point out that while the upside of using an event di= spatcher is that it has a simple implementation, the downside is that witho= ut careful design, an event dispatcher with a simple implementation can sti= ll evolve a complex contract with its client programs that is difficult to = honor (so much so that a complex client library is all but required to medi= ate access to the dispatcher).=C2=A0 I was pointing out that any system-wid= e complexity introduced by specifying a dispatcher-specific publish/subscri= be protocol for device-aware applications should be considered as part of t= he "total complexity" of using an event dispatcher, so it can be = minimized up-front (this was the "minimal increase to the subscriber&#= 39;s interface complexity" I mentioned above).=C2=A0 But bringing this= up was very academic of me ;) =C2=A0I don't think that using a careful= ly-designed event dispatcher is nearly as complex as using a filesystem.
I feel like I can replace eventfs with an event dispatcher that is bot= h simple to implement and simple to use, while lowering the overall complex= ity of device propagation and retaining enough functionality to achieve lib= udev compatibility for legacy programs.
=C2=A0
Thanks,<= /div>
Jude
--047d7b417a63e798580529a9dd36-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org ---