Mail archive
alpine-devel

Re: [alpine-devel] udev replacement on Alpine Linux

From: Jude Nelson <judecn_at_gmail.com>
Date: Tue, 19 Jan 2016 01:20:32 -0500

Hi Laurent,


> I have a standalone netlink listener:
> http://skarnet.org/software/s6-linux-utils/s6-uevent-listener.html
> Any data gatherer / event dispatcher program can be used behind it.
> I'm currently using it as "s6-uevent-listener s6-uevent-spawner mdev",
> which spawns a mdev instance per uevent.
> Ideally, I should be able to use it as something like
> "s6-uevent-listener vdev-data-gatherer vdev-event-dispatcher" and have
> a pipeline of 3 long-lived processes, every process being independently
> replaceable on the command-line by any other implementation that uses
> the same API.
>
>
Sounds good! I'll aim to add that in the medium-term.


>
> * Unlike netlink sockets, a program cannot
>> control the size of an inotify descriptor's "receive" buffer. This
>> is a system-wide constant, defined in
>> /proc/sys/fs/inotify/max_queued_events. However, libudev offers
>> clients the ability to do just this (via
>> udev_monitor_set_receive_buffer_size). This is what I originally
>> meant--libudev-compat needs to ensure that the desired receive buffer
>> size is honored.
>>
>
> Reading the udev_monitor doc pages stirs up horrible memories of the
> D-Bus API. Urge to destroy world rising.
>
> It looks like udev_monitor_set_receive_buffer_size() could be
> completely stubbed out for your implementation via inotify. It is only
> useful when events queue up in the kernel buffer because a client isn't
> reading them fast enough; but with your system, events are stored in
> the filesystem so they will never be lost - so there's no such thing as
> a meaningful "kernel buffer" in your case, and nobody cares what its
> size is: clients will always have access to the full set of events.
> "return 0;" is the implementation you want here.
>

<snip>


>
> Blech.
> I understand the API is inherently complex and kinda enforces the
> system's architecture - which is very similar to what systemd does, so
> it's very unsurprising to me that systemd phagocyted udev: those two
> were *made* to be together - but it looks like by deciding to do things
> differently and wanting to still provide compatibility, you ended up
> coding something that's just as complex, and more convoluted (since
> you're not using the original mechanisms) than the original.
>
> The filter mechanism is horribly specific and does not leave much
> room for alternative implementations, so I know it's hard to do
> correctly, but it seems to me that your implementation gets the worst
> of both worlds:
> - one of your implementation's advantages is that clients can never
> lose events, but by piling your socketpair thingy onto it for an "accurate"
> udev_monitor emulation, you make it so clients can actually shoot
> themselves in the foot. It may be accurate, but it's lower quality than
> your idea permits.

- the original udev implementation's advantage is that clients are never
> woken up when an event arrives if the event doesn't pass the filter. Here,
> your application will never be woken up indeed, but libudev-compat will be,
> since you will get readability on your inotify descriptor. Filters are
> not server-side (or even kernel-side) as udev intended, they're
> client-side,
> and that's not efficient.
>
> I believe that you'd be much better off simply using a normal Unix
> socket connection from the client to an event dispatcher daemon, and
> implementing a small protocol where udev_monitor_filter primitives just
> write strings to the socket, and the server reads them and implements
> filters server-side by *not* linking filtered events to the
> client's event directory. This way, clients really aren't woken up by
> events that do not pass the filter.


I agree with everything you have said. It is true that libudev-compat
emphasizes compatibility to the point where it sacrifices simplicity and
performance to achieve correctness (i.e. consistency with libudev's
behavior). This is not because I believe in the soundness of libudev's
design, but because I'm trying to avoid any breakage.

Believe me, I would love to get away from libudev completely. If programs
expect the device manager to expose device metadata and publish events,
then the device manager should do so in a way that lets programs access
them directly, without an additional client library. This is what vdev
strives to do--its helpers expose all device metadata as a set of
easy-to-parse files, and propagate events through the VFS (but I'm in favor
of moving towards using an event dispatcher like you suggest, since that
would be much simpler to implement and only incur a minimal increase to the
subscriber's interface complexity).

I think switching to a carefully-designed event dispatcher fixes both of
these two problems, while allowing me to retain the unmodified
event-filtering logic from libudev. Specifically, the event dispatcher
would use a UNIX domain socket to establish a shared socket pair with each
libudev-compat client, and libudev-compat would install the BPF programs on
the client's end of the socket pair (this would also preserve the ability
to set the receiving buffer size). This approach eliminates zero-copy
multicast, but as you pointed out earlier this is probably not a problem in
practice anymore, given how small messages are and how infrequent they
appear to be. Moreover, device events could still be namespaced, for
example:
* each context would run its own event dispatcher
* the parent context runs a client program (an "event-forwarder") that
writes events to a FIFO
* when the child context is started, the FIFO gets bind-mounted to a
canonical location for its event dispatcher to connect to and receive events
* the parent context controls which events get propagated to its children
by interposing filtering programs between the event-forwarder and the
shared FIFO (e.g. Don't want the child context to see USB hotplugs? Then
capture and don't write USB events to the child's FIFO endpoint in the
parent context.)


>
> But I'm uncomfortable with the technical debt it can introduce to the
>> ecosystem--for example, a message bus has its own semantics that
>> effectively require a bus-specific library, clients' design choices
>> can require a message bus daemon to be running at all times,
>> pervasive use of the message bus by system-level software can make
>> the implementation a hard requirement for having a usable system,
>> etc. (in short, we get dbus again).
>>
>
> Huh?
> I wasn't suggesting using a generic bus.
> I was suggesting that the natural architecture for an event dispatcher
> was that of a single publisher (the server) with multiple subscribers
> (the clients). And that was similar to a bus - except simpler, because
> you don't even have multiple publishers.
>
> It's not about using a system bus or anything of the kind. It's about
> writing the event dispatcher and the client library as you'd write a bus
> server and a bus client library (and please, forget about the insane
> D-Bus model of message-passing between symmetrical peers - a client-server
> model is much simpler, and easier to implement, at least on Unix).
>

Sorry--let me try to clarify what I meant. I was trying to say that one of
the things that appeals to me about exposing events through a specialized
filesystem is that it exposes a well-understood, universal, and easy-to-use
API. All existing file-oriented tools would work with it, without
modification. The downside is that it requires a somewhat complex
implementation, as we discussed.

I'm not suggesting that we look to dbus for inspiration :) I was trying to
point out that while the upside of using an event dispatcher is that it has
a simple implementation, the downside is that without careful design, an
event dispatcher with a simple implementation can still evolve a complex
contract with its client programs that is difficult to honor (so much so
that a complex client library is all but required to mediate access to the
dispatcher). I was pointing out that any system-wide complexity introduced
by specifying a dispatcher-specific publish/subscribe protocol for
device-aware applications should be considered as part of the "total
complexity" of using an event dispatcher, so it can be minimized up-front
(this was the "minimal increase to the subscriber's interface complexity" I
mentioned above). But bringing this up was very academic of me ;) I don't
think that using a carefully-designed event dispatcher is nearly as complex
as using a filesystem.

I feel like I can replace eventfs with an event dispatcher that is both
simple to implement and simple to use, while lowering the overall
complexity of device propagation and retaining enough functionality to
achieve libudev compatibility for legacy programs.

Thanks,
Jude



---
Unsubscribe:  alpine-devel+unsubscribe_at_lists.alpinelinux.org
Help:         alpine-devel+help_at_lists.alpinelinux.org
---
Received on Tue Jan 19 2016 - 01:20:32 GMT