X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 50552DC0368 for ; Thu, 14 Jan 2016 11:36:00 +0000 (UTC) Received: from smtp1.tech.numericable.fr (smtp1.tech.numericable.fr [82.216.111.37]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id D4A3BDC0298 for ; Thu, 14 Jan 2016 11:35:59 +0000 (UTC) Received: from sinay.internal.skarnet.org (ip-62.net-82-216-6.versailles2.rev.numericable.fr [82.216.6.62]) by smtp1.tech.numericable.fr (Postfix) with SMTP id 041891414C5 for ; Thu, 14 Jan 2016 12:35:57 +0100 (CET) Received: (qmail 22572 invoked from network); 14 Jan 2016 11:36:23 -0000 Received: from elzian.internal.skarnet.org. (HELO ?192.168.0.2?) (192.168.0.2) by sinay.internal.skarnet.org. with SMTP; 14 Jan 2016 11:36:23 -0000 Subject: Re: [alpine-devel] udev replacement on Alpine Linux To: alpine-devel@lists.alpinelinux.org References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> From: Laurent Bercot Message-ID: <56978822.8020205@skarnet.org> Date: Thu, 14 Jan 2016 12:36:02 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 50 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekiedrkeelgddukecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgnecuuegrihhlohhuthemuceftddtnecuogetfeejfedqtdegucdlhedtmdenucfjughrpefuvfhfhffkffgfgggjtgfgsehtjegrtddtfeejnecuhfhrohhmpefnrghurhgvnhhtuceuvghrtghothcuoehskhgrqdguvghvvghlsehskhgrrhhnvghtrdhorhhgqeenucffohhmrghinhepghhithhhuhgsrdgtohhmnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht X-Virus-Scanned: ClamAV using ClamSMTP On 14/01/2016 06:55, Jude Nelson wrote: > I think you're close. The jist of it is that vdev needs to supply a > lot more information than the kernel gives it. In particular, its > helper programs go on to query the properties and status of each > device (this often requires root privileges, i.e. via privileged > ioctl()s), and vdev gathers the information into a (much larger) > event packet and stores it in a directory tree under /dev for > subsequent query by less-privileged programs. I see. I think this is exactly what could be made modular. I've heard people say they were reluctant to using vdev because it's not KISS, and I suspect the ioctl machinery and data gathering is a large part of the complexity. If that part could be pluggable, i.e. if admins could choose a "data gatherer" just complex enough for their needs, I believe it could encourage adoption. In other words, I'm looking at a 3-part program: - the netlink listener - the data gatherer - the event publisher Of course, for libudev to work, you would need the full data gatherer; but if people aren't using libudev programs, they can use a simpler one, closer to what mdev is doing. It's all from a very high point-of-view, and I don't know the details of the code so I have no idea whether it's envisionable for vdev, but that's what I'm thinking off the top of my head. > Funny you mention this--I also created runfs > (https://github.com/jcnelson/runfs) to do exactly this. In > particular, I use it for PID files. I have no love for mechanisms that help people keep using PID files, which are an ugly relic that can't end up in the museum of mediaeval programming soon enough. :P That said, runfs is interesting, and I would love it if Unix provided such a mechanism. Unfortunately, for now it has to rely on FUSE, which is one of the most clunky mutant features of Linux, and an extra layer of complexity; so I find it cleaner if a program can achieve its functionality without depending on such a filesystem. > I agree that netlink is lighter, but I avoided it for two reasons: > * Sometime down the road, I'd like to port vdev to OpenBSD. That's a good reason, and an additional reason to separate the netlink listener from the event publisher (and the data gatherer). The event publisher and client library can be made 100% portable, whereas the netlink listener and data gatherer obviously cannot. > * There is no way to namespace netlink messages that I'm aware of. I didn't know that - I'm no netlink expert. But that's also a good reason. AFAICT, there are 32 netlink multicast groups, and they use hardcoded numbers - this is ugly, or at least requires a global registry of what group is used for. If you can't namespace them, it becomes even more of a scarce resource; although it's legitimate to use one for uevent publishing, I'm pretty sure people will find a way to clog them with random crap very soon - better stay away from resources you can't reliably lock. And from what you're saying, even systemd people have realized that. :) I'm not advocating netlink use for anything else than reading kernel events. It's just that true multicast will be more efficient than manual broadcast, there's no way around it. > By using a synthetic filesystem for > message transport, I can use bind-mounts to control which device > events get routed to which containers I'm torn between "oooh, clever" and "omg this hack is atrocious". :) > Yes, modulo some other mechanisms to ensure that the libudev-compat > process doesn't get back-logged and lose messages. What do you mean by that? If libudev-compat is, like libudev, linked into the application, then you have no control over client behaviour; if a client doesn't properly act on a notification, then there's nothing you can do about it and it's not your responsibility. Can you give a few details about what you're doing client-side? > I think both approaches are good ideas and would work just as well. > I really like skabus's approach--I'll take a look at using it for > message delivery as an additional (preferred?) vdev-to-libudev-compat > message delivery mechanism :) It looks like it offers all the > aforementioned benefits over netlink that I'm looking for. Unfortunately, it's not published yet, because there's still a lot of work to be done on clients. And now I'm wondering whether it would be more efficient to store messages in anonymous files and transmit fds, instead of transmitting copies of messages. I may have to rewrite stuff. :) I think I'll be able to get back to work on skabus by the end of this year - but no promises, since I'll be working on the Alpine init system as soon as I'm done with my current contract. But I can leak a few pieces of source code if you're interested. > A question on the implementation--what do you think of having each > subscriber create its own Unix domain socket in a canonical > directory, and having the sender connect as a client to each > subscriber? That's exactly how fifodirs work, with pipes instead of sockets. But I don't think that's a good fit here. A point of fifodirs is to have many-to-many communication: there are several subscribers, but there can also be several publishers (even if in practice there's often only one publisher). Publishers and subscribers are completely independent. Here, you only ever have one publisher: the event dispatcher. You only ever need one-to-many communication. Another point of fifodirs is to avoid the need for a daemon to act as a bus. It's notification that happens between unrelated processes without requiring a central server to ensure the communication. It's important because I didn't want my supervision system (which is supposed to manage daemons) to itself rely on a daemon (which would then have to be unsupervised). Here, you don't have that requirement, and you already have a daemon: the event dispatcher is long-lived. I think a "socketdir" mechanism is just too heavy: - for every event, you perform opendir(), readdir() and closedir() - for every event * subscriber, you perform at least socket(), connect(), sendmsg() and close() - the client library needs to listen() and accept(), which means it needs its own thread (and I hate, hate, hate, libraries that pull in thread support in my otherwise single-threaded programs) - the client library needs to perform access control on the socket, to avoid connects from unrelated processes, and even then you can't be certain it's the event publisher and not a random root process You definitely don't want a client library to be listen()ing. listen() is server stuff - mixing client and server stuff is complex. Too much so for what you need here. > Since each subscriber needs its own fd to read and > close, the directory of subscriber sockets automatically gives the > sender a list of who to communicate with and a count of how many fds > to create. It also makes it easy to detect and clean up a dead > subscriber's socket: the sender can request a struct ucred from a > subscriber to get its PID (and then other details from /proc), and if > the process ever exits (which the sender can detect on Linux using a > netlink process monitor, like [1]), the process that created the > socket can be assumed to be dead and the sender can unlink it. The > sender would rely on additional process instance-identifying > information from /proc (like its start-time) to avoid PID-reuse > races. Bleh. Of course it can be made to work, but you really don't need all that complexity. You have a daemon that wants to publish data, and several clients that want to receive data from that daemon: it's one (long-lived) to many (short-lived) communication, and there's a perfectly appropriate, simple and portable IPC for that: a single Unix domain socket that your daemon listens on and your clients connect to. If you want to be perfectly reliable, you can implement some kind of autoreconnect in the client library - in case you want to restart the event publisher without killing X, for instance. But that's still a lot simpler than playing with multiple sockets and mixing clients and serverswhen you don't need to. > Thanks again for all your input! No problem. I love design discussions, I can't get enough of them. (The reason why I left the Devuan mailing-list is that there was too much ideological mumbo-jumbo, and not enough technical/design stuff. Speaking of which, my apologies to Alpine devs for hijacking their ML; if it's too OT/uninteresting, we'll take the discussion elsewhere.) -- Laurent --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org ---