From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 9B302DC00FF for ; Sun, 26 Jul 2015 00:27:55 +0000 (UTC) Received: from mail-ie0-f193.google.com (mail-ie0-f193.google.com [209.85.223.193]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 6D84CDC00E8 for ; Sun, 26 Jul 2015 00:27:54 +0000 (UTC) Received: by iecwi11 with SMTP id wi11so1426341iec.3 for ; Sat, 25 Jul 2015 17:27:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=YX/U/DWMy8eYEv42100hOvuONUvt1QzurFBRyjlvAko=; b=k+u1ak/Y26aB/rfPpfCrUAg5UBhT0xfjZXyO5g0VDyEa3LiuAPzzbuStPi373yoI86 ZFcitHCsbXsyta6pvTBuzaC4prK9SUVZgaM09eNVmHNSQOlkBDrEZ8oek0Y79im904AT DxUraZuuFv+DqAAkXaL6Tqt8ffLCkcPK8DtN4ykzizNsB8ULDEzNTSd/tgOEnWwZ8xR6 lhPR9EmHxvTyT+cH3ngc68OgV2Nhx18SIf/ttnnhnDpfI87lYkmzFwi39UMVD7/OQw35 VfVv78+QnkTmdO2HaCqtbVYS+kcPMah6DZTi9fRE8UkBAS8ULlKPk3MxuT8eUAl7VI06 B7zw== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.107.137.154 with SMTP id t26mr32639930ioi.13.1437869501098; Sat, 25 Jul 2015 17:11:41 -0700 (PDT) Received: by 10.107.134.77 with HTTP; Sat, 25 Jul 2015 17:11:41 -0700 (PDT) Date: Sat, 25 Jul 2015 21:11:41 -0300 Message-ID: Subject: [alpine-devel] udev replacement on Alpine Linux From: Alan Pillay To: alpine-devel@lists.alpinelinux.org, judecn@gmail.com, sin@2f30.org, hiltjo@codemadness.org, rob@landley.net, frank@tuxrocks.com, dev@frign.de Content-Type: text/plain; charset=UTF-8 X-Virus-Scanned: ClamAV using ClamSMTP Dear Alpine Linux developers and mailing-list lurkers, udev is currently being used on Alpine version 3.2.2, but we all know it detracts from the philosophy to keep things simple, small and efficient. There are many programs out there that could replace udev and help Alpine get in a better shape. I will list some that I know. [mdev] there are 2 mdev implementations that I know, busybox's and toybox's. On Alpine Linux, busybox already comes installed by default (and its mdev comes with it, which is weird since it isn't currently used, but I digress) [smdev] smdev is an even simpler implementation of a device manager by the well-known suckless developers. If it is mature enough, certainly a high contender. [eudev] a fork of udev from the gentoo developers. Doesn't appear to be as small as others, but should be more easily integrated into alpine. [vdev] a device manager with an approach a bit different, offers an optional filesystem interface that implements a per-process view of /dev. Possibly the least simple alternative, but interesting nonetheless. I thought about using this means of communication so developers ca discuss this matter that impacts the use of the Alpine Linux distribution as a whole. I am also emailing relevant parties (developers of the cited device managers, so they can participate if they desire). Thanks for the attention. KISS --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 049CFDC03BF for ; Sun, 26 Jul 2015 00:51:13 +0000 (UTC) Received: from mail-ie0-f176.google.com (mail-ie0-f176.google.com [209.85.223.176]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id C6BDCDC00FF for ; Sun, 26 Jul 2015 00:51:12 +0000 (UTC) Received: by iecrl10 with SMTP id rl10so38354720iec.1 for ; Sat, 25 Jul 2015 17:51:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=BNzxDIyRRf6ME8iC0u+DvQTxM/SciHWneSQrJfAvdkE=; b=fWyKiviSeC3Bkr8Rb8JS0jScjBkhONdBPTZFa+TFdbwlMhQ7GAvZu/IOf8vnHtvatK dItmij6xu41/X7Fu98mOSYMkO+y6PdbZYJ36Gin1nDZj0+QcQHcAsMc7SgZFWnQ0Wkp+ FF1X99rtCwhkQXMKjRE34L4aDDzbBiDpQhTNHoLX8IbM3Ulgp2lblbW+V4CFvEOYFtwH VzR12qKURU1NC3yNbFie0zynaGgngIWiAns7CMugiGp68xCzXkcJp/hlntXimZIrmgUt HE5Fe/eGHVbenhpLcBXQkg2GWAJAzIARR3j/bcl+H0QoGKhREjwu+RtujCbDw41enpHY 3sYw== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.107.129.24 with SMTP id c24mr36518587iod.158.1437871872081; Sat, 25 Jul 2015 17:51:12 -0700 (PDT) Received: by 10.64.81.164 with HTTP; Sat, 25 Jul 2015 17:51:12 -0700 (PDT) In-Reply-To: References: Date: Sat, 25 Jul 2015 20:51:12 -0400 Message-ID: Subject: [alpine-devel] Re: udev replacement on Alpine Linux From: Jude Nelson To: Alan Pillay Cc: alpine-devel@lists.alpinelinux.org, sin@2f30.org, hiltjo@codemadness.org, rob@landley.net, frank@tuxrocks.com, dev@frign.de Content-Type: multipart/alternative; boundary=001a113f90482b0793051bbca2f3 X-Virus-Scanned: ClamAV using ClamSMTP --001a113f90482b0793051bbca2f3 Content-Type: text/plain; charset=UTF-8 Hi everyone, > [vdev] a device manager with an approach a bit different, offers an > optional filesystem interface that implements a per-process view of > /dev. Possibly the least simple alternative, but interesting > nonetheless. > > Thank you for your interest in vdev. I am the principal developer. I hope you all don't mind me chiming in to provide a little bit more information. First, I can't emphasize the "optional" qualifier enough regarding the filesystem component vdevfs. The hotplug daemon vdevd and the libudev-compat library are meant to replace udevd and libudev respectively; the filesystem is an add-on that can be used independently of whatever device manager is running. There are more detailed write-up's on the design goals for each of the three vdev components in [0] and [1]. Second, I would like to point out that vdevd by itself is not too different from mdev, nldev, and smdev. The udev-like behavior comes almost entirely from the shell scripts and auxiliary helper programs it executes in reaction to device uevents. I bring this up because these scripts and helper programs could easily be ported to mdev, nldev, and smdev simply by providing a wrapper that sets the appropriate environment variables (a listing can be found in Appendix A of [2]). Third, I would like to point out that libudev-compat is *not* dependent on vdevd or any device manager. A libudev-compat process receives device events by watching for new files written into a well-defined directory in a RAM-backed filesystem. Vdevd simply runs a script to fill in /run/udev and write device event files in order to communicate with libudev-compat processes. I'm happy to help test and evaluate vdev with Alpine. This document [2] contains a tutorial on how to try it out without installing, and what to send me if something breaks. We'll be making an alpha branch soon (see [3]). [0] http://judecnelson.blogspot.com/2015/01/introducing-vdev.html [1] https://github.com/jcnelson/vdev/issues/32 [2] https://github.com/jcnelson/vdev/blob/master/how-to-test.md [3] https://github.com/jcnelson/vdev/issues/33 Thanks, Jude --001a113f90482b0793051bbca2f3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi everyone,
=C2=A0
[vdev] a device manager with an approach a bit different, offers an
optional filesystem interface that implements a per-process view of
/dev. Possibly the least simple alternative, but interesting
nonetheless.


Thank you for your int= erest in vdev.=C2=A0 I am the principal developer.=C2=A0 I hope you all don= 't mind me chiming in to provide a little bit more information.

First, I can't emphasize the "optional" qua= lifier enough regarding the filesystem component vdevfs.=C2=A0 The hotplug = daemon vdevd and the libudev-compat library are meant to replace udevd and = libudev respectively; the filesystem is an add-on that can be used independ= ently of whatever device manager is running.=C2=A0 There are more detailed = write-up's on the design goals for each of the three vdev components in= [0] and [1].

Second, I would like to point out th= at vdevd by itself is not too different from mdev, nldev, and smdev.=C2=A0 = The udev-like behavior comes almost entirely from the shell scripts and aux= iliary helper programs it executes in reaction to device uevents.=C2=A0 I b= ring this up because these scripts and helper programs could easily be port= ed to mdev, nldev, and smdev simply by providing a wrapper that sets the ap= propriate environment variables (a listing can be found in Appendix A of [2= ]).

Third, I would like to point out that libudev-= compat is *not* dependent on vdevd or any device manager.=C2=A0 A libudev-c= ompat process receives device events by watching for new files written into= a well-defined directory in a RAM-backed filesystem.=C2=A0 Vdevd simply ru= ns a script to fill in /run/udev and write device event files in order to c= ommunicate with libudev-compat processes.

I'm happy to help test= and evaluate vdev with Alpine.=C2=A0 This document [2] contains a tutorial= on how to try it out without installing, and what to send me if something = breaks.=C2=A0 We'll be making an alpha branch soon (see [3]).

[0= ]=C2=A0http://judecnelson.blogspot.com/2015/01/introducing-vdev.html
= [1] https://github.c= om/jcnelson/vdev/issues/32
[2] https://github.com/jcnelson/vdev/blob/ma= ster/how-to-test.md
[3]=C2=A0https://github.com/jcnelson/vdev/issues/33

Thank= s,
Jude
--001a113f90482b0793051bbca2f3-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id DE765DC04FD for ; Sun, 26 Jul 2015 03:34:38 +0000 (UTC) Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id B4EC8DC041B for ; Sun, 26 Jul 2015 03:34:38 +0000 (UTC) Received: by oigd21 with SMTP id d21so37004416oig.1 for ; Sat, 25 Jul 2015 20:34:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=MjyuUoUanCUvkPjZaVgw6uZUp0hFcEBbdx6mLiyc95s=; b=Yo3XUEXEH1YDbK4KdYCf2EdUTSwgDsi2PVfJIqTXRNwS6fExOMbjPJYepvCL/BdUZz w4yBmknZBXy/uoXFWm7Bf0si96WeQWM8VO0Y6hZjxLtAqA3YAsFmM4Z4LJ3DaVy2sF76 ESOCwJERheus1if4kl2E+sPBAcAO+WzyGgwY5l39zZo2/SoEo68lFlDSP16imPGJ2wRU z3teOy1c4M3Io9rg+0SucNyvgrzry5MTxOnLvc9I2w94IUPppNsFwQgo/bOcVWmY7Z2z BjGCUhw2vWDp574g5/DCyIC/WtySouTCKEVPQp9RazYprs3NiyS96+Pib+PZGCLLs8GP KY2A== X-Gm-Message-State: ALoCoQm1sX/he3blfyKBIg2+q4UE+ToCsVXJQ8N2tkZ95vnvidV8BDrx+4MC7q718vJE3/I0J6F2 X-Received: by 10.202.224.87 with SMTP id x84mr21100634oig.18.1437881677743; Sat, 25 Jul 2015 20:34:37 -0700 (PDT) Received: from [10.0.2.15] (cpe-72-182-52-210.austin.res.rr.com. [72.182.52.210]) by smtp.googlemail.com with ESMTPSA id ke3sm8007894obb.28.2015.07.25.20.34.35 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 25 Jul 2015 20:34:36 -0700 (PDT) Message-ID: <55B4554A.6020708@landley.net> Date: Sat, 25 Jul 2015 22:34:34 -0500 From: Rob Landley User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 To: Alan Pillay , alpine-devel@lists.alpinelinux.org, judecn@gmail.com, sin@2f30.org, hiltjo@codemadness.org, frank@tuxrocks.com, dev@frign.de Subject: [alpine-devel] Re: udev replacement on Alpine Linux References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP On 07/25/2015 07:11 PM, Alan Pillay wrote: > Dear Alpine Linux developers and mailing-list lurkers, > > udev is currently being used on Alpine version 3.2.2, but we all know > it detracts from the philosophy to keep things simple, small and > efficient. > There are many programs out there that could replace udev and help > Alpine get in a better shape. I will list some that I know. > > [mdev] there are 2 mdev implementations that I know, busybox's and > toybox's. On Alpine Linux, busybox already comes installed by default > (and its mdev comes with it, which is weird since it isn't currently > used, but I digress) I'm the primary developer of toybox and the original author of busybox mdev, but busybox's mdev has grown a lot of new features over the years that toybox doesn't implement yet. I'm happy to add them, but am mostly waiting for patches from the users telling me what they need. (My own embedded systems mostly just use devtmpfs, they don't tend to hotplug a lot of stuff.) If there's interest in my fleshing out toybox's mdev, I can bump it up the todo list, but I tend to be chronically overcommitted so need repeated poking... Rob --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 9C4F6DC053F; Mon, 27 Jul 2015 10:00:58 +0000 (UTC) Received: from ncopa-desktop.alpinelinux.org (unknown [79.160.13.133]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: n@tanael.org) by mail.alpinelinux.org (Postfix) with ESMTPSA id B4EE1DC04B8; Mon, 27 Jul 2015 10:00:56 +0000 (UTC) Date: Mon, 27 Jul 2015 12:00:52 +0200 From: Natanael Copa To: Rob Landley Cc: Alan Pillay , alpine-devel@lists.alpinelinux.org, judecn@gmail.com, sin@2f30.org, hiltjo@codemadness.org, frank@tuxrocks.com, dev@frign.de Subject: Re: [alpine-devel] Re: udev replacement on Alpine Linux Message-ID: <20150727120052.315fa82b@ncopa-desktop.alpinelinux.org> In-Reply-To: <55B4554A.6020708@landley.net> References: <55B4554A.6020708@landley.net> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-alpine-linux-musl) X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP On Sat, 25 Jul 2015 22:34:34 -0500 Rob Landley wrote: > On 07/25/2015 07:11 PM, Alan Pillay wrote: > > Dear Alpine Linux developers and mailing-list lurkers, > > > > udev is currently being used on Alpine version 3.2.2, but we all know > > it detracts from the philosophy to keep things simple, small and > > efficient. > > There are many programs out there that could replace udev and help > > Alpine get in a better shape. I will list some that I know. > > > > [mdev] there are 2 mdev implementations that I know, busybox's and > > toybox's. On Alpine Linux, busybox already comes installed by default > > (and its mdev comes with it, which is weird since it isn't currently > > used, but I digress) > > I'm the primary developer of toybox and the original author of busybox > mdev, but busybox's mdev has grown a lot of new features over the years > that toybox doesn't implement yet. Busybox mdev has lots of feature/solutions for problems that I think should not been there in first place. For example firmware loading, now handled by kernel, device node creation could be handled by devtmpfs (if we want be able to optionally use udev for Xorg we will need devtmpfs anyway). busybox mdev has also a solution for serialization of the hotplug events, which I think is an ugly hack. Code could have been simpler by just reading events from netlink. > I'm happy to add them, but am mostly waiting for patches from the users > telling me what they need. (My own embedded systems mostly just use > devtmpfs, they don't tend to hotplug a lot of stuff.) So what I have been thinking: a netlink socket activator[1], which when there comes an event, fork and execs a handler and passes over the the netlink socket. The handler reads various events from netlink socket. It should be able load kernel modules without forking, and ideally, it should be able to handle each event without forking, including doing blkid lookups without forking. After one or two seconds without any netlink event it will exit and the socket activator takes over again. There was a huge thread about netlink and mdev in busybox mailing list. There were some strong opinions of making a more general read events from any pipe, but I think that needlessly complicates things. I am also interested in loading modules without forking, so I was thinking of making modprobe read modaliases from a stream. Doing so in busybox would require a major refactoring so I instead looked at using libkmod for that. But then libkmod only works with binary format of modaliases so busybox depmod needed a fix[2][3] to generate a binary format of the indexes so libkmod can read those. The current plan is to use nlsockd as socket activator, a netlink reader[4] which will load kernel modules with libkmod and fork mdev - but only on the relevant events - those who has DEVNAME set. > If there's interest in my fleshing out toybox's mdev, I can bump it up > the todo list, but I tend to be chronically overcommitted so need > repeated poking... What I might be interested in is making toybox mdev read events from a netlink socket (stdin or other filedescriptor), add support for loading modaliases without forking. -nc [1]: http://git.alpinelinux.org/cgit/user/ncopa/nlplug/tree/nlsockd.c [2]: http://lists.busybox.net/pipermail/busybox/2015-July/083143.html [3]: http://lists.busybox.net/pipermail/busybox/2015-July/083142.html [4]: http://git.alpinelinux.org/cgit/user/ncopa/nlplug/tree/nlplug.c --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 3E6D4DC05F2 for ; Mon, 27 Jul 2015 23:17:22 +0000 (UTC) Received: from mail-ob0-f174.google.com (mail-ob0-f174.google.com [209.85.214.174]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 14B92DC039C for ; Mon, 27 Jul 2015 23:17:21 +0000 (UTC) Received: by obre1 with SMTP id e1so71536281obr.1 for ; Mon, 27 Jul 2015 16:17:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :cc:subject:references:in-reply-to:content-type :content-transfer-encoding; bh=HS151lQrBRSyqZFmpqwH3c1Qn+qjFEQwgBguuP/58S0=; b=UGvLxorRGVhUZsnKiaGz7eU0XtPFzpuwM7t+NjnaxiXdh9Mg0++Cc8jmdDeRZgEHrW Xm4l0G98OLwLrInzAJeEIu8+fW7zYDPVMwM5jc++D9dNhi3CZGYJvI6viUjfGNdQ8rTp GBSqZFQnAVoxlHQwM/TC3IDn6rdlTHo9wV+4aDGMF1w1427jculCDVRK9rJnxVP5FF8W KvnE7YBrZtJgdbr5IrFtpLRViU7Kwmnmmtufi06n4VEpIHJE9RhF1p1zpKSm+tZV1bm8 PKEYbTgEi0g9o14bVXMLohnZypp+ybMjRBEtr+EgP4L0IvYV1u5UQGceJxf/7LI5rAMh s3nA== X-Gm-Message-State: ALoCoQnluGmUe0bUmarAcGQD9EkuG9Wsjjk5+oQk/q6LlpcnZPrf6sR6LiRCM1lUB5+xldRec8L6 X-Received: by 10.182.236.66 with SMTP id us2mr29678717obc.5.1438039041135; Mon, 27 Jul 2015 16:17:21 -0700 (PDT) Received: from [10.0.2.15] (cpe-72-182-52-210.austin.res.rr.com. [72.182.52.210]) by smtp.googlemail.com with ESMTPSA id jp2sm11211569oeb.4.2015.07.27.16.17.19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jul 2015 16:17:20 -0700 (PDT) Message-ID: <55B67C49.9040007@landley.net> Date: Mon, 27 Jul 2015 13:45:29 -0500 From: Rob Landley User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 To: Natanael Copa CC: Alan Pillay , alpine-devel@lists.alpinelinux.org, judecn@gmail.com, sin@2f30.org, hiltjo@codemadness.org, frank@tuxrocks.com, dev@frign.de Subject: Re: [alpine-devel] Re: udev replacement on Alpine Linux References: <55B4554A.6020708@landley.net> <20150727120052.315fa82b@ncopa-desktop.alpinelinux.org> In-Reply-To: <20150727120052.315fa82b@ncopa-desktop.alpinelinux.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP On 07/27/2015 05:00 AM, Natanael Copa wrote: > On Sat, 25 Jul 2015 22:34:34 -0500 > Rob Landley wrote: > >> On 07/25/2015 07:11 PM, Alan Pillay wrote: >>> Dear Alpine Linux developers and mailing-list lurkers, >>> >>> udev is currently being used on Alpine version 3.2.2, but we all know >>> it detracts from the philosophy to keep things simple, small and >>> efficient. >>> There are many programs out there that could replace udev and help >>> Alpine get in a better shape. I will list some that I know. >>> >>> [mdev] there are 2 mdev implementations that I know, busybox's and >>> toybox's. On Alpine Linux, busybox already comes installed by default >>> (and its mdev comes with it, which is weird since it isn't currently >>> used, but I digress) >> >> I'm the primary developer of toybox and the original author of busybox >> mdev, but busybox's mdev has grown a lot of new features over the years >> that toybox doesn't implement yet. > > Busybox mdev has lots of feature/solutions for problems that I think > should not been there in first place. For example firmware loading, now > handled by kernel, device node creation could be handled by devtmpfs > (if we want be able to optionally use udev for Xorg we will need > devtmpfs anyway). I lean towards devtmpfs too, but you just listed requiring it as one of the _downsides_ of eudev if I recall... :) At some point I need to immerse myself in this for a month to design and implement The Right Thing. Unfortunately, it's about 12th on the list, especially since Android does its own thing we're unlikely to displace, and that's literally a billion seats. (Getting android, posix, lsb, and the prerequisites of a linux from scratch build right are the top priorities, everything else comes after that unless people submit patches and to be honest usually a couple follow up pokes.) > busybox mdev has also a solution for serialization of the > hotplug events, which I think is an ugly hack. Code could have been > simpler by just reading events from netlink. I have a patch to do that somewhere, actually. Not against it, just... that requires a persistent demon, which the historical mode doesn't. >> I'm happy to add them, but am mostly waiting for patches from the users >> telling me what they need. (My own embedded systems mostly just use >> devtmpfs, they don't tend to hotplug a lot of stuff.) > > So what I have been thinking: > > a netlink socket activator[1], which when there comes an event, fork > and execs a handler and passes over the the netlink socket. Wasn't requiring a new fork for each event was a performance bottleneck in what the sash guys wrote, last email? > The handler reads various events from netlink socket. It should be able > load kernel modules without forking, and ideally, it should be able to > handle each event without forking, including doing blkid lookups > without forking. Note: last time I benched this fork was 5% of the overhead and exec was 95% of the overhead. Is fork what you object to, or is process spawning what you object to? Toybox can fork() and internally run insmod or modprobe as a builtin command without re-execing itself (except on nommu). The reason for the fork() is to avoid needing to clean up after commands, especially from error paths that exit halfway through because it couldn't open a file or something. (There's a recursion limit, after 9 recursive invocations it invokes the exec() path anyway to keep the stack size down to a dull roar. But that doesn't come up much in practice.) I _can_ do nofork calls to various commands, but daemons are careful to avoid xfunction() library calls that exit on error, and most other commands aren't, and even if I audit what's there now relying on it to stay that way is a regression waiting to happen. (Maybe at some point I'll expand the nofork stuff, but it's a can of worms and I'm going for simplicity where possible.) > After one or two seconds without any netlink event it > will exit and the socket activator takes over again. By socket activator you basically mean an inetd variant? > There was a huge thread about netlink and mdev in busybox mailing list. I catch up on that _maybe_ yearly these days, lemme see... Google says: http://lists.busybox.net/pipermail/busybox/2015-March/082690.html > There were some strong opinions of making a more general read events > from any pipe, but I think that needlessly complicates things. > > I am also interested in loading modules without forking, so I was > thinking of making modprobe read modaliases from a stream. Doing so in > busybox would require a major refactoring so I instead looked at using > libkmod for that. But then libkmod only works with binary format of > modaliases so busybox depmod needed a fix[2][3] to generate a binary > format of the indexes so libkmod can read those. The whole of toybox insmod is 46 lines. (There's a 568 line modprobe in toys/pending but there's a _reason_ it's in pending.) > The current plan is to use nlsockd as socket activator, a netlink > reader[4] which will load kernel modules with libkmod and fork mdev - > but only on the relevant events - those who has DEVNAME set. No man page in ubuntu, and when I type the command it doesn't suggest a packakge to install... Ah, your email has a bibliography. >> If there's interest in my fleshing out toybox's mdev, I can bump it up >> the todo list, but I tend to be chronically overcommitted so need >> repeated poking... > > What I might be interested in is making toybox mdev read events from a > netlink socket (stdin or other filedescriptor), -n netlink file descriptor > add support for loading modaliases without forking. Again, fork, or exec? (If you really care that much I could probably move the modalias parsing stuff to lib, it's really #include from lib I try to avoid. Building on bsd you can switch off commands, but lib/*.c gets compiled unconditionally and then --gc-sections trimmed.) Rob --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 266A7DC1190; Mon, 27 Jul 2015 08:37:44 +0000 (UTC) Received: from ncopa-desktop.alpinelinux.org (unknown [79.160.13.133]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: n@tanael.org) by mail.alpinelinux.org (Postfix) with ESMTPSA id 7E2FCDC0C8C; Mon, 27 Jul 2015 08:37:41 +0000 (UTC) Date: Mon, 27 Jul 2015 10:37:37 +0200 From: Natanael Copa To: Alan Pillay Cc: alpine-devel@lists.alpinelinux.org, judecn@gmail.com, sin@2f30.org, hiltjo@codemadness.org, rob@landley.net, frank@tuxrocks.com, dev@frign.de Subject: Re: [alpine-devel] udev replacement on Alpine Linux Message-ID: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> In-Reply-To: References: X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.25; x86_64-alpine-linux-musl) X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP On Sat, 25 Jul 2015 21:11:41 -0300 Alan Pillay wrote: > Dear Alpine Linux developers and mailing-list lurkers, > > udev is currently being used on Alpine version 3.2.2, but we all know > it detracts from the philosophy to keep things simple, small and > efficient. udev is optional. Default Alpine Linux uses only mdev. > There are many programs out there that could replace udev and help > Alpine get in a better shape. I will list some that I know. > > [mdev] there are 2 mdev implementations that I know, busybox's and > toybox's. On Alpine Linux, busybox already comes installed by default > (and its mdev comes with it, which is weird since it isn't currently > used, but I digress) mdev is used and is fully supported. You may replace mdev with udev if you need Xorg hotplugging. This is not installed by default though. There was also a long discussion about adding netlink support to busybox mdev on busybox mailing list. There was some disagreement on how to do it. There was even some patches that made busybox mdev read events from stdin. > [smdev] smdev is an even simpler implementation of a device manager by > the well-known suckless developers. If it is mature enough, certainly > a high contender. I did look at smdev. One of the big benefits with smdev is the mdev.conf compatibility. I don't want support 3-4 different config formats (udev rules, mdev.conf etc). smdev requires fork/exec for every single event which has a performance issue. I believe that you can solve the performance issue too, with just a little more effort. > [eudev] a fork of udev from the gentoo developers. Doesn't appear to > be as small as others, but should be more easily integrated into > alpine. Alpine Linux switched the udev support to eudev a couple of weeks ago and rebuilt everything that linked to libudev. Benefit with eudev is that it is "mainstream" nowdays. upstream softwoare project often supports only udev. Downside is that code comes from systemd and suffers from many of the same management issues that upstream systemd. (eg. no support for separate /usr partition, network interface renaming policies etc, require devtmpfs etc) To use eudev efficiently we would have to follow whatever systemd does on many things. I am not comfortable with that thought. I would like to get rid of eudev/udev, but at the same time, I want support for hotplugging in Xorg. I want plug in a moue and keyboard and I want it to just work, without needing changing xorg.xonf and restart xorg. Today you need (e)udev for that. > [vdev] a device manager with an approach a bit different, offers an > optional filesystem interface that implements a per-process view of > /dev. Possibly the least simple alternative, but interesting > nonetheless. I will have to look at vdev. The udev compat might be of interest. > I thought about using this means of communication so developers ca > discuss this matter that impacts the use of the Alpine Linux > distribution as a whole. > I am also emailing relevant parties (developers of the cited device > managers, so they can participate if they desire). > Thanks for the attention. Thanks for raising the topic and for bringing in the people who likely sit with the answers. I think it would be great if we together could come up with something. I should present my thoughts/ideas on the subject in a separate email. Thanks! > > KISS > --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 01B07DC0739 for ; Mon, 27 Jul 2015 09:43:58 +0000 (UTC) Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com [209.85.212.175]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 9E8F3DC05F2 for ; Mon, 27 Jul 2015 09:43:56 +0000 (UTC) Received: by wibxm9 with SMTP id xm9so104103990wib.0 for ; Mon, 27 Jul 2015 02:43:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; bh=AOsEzDUS9sZsrGFaVRlRpqDyXApVKMOq1OS7oExp/n4=; b=H1diCD3Yz+qzm+w8K0t1s+CROd3UGGuGQDsVo3J6xzF/1Tma6RjycWzVcH5rpjXqjF SjhtMk/9g6T2HOmGi7zBEvKZf6IF1lbuCWX8/lEfVeMm9lwTxYWI9ABKem7eSGA9U4gu Fya4EDpdbsDcRN/TuxjeZ41TIMfjXXlwZHnlGLAavHbkvxpTVPwcBpF/RWNCQe8S5tYw YcFG36shhyTJdsNdZOexuz64K+4YwavK0HpeBmSBJXvWBU5QE06chbpGYrHxM5E0hNUL V1atQr0s9egqYFZxmrsr3WVNALzXdDGsfwB7rktdJj7NgMAmzudPKrelAA3YZXDIVBjZ gtMA== X-Received: by 10.194.95.71 with SMTP id di7mr54583547wjb.125.1437990234930; Mon, 27 Jul 2015 02:43:54 -0700 (PDT) Received: from expedite.oesys.co (mail.oesys.co. [82.71.11.172]) by smtp.gmail.com with ESMTPSA id jz4sm26921634wjb.16.2015.07.27.02.43.53 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jul 2015 02:43:54 -0700 (PDT) Date: Mon, 27 Jul 2015 11:49:06 +0100 From: Kevin Chadwick To: alpine-devel@lists.alpinelinux.org Subject: Re: [alpine-devel] udev replacement on Alpine Linux Message-ID: <20150727114906.0db6555e@expedite.oesys.co> In-Reply-To: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP > smdev requires fork/exec for every single event which has a performance > issue. I believe that you can solve the performance issue too, with > just a little more effort. I don't know the details of this or the other options but wonder from reading a recent OpenBSD paper if although a performance hit this lends itself to better process location randomisation to help fight rop attacks with fork being a copy and exec then being a new random layout? -- KISSIS - Keep It Simple So It's Securable --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id A99A1DC69F7 for ; Tue, 28 Jul 2015 00:55:00 +0000 (UTC) Received: from mail-pd0-f175.google.com (mail-pd0-f175.google.com [209.85.192.175]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 66933DC01CF for ; Tue, 28 Jul 2015 00:55:00 +0000 (UTC) Received: by pdjr16 with SMTP id r16so61630757pdj.3 for ; Mon, 27 Jul 2015 17:54:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=R/UAPTiAerWjap6Vbjts0h0bXPmWfiKpVErUHUNKWf0=; b=Kzs5WdVDRm0HAVJtx6VWeHVh1XNqWhw/LvCpFLPdIx4QB41254UCw7bp3ilDdVquf2 4z+vAG7NU1iMls/EF7v+dYfN4rdLgfMp9Dy7PhPyqfvqFZn1vsvNzUOg1FKs44zpcwkM CGK+kF2oTA7tSFN4NHHLFFCqzw8XCxXBRyaXebDDpWiWcT3YBPV4L4adf53hCShfxnxc ir4WT3lxNSoG1LzJUdIjS9wDXnoYzuIEhi57a5GQjrjb6fRvrvSog+5FrIfhdQWXmnVB V/oZppd0xT5iUWcEJiDU9x1OdH7cLU3ggWE0SIS/nAdqC5yeQJbLLikKIJXMZsI8+yxE Rb0A== X-Received: by 10.70.134.226 with SMTP id pn2mr44016841pdb.53.1438044899134; Mon, 27 Jul 2015 17:54:59 -0700 (PDT) Received: from newbook ([50.0.227.100]) by smtp.gmail.com with ESMTPSA id fl6sm31804233pab.12.2015.07.27.17.54.57 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jul 2015 17:54:58 -0700 (PDT) Date: Mon, 27 Jul 2015 17:55:20 -0700 From: Isaac Dunham To: Natanael Copa Cc: Alan Pillay , alpine-devel@lists.alpinelinux.org, judecn@gmail.com, sin@2f30.org, hiltjo@codemadness.org, rob@landley.net, frank@tuxrocks.com, dev@frign.de Subject: Re: [alpine-devel] udev replacement on Alpine Linux Message-ID: <20150728005519.GB1923@newbook> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Jul 27, 2015 at 10:37:37AM +0200, Natanael Copa wrote: > I would like to get rid of eudev/udev, but at the same time, I want > support for hotplugging in Xorg. I want plug in a moue and keyboard and > I want it to just work, without needing changing xorg.xonf and restart > xorg. Today you need (e)udev for that. Quibbling: If you drop some config files in xorg.conf.d (from "mdev-like-a-boss"), you do not need to edit anything...although it will still be necessary to restart X: https://github.com/slashbeast/mdev-like-a-boss/tree/master/xorg.conf.d Unfortunately, the author of that package has not specified a license in that repository, although elsewhere he describes it as BSD. HTH, Isaac Dunham --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id ABE58DC5A94 for ; Tue, 28 Jul 2015 05:24:17 +0000 (UTC) Received: from mail-pa0-f52.google.com (mail-pa0-f52.google.com [209.85.220.52]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 696A9DC0AF5 for ; Tue, 28 Jul 2015 05:24:17 +0000 (UTC) Received: by pabkd10 with SMTP id kd10so63735056pab.2 for ; Mon, 27 Jul 2015 22:24:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=GBQEbvjVnwL+ry85vWf2V/yDuN9hNSffqYCphzVW6/U=; b=d3rPXxyFRm3DI+zTftujwlCiFzQcYsMlPgCoa0uBxaZaWzYh9z5yS/z/Ty/douogsw G4He9C+/FhmG4/uRUTxCNEf3/Ai+Pkzf+2f6i7Wd/8xlEdNs0ohj7+biQhDtv1Sa/1/X XwV24KTTyihfG6lEQNiD7OY2KNFquKwY3IUZB7QLgc9BaB/b1iVRT+U3qIqQJJ9h9I67 zLkYdKPEetVKjZYrGARlWXc5N0Zkybll+fFleNWRkylhOQGy7M6nJwYiRpp9R/sJRndl LZu4+d/ou5qmaHNbtCipbGVzsrHzLAw6rMdrDVj25WCEtQMUcxULg9cIBfLUubEN0DA3 o6Cg== X-Received: by 10.66.236.167 with SMTP id uv7mr74993882pac.134.1438061056343; Mon, 27 Jul 2015 22:24:16 -0700 (PDT) Received: from newbook ([50.0.227.100]) by smtp.gmail.com with ESMTPSA id cz1sm32578567pdb.44.2015.07.27.22.24.15 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Jul 2015 22:24:15 -0700 (PDT) Date: Mon, 27 Jul 2015 22:24:37 -0700 From: Isaac Dunham To: Natanael Copa Cc: alpine-devel@lists.alpinelinux.org, sin@2f30.org, rob@landley.net, dev@frign.de Subject: Re: [alpine-devel] udev replacement on Alpine Linux Message-ID: <20150728052436.GC1923@newbook> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Virus-Scanned: ClamAV using ClamSMTP [pruning CC: to those who are more likely to care about mdev features and command line] On Mon, Jul 27, 2015 at 10:37:37AM +0200, Natanael Copa wrote: > There was also a long discussion about adding netlink support to > busybox mdev on busybox mailing list. There was some disagreement on > how to do it. > > There was even some patches that made busybox mdev read events from > stdin. A few comments regarding those patches... It is much simpler to debug something like "mdev -i" than a netlink reader spawned by a "netlink inetd", since there's less indirection in using strace and you can trivially create, log, and replay events. Do not underestimate how useful the ability to play back a series of hotplug events is. While I did prepare a patch for mdev -i (read events from stdin) based on your work, other modifications were made to mdev. Meanwhile, the agreement that there had been about reading events from stdin disappeared, Denys assumed that it was entirely about serialization and reimplemented nldev, and a simple rework of the patch to match the new code didn't work, so I never got the patch updated. (I figured that with a maintainer who didn't understand the feature request, it would take a bit more support to get it in.) If mdev -i is desired rather than a netlink reader, I think I could update the patch; but if netlink support in mdev itself is desired, I don't want to set up the environment that would be needed to test it. Thanks, Isaac Dunham --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 5A364DC5F2F for ; Tue, 12 Jan 2016 12:51:27 +0000 (UTC) Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id DB488DC037D for ; Tue, 12 Jan 2016 12:51:26 +0000 (UTC) Received: by mail-wm0-f44.google.com with SMTP id b14so318447284wmb.1 for ; Tue, 12 Jan 2016 04:51:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=qQmfzZj78B2NAyMEQSZaMzKajU9n0UvabHv6nfPWoUc=; b=J9DAv+Wj44xNZWlGoWD3SiwPMoLPeY3ZzCXLbKCVSTlciY2mG7rDeqSgCGBSSk/wYY QFltPD1246FQixXJroulVUP+OUBFnj7GRRG0fmwjdLyovSf5b3hckRQyeUlgJu6R/1t/ N2ndDew9XFIdQAZN6ub6gyu3PnqBpHobrb0VXA1IcFvJ//er4OLYdsdHVcVmS+34hP4Z 1WlyIk5Y2TZ0ciXDJDrLGQEzB/YIF1io/3xoEP2WqehrGBnx2veIxoVVfsDdJVI/hsPz l+ou0PoiNCVeDd6jTOJ9R33SI2+cSyWLOgVyBQ9z7WLPfWGlkhOm5dBQPVQ8vmJkLpfn xIHQ== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.28.150.72 with SMTP id y69mr18563776wmd.17.1452603085388; Tue, 12 Jan 2016 04:51:25 -0800 (PST) Received: by 10.27.90.207 with HTTP; Tue, 12 Jan 2016 04:51:25 -0800 (PST) In-Reply-To: <20150728052436.GC1923@newbook> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> Date: Tue, 12 Jan 2016 10:51:25 -0200 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Alan Pillay To: Isaac Dunham Cc: Natanael Copa , AlpineLinux ML , sin@2f30.org, FRIGN , Christoph Lohmann <20h@r-36.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Scanned: ClamAV using ClamSMTP It has been half a month since we had this nice conversation. First of all, I wish everyone a happy new year! Due to other priorities, the udev replacement has been postponed, but now S=C3=B6ren Tempel came up with this idea again. He also proposes smdev to be shipped with next major Alpine Linux version - 3.4.0 - which is expected to be released in about 4 months. For this reason, I think it is good to reconsider this subject and restart this conversation. Natanael, do you believe smdev is mature enough to be used implemented on Alpine Linux as the default device manager? What about the accompanying software, nldev and nlmon? If something is missing, what is it exactly? Would the developers of smdev, nldev and nlmon be willing to help with the transaction from udev to their lightweight alternatives? KISS On Tue, Jul 28, 2015 at 2:24 AM, Isaac Dunham wrote: > > [pruning CC: to those who are more likely to care about mdev features > and command line] > > On Mon, Jul 27, 2015 at 10:37:37AM +0200, Natanael Copa wrote: >> There was also a long discussion about adding netlink support to >> busybox mdev on busybox mailing list. There was some disagreement on >> how to do it. >> >> There was even some patches that made busybox mdev read events from >> stdin. > > A few comments regarding those patches... > > It is much simpler to debug something like "mdev -i" than a netlink reade= r > spawned by a "netlink inetd", since there's less indirection in using str= ace > and you can trivially create, log, and replay events. Do not underestimat= e > how useful the ability to play back a series of hotplug events is. > > While I did prepare a patch for mdev -i (read events from stdin) based > on your work, other modifications were made to mdev. > Meanwhile, the agreement that there had been about reading events from > stdin disappeared, Denys assumed that it was entirely about serialization > and reimplemented nldev, and a simple rework of the patch to match the > new code didn't work, so I never got the patch updated. (I figured that > with a maintainer who didn't understand the feature request, it would tak= e > a bit more support to get it in.) > > If mdev -i is desired rather than a netlink reader, I think I could updat= e > the patch; but if netlink support in mdev itself is desired, I don't want > to set up the environment that would be needed to test it. > > Thanks, > Isaac Dunham > > > --- > Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org > Help: alpine-devel+help@lists.alpinelinux.org > --- > --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id E773BDC037D for ; Tue, 12 Jan 2016 15:38:29 +0000 (UTC) Received: from outgoing.fripost.org (giraff.fripost.org [178.16.208.44]) (using TLSv1 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id A302ADC0139 for ; Tue, 12 Jan 2016 15:38:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by outgoing.fripost.org (Postfix) with ESMTP id 184C5377E82 for ; Tue, 12 Jan 2016 16:38:28 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fripost.org; h= in-reply-to:content-transfer-encoding:content-disposition :content-type:content-type:mime-version:references:message-id :subject:subject:from:from:date:date; s=20140703; t=1452613107; x=1454427508; bh=nUNnE06ep4+HyPUJuq+eHv+uMbBxskL0K0WsXLcgKw4=; b= N2wIYxycfE9m6keZDzWj+e18YeuibtxvKXZfC/XE2iG8MWhH1Vn/FdPxKxthnpY3 ntaVYrGceOSMEdXiKY9rxlKL4npcE9t/2QuRhMpokvNQw3mUnKM43IYDgsLsdYV0 zXJD8Eszge/eF2jwhTIH5p8PrB5ZfltNcU2Ei7Kw3fo= X-Virus-Scanned: Debian amavisd-new at fripost.org Received: from outgoing.fripost.org ([127.0.0.1]) by localhost (giraff.fripost.org [127.0.0.1]) (amavisd-new, port 10040) with LMTP id wX45SwE6PbkW for ; Tue, 12 Jan 2016 16:38:27 +0100 (CET) Received: from smtp.fripost.org (mistral.fripost.org [178.16.208.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "mistral.fripost.org", Issuer "mistral.fripost.org" (not verified)) by outgoing.fripost.org (Postfix) with ESMTPS id 99544377E7F for ; Tue, 12 Jan 2016 16:38:27 +0100 (CET) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) by smtp.fripost.org (Postfix) with ESMTPSA id 1356AA2DDC7 for ; Tue, 12 Jan 2016 16:38:20 +0100 (CET) Received: (qmail 28930 invoked from network); 12 Jan 2016 15:34:41 -0000 Received: from unknown (HELO aetey.se) (eh1ba719@127.0.0.1) by mail with ESMTPA; 12 Jan 2016 15:34:41 -0000 Date: Tue, 12 Jan 2016 16:38:04 +0100 From: u-ztsd@aetey.se To: Alan Pillay Cc: AlpineLinux ML Subject: Re: [alpine-devel] udev replacement on Alpine Linux Message-ID: <20160112153804.GI32545@example.net> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: X-Virus-Scanned: ClamAV using ClamSMTP On Tue, Jan 12, 2016 at 10:51:25AM -0200, Alan Pillay wrote: > Due to other priorities, the udev replacement has been postponed, but > now S=F6ren Tempel came up with this idea again. He also proposes smdev > to be shipped with next major Alpine Linux version - 3.4.0 - which is > expected to be released in about 4 months. > For this reason, I think it is good to reconsider this subject and > restart this conversation. > Natanael, do you believe smdev is mature enough to be used implemented > on Alpine Linux as the default device manager? What about the > accompanying software, nldev and nlmon? If something is missing, what > is it exactly? Would the developers of smdev, nldev and nlmon be > willing to help with the transaction from udev to their lightweight > alternatives? I would love to get rid of udev but isn't libudev the harder part? Would you summarize in which ways smdev is better than mdev, I do not find the corresponding documentation (studying the source is expensive, even for such a compact program). The smdev license is more free, what are the other differences? The README says "mostly compatible with mdev but doesn't have all of its features" which looks like "almost as good as mdev", then why replace mdev? Rune --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 0D46CDC037D for ; Tue, 12 Jan 2016 17:41:18 +0000 (UTC) Received: from smtp2.tech.numericable.fr (smtp2.tech.numericable.fr [82.216.111.38]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id BC8DBDC0139 for ; Tue, 12 Jan 2016 17:41:17 +0000 (UTC) Received: from sinay.internal.skarnet.org (ip-62.net-82-216-6.versailles2.rev.numericable.fr [82.216.6.62]) by smtp2.tech.numericable.fr (Postfix) with SMTP id 742A96348F for ; Tue, 12 Jan 2016 18:41:15 +0100 (CET) Received: (qmail 21842 invoked from network); 12 Jan 2016 17:41:40 -0000 Received: from elzian.internal.skarnet.org. (HELO ?192.168.0.2?) (192.168.0.2) by sinay.internal.skarnet.org. with SMTP; 12 Jan 2016 17:41:40 -0000 Subject: Re: [alpine-devel] udev replacement on Alpine Linux To: alpine-devel@lists.alpinelinux.org References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> From: Laurent Bercot Message-ID: <56953ABE.5090203@skarnet.org> Date: Tue, 12 Jan 2016 18:41:18 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 In-Reply-To: <20160112153804.GI32545@example.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 50 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekiedrkeehgddvvdcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgnecuuegrihhlohhuthemuceftddtnecuogetfeejfedqtdegucdlhedtmdenucfjughrpefuvfhfhffkffgfgggjtgfgsehtjegrtddtfeejnecuhfhrohhmpefnrghurhgvnhhtuceuvghrtghothcuoehskhgrqdguvghvvghlsehskhgrrhhnvghtrdhorhhgqeenucffohhmrghinhepghhithhhuhgsrdgtohhmnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht X-Virus-Scanned: ClamAV using ClamSMTP On 12/01/2016 16:38, u-ztsd@aetey.se wrote: > I would love to get rid of udev but isn't libudev the harder part? Yes, libudev is definitely the harder part. Handling hotplug events via netlink is easy, and has been done several times over already; but libudev introduces policy in software, and most of the work is providing a compatible interface. I have my eyes set on libudev-compat from vdev: https://github.com/jcnelson/vdev but I don't know how much of a drop-in it is, or how production- ready it is. I'll be asking people around who have experience with it. -- Laurent --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id E8BCBDC0170 for ; Tue, 12 Jan 2016 20:06:34 +0000 (UTC) Received: from mail-ob0-f176.google.com (mail-ob0-f176.google.com [209.85.214.176]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id BFAC6DC0169 for ; Tue, 12 Jan 2016 20:06:34 +0000 (UTC) Received: by mail-ob0-f176.google.com with SMTP id vt7so36651690obb.1 for ; Tue, 12 Jan 2016 12:06:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=WHqg0iBbaLfQXW8P+lNpyVt39gXzmpyFbaPdYsN+JIA=; b=sbG03ULzgAlaiX2tw6IcE7k0dRYLy4jVfyCDaRrdLlPnp/PbaPzHQJK8tMoy/DR7MR /azQnH3QFjtXHvxj0P7gJBIV6PkfBXsrGYenJjyGgIni654DzEdTtA0ZjM1KJTv5xdX2 52vCm+adXtaMmvtZ9VS88+dRoOJVcAGkInf4jxJW7MRia5CBvcVfivIpX3CvGTb1VCdr 9mgUcJ52uZhrcc+PPHm8byRucKvUWRYQbu+sUISD6sfFj/UZzt1YKKG4UyO1zQlRcoTc gD9SVShQlVGTgFz6Gb5NdvSnVR1UK4JvmWzy2xHHCCxQq6mmMW2gRCFNsdOlBxzY130N f6/w== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.60.159.7 with SMTP id wy7mr103947060oeb.71.1452629193225; Tue, 12 Jan 2016 12:06:33 -0800 (PST) Received: by 10.202.81.80 with HTTP; Tue, 12 Jan 2016 12:06:33 -0800 (PST) In-Reply-To: <56953ABE.5090203@skarnet.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> Date: Tue, 12 Jan 2016 15:06:33 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=047d7bd761b60d7d900529289749 X-Virus-Scanned: ClamAV using ClamSMTP --047d7bd761b60d7d900529289749 Content-Type: text/plain; charset=UTF-8 On Tue, Jan 12, 2016 at 12:41 PM, Laurent Bercot wrote: > On 12/01/2016 16:38, u-ztsd@aetey.se wrote: > >> I would love to get rid of udev but isn't libudev the harder part? >> > > Yes, libudev is definitely the harder part. Handling hotplug > events via netlink is easy, and has been done several times over > already; but libudev introduces policy in software, and most of > the work is providing a compatible interface. > > I have my eyes set on libudev-compat from vdev: > https://github.com/jcnelson/vdev > > but I don't know how much of a drop-in it is, or how production- > ready it is. I'll be asking people around who have experience with it. I've been using vdev and libudev-compat it on my production machine for several months. I use it with heavily with Chromium (YouTube and Google Hangouts work) and udev-enabled Xorg (hotplugged input devices work as expected). My encrypted swap partition's device-mapped nodes and directories show up where they should, and my Android development tools work with my Android phone when I plug it in. I wouldn't say it's ready for prime time just yet, though. In particular, because libudev-compat uses (dev)tmpfs to record and distribute event messages as regular files (under /dev/metadata/udev/events), a program can leak files and directories simply by exiting without shutting down libudev (i.e. failing freeing up the struct udev_device). My plan is to have libudev-compat store its events to a special-purpose FUSE filesystem called eventfs [1] that automatically removes orphaned files and denies all future access to them. Eventfs works in my tests, but I have yet to move over to using it in production. Instead, I've been running a script every now and then that clears out orphaned directories in /dev/metadata/udev/events. -Jude [1] https://github.com/jcnelson/eventfs > -- > Laurent > > > > > --- > Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org > Help: alpine-devel+help@lists.alpinelinux.org > --- > > --047d7bd761b60d7d900529289749 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Tue, Jan 12, 2016 at 12:41 PM, Laurent Bercot <= ska-devel@skarne= t.org> wrote:
On 12/01/2016 16:38, = u-ztsd@aetey.se wr= ote:
I would love to get rid of udev but isn't libudev the harder part?

=C2=A0Yes, libudev is definitely the harder part. Handling hotplug
events via netlink is easy, and has been done several times over
already; but libudev introduces policy in software, and most of
the work is providing a compatible interface.

=C2=A0I have my eyes set on libudev-compat from vdev:
=C2=A0https://github.com/jcnelson/vdev

=C2=A0but I don't know how much of a drop-in it is, or how production-<= br> ready it is. I'll be asking people around who have experience with it.<= /blockquote>

I've been using vdev and libudev-compat= it on my production machine for several months.=C2=A0 I use it with heavil= y with Chromium (YouTube and Google Hangouts work) and udev-enabled Xorg (h= otplugged input devices work as expected).=C2=A0 My encrypted swap partitio= n's device-mapped nodes and directories show up where they should, and = my Android development tools work with my Android phone when I plug it in.<= /div>

I wouldn't say it's ready for prime time j= ust yet, though.=C2=A0 In particular, because libudev-compat uses (dev)tmpf= s to record and distribute event messages as regular files (under /dev/meta= data/udev/events), a program can leak files and directories simply by exiti= ng without shutting down libudev (i.e. failing freeing up the struct udev_d= evice).=C2=A0 My plan is to have libudev-compat store its events to a speci= al-purpose FUSE filesystem called eventfs [1] that automatically removes or= phaned files and denies all future access to them.=C2=A0 Eventfs works in m= y tests, but I have yet to move over to using it in production.=C2=A0 Inste= ad, I've been running a script every now and then that clears out orpha= ned directories in /dev/metadata/udev/events.

-Jud= e



--
=C2=A0Laurent




---
Unsubscribe:=C2=A0 alpine-devel+unsubscribe@lists.alpinelinux.or= g
Help:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alpine-devel+help@lists.alpineli= nux.org
---


--047d7bd761b60d7d900529289749-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 2A047DC0236 for ; Tue, 12 Jan 2016 23:37:06 +0000 (UTC) Received: from smtp1.tech.numericable.fr (smtp1.tech.numericable.fr [82.216.111.37]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id BF062DC0169 for ; Tue, 12 Jan 2016 23:37:05 +0000 (UTC) Received: from sinay.internal.skarnet.org (ip-62.net-82-216-6.versailles2.rev.numericable.fr [82.216.6.62]) by smtp1.tech.numericable.fr (Postfix) with SMTP id D3EF0143090 for ; Wed, 13 Jan 2016 00:37:03 +0100 (CET) Received: (qmail 21867 invoked from network); 12 Jan 2016 23:37:29 -0000 Received: from elzian.internal.skarnet.org. (HELO ?192.168.0.2?) (192.168.0.2) by sinay.internal.skarnet.org. with SMTP; 12 Jan 2016 23:37:29 -0000 Subject: Re: [alpine-devel] udev replacement on Alpine Linux To: alpine-devel@lists.alpinelinux.org References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> From: Laurent Bercot Message-ID: <56958E22.90806@skarnet.org> Date: Wed, 13 Jan 2016 00:37:06 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekiedrkeeigddujecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgnecuuegrihhlohhuthemuceftddtnecunecujfgurhepuffvfhfhkffffgggjggtgfesthejrgdttdefjeenucfhrhhomhepnfgruhhrvghnthcuuegvrhgtohhtuceoshhkrgdquggvvhgvlhesshhkrghrnhgvthdrohhrgheqnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht X-Virus-Scanned: ClamAV using ClamSMTP On 12/01/2016 21:06, Jude Nelson wrote: > I've been using vdev and libudev-compat it on my production machine > for several months. Sure, but since you're the author, it's certainly easier for you than for other people. ;) > I use it with heavily with Chromium (YouTube and > Google Hangouts work) and udev-enabled Xorg (hotplugged input devices > work as expected). My encrypted swap partition's device-mapped nodes > and directories show up where they should, and my Android development > tools work with my Android phone when I plug it in. That's neat, and very promising. I doubt you're the right person to ask, but do you have any experience running libudev-compat with a different hotplug manager than vdev ? I'd like to stick with (s)mdev as long as I can make it work. > I wouldn't say it's ready for prime time just yet, though. In > particular, because libudev-compat uses (dev)tmpfs to record and > distribute event messages as regular files (under > /dev/metadata/udev/events), a program can leak files and directories > simply by exiting without shutting down libudev (i.e. failing freeing > up the struct udev_device). That may be OOT, but I'm interested in hearing the rationale for that choice. An event is ephemeral, a file is (relatively) permanent; recording events as regular files does not sound like a good match, unless you have a reference counting process/thread somewhere that cleans up an event as soon as it's consumed. Anyway, unless I'm misunderstanding the architecture completely, it sounds like leaks could be prevented by wrapping programs you're not sure of. > My plan is to have libudev-compat store > its events to a special-purpose FUSE filesystem called eventfs [1] > that automatically removes orphaned files and denies all future > access to them. Unfortunately, FUSE is a deal breaker for the project I'm working on. I'm under the impression that you're slightly overengineering this; you shouldn't need a specific filesystem to distribute events. My s6-ftrig-* set of tools distribute events to arbitrary subscribers without needing anything specific - the mechanism is just directories and named pipes. But I don't know the details of libudev, so I may be missing something, and I'm really interested in learning more. > Instead, I've been running a script > every now and then that clears out orphaned directories in > /dev/metadata/udev/events. A polling cleaner script works if you have no sensitive data. A better design, though, is a notification-based cleaner, that is triggered as soon as a reference expires. And I'm almost certain you don't need eventfs for this :) -- Laurent --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id D54A9DC0D48 for ; Wed, 13 Jan 2016 03:47:46 +0000 (UTC) Received: from mail-ob0-f172.google.com (mail-ob0-f172.google.com [209.85.214.172]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id AAA72DC050F for ; Wed, 13 Jan 2016 03:47:46 +0000 (UTC) Received: by mail-ob0-f172.google.com with SMTP id ba1so454570004obb.3 for ; Tue, 12 Jan 2016 19:47:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=2lEiX1lePyeNlj6qWzHBstKImfxyGvXqRJaM+nTCkIQ=; b=sSQ4WdASn5oK4L5fU+YzrZiAVcx9aAdQbhGrRBPqvb5V6AYeM5H8LJVxpPniuuS8ub COcG41XUXvX3BqdcikDIBfKj2v/bXXRPOqVYHr+sKfHD4l/Olpi8S5nCNVEDQD1U2eDf VPkHKjncyGMseE/lFoNdziHw8k9g95QqUAFUAEp62EtXKAWbhQj5x8crjxoZ1ljkuZVk M5uKFO3yFXx8biQRmC2m8rXX8uuNlSxIFtdcENGpoXITFHHHkvz50YE4kXX7/7MFXgQM SXY5nRWkiIEEtR4Q7KrRoKvPYsgdujoC3yv3lP73WMc84GgXXV1Eg7xsYhEv/oCWiRg/ +oMQ== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.60.134.202 with SMTP id pm10mr50791880oeb.50.1452656865516; Tue, 12 Jan 2016 19:47:45 -0800 (PST) Received: by 10.202.81.80 with HTTP; Tue, 12 Jan 2016 19:47:45 -0800 (PST) In-Reply-To: <56958E22.90806@skarnet.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> Date: Tue, 12 Jan 2016 22:47:45 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=047d7b417a6373259605292f0892 X-Virus-Scanned: ClamAV using ClamSMTP --047d7b417a6373259605292f0892 Content-Type: text/plain; charset=UTF-8 Hi Laurent, thank you as always for your input. On Tue, Jan 12, 2016 at 6:37 PM, Laurent Bercot wrote: > On 12/01/2016 21:06, Jude Nelson wrote: > >> I've been using vdev and libudev-compat it on my production machine >> for several months. >> > > Sure, but since you're the author, it's certainly easier for you > than for other people. ;) > Agreed; I was just pointing out that the system has been seeing some real-world use :) > > I use it with heavily with Chromium (YouTube and >> Google Hangouts work) and udev-enabled Xorg (hotplugged input devices >> work as expected). My encrypted swap partition's device-mapped nodes >> and directories show up where they should, and my Android development >> tools work with my Android phone when I plug it in. >> > > That's neat, and very promising. > I doubt you're the right person to ask, but do you have any > experience running libudev-compat with a different hotplug > manager than vdev ? I'd like to stick with (s)mdev as long as > I can make it work. > I haven't tried this myself, but it should be doable. Vdev's event-propagation mechanism is a small program that constructs a uevent string from environment variables passed to it by vdev and writes the string to the appropriate place. The vdev daemon isn't aware of its existence; it simply executes it like it would for any another matching device-event action. Another device manager could supply the same program with the right environment variables and use it for the same purposes. > > I wouldn't say it's ready for prime time just yet, though. In >> particular, because libudev-compat uses (dev)tmpfs to record and >> distribute event messages as regular files (under >> /dev/metadata/udev/events), a program can leak files and directories >> simply by exiting without shutting down libudev (i.e. failing freeing >> up the struct udev_device). >> > > That may be OOT, but I'm interested in hearing the rationale for > that choice. An event is ephemeral, a file is (relatively) permanent; > recording events as regular files does not sound like a good match, > unless you have a reference counting process/thread somewhere that > cleans up an event as soon as it's consumed. > Tmpfs and devtmps are designed for holding ephemeral state already, so I'm not sure why the fact that they expose data as regular files is a concern? I went with a file-oriented model specifically because it made reference-counting simple and easy--specifically, by using hard-links. The aforementioned event-propagation tool writes the uevent into a scratch area under /dev, hard-links it into each libudev-compat monitor directory under /dev/metadata/udev, and unlinks the file (there is a directory in /dev/metadata/udev for each struct udev_monitor created by each libudev-compat program). When the libudev-compat client wakes up next, it consumes any new event-files (in delivery order) and unlinks them, thereby ensuring that once each libudev-compat client "receives" the event, the event's resources are fully reclaimed. > Anyway, unless I'm misunderstanding the architecture completely, > it sounds like leaks could be prevented by wrapping programs you're > not sure of. > I couldn't think of a simpler way that was also as robust. Unless I'm misunderstanding something, wrapping an arbitrary program to clean up the files it created would, in the extreme, require coming up with a way to do so on SIGKILL. I'd love to know if there is a simple way to do this, though. > > My plan is to have libudev-compat store >> its events to a special-purpose FUSE filesystem called eventfs [1] >> that automatically removes orphaned files and denies all future >> access to them. >> > > Unfortunately, FUSE is a deal breaker for the project I'm working on. > > I'm under the impression that you're slightly overengineering this; > you shouldn't need a specific filesystem to distribute events. My > s6-ftrig-* set of tools distribute events to arbitrary subscribers > without needing anything specific - the mechanism is just directories > and named pipes. > But I don't know the details of libudev, so I may be missing > something, and I'm really interested in learning more. > I went with a specialized filesystem for two reasons; both of which were to fulfill libudev's API contract: * Efficient, reliable event multicasting. By using hard-links as described above, the event only needs to be written out once, and the OS only needs to store one copy. * Automatic multicast channel cleanup. Eventfs would ensure that no matter how a process dies, its multicast state would be come inaccessible and be reclaimed once it is dead (i.e. a subsequent filesystem operation on the orphaned state, no matter how soon after the process's exit, will fail). Both of the above are implicitly guaranteed by libudev, since it relies on a netlink multicast group shared with the udevd process to achieve them. It is my understanding (please correct me if I'm wrong) that with s6-ftrig-*, I would need to write out the event data to each listener's pipe (i.e. once per struct udev_monitor instance), and I would still be responsible for cleaning up the fifodir every now and then if the libudev-compat client failed to do so itself. Is my understanding correct? Again, I would love to know of a simpler approach that is just as robust. > > Instead, I've been running a script >> every now and then that clears out orphaned directories in >> /dev/metadata/udev/events. >> > > A polling cleaner script works if you have no sensitive data. > A better design, though, is a notification-based cleaner, that > is triggered as soon as a reference expires. And I'm almost > certain you don't need eventfs for this :) > > I agree that a notification-based cleaner could be just as effective, but I wonder whether or not the machinery necessary to track all libudev-compat processes in a reliable and efficient manner would be simpler than eventfs? Would love to know what you had in mind :) Thanks for your feedback, Jude > > -- > Laurent > > > --- > Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org > Help: alpine-devel+help@lists.alpinelinux.org > --- > > --047d7b417a6373259605292f0892 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Laurent, thank you as always for your input.

On Tue, Jan 12, 2016 at = 6:37 PM, Laurent Bercot <ska-devel@skarnet.org> wrote:
On 12/= 01/2016 21:06, Jude Nelson wrote:
I've been using vdev and libudev-compat it on my production machine
for several months.

=C2=A0Sure, but since you're the author, it's certainly easier for = you
than for other people. ;)

=
Agreed; I was just pointing out that the system has been seeing = some real-world use :)



=C2=A0I use it with heavily with Chromium (YouTube and
Google Hangouts work) and udev-enabled Xorg (hotplugged input devices
work as expected).=C2=A0 My encrypted swap partition's device-mapped no= des
and directories show up where they should, and my Android development
tools work with my Android phone when I plug it in.

=C2=A0That's neat, and very promising.
=C2=A0I doubt you're the right person to ask, but do you have any
experience running libudev-compat with a different hotplug
manager than vdev ? I'd like to stick with (s)mdev as long as
I can make it work.

=
I haven't tried this myself, but it should be doable.=C2=A0 Vdev&#= 39;s event-propagation mechanism is a small program that constructs a ueven= t string from environment variables passed to it by vdev and writes the str= ing to the appropriate place.=C2=A0 The vdev daemon isn't aware of its = existence; it simply executes it like it would for any another matching dev= ice-event action.=C2=A0 Another device manager could supply the same progra= m with the right environment variables and use it for the same purposes.


I wouldn't say it's ready for prime time just yet, though.=C2=A0 In=
particular, because libudev-compat uses (dev)tmpfs to record and
distribute event messages as regular files (under
/dev/metadata/udev/events), a program can leak files and directories
simply by exiting without shutting down libudev (i.e. failing freeing
up the struct udev_device).

=C2=A0That may be OOT, but I'm interested in hearing the rationale for<= br> that choice. An event is ephemeral, a file is (relatively) permanent;
recording events as regular files does not sound like a good match,
unless you have a reference counting process/thread somewhere that
cleans up an event as soon as it's consumed.

<= /div>
Tmpfs and devtmps are designed for holding ephemeral state alread= y, so I'm not sure why the fact that they expose data as regular files = is a concern?

I went with a file-oriented model specifica= lly because it made reference-counting simple and easy--specifically, by us= ing hard-links.=C2=A0 The aforementioned event-propagation tool writes the = uevent into a scratch area under /dev, hard-links it into each libudev-comp= at monitor directory under /dev/metadata/udev, and unlinks the file (there = is a directory in /dev/metadata/udev for each struct udev_monitor created b= y each libudev-compat program).=C2=A0 When the libudev-compat client wakes = up next, it consumes any new event-files (in delivery order) and unlinks th= em, thereby ensuring that once each libudev-compat client "receives&qu= ot; the event, the event's resources are fully reclaimed.


=C2=A0Anyway, unless I'm misunderstanding the architecture completely,<= br> it sounds like leaks could be prevented by wrapping programs you're
not sure of.

I = couldn't think of a simpler way that was also as robust.=C2=A0 Unless I= 'm misunderstanding something, wrapping an arbitrary program to clean u= p the files it created would, in the extreme, require coming up with a way = to do so on SIGKILL.=C2=A0 I'd love to know if there is a simple way to= do this, though.



=C2=A0My plan is to have libudev-compat store
its events to a special-purpose FUSE filesystem called eventfs [1]
that automatically removes orphaned files and denies all future
access to them.

=C2=A0Unfortunately, FUSE is a deal breaker for the project I'm working= on.

=C2=A0I'm under the impression that you're slightly overengineering= this;
you shouldn't need a specific filesystem to distribute events. My
s6-ftrig-* set of tools distribute events to arbitrary subscribers
without needing anything specific - the mechanism is just directories
and named pipes.
=C2=A0But I don't know the details of libudev, so I may be missing
something, and I'm really interested in learning more.=

I went with a specia= lized filesystem for two reasons; both of which were to fulfill libudev'= ;s API contract:
* Efficient, reliable event multicasting.=C2= =A0 By using hard-links as described above, the event only needs to be writ= ten out once, and the OS only needs to store one copy.
* Auto= matic multicast channel cleanup.=C2=A0 Eventfs would ensure that no matter = how a process dies, its multicast state would be come inaccessible and be r= eclaimed once it is dead (i.e. a subsequent filesystem operation on the orp= haned state, no matter how soon after the process's exit, will fail).
Both of the above are implicitly guaranteed by libudev, since it rel= ies on a netlink multicast group shared with the udevd process to achieve t= hem.

It is my understanding (please correct me= if I'm wrong) that with s6-ftrig-*, I would need to write out the even= t data to each listener's pipe (i.e. once per struct udev_monitor insta= nce), and I would still be responsible for cleaning up the fifodir every no= w and then if the libudev-compat client failed to do so itself.=C2=A0 Is my= understanding correct?

Again, I would love to know of a = simpler approach that is just as robust.



Instead, I've been running a script
every now and then that clears out orphaned directories in
/dev/metadata/udev/events.

=C2=A0A polling cleaner script works if you have no sensitive data.
A better design, though, is a notification-based cleaner, that
is triggered as soon as a reference expires. And I'm almost
certain you don't need eventfs for this :)


I agree that a notifi= cation-based cleaner could be just as effective, but I wonder whether or no= t the machinery necessary to track all libudev-compat processes in a reliab= le and efficient manner would be simpler than eventfs?=C2=A0 Would love to = know what you had in mind :)

Thanks for your feedback,
Jude
=C2=A0

--
=C2=A0Laurent


---
Unsubscribe:=C2=A0 alpine-devel+unsubscribe@lists.alpinelinux.or= g
Help:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alpine-devel+help@lists.alpineli= nux.org
---


--047d7b417a6373259605292f0892-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 714CEDCFCCF for ; Wed, 13 Jan 2016 12:33:23 +0000 (UTC) Received: from smtp1.tech.numericable.fr (smtp1.tech.numericable.fr [82.216.111.37]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 073D1DCF95D for ; Wed, 13 Jan 2016 12:33:22 +0000 (UTC) Received: from sinay.internal.skarnet.org (ip-62.net-82-216-6.versailles2.rev.numericable.fr [82.216.6.62]) by smtp1.tech.numericable.fr (Postfix) with SMTP id 0DDCA140566 for ; Wed, 13 Jan 2016 13:33:20 +0100 (CET) Received: (qmail 22096 invoked from network); 13 Jan 2016 12:33:46 -0000 Received: from elzian.internal.skarnet.org. (HELO ?192.168.0.2?) (192.168.0.2) by sinay.internal.skarnet.org. with SMTP; 13 Jan 2016 12:33:46 -0000 Subject: Re: [alpine-devel] udev replacement on Alpine Linux To: alpine-devel@lists.alpinelinux.org References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> From: Laurent Bercot Message-ID: <56964414.1000605@skarnet.org> Date: Wed, 13 Jan 2016 13:33:24 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 0 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekiedrkeejgddukecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgnecuuegrihhlohhuthemuceftddtnecunecujfgurhepuffvfhfhkffffgggjggtgfesthejrgdttdefjeenucfhrhhomhepnfgruhhrvghnthcuuegvrhgtohhtuceoshhkrgdquggvvhgvlhesshhkrghrnhgvthdrohhrgheqnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht X-Virus-Scanned: ClamAV using ClamSMTP On 13/01/2016 04:47, Jude Nelson wrote: > I haven't tried this myself, but it should be doable. Vdev's > event-propagation mechanism is a small program that constructs a > uevent string from environment variables passed to it by vdev and > writes the string to the appropriate place. The vdev daemon isn't > aware of its existence; it simply executes it like it would for any > another matching device-event action. Another device manager could > supply the same program with the right environment variables and use > it for the same purposes. Indeed. My question then becomes: what are the differences between the string passed by the kernel (which is more or less a list of environment variables, too) and the string constructed by vdev ? In other words, is vdev itself more than a trivial netlink listener, and if yes, what does it do ? (I'll just take a pointer to the documentation if that question is answered somewhere.) For now I'll take a wild guess and say that vdev analyzes the MODALIAS or something, according to a conf file, in order to know the correct fan-out to perform and write the event to the correct subsystems. Am I close ? > Tmpfs and devtmps are designed for holding ephemeral state already, > so I'm not sure why the fact that they expose data as regular files > is a concern? Two different meanings of "ephemeral". tmpfs and devtmpfs are supposed to retain their data until the end of the system's lifetime. An event is much more ephemeral than that: it's supposed to be consumed instantly - like the event from the kernel is consumed instantly by the netlink listener. Files, even in a tmpfs, remain alive in the absence of a live process to hold them; but events have no meaning if no process needs them, which is the reason for the "event leaking" problem. Ideally, you need a file type with basically the same lifetime as a process. Holding event data in a file is perfectly valid as long as you have a mechanism to reclaim the file as soon as the last reference to it dies. > I couldn't think of a simpler way that was also as robust. Unless > I'm misunderstanding something, wrapping an arbitrary program to > clean up the files it created would, in the extreme, require coming > up with a way to do so on SIGKILL. I'd love to know if there is a > simple way to do this, though. That's where supervisors come into play: the parent of a process always knows when it dies, even on SIGKILL. Supervised daemons can have a cleaner script in place. For the general case, it shouldn't be hard to have a wrapper that forks an arbitrary program and cleans up /dev/metadata/whatever/*$childpid* when it dies. The price to pay is an additional process, but that additional process would be very small. You can still have a polling "catch-all cleaner" to collect dead events in case the supervisor/wrapper also died, but since that occurrence will be rare, the polling period can be pretty long so it's not a problem. > I went with a specialized filesystem for two reasons; both of which > were to fulfill libudev's API contract: * Efficient, reliable event > multicasting. By using hard-links as described above, the event only > needs to be written out once, and the OS only needs to store one > copy. That's a good mechanism; you're already fulfilling that contract with the non-eventfs implementation. > * Automatic multicast channel cleanup. Eventfs would ensure that no > matter how a process dies, its multicast state would be come > inaccessible and be reclaimed once it is dead (i.e. a subsequent > filesystem operation on the orphaned state, no matter how soon after > the process's exit, will fail). That's where storing events as files is problematic: files survive processes. But I still don't think a specific fs is necessary: you can either ensure files do not survive processes (see the supervisor/cleaner idea above), or you can use another Unix mechanism (see below). > Both of the above are implicitly guaranteed by libudev, since it > relies on a netlink multicast group shared with the udevd process > to achieve them. And honestly, that's not a bad design. If you want to have multicast, and you happen to have a true multicast IPC mechanism, might as well use it. It will be hard to be as efficient as that: if you don't have true multicast, you have to compromise somewhere. I dare say using a netlink multicast group is lighter than designing a FUSE filesystem to do the same thing. If you want the same functionality, why didn't you adopt the same mechanism ? (It can be made modular. You can have a uevent listener that just gets the event from the kernel and transmits it to the event manager; and the chosen event manager multicasts it.) > It is my understanding (please correct me if I'm wrong) that with > s6-ftrig-*, I would need to write out the event data to each > listener's pipe (i.e. once per struct udev_monitor instance), and I > would still be responsible for cleaning up the fifodir every now and > then if the libudev-compat client failed to do so itself. Is my > understanding correct? Yes and no. I'm not suggesting you to use libftrig for your purpose. :) * My concern with libftrig was never event storage: it was many-to-many notification. I didn't design it to transmit arbitrary amounts of data, but to instantly wake up processes when something happens; data transmission *is* possible, but the original idea is to send one byte at a time, for just 256 types of event. Notification and data transmission are orthogonal concepts. It's always possible to store data somewhere and notify processes that data is available; then processes can fetch the data. Data transmission can be pull, whereas notification has to be push. libftrig is only about the push. Leaking space is not a concern with libftrig, because fifodirs never store data, only pipes; at worst, they leak a few inodes. That is why a polling cleaner is sufficient: even if multiple subscribers get SIGKILLed, they will only leave behind a few fifos, and no data - so sweeping now and then is more than enough. It's different if you're storing data, because leaks can be much more problematic. * Unless you have true multicast, you will have to push a notification as many times as you have listeners, no matter what. That's what I'm doing when writing to all the fifos in a fifodir. That's what you are doing when linking the event into every subscriber's directory. I guess your subscriber library uses some kind of inotify to know when a new file has arrived? > Again, I would love to know of a simpler approach that is just as > robust. Whenever you have "pull" data transmission, you necessarily have the problem of storage lifetime. Here, as often, what you want is reference counting: when the last handle to the data disappears, the data is automatically collected. The problem is that your current handle, an inode, is not tied to the subscriber's lifetime. You want a type of handle that will die with the process. File descriptors fit this. So, an idea would be to do something like: - Your event manager listens to a Unix domain socket. - Your subscribers connect to that socket. - For every event: + the event manager stores the event into an anonymous file (e.g. a file in a tmpfs that is unlinked as soon as it is created) while keeping a reading fd on it + the event manager sends a copy of the reading fd, via fd-passing, to every subscriber. This counts as a notification, since it will wake up subscribers. + the event manager closes its own fd to the file. + subscribers will read the fd when they so choose, and they will close it afterwards. The kernel will also close it when they die, so you won't leak any data. Of course, at that point, you may as well give up and just push the whole event over the Unix socket. It's what udevd does, except it uses a netlink multicast group instead of a normal socket (so its complexity is independent from the number of subscribers). Honestly, given that the number of subscribers will likely be small, and your events probably aren't too large either, it's the simplest design - it's what I'd go for. (I even already have the daemon to do it, as a part of skabus. Sending data to subscribers is exactly what a pubsub does.) But if you estimate that the amount of data is too large and you don't want to copy it, then you can just send a fd instead. It's still manual broadcast, but it's not in O(event length * subscribers), it's in O(subscribers), i.e. the same complexity as your "hard link the event file" strategy; and it has the exact storage properties that you want. What do you think ? -- Laurent --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 693B4DC0BEE for ; Thu, 14 Jan 2016 05:55:59 +0000 (UTC) Received: from mail-ob0-f169.google.com (mail-ob0-f169.google.com [209.85.214.169]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 3CDDFDC014B for ; Thu, 14 Jan 2016 05:55:58 +0000 (UTC) Received: by mail-ob0-f169.google.com with SMTP id is5so74863806obc.0 for ; Wed, 13 Jan 2016 21:55:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=AFmIe11Buc+N8GX1c5QpQhUXlecom/+JrtDlixXcvh8=; b=hmIN8ISPMgPkV8pVaoFukr3Su3xOJLcx316IE7KlcyaIuSGFFN0aAvN5xHNx9kfaXp qcJUC3gFK7I9jqmcHHF4v5B9Uuhng04ZWgiGOcGXiHNJxmu8MbujHzESbQs3zaQJpZtk sb+a8WEsswtFIQMxNXtzFcBYuSNJwDKkmbj8Vxa6CIACcHJH9U7Qt+NjBpihsm8dl0TO 0DQhXvH21jW8rRPF8tuWzeI54j++S3lSTlvWU0ZVsYoYo87Y4lK7YkWUmSgjfLk3b5vw RQtGuLxULb/dRMSwm4yNZhgWh21kZoxWLiLxyrUeig83g0rhlBPxCyb0WBKLOtaXCPpw YQyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=AFmIe11Buc+N8GX1c5QpQhUXlecom/+JrtDlixXcvh8=; b=gHnVPPzleZanhqdz49SdINbszgtNzLTmeTTU5B0jQeMy17TakTdiJQPdgABsTkbO5c yB+dTYsZAJIjCb6Vz8GlWEbNM9vZB7cJPX0SkBlsNwudVuAa7XFbo4zOaEeTuw5BF+8T t2Xd52leq2zB891Mrc2Ke8XmbSAG+46AoEGaqaapR4VWO/X/8Rw8yF/LTVJFznA/5+8Q mx0URVN7vKueIMkIdpI1dhsspuFjDBP8cNCrOwDZ8WgMJhwzuKZORIwpSVSqsU8fQ0c/ kcYgzA3v9rAckXOD7p4mMa+K7nqldPgRXdIadC/nw6TqJNs9otgdUjkw2omvBmRH6Xfw QWzg== X-Gm-Message-State: ALoCoQkQ0ZLjdERC1GJzqDZoIR3CLd/LYWve+PCDA/wfHeJCEVKyUjSjKYgAg34rLaoGOvCMaajB9Ao8DkeJ80DKQsjKsmEklA== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.182.28.66 with SMTP id z2mr1766080obg.32.1452750957889; Wed, 13 Jan 2016 21:55:57 -0800 (PST) Received: by 10.202.81.6 with HTTP; Wed, 13 Jan 2016 21:55:57 -0800 (PST) In-Reply-To: <56964414.1000605@skarnet.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> Date: Thu, 14 Jan 2016 00:55:57 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=089e0158b09ecacf4d052944f024 X-Virus-Scanned: ClamAV using ClamSMTP --089e0158b09ecacf4d052944f024 Content-Type: text/plain; charset=UTF-8 Hi Laurent, On Wed, Jan 13, 2016 at 7:33 AM, Laurent Bercot wrote: > On 13/01/2016 04:47, Jude Nelson wrote: > > I haven't tried this myself, but it should be doable. Vdev's >> event-propagation mechanism is a small program that constructs a >> uevent string from environment variables passed to it by vdev and >> writes the string to the appropriate place. The vdev daemon isn't >> aware of its existence; it simply executes it like it would for any >> another matching device-event action. Another device manager could >> supply the same program with the right environment variables and use >> it for the same purposes. >> > > Indeed. My question then becomes: what are the differences between > the string passed by the kernel (which is more or less a list of > environment variables, too) and the string constructed by vdev ? > In other words, is vdev itself more than a trivial netlink listener, > and if yes, what does it do ? (I'll just take a pointer to the > documentation if that question is answered somewhere.) > For now I'll take a wild guess and say that vdev analyzes the > MODALIAS or something, according to a conf file, in order to know > the correct fan-out to perform and write the event to the correct > subsystems. Am I close ? > (I should really sit down and write documentation sometime :) I think you're close. The jist of it is that vdev needs to supply a lot more information than the kernel gives it. In particular, its helper programs go on to query the properties and status of each device (this often requires root privileges, i.e. via privileged ioctl()s), and vdev gathers the information into a (much larger) event packet and stores it in a directory tree under /dev for subsequent query by less-privileged programs. It doesn't rely on the MODALIAS per se; instead it matches fields of the kernel's uevent packet (one of which is the MODALIAS) to the right helper programs to run. Here's an example of what vdev gathers for my laptop's SATA disk: $ cat /dev/metadata/dev/sda/properties VDEV_ATA=1 VDEV_WWN=0x5000c500299a9a7a VDEV_BUS=ata VDEV_SERIAL=ST9500420AS_5VJ7A0BM VDEV_SERIAL_SHORT=5VJ7A0BM VDEV_REVISION=0003LVM1 VDEV_TYPE=ata VDEV_MAJOR=8 VDEV_MINOR=0 VDEV_OS_SUBSYSTEM=block VDEV_OS_DEVTYPE=disk VDEV_OS_DEVPATH=/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/block/sda VDEV_OS_DEVNAME=sda VDEV_ATA=1 VDEV_ATA_TYPE=disk VDEV_ATA_MODEL=ST9500420AS VDEV_ATA_MODEL_ENC=ST9500420ASx20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20 VDEV_ATA_REVISION=0003LVM1 VDEV_ATA_SERIAL=ST9500420AS_5VJ7A0BM VDEV_ATA_SERIAL_SHORT=5VJ7A0BM VDEV_ATA_WRITE_CACHE=1 VDEV_ATA_WRITE_CACHE_ENABLED=1 VDEV_ATA_FEATURE_SET_HPA=1 VDEV_ATA_FEATURE_SET_HPA_ENABLED=1 VDEV_ATA_FEATURE_SET_PM=1 VDEV_ATA_FEATURE_SET_PM_ENABLED=1 VDEV_ATA_FEATURE_SET_SECURITY=1 VDEV_ATA_FEATURE_SET_SECURITY_ENABLED=0 VDEV_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=100 VDEV_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=100 VDEV_ATA_FEATURE_SET_SECURITY_FROZEN=1 VDEV_ATA_FEATURE_SET_SMART=1 VDEV_ATA_FEATURE_SET_SMART_ENABLED=1 VDEV_ATA_FEATURE_SET_APM=1 VDEV_ATA_FEATURE_SET_APM_ENABLED=1 VDEV_ATA_FEATURE_SET_APM_CURRENT_VALUE=128 VDEV_ATA_DOWNLOAD_MICROCODE=1 VDEV_ATA_SATA=1 VDEV_ATA_SATA_SIGNAL_RATE_GEN2=1 VDEV_ATA_SATA_SIGNAL_RATE_GEN1=1 VDEV_ATA_ROTATION_RATE_RPM=7200 VDEV_ATA_WWN=0x5000c500299a9a7a VDEV_ATA_WWN_WITH_EXTENSION=0x5000c500299a9a7a Anything that starts with "VDEV_ATA_", as well as "VDEV_BUS", "VDEV_SERIAL_*", "VDEV_TYPE", and "VDEV_REVISION" had to be extracted via an ioctl, by exploring files in sysfs, or by querying a hardware database. The kernel only supplied a few of these fields. > > Tmpfs and devtmps are designed for holding ephemeral state already, >> so I'm not sure why the fact that they expose data as regular files >> is a concern? >> > > Two different meanings of "ephemeral". > tmpfs and devtmpfs are supposed to retain their data until the > end of the system's lifetime. An event is much more ephemeral > than that: it's supposed to be consumed instantly - like the > event from the kernel is consumed instantly by the netlink listener. > Files, even in a tmpfs, remain alive in the absence of a live > process to hold them; but events have no meaning if no process needs > them, which is the reason for the "event leaking" problem. > Ideally, you need a file type with basically the same lifetime > as a process. > > Holding event data in a file is perfectly valid as long as you have > a mechanism to reclaim the file as soon as the last reference to it > dies. > Funny you mention this--I also created runfs ( https://github.com/jcnelson/runfs) to do exactly this. In particular, I use it for PID files. Also, eventfs was actually derived from runfs, but specialized more to make it more suitable for managing event-queues. > > > I couldn't think of a simpler way that was also as robust. Unless >> I'm misunderstanding something, wrapping an arbitrary program to >> clean up the files it created would, in the extreme, require coming >> up with a way to do so on SIGKILL. I'd love to know if there is a >> simple way to do this, though. >> > > That's where supervisors come into play: the parent of a process > always knows when it dies, even on SIGKILL. Supervised daemons can > have a cleaner script in place. > For the general case, it shouldn't be hard to have a wrapper that > forks an arbitrary program and cleans up /dev/metadata/whatever/*$childpid* > when it dies. The price to pay is an additional process, but that > additional process would be very small. > You can still have a polling "catch-all cleaner" to collect dead events > in case the supervisor/wrapper also died, but since that occurrence will > be rare, the polling period can be pretty long so it's not a problem. > Agreed. I would be happy to keep this approach in mind in the design of libudev-compat. Eventfs isn't a hard requirement and I don't want it to be, since there's more than one way to deal with this problem. > > > I went with a specialized filesystem for two reasons; both of which >> were to fulfill libudev's API contract: * Efficient, reliable event >> multicasting. By using hard-links as described above, the event only >> needs to be written out once, and the OS only needs to store one >> copy. >> > > That's a good mechanism; you're already fulfilling that contract > with the non-eventfs implementation. > > > * Automatic multicast channel cleanup. Eventfs would ensure that no >> matter how a process dies, its multicast state would be come >> inaccessible and be reclaimed once it is dead (i.e. a subsequent >> filesystem operation on the orphaned state, no matter how soon after >> the process's exit, will fail). >> > > That's where storing events as files is problematic: files survive > processes. But I still don't think a specific fs is necessary: you can > either ensure files do not survive processes (see the supervisor/cleaner > idea above), or you can use another Unix mechanism (see below). > > > Both of the above are implicitly guaranteed by libudev, since it >> relies on a netlink multicast group shared with the udevd process >> to achieve them. >> > > And honestly, that's not a bad design. If you want to have multicast, > and you happen to have a true multicast IPC mechanism, might as well > use it. It will be hard to be as efficient as that: if you don't have > true multicast, you have to compromise somewhere. > I dare say using a netlink multicast group is lighter than designing > a FUSE filesystem to do the same thing. If you want the same > functionality, why didn't you adopt the same mechanism ? > I agree that netlink is lighter, but I avoided it for two reasons: * Sometime down the road, I'd like to port vdev to OpenBSD. Not because I believe that the OpenBSD project is in dire need of a dynamic device manager, but simply because it's the thing I miss the most when I'm using OpenBSD (personal preference). Netlink is Linux-specific, whereas FUSE works on pretty much every Unix these days. * There is no way to namespace netlink messages that I'm aware of. The kernel (and udev) sends the same device events to every container on the system--in fact, this is one of the major reasons cited by the systemd folks for moving off of netlink for udevd-to-libudev communications. By using a synthetic filesystem for message transport, I can use bind-mounts to control which device events get routed to which containers (this is also the reason why the late kdbus was implemented as a synthetic filesystem). Using fifodirs has the same benefit :) > > (It can be made modular. You can have a uevent listener that just gets > the event from the kernel and transmits it to the event manager; and > the chosen event manager multicasts it.) > > Good point; something I'll keep in mind in the future evolution of libudev-compat :) > > It is my understanding (please correct me if I'm wrong) that with >> s6-ftrig-*, I would need to write out the event data to each >> listener's pipe (i.e. once per struct udev_monitor instance), and I >> would still be responsible for cleaning up the fifodir every now and >> then if the libudev-compat client failed to do so itself. Is my >> understanding correct? >> > > Yes and no. I'm not suggesting you to use libftrig for your purpose. :) > > * My concern with libftrig was never event storage: it was > many-to-many notification. I didn't design it to transmit arbitrary > amounts of data, but to instantly wake up processes when something > happens; data transmission *is* possible, but the original idea is > to send one byte at a time, for just 256 types of event. > > Notification and data transmission are orthogonal concepts. It's > always possible to store data somewhere and notify processes that > data is available; then processes can fetch the data. Data > transmission can be pull, whereas notification has to be push. > libftrig is only about the push. > > Leaking space is not a concern with libftrig, because fifodirs > never store data, only pipes; at worst, they leak a few inodes. > That is why a polling cleaner is sufficient: even if multiple > subscribers get SIGKILLed, they will only leave behind a few > fifos, and no data - so sweeping now and then is more than enough. > It's different if you're storing data, because leaks can be much > more problematic. > > * Unless you have true multicast, you will have to push a > notification as many times as you have listeners, no matter what. > That's what I'm doing when writing to all the fifos in a fifodir. > That's what you are doing when linking the event into every > subscriber's directory. I guess your subscriber library uses some > kind of inotify to know when a new file has arrived? > Yes, modulo some other mechanisms to ensure that the libudev-compat process doesn't get back-logged and lose messages. I completely agree with you about the benefits of separating notification (control-plane) from message delivery (data-plane). > > > Again, I would love to know of a simpler approach that is just as >> robust. >> > > Whenever you have "pull" data transmission, you necessarily have the > problem of storage lifetime. Here, as often, what you want is > reference counting: when the last handle to the data disappears, the data > is automatically collected. > The problem is that your current handle, an inode, is not tied to the > subscriber's lifetime. You want a type of handle that will die with the > process. > File descriptors fit this. > > So, an idea would be to do something like: > - Your event manager listens to a Unix domain socket. > - Your subscribers connect to that socket. > - For every event: > + the event manager stores the event into an anonymous file (e.g. a file > in a tmpfs that is unlinked as soon as it is created) while keeping a > reading fd on it > + the event manager sends a copy of the reading fd, via fd-passing, > to every subscriber. This counts as a notification, since it will wake up > subscribers. > + the event manager closes its own fd to the file. > + subscribers will read the fd when they so choose, and they will > close it afterwards. The kernel will also close it when they die, so you > won't leak any data. > > Of course, at that point, you may as well give up and just push the > whole event over the Unix socket. It's what udevd does, except it uses a > netlink multicast group instead of a normal socket (so its complexity is > independent from the number of subscribers). Honestly, given that the > number of subscribers will likely be small, and your events probably aren't > too large either, it's the simplest design - it's what I'd go for. > (I even already have the daemon to do it, as a part of skabus. Sending > data to subscribers is exactly what a pubsub does.) > > But if you estimate that the amount of data is too large and you don't > want to copy it, then you can just send a fd instead. It's still > manual broadcast, but it's not in O(event length * subscribers), it's in > O(subscribers), i.e. the same complexity as your "hard link the event > file" strategy; and it has the exact storage properties that you want. > > What do you think ? I think both approaches are good ideas and would work just as well. I really like skabus's approach--I'll take a look at using it for message delivery as an additional (preferred?) vdev-to-libudev-compat message delivery mechanism :) It looks like it offers all the aforementioned benefits over netlink that I'm looking for. A question on the implementation--what do you think of having each subscriber create its own Unix domain socket in a canonical directory, and having the sender connect as a client to each subscriber? Since each subscriber needs its own fd to read and close, the directory of subscriber sockets automatically gives the sender a list of who to communicate with and a count of how many fds to create. It also makes it easy to detect and clean up a dead subscriber's socket: the sender can request a struct ucred from a subscriber to get its PID (and then other details from /proc), and if the process ever exits (which the sender can detect on Linux using a netlink process monitor, like [1]), the process that created the socket can be assumed to be dead and the sender can unlink it. The sender would rely on additional process instance-identifying information from /proc (like its start-time) to avoid PID-reuse races. Thanks again for all your input! -Jude [1] http://bewareofgeek.livejournal.com/2945.html?page=1 --089e0158b09ecacf4d052944f024 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Laurent,

On Wed, Jan 13, 2016 at 7:33 AM, Laurent Bercot <ska= -devel@skarnet.org> wrote:
On 13/01= /2016 04:47, Jude Nelson wrote:

I haven't tried this myself, but it should be doable.=C2=A0 Vdev's<= br> event-propagation mechanism is a small program that constructs a
uevent string from environment variables passed to it by vdev and
writes the string to the appropriate place.=C2=A0 The vdev daemon isn't=
aware of its existence; it simply executes it like it would for any
another matching device-event action.=C2=A0 Another device manager could supply the same program with the right environment variables and use
=C2=A0it for the same purposes.

=C2=A0Indeed. My question then becomes: what are the differences between the string passed by the kernel (which is more or less a list of
environment variables, too) and the string constructed by vdev ?
In other words, is vdev itself more than a trivial netlink listener,
and if yes, what does it do ? (I'll just take a pointer to the
documentation if that question is answered somewhere.)
For now I'll take a wild guess and say that vdev analyzes the
MODALIAS or something, according to a conf file, in order to know
the correct fan-out to perform and write the event to the correct
subsystems. Am I close ?

(= I should really sit down and write documentation sometime :)

I think= you're close.=C2=A0 The jist of it is that vdev needs to supply a lot = more information than the kernel gives it.=C2=A0 In particular, its helper = programs go on to query the properties and status of each device (this ofte= n requires root privileges, i.e. via privileged ioctl()s), and vdev gathers= the information into a (much larger) event packet and stores it in a direc= tory tree under /dev for subsequent query by less-privileged programs.=C2= =A0 It doesn't rely on the MODALIAS per se; instead it matches fields o= f the kernel's uevent packet (one of which is the MODALIAS) to the righ= t helper programs to run.

Here's an example of what v= dev gathers for my laptop's SATA disk:

$ cat /dev/metadata/dev/s= da/properties
VDEV_ATA=3D1
VDEV_WWN=3D0x5000c500299a9a7a
VDEV_BUS= =3Data
VDEV_SERIAL=3DST9500420AS_5VJ7A0BM
VDEV_SERIAL_SHORT=3D5VJ7A0B= M
VDEV_REVISION=3D0003LVM1
VDEV_TYPE=3Data
VDEV_MAJOR=3D8
VDEV_= MINOR=3D0
VDEV_OS_SUBSYSTEM=3Dblock
VDEV_OS_DEVTYPE=3Ddisk
VDEV_OS= _DEVPATH=3D/devices/pci0000:00/0000:00:1f.2/ata1/host0/target0:0:0/0:0:0:0/= block/sda
VDEV_OS_DEVNAME=3Dsda
VDEV_ATA=3D1
VDEV_ATA_TYPE=3Ddisk<= br>VDEV_ATA_MODEL=3DST9500420AS
VDEV_ATA_MODEL_ENC=3DST9500420ASx20x20x2= 0x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x20x2= 0x20
VDEV_ATA_REVISION=3D0003LVM1
VDEV_ATA_SERIAL=3DST9500420AS_5VJ7A= 0BM
VDEV_ATA_SERIAL_SHORT=3D5VJ7A0BM
VDEV_ATA_WRITE_CACHE=3D1
VDEV= _ATA_WRITE_CACHE_ENABLED=3D1
VDEV_ATA_FEATURE_SET_HPA=3D1
VDEV_ATA_FE= ATURE_SET_HPA_ENABLED=3D1
VDEV_ATA_FEATURE_SET_PM=3D1
VDEV_ATA_FEATUR= E_SET_PM_ENABLED=3D1
VDEV_ATA_FEATURE_SET_SECURITY=3D1
VDEV_ATA_FEATU= RE_SET_SECURITY_ENABLED=3D0
VDEV_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN= =3D100
VDEV_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=3D100
VD= EV_ATA_FEATURE_SET_SECURITY_FROZEN=3D1
VDEV_ATA_FEATURE_SET_SMART=3D1VDEV_ATA_FEATURE_SET_SMART_ENABLED=3D1
VDEV_ATA_FEATURE_SET_APM=3D1
= VDEV_ATA_FEATURE_SET_APM_ENABLED=3D1
VDEV_ATA_FEATURE_SET_APM_CURRENT_VA= LUE=3D128
VDEV_ATA_DOWNLOAD_MICROCODE=3D1
VDEV_ATA_SATA=3D1
VDEV_A= TA_SATA_SIGNAL_RATE_GEN2=3D1
VDEV_ATA_SATA_SIGNAL_RATE_GEN1=3D1
VDEV_= ATA_ROTATION_RATE_RPM=3D7200
VDEV_ATA_WWN=3D0x5000c500299a9a7a
VDEV_A= TA_WWN_WITH_EXTENSION=3D0x5000c500299a9a7a

Anything that = starts with "VDEV_ATA_", as well as "VDEV_BUS", "V= DEV_SERIAL_*", "VDEV_TYPE", and "VDEV_REVISION" ha= d to be extracted via an ioctl, by exploring files in sysfs, or by querying= a hardware database.=C2=A0 The kernel only supplied a few of these fields.=



Tmpfs and devtmps are designed for holding ephemeral state already,
so I'm not sure why the fact that they expose data as regular files
is a concern?

=C2=A0Two different meanings of "ephemeral".
=C2=A0tmpfs and devtmpfs are supposed to retain their data until the
end of the system's lifetime. An event is much more ephemeral
than that: it's supposed to be consumed instantly - like the
event from the kernel is consumed instantly by the netlink listener.
Files, even in a tmpfs, remain alive in the absence of a live
process to hold them; but events have no meaning if no process needs
them, which is the reason for the "event leaking" problem.
Ideally, you need a file type with basically the same lifetime
as a process.

=C2=A0Holding event data in a file is perfectly valid as long as you have a mechanism to reclaim the file as soon as the last reference to it
dies.

Funny you mention this--I also = created runfs=20 (https://gi= thub.com/jcnelson/runfs) to do exactly this.=C2=A0 In particular, I use= it for PID files.=C2=A0 Also, eventfs was=20 actually derived from runfs, but specialized more to make it more suitable = for managing event-queues.
=C2=A0


I couldn't think of a simpler way that was also as robust.=C2=A0 Unless=
I'm misunderstanding something, wrapping an arbitrary program to
clean up the files it created would, in the extreme, require coming
up with a way to do so on SIGKILL.=C2=A0 I'd love to know if there is a=
simple way to do this, though.

=C2=A0That's where supervisors come into play: the parent of a process<= br> always knows when it dies, even on SIGKILL. Supervised daemons can
have a cleaner script in place.
=C2=A0For the general case, it shouldn't be hard to have a wrapper that=
forks an arbitrary program and cleans up /dev/metadata/whatever/*$childpid*=
when it dies. The price to pay is an additional process, but that
additional process would be very small.
=C2=A0You can still have a polling "catch-all cleaner" to collect= dead events
in case the supervisor/wrapper also died, but since that occurrence will be rare, the polling period can be pretty long so it's not a problem.

Agreed.=C2=A0 I would be ha= ppy to keep this approach in mind in the design of libudev-compat.=C2=A0 Ev= entfs isn't a hard requirement and I don't want it to be, since the= re's more than one way to deal with this problem.
=C2=A0


I went with a specialized filesystem for two reasons; both of which
were to fulfill libudev's API contract: * Efficient, reliable event
multicasting.=C2=A0 By using hard-links as described above, the event only<= br> needs to be written out once, and the OS only needs to store one
copy.

=C2=A0That's a good mechanism; you're already fulfilling that contr= act
with the non-eventfs implementation.


* Automatic multicast channel cleanup.=C2=A0 Eventfs would ensure that no matter how a process dies, its multicast state would be come
inaccessible and be reclaimed once it is dead (i.e. a subsequent
filesystem operation on the orphaned state, no matter how soon after
=C2=A0the process's exit, will fail).

=C2=A0That's where storing events as files is problematic: files surviv= e
processes. But I still don't think a specific fs is necessary: you can<= br> either ensure files do not survive processes (see the supervisor/cleaner idea above), or you can use another Unix mechanism (see below).


Both of the above are implicitly guaranteed by libudev, since it
relies on a netlink multicast group shared with the udevd process
to achieve them.

=C2=A0And honestly, that's not a bad design. If you want to have multic= ast,
and you happen to have a true multicast IPC mechanism, might as well
use it. It will be hard to be as efficient as that: if you don't have true multicast, you have to compromise somewhere.
=C2=A0I dare say using a netlink multicast group is lighter than designing<= br> a FUSE filesystem to do the same thing. If you want the same
functionality, why didn't you adopt the same mechanism ?

I agree that netlink is lighter, but I avoided it for= two reasons:
* Sometime down the road, I'd like to port vdev to Ope= nBSD.=C2=A0 Not because I believe that the OpenBSD project is in dire need = of a dynamic device manager, but simply because it's the thing I miss t= he most when I'm using OpenBSD (personal preference).=C2=A0 Netlink is = Linux-specific, whereas FUSE works on pretty much every Unix these days.
* There is no way to namespace netlink messages that I'm aw= are of.=C2=A0 The kernel (and udev) sends the same device events to every c= ontainer on the system--in fact, this is one of the major reasons cited by = the systemd folks for moving off of netlink for udevd-to-libudev communicat= ions.=C2=A0 By using a synthetic filesystem for message transport, I can us= e bind-mounts to control which device events get routed to which containers= (this is also the reason why the late kdbus was implemented as a synthetic= filesystem).=C2=A0 Using fifodirs has the same benefit :)
= =C2=A0

(It can be made modular. You can have a uevent listener that just gets
the event from the kernel and transmits it to the event manager; and
the chosen event manager multicasts it.)


Good point; something I'll = keep in mind in the future evolution of libudev-compat :)
=C2=A0

It is my understanding (please correct me if I'm wrong) that with
s6-ftrig-*, I would need to write out the event data to each
listener's pipe (i.e. once per struct udev_monitor instance), and I
would still be responsible for cleaning up the fifodir every now and
=C2=A0then if the libudev-compat client failed to do so itself.=C2=A0 Is my=
understanding correct?

=C2=A0Yes and no. I'm not suggesting you to use libftrig for your purpo= se. :)

* My concern with libftrig was never event storage: it was
many-to-many notification. I didn't design it to transmit arbitrary
amounts of data, but to instantly wake up processes when something
happens; data transmission *is* possible, but the original idea is
to send one byte at a time, for just 256 types of event.

=C2=A0Notification and data transmission are orthogonal concepts. It's<= br> always possible to store data somewhere and notify processes that
data is available; then processes can fetch the data. Data
transmission can be pull, whereas notification has to be push.
libftrig is only about the push.

=C2=A0Leaking space is not a concern with libftrig, because fifodirs
never store data, only pipes; at worst, they leak a few inodes.
That is why a polling cleaner is sufficient: even if multiple
subscribers get SIGKILLed, they will only leave behind a few
fifos, and no data - so sweeping now and then is more than enough.
It's different if you're storing data, because leaks can be much more problematic.

* Unless you have true multicast, you will have to push a
notification as many times as you have listeners, no matter what.
That's what I'm doing when writing to all the fifos in a fifodir. That's what you are doing when linking the event into every
subscriber's directory. I guess your subscriber library uses some
kind of inotify to know when a new file has arrived?

Yes, modulo some other mechanisms to ensure that= the libudev-compat process doesn't get back-logged and lose messages.= =C2=A0 I completely agree with you about the benefits of separating notific= ation (control-plane) from message delivery (data-plane).
=C2= =A0


Again, I would love to know of a simpler approach that is just as
robust.

=C2=A0Whenever you have "pull" data transmission, you necessarily= have the
problem of storage lifetime. Here, as often, what you want is
reference counting: when the last handle to the data disappears, the data is automatically collected.
=C2=A0The problem is that your current handle, an inode, is not tied to the=
subscriber's lifetime. You want a type of handle that will die with the=
process.
=C2=A0File descriptors fit this.

=C2=A0So, an idea would be to do something like:
=C2=A0- Your event manager listens to a Unix domain socket.
=C2=A0- Your subscribers connect to that socket.
=C2=A0- For every event:
=C2=A0 =C2=A0+ the event manager stores the event into an anonymous file (e= .g. a file
in a tmpfs that is unlinked as soon as it is created) while keeping a
reading fd on it
=C2=A0 =C2=A0+ the event manager sends a copy of the reading fd, via fd-pas= sing,
to every subscriber. This counts as a notification, since it will wake up subscribers.
=C2=A0 =C2=A0+ the event manager closes its own fd to the file.
=C2=A0 =C2=A0+ subscribers will read the fd when they so choose, and they w= ill
close it afterwards. The kernel will also close it when they die, so you won't leak any data.

=C2=A0Of course, at that point, you may as well give up and just push the whole event over the Unix socket. It's what udevd does, except it uses = a
netlink multicast group instead of a normal socket (so its complexity is independent from the number of subscribers). Honestly, given that the
number of subscribers will likely be small, and your events probably aren&#= 39;t
too large either, it's the simplest design - it's what I'd go f= or.
(I even already have the daemon to do it, as a part of skabus. Sending
data to subscribers is exactly what a pubsub does.)

=C2=A0But if you estimate that the amount of data is too large and you don&= #39;t
want to copy it, then you can just send a fd instead. It's still
manual broadcast, but it's not in O(event length * subscribers), it'= ;s in
O(subscribers), i.e. the same complexity as your "hard link the event = file" strategy; and it has the exact storage properties that you want.=

=C2=A0What do you think ?

I think both approaches are = good ideas and would work just as well.=C2=A0 I really like skabus's ap= proach--I'll take a look at using it for message delivery as an additio= nal (preferred?) vdev-to-libudev-compat message delivery mechanism :) =C2= =A0It looks like it offers all the aforementioned benefits over netlink tha= t I'm looking for.

A question on the implement= ation--what do you think of having each subscriber create its own Unix doma= in socket in a canonical directory, and having the sender connect as a clie= nt to each subscriber?=C2=A0 Since each subscriber needs its own fd to read= and close, the directory of subscriber sockets automatically gives the sen= der a list of who to communicate with and a count of how many fds to create= .=C2=A0 It also makes it easy to detect and clean up a dead subscriber'= s socket: =C2=A0the sender can request a struct ucred from a subscriber to = get its PID (and then other details from /proc), and if the process ever ex= its (which the sender can detect on Linux using a netlink process monitor, = like [1]), the process that created the socket can be assumed to be dead an= d the sender can unlink it.=C2=A0 The sender would rely on additional proce= ss instance-identifying information from /proc (like its start-time) to avo= id PID-reuse races.
=C2=A0
Thanks again for all your in= put!
-Jude


--089e0158b09ecacf4d052944f024-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 50552DC0368 for ; Thu, 14 Jan 2016 11:36:00 +0000 (UTC) Received: from smtp1.tech.numericable.fr (smtp1.tech.numericable.fr [82.216.111.37]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id D4A3BDC0298 for ; Thu, 14 Jan 2016 11:35:59 +0000 (UTC) Received: from sinay.internal.skarnet.org (ip-62.net-82-216-6.versailles2.rev.numericable.fr [82.216.6.62]) by smtp1.tech.numericable.fr (Postfix) with SMTP id 041891414C5 for ; Thu, 14 Jan 2016 12:35:57 +0100 (CET) Received: (qmail 22572 invoked from network); 14 Jan 2016 11:36:23 -0000 Received: from elzian.internal.skarnet.org. (HELO ?192.168.0.2?) (192.168.0.2) by sinay.internal.skarnet.org. with SMTP; 14 Jan 2016 11:36:23 -0000 Subject: Re: [alpine-devel] udev replacement on Alpine Linux To: alpine-devel@lists.alpinelinux.org References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> From: Laurent Bercot Message-ID: <56978822.8020205@skarnet.org> Date: Thu, 14 Jan 2016 12:36:02 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 50 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekiedrkeelgddukecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgnecuuegrihhlohhuthemuceftddtnecuogetfeejfedqtdegucdlhedtmdenucfjughrpefuvfhfhffkffgfgggjtgfgsehtjegrtddtfeejnecuhfhrohhmpefnrghurhgvnhhtuceuvghrtghothcuoehskhgrqdguvghvvghlsehskhgrrhhnvghtrdhorhhgqeenucffohhmrghinhepghhithhhuhgsrdgtohhmnecurfgrrhgrmhepmhhouggvpehsmhhtphhouhht X-Virus-Scanned: ClamAV using ClamSMTP On 14/01/2016 06:55, Jude Nelson wrote: > I think you're close. The jist of it is that vdev needs to supply a > lot more information than the kernel gives it. In particular, its > helper programs go on to query the properties and status of each > device (this often requires root privileges, i.e. via privileged > ioctl()s), and vdev gathers the information into a (much larger) > event packet and stores it in a directory tree under /dev for > subsequent query by less-privileged programs. I see. I think this is exactly what could be made modular. I've heard people say they were reluctant to using vdev because it's not KISS, and I suspect the ioctl machinery and data gathering is a large part of the complexity. If that part could be pluggable, i.e. if admins could choose a "data gatherer" just complex enough for their needs, I believe it could encourage adoption. In other words, I'm looking at a 3-part program: - the netlink listener - the data gatherer - the event publisher Of course, for libudev to work, you would need the full data gatherer; but if people aren't using libudev programs, they can use a simpler one, closer to what mdev is doing. It's all from a very high point-of-view, and I don't know the details of the code so I have no idea whether it's envisionable for vdev, but that's what I'm thinking off the top of my head. > Funny you mention this--I also created runfs > (https://github.com/jcnelson/runfs) to do exactly this. In > particular, I use it for PID files. I have no love for mechanisms that help people keep using PID files, which are an ugly relic that can't end up in the museum of mediaeval programming soon enough. :P That said, runfs is interesting, and I would love it if Unix provided such a mechanism. Unfortunately, for now it has to rely on FUSE, which is one of the most clunky mutant features of Linux, and an extra layer of complexity; so I find it cleaner if a program can achieve its functionality without depending on such a filesystem. > I agree that netlink is lighter, but I avoided it for two reasons: > * Sometime down the road, I'd like to port vdev to OpenBSD. That's a good reason, and an additional reason to separate the netlink listener from the event publisher (and the data gatherer). The event publisher and client library can be made 100% portable, whereas the netlink listener and data gatherer obviously cannot. > * There is no way to namespace netlink messages that I'm aware of. I didn't know that - I'm no netlink expert. But that's also a good reason. AFAICT, there are 32 netlink multicast groups, and they use hardcoded numbers - this is ugly, or at least requires a global registry of what group is used for. If you can't namespace them, it becomes even more of a scarce resource; although it's legitimate to use one for uevent publishing, I'm pretty sure people will find a way to clog them with random crap very soon - better stay away from resources you can't reliably lock. And from what you're saying, even systemd people have realized that. :) I'm not advocating netlink use for anything else than reading kernel events. It's just that true multicast will be more efficient than manual broadcast, there's no way around it. > By using a synthetic filesystem for > message transport, I can use bind-mounts to control which device > events get routed to which containers I'm torn between "oooh, clever" and "omg this hack is atrocious". :) > Yes, modulo some other mechanisms to ensure that the libudev-compat > process doesn't get back-logged and lose messages. What do you mean by that? If libudev-compat is, like libudev, linked into the application, then you have no control over client behaviour; if a client doesn't properly act on a notification, then there's nothing you can do about it and it's not your responsibility. Can you give a few details about what you're doing client-side? > I think both approaches are good ideas and would work just as well. > I really like skabus's approach--I'll take a look at using it for > message delivery as an additional (preferred?) vdev-to-libudev-compat > message delivery mechanism :) It looks like it offers all the > aforementioned benefits over netlink that I'm looking for. Unfortunately, it's not published yet, because there's still a lot of work to be done on clients. And now I'm wondering whether it would be more efficient to store messages in anonymous files and transmit fds, instead of transmitting copies of messages. I may have to rewrite stuff. :) I think I'll be able to get back to work on skabus by the end of this year - but no promises, since I'll be working on the Alpine init system as soon as I'm done with my current contract. But I can leak a few pieces of source code if you're interested. > A question on the implementation--what do you think of having each > subscriber create its own Unix domain socket in a canonical > directory, and having the sender connect as a client to each > subscriber? That's exactly how fifodirs work, with pipes instead of sockets. But I don't think that's a good fit here. A point of fifodirs is to have many-to-many communication: there are several subscribers, but there can also be several publishers (even if in practice there's often only one publisher). Publishers and subscribers are completely independent. Here, you only ever have one publisher: the event dispatcher. You only ever need one-to-many communication. Another point of fifodirs is to avoid the need for a daemon to act as a bus. It's notification that happens between unrelated processes without requiring a central server to ensure the communication. It's important because I didn't want my supervision system (which is supposed to manage daemons) to itself rely on a daemon (which would then have to be unsupervised). Here, you don't have that requirement, and you already have a daemon: the event dispatcher is long-lived. I think a "socketdir" mechanism is just too heavy: - for every event, you perform opendir(), readdir() and closedir() - for every event * subscriber, you perform at least socket(), connect(), sendmsg() and close() - the client library needs to listen() and accept(), which means it needs its own thread (and I hate, hate, hate, libraries that pull in thread support in my otherwise single-threaded programs) - the client library needs to perform access control on the socket, to avoid connects from unrelated processes, and even then you can't be certain it's the event publisher and not a random root process You definitely don't want a client library to be listen()ing. listen() is server stuff - mixing client and server stuff is complex. Too much so for what you need here. > Since each subscriber needs its own fd to read and > close, the directory of subscriber sockets automatically gives the > sender a list of who to communicate with and a count of how many fds > to create. It also makes it easy to detect and clean up a dead > subscriber's socket: the sender can request a struct ucred from a > subscriber to get its PID (and then other details from /proc), and if > the process ever exits (which the sender can detect on Linux using a > netlink process monitor, like [1]), the process that created the > socket can be assumed to be dead and the sender can unlink it. The > sender would rely on additional process instance-identifying > information from /proc (like its start-time) to avoid PID-reuse > races. Bleh. Of course it can be made to work, but you really don't need all that complexity. You have a daemon that wants to publish data, and several clients that want to receive data from that daemon: it's one (long-lived) to many (short-lived) communication, and there's a perfectly appropriate, simple and portable IPC for that: a single Unix domain socket that your daemon listens on and your clients connect to. If you want to be perfectly reliable, you can implement some kind of autoreconnect in the client library - in case you want to restart the event publisher without killing X, for instance. But that's still a lot simpler than playing with multiple sockets and mixing clients and serverswhen you don't need to. > Thanks again for all your input! No problem. I love design discussions, I can't get enough of them. (The reason why I left the Devuan mailing-list is that there was too much ideological mumbo-jumbo, and not enough technical/design stuff. Speaking of which, my apologies to Alpine devs for hijacking their ML; if it's too OT/uninteresting, we'll take the discussion elsewhere.) -- Laurent --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 11477DC090A for ; Sat, 16 Jan 2016 17:48:13 +0000 (UTC) Received: from mail-oi0-f52.google.com (mail-oi0-f52.google.com [209.85.218.52]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id BEE72DC01AA for ; Sat, 16 Jan 2016 17:48:12 +0000 (UTC) Received: by mail-oi0-f52.google.com with SMTP id o124so144520681oia.3 for ; Sat, 16 Jan 2016 09:48:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=v9jVWqPrUFi2NGgfye+xoSK1hT4yEyDrUShQ3xqIo6A=; b=tEp1dKWOezJWmr6eowhBvASzLBL9t90DEKC+gWtRtnS1tOhlkkN2F6iNSriTwdN2GT HW0tnEzMdCCKgmuM2z1jK9jMb5hYIuM0F5FO68OOqXvxmVfxPJYf8OIAy++Xn1CytVBQ ka/+fMrKuvz5ZvWp279AgVdxWtuD/OTKlmeSVd2vhnLJe73N81LqLCOPBgo5In0/HTNM 34+yl/h2mboSQF5Cvwb2J5JVGek2xZUwoKkciwyLrBHhNel3ECeavD4q8/Nkp0TP4U3u 2MpUUtFfRCRCYzZSDhvoFkQq6yTQ4RfjLIDlFYPMOnua2wpLa6dS79RSYKjpQgYM4X5l 05Pg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=v9jVWqPrUFi2NGgfye+xoSK1hT4yEyDrUShQ3xqIo6A=; b=FsIUKSgrkjePWp9Fy+ibjKPovgi1XTjHa/mKimsxt9QgHQYRav8p8FzbQ8QUesIcCx psKPqwjax90b/SPx0pIB1t1aneEEseelgPkvnG0HOIzg0KvGVvoB6nQJjZZDNIpNfdaF Oaj8BI/egPWtbFWUCgf7PHAWEWTfCKcjsAH2tu864SRCiyLxpEMTPpppBkj+v19xDHZE hcrEwhqXEta/IlskWzRWxa3jIIUjuEPBto8r0RqDz7UsB6X9V3uMjzNoPm3MvDcw6mIx GnqKVE4hBwYJQmg6lw/hXPf2LAHzgJ14V9UnBNcugyLpjTX6ICWaLhNEEPfOwAnK+0EA TT2Q== X-Gm-Message-State: ALoCoQmeOpqhoSU5/XevgsOgdjlMbATTrtUQ+SiViTSmqAuk9AD9dg0FEp27/eBB2xdyuSYhDGRFuYOm9Of4wy5TvzqXfTGNkA== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.202.215.195 with SMTP id o186mr13231276oig.87.1452966490956; Sat, 16 Jan 2016 09:48:10 -0800 (PST) Received: by 10.202.81.6 with HTTP; Sat, 16 Jan 2016 09:48:10 -0800 (PST) In-Reply-To: <56978822.8020205@skarnet.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> <56978822.8020205@skarnet.org> Date: Sat, 16 Jan 2016 12:48:10 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=001a113d53109068980529771f3a X-Virus-Scanned: ClamAV using ClamSMTP --001a113d53109068980529771f3a Content-Type: text/plain; charset=UTF-8 Hi Laurent, apologies for the delay, On Thu, Jan 14, 2016 at 6:36 AM, Laurent Bercot wrote: > On 14/01/2016 06:55, Jude Nelson wrote: > >> I think you're close. The jist of it is that vdev needs to supply a >> lot more information than the kernel gives it. In particular, its >> helper programs go on to query the properties and status of each >> device (this often requires root privileges, i.e. via privileged >> ioctl()s), and vdev gathers the information into a (much larger) >> event packet and stores it in a directory tree under /dev for >> subsequent query by less-privileged programs. >> > > I see. > I think this is exactly what could be made modular. I've heard > people say they were reluctant to using vdev because it's not KISS, and > I suspect the ioctl machinery and data gathering is a large part of > the complexity. If that part could be pluggable, i.e. if admins could > choose a "data gatherer" just complex enough for their needs, I believe > it could encourage adoption. In other words, I'm looking at a 3-part > program: > - the netlink listener > - the data gatherer > - the event publisher > Of course, for libudev to work, you would need the full data gatherer; > but if people aren't using libudev programs, they can use a simpler one, > closer to what mdev is doing. It's all from a very high point-of-view, and I don't know the details of > the code so I have no idea whether it's envisionable for vdev, but that's > what I'm thinking off the top of my head. This sounds reasonable. In fact, within vdevd there are already distinct netlink listener and data gatherer threads that communicate over a producer/consumer queue. Splitting them into separate processes connected by a pipe is consistent with the current design, and would also help with portability. > > > > Funny you mention this--I also created runfs >> (https://github.com/jcnelson/runfs) to do exactly this. In >> particular, I use it for PID files. >> > > I have no love for mechanisms that help people keep using PID files, > which are an ugly relic that can't end up in the museum of mediaeval > programming soon enough. :P > Haha, true. I have other purposes for it though. That said, runfs is interesting, and I would love it if Unix provided > such a mechanism. Unfortunately, for now it has to rely on FUSE, which > is one of the most clunky mutant features of Linux, and an extra layer > of complexity; so I find it cleaner if a program can achieve its > functionality without depending on such a filesystem. > > I think this is one of the things Plan 9 got right--letting a process expose whatever fate-sharing state it wanted through the VFS. I agree that using FUSE to do this is a lot clunkier, but I don't think that's FUSE's fault. As far as I know, Linux doesn't allow a process to expose custom state through /proc. > > I agree that netlink is lighter, but I avoided it for two reasons: >> * Sometime down the road, I'd like to port vdev to OpenBSD. >> > > That's a good reason, and an additional reason to separate the > netlink listener from the event publisher (and the data gatherer). > The event publisher and client library can be made 100% portable, > whereas the netlink listener and data gatherer obviously cannot. > > > * There is no way to namespace netlink messages that I'm aware of. >> > > I didn't know that - I'm no netlink expert. But that's also a good > reason. AFAICT, there are 32 netlink multicast groups, and they use > hardcoded numbers - this is ugly, or at least requires a global > registry of what group is used for. If you can't namespace them, it > becomes even more of a scarce resource; although it's legitimate to > use one for uevent publishing, I'm pretty sure people will find a way > to clog them with random crap very soon - better stay away from > resources you can't reliably lock. And from what you're saying, even > systemd people have realized that. :) > > I'm not advocating netlink use for anything else than reading kernel > events. It's just that true multicast will be more efficient than manual > broadcast, there's no way around it. > > > By using a synthetic filesystem for >> message transport, I can use bind-mounts to control which device >> events get routed to which containers >> > > I'm torn between "oooh, clever" and "omg this hack is atrocious". :) > > Haha, thanks :) > > Yes, modulo some other mechanisms to ensure that the libudev-compat >> process doesn't get back-logged and lose messages. >> > > What do you mean by that? > If libudev-compat is, like libudev, linked into the application, then > you have no control over client behaviour; if a client doesn't properly > act on a notification, then there's nothing you can do about it and > it's not your responsibility. Can you give a few details about what > you're doing client-side? > > A bit of background: * Unlike netlink sockets, a program cannot control the size of an inotify descriptor's "receive" buffer. This is a system-wide constant, defined in /proc/sys/fs/inotify/max_queued_events. However, libudev offers clients the ability to do just this (via udev_monitor_set_receive_buffer_size). This is what I originally meant--libudev-compat needs to ensure that the desired receive buffer size is honored. * libudev's API exposes the udev_monitor's netlink socket descriptor directly to the client, so it can poll on it (via udev_monitor_get_fd). * libudev allows clients to define event filters, so they receive only the events that they want to receive (via udev_monitor_filter_*). The implementation achieves this by translating filters into BPF programs, and attaching them to the client's netlink socket. It is also somewhat complex, and I didn't want to have to re-write it each time I sync'ed the code with upstream. To work around these constraints, libudev-compat routes a udev_monitor's events through an internal socket pair. It uses inotify as an edge-trigger instead of a level-trigger: when there is at least one file to consume from the event directory, it will read as many files as it can and try to saturate the struct udev_monitor's socket pair (the number of bytes the socketpair can hold now gets controlled by udev_monitor_set_receive_buffer_size). The receive end of the socket pair and the inotify descriptor are unified into a single pollable epoll descriptor, which gets returned via libudev-compat's udev_monitor_get_fd (it will poll as ready if either there are unconsumed events in the socket pair, or a new file has arrived in the directory). The filtering implementation works almost unmodified, except that it attaches BPF programs to the udev_monitor's socket pair's receiving end instead of a netlink socket. In summary, the system doesn't try to outright prevent event loss for clients; it tries to ensure the clients can control their receive-buffer size, with expected results. One of the more subtle reasons for using eventfs is that it makes it possible to control the maximum number of bytes an event directory can hold. By making this work on a per-directory basis, the system retains the ability to control on a per-monitor basis the maximum number of events it will hold before NACKing the event-pusher. The udev_monitor_set_receive_buffer_size would also set the upper byte-limit value for its udev_monitor's event directory, thereby retaining the original API contract. > > I think both approaches are good ideas and would work just as well. >> I really like skabus's approach--I'll take a look at using it for >> message delivery as an additional (preferred?) vdev-to-libudev-compat >> message delivery mechanism :) It looks like it offers all the >> aforementioned benefits over netlink that I'm looking for. >> > > Unfortunately, it's not published yet, because there's still a lot > of work to be done on clients. And now I'm wondering whether it would > be more efficient to store messages in anonymous files and transmit > fds, instead of transmitting copies of messages. I may have to rewrite > stuff. :) > I think I'll be able to get back to work on skabus by the end of this > year - but no promises, since I'll be working on the Alpine init system > as soon as I'm done with my current contract. But I can leak a few > pieces of source code if you're interested. > > I'd be willing to take a crack at it, if I have time between now and the end of the year. I'm trying to finish my PhD this year, which is why vdev development has been slow-going for the past several months. Will keep you posted :) > > A question on the implementation--what do you think of having each >> subscriber create its own Unix domain socket in a canonical >> directory, and having the sender connect as a client to each >> subscriber? >> > > That's exactly how fifodirs work, with pipes instead of sockets. > But I don't think that's a good fit here. > > A point of fifodirs is to have many-to-many communication: there > are several subscribers, but there can also be several publishers > (even if in practice there's often only one publisher). Publishers and > subscribers are completely independent. > Here, you only ever have one publisher: the event dispatcher. You > only ever need one-to-many communication. > > Another point of fifodirs is to avoid the need for a daemon to act > as a bus. It's notification that happens between unrelated processes > without requiring a central server to ensure the communication. > It's important because I didn't want my supervision system (which is > supposed to manage daemons) to itself rely on a daemon (which would > then have to be unsupervised). > Here, you don't have that requirement, and you already have a daemon: > the event dispatcher is long-lived. I think a "socketdir" mechanism is just too heavy: > - for every event, you perform opendir(), readdir() and closedir() - for every event * subscriber, you perform at least socket(), connect(), > sendmsg() and close() - the client library needs to listen() and accept(), which means it > needs its own thread (and I hate, hate, hate, libraries that pull in > thread support in my otherwise single-threaded programs) > - the client library needs to perform access control on the socket, > to avoid connects from unrelated processes, and even then you can't > be certain it's the event publisher and not a random root process > > You definitely don't want a client library to be listen()ing. > listen() is server stuff - mixing client and server stuff is complex. > Too much so for what you need here. > Since each subscriber needs its own fd to read and >> close, the directory of subscriber sockets automatically gives the >> sender a list of who to communicate with and a count of how many fds >> to create. It also makes it easy to detect and clean up a dead >> subscriber's socket: the sender can request a struct ucred from a >> subscriber to get its PID (and then other details from /proc), and if >> the process ever exits (which the sender can detect on Linux using a >> netlink process monitor, like [1]), the process that created the >> socket can be assumed to be dead and the sender can unlink it. The >> sender would rely on additional process instance-identifying >> information from /proc (like its start-time) to avoid PID-reuse >> races. >> > > Bleh. Of course it can be made to work, but you really don't need all > that complexity. You have a daemon that wants to publish data, and > several clients that want to receive data from that daemon: it's > one (long-lived) to many (short-lived) communication, and there's a > perfectly appropriate, simple and portable IPC for that: a single Unix > domain socket that your daemon listens on and your clients connect to. > If you want to be perfectly reliable, you can implement some kind of > autoreconnect in the client library - in case you want to restart the > event publisher without killing X, for instance. But that's still a > lot simpler than playing with multiple sockets and mixing clients and > serverswhen you don't need to. Agreed--if the event dispatcher is going to be a message bus, then a lot of the aforementioned difficulties can be eliminated by design. But I'm uncomfortable with the technical debt it can introduce to the ecosystem--for example, a message bus has its own semantics that effectively require a bus-specific library, clients' design choices can require a message bus daemon to be running at all times, pervasive use of the message bus by system-level software can make the implementation a hard requirement for having a usable system, etc. (in short, we get dbus again). By going with filesystem-oriented approach, this risk is averted, since the filesystem interface is well-understood, universally supported, and somewhat future-proof. Most programs can use it without being aware of the fact. > > > > Thanks again for all your input! >> > > No problem. I love design discussions, I can't get enough of them. > (The reason why I left the Devuan mailing-list is that there was too > much ideological mumbo-jumbo, and not enough technical/design stuff. > Speaking of which, my apologies to Alpine devs for hijacking their ML; > if it's too OT/uninteresting, we'll take the discussion elsewhere.) Happy to move offline, unless the Alpine devs still want to be CC'ed :) -Jude > --- > Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org > Help: alpine-devel+help@lists.alpinelinux.org > --- > > --001a113d53109068980529771f3a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Laurent, apologies for the delay,

On Thu, Jan 14, 2016 at 6:36 AM, Lauren= t Bercot <ska-devel@skarnet.org> wrote:
On 14/01/2016 06:55, Jude Nelson wrote:
I think you're close.=C2=A0 The jist of it is that vdev needs to supply= a
lot more information than the kernel gives it.=C2=A0 In particular, its
helper programs go on to query the properties and status of each
device (this often requires root privileges, i.e. via privileged
ioctl()s), and vdev gathers the information into a (much larger)
event packet and stores it in a directory tree under /dev for
subsequent query by less-privileged programs.

=C2=A0I see.
=C2=A0I think this is exactly what could be made modular. I've heard people say they were reluctant to using vdev because it's not KISS, and=
I suspect the ioctl machinery and data gathering is a large part of
the complexity. If that part could be pluggable, i.e. if admins could
choose a "data gatherer" just complex enough for their needs, I b= elieve
it could encourage adoption. In other words, I'm looking at a 3-part program:
=C2=A0- the netlink listener
=C2=A0- the data gatherer
=C2=A0- the event publisher

=C2=A0Of course, for libudev to work, you would need the full data gatherer= ;
but if people aren't using libudev programs, they can use a simpler one= ,
closer to what mdev is doing.=C2=A0
=C2=A0It's all from a very high point-of-view, and I don't know the= details of
the code so I have no idea whether it's envisionable for vdev, but that= 's
what I'm thinking off the top of my head.

This sounds reasonable.=C2=A0 In fact, within vdevd there are already di= stinct netlink listener and data gatherer threads that communicate over a p= roducer/consumer queue.=C2=A0 Splitting them into separate processes connec= ted by a pipe is consistent with the current design, and would also help wi= th portability.
=C2=A0


Funny you mention this--I also created runfs
(https://github.com/jcnelson/runfs) to do exactly this.=C2=A0 I= n
particular, I use it for PID files.

=C2=A0I have no love for mechanisms that help people keep using PID files,<= br> which are an ugly relic that can't end up in the museum of mediaeval programming soon enough. :P

Haha, true.= =C2=A0 I have other purposes for it though.

=C2=A0That said, runfs is interesting, and I would love it if Unix provided=
such a mechanism. Unfortunately, for now it has to rely on FUSE, which
is one of the most clunky mutant features of Linux, and an extra layer
of complexity; so I find it cleaner if a program can achieve its
functionality without depending on such a filesystem.


I think this is one of the things Plan 9 g= ot right--letting a process expose whatever fate-sharing state it wanted th= rough the VFS.=C2=A0 I agree that using FUSE to do this is a lot clunkier, = but I don't think that's FUSE's fault.=C2=A0 As far as I know, = Linux doesn't allow a process to expose custom state through /proc.
=C2=A0

I agree that netlink is lighter, but I avoided it for two reasons:
* Sometime down the road, I'd like to port vdev to OpenBSD.

=C2=A0That's a good reason, and an additional reason to separate the netlink listener from the event publisher (and the data gatherer).
The event publisher and client library can be made 100% portable,
whereas the netlink listener and data gatherer obviously cannot.


* There is no way to namespace netlink messages that I'm aware of.

=C2=A0I didn't know that - I'm no netlink expert. But that's al= so a good
reason. AFAICT, there are 32 netlink multicast groups, and they use
hardcoded numbers - this is ugly, or at least requires a global
registry of what group is used for. If you can't namespace them, it
becomes even more of a scarce resource; although it's legitimate to
use one for uevent publishing, I'm pretty sure people will find a way to clog them with random crap very soon - better stay away from
resources you can't reliably lock. And from what you're saying, eve= n
systemd people have realized that. :)

=C2=A0I'm not advocating netlink use for anything else than reading ker= nel
events. It's just that true multicast will be more efficient than manua= l
broadcast, there's no way around it.


By using a synthetic filesystem for
message transport, I can use bind-mounts to control which device
events get routed to which containers

=C2=A0I'm torn between "oooh, clever" and "omg this hack= is atrocious". :)


Haha, thanks :)
=C2= =A0

Yes, modulo some other mechanisms to ensure that the libudev-compat
process doesn't get back-logged and lose messages.

=C2=A0What do you mean by that?
=C2=A0If libudev-compat is, like libudev, linked into the application, then=
you have no control over client behaviour; if a client doesn't properly=
act on a notification, then there's nothing you can do about it and
it's not your responsibility. Can you give a few details about what
you're doing client-side?


A bit of background:
* Unlike ne= tlink sockets, a program cannot control the size of an inotify descriptor&#= 39;s "receive" buffer.=C2=A0 This is a system-wide constant, defi= ned in=C2=A0/proc/sys/fs/inotify/max_queued_events.=C2=A0 However, libudev = offers clients the ability to do just this (via=C2=A0udev_monitor_set_recei= ve_buffer_size).=C2=A0 This is what I originally meant--libudev-compat need= s to ensure that the desired receive buffer size is honored.
* libudev&#= 39;s API exposes the udev_monitor's netlink socket descriptor directly = to the client, so it can poll on it (via=C2=A0udev_monitor_get_fd).
* li= budev allows clients to define event filters, so they receive only the even= ts that they want to receive (via udev_monitor_filter_*).=C2=A0 The impleme= ntation achieves this by translating filters into BPF programs, and attachi= ng them to the client's netlink socket.=C2=A0 It is also somewhat compl= ex, and I didn't want to have to re-write it each time I sync'ed th= e code with upstream.

To work around these constra= ints, libudev-compat routes a udev_monitor's events through an internal= socket pair.=C2=A0 It uses inotify as an edge-trigger instead of a level-t= rigger: =C2=A0when there is at least one file to consume from the event dir= ectory, it will read as many files as it can and try to saturate the struct= udev_monitor's socket pair (the number of bytes the socketpair can hol= d now gets controlled by udev_monitor_set_receive_buffer_size).=C2=A0 The r= eceive end of the socket pair and the inotify descriptor are unified into a= single pollable epoll descriptor, which gets returned via libudev-compat&#= 39;s =C2=A0udev_monitor_get_fd (it will poll as ready if either there are u= nconsumed events in the socket pair, or a new file has arrived in the direc= tory).=C2=A0 The filtering implementation works almost unmodified, except t= hat it attaches BPF programs to the udev_monitor's socket pair's re= ceiving end instead of a netlink socket.

In summar= y, the system doesn't try to outright prevent event loss for clients; i= t tries to ensure the clients can control their receive-buffer size, with e= xpected results.=C2=A0 One of the more subtle reasons for using eventfs is = that it makes it possible to control the maximum number of bytes an event d= irectory can hold.=C2=A0 By making this work on a per-directory basis, the = system retains the ability to control on a per-monitor basis the maximum nu= mber of events it will hold before NACKing the event-pusher.=C2=A0 The=C2= =A0udev_monitor_set_receive_buffer_size would also set the upper byte-limit= value for its udev_monitor's event directory, thereby retaining the or= iginal API contract.
=C2=A0

I think both approaches are good ideas and would work just as well.
I really like skabus's approach--I'll take a look at using it for message delivery as an additional (preferred?) vdev-to-libudev-compat
message delivery mechanism :)=C2=A0 It looks like it offers all the
aforementioned benefits over netlink that I'm looking for.

=C2=A0Unfortunately, it's not published yet, because there's still = a lot
of work to be done on clients. And now I'm wondering whether it would be more efficient to store messages in anonymous files and transmit
fds, instead of transmitting copies of messages. I may have to rewrite
stuff. :)
=C2=A0I think I'll be able to get back to work on skabus by the end of = this
year - but no promises, since I'll be working on the Alpine init system=
as soon as I'm done with my current contract. But I can leak a few
pieces of source code if you're interested.


I'd be willing to take a cr= ack at it, if I have time between now and the end of the year.=C2=A0 I'= m trying to finish my PhD this year, which is why vdev development has been= slow-going for the past several months.=C2=A0 Will keep you posted :)
=C2=A0

A question on the implementation--what do you think of having each
subscriber create its own Unix domain socket in a canonical
directory, and having the sender connect as a client to each
subscriber?

=C2=A0That's exactly how fifodirs work, with pipes instead of sockets.<= br> =C2=A0But I don't think that's a good fit here.

=C2=A0A point of fifodirs is to have many-to-many communication: there
are several subscribers, but there can also be several publishers
(even if in practice there's often only one publisher). Publishers and<= br> subscribers are completely independent.
=C2=A0Here, you only ever have one publisher: the event dispatcher. You
only ever need one-to-many communication.

=C2=A0Another point of fifodirs is to avoid the need for a daemon to act as a bus. It's notification that happens between unrelated processes without requiring a central server to ensure the communication.
It's important because I didn't want my supervision system (which i= s
supposed to manage daemons) to itself rely on a daemon (which would
then have to be unsupervised).
=C2=A0Here, you don't have that requirement, and you already have a dae= mon:
the event dispatcher is long-lived.
=C2=A0I think a "socketdir" mechanism is just too heavy:
=C2=A0- for every event, you perform opendir(), readdir() and closedir()=C2= =A0
=C2=A0- for every event * subscriber, you perform at least socket(), connec= t(),
sendmsg() and close()
=C2=A0- the client library needs to listen() and accept(), which means it needs its own thread (and I hate, hate, hate, libraries that pull in
thread support in my otherwise single-threaded programs)
=C2=A0- the client library needs to perform access control on the socket, to avoid connects from unrelated processes, and even then you can't
be certain it's the event publisher and not a random root process

=C2=A0You definitely don't want a client library to be listen()ing.
listen() is server stuff - mixing client and server stuff is complex.
Too much so for what you need here.

=C2=A0Since each subscriber needs its own fd to read and
close, the directory of subscriber sockets automatically gives the
sender a list of who to communicate with and a count of how many fds
to create.=C2=A0 It also makes it easy to detect and clean up a dead
subscriber's socket:=C2=A0 the sender can request a struct ucred from a=
subscriber to get its PID (and then other details from /proc), and if
the process ever exits (which the sender can detect on Linux using a
netlink process monitor, like [1]), the process that created the
socket can be assumed to be dead and the sender can unlink it.=C2=A0 The sender would rely on additional process instance-identifying
information from /proc (like its start-time) to avoid PID-reuse
races.

=C2=A0Bleh. Of course it can be made to work, but you really don't need= all
that complexity. You have a daemon that wants to publish data, and
several clients that want to receive data from that daemon: it's
one (long-lived) to many (short-lived) communication, and there's a
perfectly appropriate, simple and portable IPC for that: a single Unix
domain socket that your daemon listens on and your clients connect to.
=C2=A0If you want to be perfectly reliable, you can implement some kind of<= br> autoreconnect in the client library - in case you want to restart the
event publisher without killing X, for instance. But that's still a
lot simpler than playing with multiple sockets and mixing clients and
serverswhen you don't need to.

Agreed--= if the event dispatcher is going to be a message bus, then a lot of the afo= rementioned difficulties can be eliminated by design.=C2=A0 But I'm unc= omfortable with the technical debt it can introduce to the ecosystem--for e= xample, a message bus has its own semantics that effectively require a bus-= specific library, clients' design choices can require a message bus dae= mon to be running at all times, pervasive use of the message bus by system-= level software can make the implementation a hard requirement for having a = usable system, etc. (in short, we get dbus again).=C2=A0 By going with file= system-oriented approach, this risk is averted, since the filesystem interf= ace is well-understood, universally supported, and somewhat future-proof.= =C2=A0 Most programs can use it without being aware of the fact.
= =C2=A0



Thanks again for all your input!

=C2=A0No problem. I love design discussions, I can't get enough of them= .
(The reason why I left the Devuan mailing-list is that there was too
much ideological mumbo-jumbo, and not enough technical/design stuff.
Speaking of which, my apologies to Alpine devs for hijacking their ML;
if it's too OT/uninteresting, we'll take the discussion elsewhere.)=

Happy to move offline, unless the Alpine d= evs still want to be CC'ed :)

-Jude
=

---
Unsubscribe:=C2=A0 alpine-devel+unsubscribe@lists.alpinelinux.or= g
Help:=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0alpine-devel+help@lists.alpineli= nux.org
---


--001a113d53109068980529771f3a-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 4BE30DC0309 for ; Mon, 18 Jan 2016 12:14:27 +0000 (UTC) Received: from smtp1.tech.numericable.fr (smtp1.tech.numericable.fr [82.216.111.37]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id E144BDC014A for ; Mon, 18 Jan 2016 12:14:26 +0000 (UTC) Received: from sinay.internal.skarnet.org (ip-62.net-82-216-6.versailles2.rev.numericable.fr [82.216.6.62]) by smtp1.tech.numericable.fr (Postfix) with SMTP id 840131405B3 for ; Mon, 18 Jan 2016 13:14:19 +0100 (CET) Received: (qmail 24344 invoked from network); 18 Jan 2016 12:14:44 -0000 Received: from elzian.internal.skarnet.org. (HELO ?192.168.0.2?) (192.168.0.2) by sinay.internal.skarnet.org. with SMTP; 18 Jan 2016 12:14:44 -0000 Subject: Re: [alpine-devel] udev replacement on Alpine Linux To: alpine-devel@lists.alpinelinux.org References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> <56978822.8020205@skarnet.org> From: Laurent Bercot Message-ID: <569CD71C.2020407@skarnet.org> Date: Mon, 18 Jan 2016 13:14:20 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: 50 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrfeekiedrleeigdefiecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgnecuuegrihhlohhuthemuceftddtnecuogetfeejfedqtdegucdlhedtmdenucfjughrpefuvfhfhffkffgfgggjtgfgsehtjegrtddtfeejnecuhfhrohhmpefnrghurhgvnhhtuceuvghrtghothcuoehskhgrqdguvghvvghlsehskhgrrhhnvghtrdhorhhgqeenucffohhmrghinhepshhkrghrnhgvthdrohhrghenucfrrghrrghmpehmohguvgepshhmthhpohhuth X-Virus-Scanned: ClamAV using ClamSMTP On 16/01/2016 18:48, Jude Nelson wrote: > This sounds reasonable. In fact, within vdevd there are already > distinct netlink listener and data gatherer threads that communicate > over a producer/consumer queue. Splitting them into separate > processes connected by a pipe is consistent with the current design, > and would also help with portability. I have a standalone netlink listener: http://skarnet.org/software/s6-linux-utils/s6-uevent-listener.html Any data gatherer / event dispatcher program can be used behind it. I'm currently using it as "s6-uevent-listener s6-uevent-spawner mdev", which spawns a mdev instance per uevent. Ideally, I should be able to use it as something like "s6-uevent-listener vdev-data-gatherer vdev-event-dispatcher" and have a pipeline of 3 long-lived processes, every process being independently replaceable on the command-line by any other implementation that uses the same API. > I think this is one of the things Plan 9 got right--letting a process > expose whatever fate-sharing state it wanted through the VFS. The more I keep hearing about Plan 9, the more I tell myself I really need to try it out. The day where I actually do it is getting closer and closer - I'm just afraid that once I do, I'll realize how horrible Unix is and won't ever want to work with Unix again, which would be bad for my financial well-being. XD > * Unlike netlink sockets, a program cannot > control the size of an inotify descriptor's "receive" buffer. This > is a system-wide constant, defined in > /proc/sys/fs/inotify/max_queued_events. However, libudev offers > clients the ability to do just this (via > udev_monitor_set_receive_buffer_size). This is what I originally > meant--libudev-compat needs to ensure that the desired receive buffer > size is honored. Reading the udev_monitor doc pages stirs up horrible memories of the D-Bus API. Urge to destroy world rising. It looks like udev_monitor_set_receive_buffer_size() could be completely stubbed out for your implementation via inotify. It is only useful when events queue up in the kernel buffer because a client isn't reading them fast enough; but with your system, events are stored in the filesystem so they will never be lost - so there's no such thing as a meaningful "kernel buffer" in your case, and nobody cares what its size is: clients will always have access to the full set of events. "return 0;" is the implementation you want here. > To work around these constraints, libudev-compat routes a > udev_monitor's events through an internal socket pair. > (cut layers upon layers of hacks to emulate udev_monitor filters) Blech. I understand the API is inherently complex and kinda enforces the system's architecture - which is very similar to what systemd does, so it's very unsurprising to me that systemd phagocyted udev: those two were *made* to be together - but it looks like by deciding to do things differently and wanting to still provide compatibility, you ended up coding something that's just as complex, and more convoluted (since you're not using the original mechanisms) than the original. The filter mechanism is horribly specific and does not leave much room for alternative implementations, so I know it's hard to do correctly, but it seems to me that your implementation gets the worst of both worlds: - one of your implementation's advantages is that clients can never lose events, but by piling your socketpair thingy onto it for an "accurate" udev_monitor emulation, you make it so clients can actually shoot themselves in the foot. It may be accurate, but it's lower quality than your idea permits. - the original udev implementation's advantage is that clients are never woken up when an event arrives if the event doesn't pass the filter. Here, your application will never be woken up indeed, but libudev-compat will be, since you will get readability on your inotify descriptor. Filters are not server-side (or even kernel-side) as udev intended, they're client-side, and that's not efficient. I believe that you'd be much better off simply using a normal Unix socket connection from the client to an event dispatcher daemon, and implementing a small protocol where udev_monitor_filter primitives just write strings to the socket, and the server reads them and implements filters server-side by *not* linking filtered events to the client's event directory. This way, clients really aren't woken up by events that do not pass the filter. > But I'm uncomfortable with the technical debt it can introduce to the > ecosystem--for example, a message bus has its own semantics that > effectively require a bus-specific library, clients' design choices > can require a message bus daemon to be running at all times, > pervasive use of the message bus by system-level software can make > the implementation a hard requirement for having a usable system, > etc. (in short, we get dbus again). Huh? I wasn't suggesting using a generic bus. I was suggesting that the natural architecture for an event dispatcher was that of a single publisher (the server) with multiple subscribers (the clients). And that was similar to a bus - except simpler, because you don't even have multiple publishers. It's not about using a system bus or anything of the kind. It's about writing the event dispatcher and the client library as you'd write a bus server and a bus client library (and please, forget about the insane D-Bus model of message-passing between symmetrical peers - a client-server model is much simpler, and easier to implement, at least on Unix). Good luck with your Ph.D. thesis! -- Laurent --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id C52E2DC0A15 for ; Tue, 19 Jan 2016 06:20:34 +0000 (UTC) Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id 9A9ACDC0268 for ; Tue, 19 Jan 2016 06:20:33 +0000 (UTC) Received: by mail-ob0-f179.google.com with SMTP id py5so211332458obc.2 for ; Mon, 18 Jan 2016 22:20:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cGut8z4t3f62JMkaF5/W2edKE8IJQXTipu+1dbtQiFI=; b=NCIWOIzf2Zo4L3vDYW5DZWVGFzxDe1yCOcxoE07Tip7ZEHn4TM+6i9KkG9EPBPYvru f0gM3VP1ben+Oifu7pcGiMRgqAc3fAKqh1Z3QY49kYNNUbWTvJblHmcqVZSgDwVKywNR o6opWlWN5Us4r/r5d6ewKOK/yIVltf7bLmOUffLvR5P5RtDkGHJnmj7iDmUSSbte0Nmh mvJA1TKLw6pIokcFe/xVd5bwX7zfLJA9Pf/2RlDVe/ClUDjNEYLi+Mi0HiOBf0eQLsn2 Lxgf8MyTq+7i03ieBzi/AfhaHRZiH60u375RuskkXLEmrgPfgRGssmDXid6oQCz1MCRY rawg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=cGut8z4t3f62JMkaF5/W2edKE8IJQXTipu+1dbtQiFI=; b=mjE0OKEkWj4movEOvY5SEZLO4jIKZ+l31V3bAASjrMyS78livlXI3jkG/WsRLvMm0u qonMr/MQqZvwmdQ2e12g1LN/bA2+sItSc39AHXhIzXzKacsnHRkXkP4TceARWbOQSgB7 2QO0F122de5Gdv71w9GfDNB+FXvY0cJj+uDqyL/XDI0csMrqOY97aVTyg8CokNw195Cx g6Bg/uky0LkRX+/aCbj31122Yk69a7EVR308yNhfSCMbFefv6uyz9w2iTAE8CuF3d/Zo m5OODi6v6vzHSBlTGk9Yl6WsRX2t5YNjeMxNPNX/c0YSkpcGDBbg2pVwI8TByoeXyGsj dPHw== X-Gm-Message-State: ALoCoQnsbxFSXFNCX+O2VZBHSYnO0W0j+QCcGay+TzIwJGBCFiG6TwuRnO0RVueN4FYuW2EHN9RT71gPBipPTQUp6In2IshKig== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.60.134.202 with SMTP id pm10mr21998025oeb.50.1453184432705; Mon, 18 Jan 2016 22:20:32 -0800 (PST) Received: by 10.202.81.6 with HTTP; Mon, 18 Jan 2016 22:20:32 -0800 (PST) In-Reply-To: <569CD71C.2020407@skarnet.org> References: <20150727103737.4f95e523@ncopa-desktop.alpinelinux.org> <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> <56978822.8020205@skarnet.org> <569CD71C.2020407@skarnet.org> Date: Tue, 19 Jan 2016 01:20:32 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=047d7b417a63e798580529a9dd36 X-Virus-Scanned: ClamAV using ClamSMTP --047d7b417a63e798580529a9dd36 Content-Type: text/plain; charset=UTF-8 Hi Laurent, > I have a standalone netlink listener: > http://skarnet.org/software/s6-linux-utils/s6-uevent-listener.html > Any data gatherer / event dispatcher program can be used behind it. > I'm currently using it as "s6-uevent-listener s6-uevent-spawner mdev", > which spawns a mdev instance per uevent. > Ideally, I should be able to use it as something like > "s6-uevent-listener vdev-data-gatherer vdev-event-dispatcher" and have > a pipeline of 3 long-lived processes, every process being independently > replaceable on the command-line by any other implementation that uses > the same API. > > Sounds good! I'll aim to add that in the medium-term. > > * Unlike netlink sockets, a program cannot >> control the size of an inotify descriptor's "receive" buffer. This >> is a system-wide constant, defined in >> /proc/sys/fs/inotify/max_queued_events. However, libudev offers >> clients the ability to do just this (via >> udev_monitor_set_receive_buffer_size). This is what I originally >> meant--libudev-compat needs to ensure that the desired receive buffer >> size is honored. >> > > Reading the udev_monitor doc pages stirs up horrible memories of the > D-Bus API. Urge to destroy world rising. > > It looks like udev_monitor_set_receive_buffer_size() could be > completely stubbed out for your implementation via inotify. It is only > useful when events queue up in the kernel buffer because a client isn't > reading them fast enough; but with your system, events are stored in > the filesystem so they will never be lost - so there's no such thing as > a meaningful "kernel buffer" in your case, and nobody cares what its > size is: clients will always have access to the full set of events. > "return 0;" is the implementation you want here. > > > Blech. > I understand the API is inherently complex and kinda enforces the > system's architecture - which is very similar to what systemd does, so > it's very unsurprising to me that systemd phagocyted udev: those two > were *made* to be together - but it looks like by deciding to do things > differently and wanting to still provide compatibility, you ended up > coding something that's just as complex, and more convoluted (since > you're not using the original mechanisms) than the original. > > The filter mechanism is horribly specific and does not leave much > room for alternative implementations, so I know it's hard to do > correctly, but it seems to me that your implementation gets the worst > of both worlds: > - one of your implementation's advantages is that clients can never > lose events, but by piling your socketpair thingy onto it for an "accurate" > udev_monitor emulation, you make it so clients can actually shoot > themselves in the foot. It may be accurate, but it's lower quality than > your idea permits. - the original udev implementation's advantage is that clients are never > woken up when an event arrives if the event doesn't pass the filter. Here, > your application will never be woken up indeed, but libudev-compat will be, > since you will get readability on your inotify descriptor. Filters are > not server-side (or even kernel-side) as udev intended, they're > client-side, > and that's not efficient. > > I believe that you'd be much better off simply using a normal Unix > socket connection from the client to an event dispatcher daemon, and > implementing a small protocol where udev_monitor_filter primitives just > write strings to the socket, and the server reads them and implements > filters server-side by *not* linking filtered events to the > client's event directory. This way, clients really aren't woken up by > events that do not pass the filter. I agree with everything you have said. It is true that libudev-compat emphasizes compatibility to the point where it sacrifices simplicity and performance to achieve correctness (i.e. consistency with libudev's behavior). This is not because I believe in the soundness of libudev's design, but because I'm trying to avoid any breakage. Believe me, I would love to get away from libudev completely. If programs expect the device manager to expose device metadata and publish events, then the device manager should do so in a way that lets programs access them directly, without an additional client library. This is what vdev strives to do--its helpers expose all device metadata as a set of easy-to-parse files, and propagate events through the VFS (but I'm in favor of moving towards using an event dispatcher like you suggest, since that would be much simpler to implement and only incur a minimal increase to the subscriber's interface complexity). I think switching to a carefully-designed event dispatcher fixes both of these two problems, while allowing me to retain the unmodified event-filtering logic from libudev. Specifically, the event dispatcher would use a UNIX domain socket to establish a shared socket pair with each libudev-compat client, and libudev-compat would install the BPF programs on the client's end of the socket pair (this would also preserve the ability to set the receiving buffer size). This approach eliminates zero-copy multicast, but as you pointed out earlier this is probably not a problem in practice anymore, given how small messages are and how infrequent they appear to be. Moreover, device events could still be namespaced, for example: * each context would run its own event dispatcher * the parent context runs a client program (an "event-forwarder") that writes events to a FIFO * when the child context is started, the FIFO gets bind-mounted to a canonical location for its event dispatcher to connect to and receive events * the parent context controls which events get propagated to its children by interposing filtering programs between the event-forwarder and the shared FIFO (e.g. Don't want the child context to see USB hotplugs? Then capture and don't write USB events to the child's FIFO endpoint in the parent context.) > > But I'm uncomfortable with the technical debt it can introduce to the >> ecosystem--for example, a message bus has its own semantics that >> effectively require a bus-specific library, clients' design choices >> can require a message bus daemon to be running at all times, >> pervasive use of the message bus by system-level software can make >> the implementation a hard requirement for having a usable system, >> etc. (in short, we get dbus again). >> > > Huh? > I wasn't suggesting using a generic bus. > I was suggesting that the natural architecture for an event dispatcher > was that of a single publisher (the server) with multiple subscribers > (the clients). And that was similar to a bus - except simpler, because > you don't even have multiple publishers. > > It's not about using a system bus or anything of the kind. It's about > writing the event dispatcher and the client library as you'd write a bus > server and a bus client library (and please, forget about the insane > D-Bus model of message-passing between symmetrical peers - a client-server > model is much simpler, and easier to implement, at least on Unix). > Sorry--let me try to clarify what I meant. I was trying to say that one of the things that appeals to me about exposing events through a specialized filesystem is that it exposes a well-understood, universal, and easy-to-use API. All existing file-oriented tools would work with it, without modification. The downside is that it requires a somewhat complex implementation, as we discussed. I'm not suggesting that we look to dbus for inspiration :) I was trying to point out that while the upside of using an event dispatcher is that it has a simple implementation, the downside is that without careful design, an event dispatcher with a simple implementation can still evolve a complex contract with its client programs that is difficult to honor (so much so that a complex client library is all but required to mediate access to the dispatcher). I was pointing out that any system-wide complexity introduced by specifying a dispatcher-specific publish/subscribe protocol for device-aware applications should be considered as part of the "total complexity" of using an event dispatcher, so it can be minimized up-front (this was the "minimal increase to the subscriber's interface complexity" I mentioned above). But bringing this up was very academic of me ;) I don't think that using a carefully-designed event dispatcher is nearly as complex as using a filesystem. I feel like I can replace eventfs with an event dispatcher that is both simple to implement and simple to use, while lowering the overall complexity of device propagation and retaining enough functionality to achieve libudev compatibility for legacy programs. Thanks, Jude --047d7b417a63e798580529a9dd36 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi Laurent,
=C2=A0
=C2=A0I have a standalone netlink listener:
=C2=A0http://skarnet.org/software/= s6-linux-utils/s6-uevent-listener.html
=C2=A0Any data gatherer / event dispatcher program can be used behind it. I'm currently using it as "s6-uevent-listener s6-uevent-spawner md= ev",
which spawns a mdev instance per uevent.
=C2=A0Ideally, I should be able to use it as something like
"s6-uevent-listener vdev-data-gatherer vdev-event-dispatcher" and= have
a pipeline of 3 long-lived processes, every process being independently
replaceable on the command-line by any other implementation that uses
the same API.


Sounds good!=C2=A0 I'll aim= to add that in the medium-term.
=C2=A0

=C2=A0* Unlike netlink sockets, a program cannot
control the size of an inotify descriptor's "receive" buffer.= =C2=A0 This
is a system-wide constant, defined in
/proc/sys/fs/inotify/max_queued_events.=C2=A0 However, libudev offers
clients the ability to do just this (via
udev_monitor_set_receive_buffer_size).=C2=A0 This is what I originally
meant--libudev-compat needs to ensure that the desired receive buffer
size is honored.

=C2=A0Reading the udev_monitor doc pages stirs up horrible memories of the<= br> D-Bus API. Urge to destroy world rising.

=C2=A0It looks like udev_monitor_set_receive_buffer_size() could be
completely stubbed out for your implementation via inotify. It is only
useful when events queue up in the kernel buffer because a client isn't=
reading them fast enough; but with your system, events are stored in
the filesystem so they will never be lost - so there's no such thing as=
a meaningful "kernel buffer" in your case, and nobody cares what = its
size is: clients will always have access to the full set of events.
"return 0;" is the implementation you want here.
=

<snip>
=C2=A0

=C2=A0Blech.
=C2=A0I understand the API is inherently complex and kinda enforces the
system's architecture - which is very similar to what systemd does, so<= br> it's very unsurprising to me that systemd phagocyted udev: those two were *made* to be together - but it looks like by deciding to do things
differently and wanting to still provide compatibility, you ended up
coding something that's just as complex, and more convoluted (since
you're not using the original mechanisms) than the original.

=C2=A0The filter mechanism is horribly specific and does not leave much
room for alternative implementations, so I know it's hard to do
correctly, but it seems to me that your implementation gets the worst
of both worlds:
- one of your implementation's advantages is that clients can never
lose events, but by piling your socketpair thingy onto it for an "accu= rate"
udev_monitor emulation, you make it so clients can actually shoot
themselves in the foot. It may be accurate, but it's lower quality than=
your idea permits.=C2=A0
- the original udev implementation's advantage is that clients are neve= r
woken up when an event arrives if the event doesn't pass the filter. He= re,
your application will never be woken up indeed, but libudev-compat will be,=
since you will get readability on your inotify descriptor. Filters are
not server-side (or even kernel-side) as udev intended, they're client-= side,
and that's not efficient.

=C2=A0I believe that you'd be much better off simply using a normal Uni= x
socket connection from the client to an event dispatcher daemon, and
implementing a small protocol where udev_monitor_filter primitives just
write strings to the socket, and the server reads them and implements
filters server-side by *not* linking filtered events to the
client's event directory. This way, clients really aren't woken up = by
events that do not pass the filter.

I agree= with everything you have said.=C2=A0 It is true that libudev-compat emphas= izes compatibility to the point where it sacrifices simplicity and performa= nce to achieve correctness (i.e. consistency with libudev's behavior).= =C2=A0 This is not because I believe in the soundness of libudev's desi= gn, but because I'm trying to avoid any breakage.

<= div>Believe me, I would love to get away from libudev completely.=C2=A0 If = programs expect the device manager to expose device metadata and publish ev= ents, then the device manager should do so in a way that lets programs acce= ss them directly, without an additional client library.=C2=A0 This is what = vdev strives to do--its helpers expose all device metadata as a set of easy= -to-parse files, and propagate events through the VFS (but I'm in favor= of moving towards using an event dispatcher like you suggest, since that w= ould be much simpler to implement and only incur a minimal increase to the = subscriber's interface complexity).

I think sw= itching to a carefully-designed event dispatcher fixes both of these two pr= oblems, while allowing me to retain the unmodified event-filtering logic fr= om libudev.=C2=A0 Specifically, the event dispatcher would use a UNIX domai= n socket to establish a shared socket pair with each libudev-compat client,= and libudev-compat would install the BPF programs on the client's end = of the socket pair (this would also preserve the ability to set the receivi= ng buffer size).=C2=A0 This approach eliminates zero-copy multicast, but as= you pointed out earlier this is probably not a problem in practice anymore= , given how small messages are and how infrequent they appear to be.=C2=A0 = Moreover, device events could still be namespaced, for example:
*= each context would run its own event dispatcher
* the parent context ru= ns a client program (an "event-forwarder") that writes events to = a FIFO
* when the child context is started, the FIFO gets bind-mo= unted to a canonical location for its event dispatcher to connect to and re= ceive events
* the parent context controls which events get propa= gated to its children by interposing filtering programs between the event-f= orwarder and the shared FIFO (e.g. Don't want the child context to see = USB hotplugs?=C2=A0 Then capture and don't write USB events to the chil= d's FIFO endpoint in the parent context.)
=C2=A0

But I'm uncomfortable with the technical debt it can introduce to the ecosystem--for example, a message bus has its own semantics that
effectively require a bus-specific library, clients' design choices
can require a message bus daemon to be running at all times,
pervasive use of the message bus by system-level software can make
the implementation a hard requirement for having a usable system,
etc. (in short, we get dbus again).

=C2=A0Huh?
=C2=A0I wasn't suggesting using a generic bus.
=C2=A0I was suggesting that the natural architecture for an event dispatche= r
was that of a single publisher (the server) with multiple subscribers
(the clients). And that was similar to a bus - except simpler, because
you don't even have multiple publishers.

=C2=A0It's not about using a system bus or anything of the kind. It'= ;s about
writing the event dispatcher and the client library as you'd write a bu= s
server and a bus client library (and please, forget about the insane
D-Bus model of message-passing between symmetrical peers - a client-server<= br> model is much simpler, and easier to implement, at least on Unix).

Sorry--let me try to clarify what I meant.=C2= =A0 I was trying to say that one of the things that appeals to me about exp= osing events through a specialized filesystem is that it exposes a well-und= erstood, universal, and easy-to-use API.=C2=A0 All existing file-oriented t= ools would work with it, without modification.=C2=A0 The downside is that i= t requires a somewhat complex implementation, as we discussed.
I'm not suggesting that we look to dbus for inspiration :)= =C2=A0I was trying to point out that while the upside of using an event di= spatcher is that it has a simple implementation, the downside is that witho= ut careful design, an event dispatcher with a simple implementation can sti= ll evolve a complex contract with its client programs that is difficult to = honor (so much so that a complex client library is all but required to medi= ate access to the dispatcher).=C2=A0 I was pointing out that any system-wid= e complexity introduced by specifying a dispatcher-specific publish/subscri= be protocol for device-aware applications should be considered as part of t= he "total complexity" of using an event dispatcher, so it can be = minimized up-front (this was the "minimal increase to the subscriber&#= 39;s interface complexity" I mentioned above).=C2=A0 But bringing this= up was very academic of me ;) =C2=A0I don't think that using a careful= ly-designed event dispatcher is nearly as complex as using a filesystem.
I feel like I can replace eventfs with an event dispatcher that is bot= h simple to implement and simple to use, while lowering the overall complex= ity of device propagation and retaining enough functionality to achieve lib= udev compatibility for legacy programs.
=C2=A0
Thanks,<= /div>
Jude
--047d7b417a63e798580529a9dd36-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id 288E9DC1BAF for ; Fri, 15 Jan 2016 04:55:02 +0000 (UTC) Received: from mail-pa0-f54.google.com (mail-pa0-f54.google.com [209.85.220.54]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id E3F98DC0EC0 for ; Fri, 15 Jan 2016 04:55:01 +0000 (UTC) Received: by mail-pa0-f54.google.com with SMTP id uo6so373308826pac.1 for ; Thu, 14 Jan 2016 20:55:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=RIkeKLRq2X3r9QYhKFP9RKpaJcli2chjqXlRvZaJ1Fg=; b=Dph1ZUpulloF962ARnFxsyNrGxyvFnh/2b/1/caLE2xBOS03lZayHT/CBHk4CHXu+L 7vZPsOtGo5nR6KODo73QLTnqNs3BLV41GuGRYtIrT0yrBaPQAo/0JTu+rMpLq375T9g7 iOdl30J6Xrwcb9li/TUkmaj4IraK9otF/xo+9C3vOnTQ5xw0n0omXJEoV1d7fbNIf4pB cMpp2tW5+YQzf1n4eOrDQlE/t1nPQVUdMPoNfpulTPy9faCia/Q1QzpEBQQm6vPBcf7L 9jOT88+GfXC9UEFBCp4SMhdgUzt9e1K3UtEfRpPad+tBNIlkoR3hw4siu6kT5Gbk3KMr aPCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-type:content-disposition:in-reply-to :user-agent; bh=RIkeKLRq2X3r9QYhKFP9RKpaJcli2chjqXlRvZaJ1Fg=; b=ebb67hIrb2Q7HJs71qXdUSgQpi4FPm/+v4Xba0/pCcT2YjScO50lymPQIp81tN/bSd p8FawIGki2DbzPIX5HtZDac/a1lr8DvYqWn8KNZnroggUfuXWynuwQSoPLqTbfuKBzhm 0bQ/NLFBJ2oL2dP4lffuBHxZuT6uFL9XGqO9iehuwHb9mVgG0BA4AtwXeR5F+GIUXFJB C2XiLgpDPHPfNjt6HmWCh9VhOLEQJ10E2NPVugxnYrXZalOSwj9OuVm0TpebYRsU+FwP BJzUfxJxwEUWdZDvFhd8g5DRwv+OU8aFPKWEVVDsJYvWy5BURVGtILoNc3HmjlXYVLU1 1Sdw== X-Gm-Message-State: ALoCoQkIoRz1w9kuA0Pmjaa7cRPINic6z1upFPGzzaYSML1o2S2RZz9HQzdXGlYqw2Uca8aQoofvBCBhvZUd/lKmqa5bq0cxkw== X-Received: by 10.66.250.165 with SMTP id zd5mr12113654pac.9.1452833700817; Thu, 14 Jan 2016 20:55:00 -0800 (PST) Received: from newbook ([50.0.225.136]) by smtp.gmail.com with ESMTPSA id n2sm12663356pfj.16.2016.01.14.20.54.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 14 Jan 2016 20:55:00 -0800 (PST) Date: Thu, 14 Jan 2016 20:54:52 -0800 From: Isaac Dunham To: Laurent Bercot Cc: alpine-devel@lists.alpinelinux.org Subject: Re: [alpine-devel] udev replacement on Alpine Linux Message-ID: <20160115045451.GA10573@newbook> References: <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> <56978822.8020205@skarnet.org> X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56978822.8020205@skarnet.org> User-Agent: Mutt/1.5.24 (2015-08-30) X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Jan 14, 2016 at 12:36:02PM +0100, Laurent Bercot wrote: > On 14/01/2016 06:55, Jude Nelson wrote: > >I think you're close. The jist of it is that vdev needs to supply a > >lot more information than the kernel gives it. In particular, its > >helper programs go on to query the properties and status of each > >device (this often requires root privileges, i.e. via privileged > >ioctl()s), and vdev gathers the information into a (much larger) > >event packet and stores it in a directory tree under /dev for > >subsequent query by less-privileged programs. > > I see. > I think this is exactly what could be made modular. I've heard > people say they were reluctant to using vdev because it's not KISS, and > I suspect the ioctl machinery and data gathering is a large part of > the complexity. If that part could be pluggable, i.e. if admins could > choose a "data gatherer" just complex enough for their needs, I believe > it could encourage adoption. In other words, I'm looking at a 3-part > program: > - the netlink listener > - the data gatherer > - the event publisher > > Of course, for libudev to work, you would need the full data gatherer; > but if people aren't using libudev programs, they can use a simpler one, > closer to what mdev is doing. > It's all from a very high point-of-view, and I don't know the details of > the code so I have no idea whether it's envisionable for vdev, but that's > what I'm thinking off the top of my head. I haven't really looked at the vdevd code in a while, but from what I recollect... The vdev/libudev-compat "solution" is split up as follows: -the netlink listener, vdevd -several data gatherers, which are invoked according to a set of rules for vdevd (analogous to mdev.conf) -helper scripts to create extra links and so on, run via the same rules -I don't recall exactly how the event publishing is done, but IIRC, there's a daemon watching the directory where the helpers write out the data, and then distributing it as described -libudev-compat is libudev, patched to read events that are published as described -IIRC, there's some helper that will (also? instead?) maintain a list of devices more like the way udev does, so you don't *need* libudev-compat. Jude went to a bit of effort to design vdevd and libudev-compat so that it would be possible to use part alongside mdev. The discussion is in the Devuan archives; I don't recall how far back, though. HTH, Isaac Dunham --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org --- From nobody Thu Mar 28 10:34:08 2024 X-Original-To: alpine-devel@mail.alpinelinux.org Delivered-To: alpine-devel@mail.alpinelinux.org Received: from mail.alpinelinux.org (dallas-a1.alpinelinux.org [127.0.0.1]) by mail.alpinelinux.org (Postfix) with ESMTP id EA343DC46B1 for ; Sat, 16 Jan 2016 18:25:04 +0000 (UTC) Received: from mail-ob0-f179.google.com (mail-ob0-f179.google.com [209.85.214.179]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by mail.alpinelinux.org (Postfix) with ESMTPS id C5420DC090A for ; Sat, 16 Jan 2016 18:25:04 +0000 (UTC) Received: by mail-ob0-f179.google.com with SMTP id vt7so134593415obb.1 for ; Sat, 16 Jan 2016 10:25:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=HTheBPEdKzsJaVWkGjct48d0EZEyx6qms0K+YVutCfk=; b=z5nL6DfjsAoUWzUEXSgkaYF934aPmOBULEU63lsvEmomqDXOX3qyuBWTgAVwacaTqO c2gSdvZtzZhD1RtHaOtkQbxK3fHkLQDDW+TwsSIkDnArsw1G5iZvgTIJ3G5TWe+GFyoM LPi+iJmxLd0BqM3RI49QZcA+Y4ztcEYBwS/zZcX/hN5aWob9qIH7zmK2wJtzmu4rNCfc Ff3RTqiQmhHq/la01eNfTNitFQd592GLMj6kZmx/U/dGwIljJ/ZRpKwistH79deOmM/M Da6PF3bZsFHyEz4IEaI55elroYgpR+zcDVTsBUDlqCjjCWQDPv3F0gXe1oNNmZaaH85v 0Kvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=HTheBPEdKzsJaVWkGjct48d0EZEyx6qms0K+YVutCfk=; b=ZgDyMsxLv9vhLLxGbK0Y+qTulXxbLTj91eTFGBXS5HaOJg7nqXihJY8TRnx1SuVpw6 9FipB9cmdzXAZzzLHbOncEt2ehvaL9ik7BdSyoR3Qgx7iv6eOuDZmLUaHLwrq2FuL+JQ wMDpVD2R7cZC9XEJApbNrQJidy0UQKRyTFtuiB4rASEDRyGUk1KYIcKbcJavemjCUtB8 3cTxCcla/Cc34N+c9UBCOaY6q0fvCA0rzp0yd1XOH6W3C97EBvIQGyJvYz5sxpATvKdM if1tR3ijPgQdPINahKUdOYXrxKGPLYbk34bh/DrE14gbyecmAzVNbEm14o1a7JkaBMa5 lWMQ== X-Gm-Message-State: ALoCoQkbQVl+4LPdYtfsfTo4QDuOLE5autYw+TO4pLh7hSXjQ1OBVlVoxuCTuxiBubRqkRIZKmH6sET9P7FwhMjh3JYcRQ7ZOQ== X-Mailinglist: alpine-devel Precedence: list List-Id: Alpine Development List-Unsubscribe: List-Post: List-Help: List-Subscribe: MIME-Version: 1.0 X-Received: by 10.182.19.234 with SMTP id i10mr13122936obe.67.1452968703950; Sat, 16 Jan 2016 10:25:03 -0800 (PST) Received: by 10.202.81.6 with HTTP; Sat, 16 Jan 2016 10:25:03 -0800 (PST) In-Reply-To: <20160115045451.GA10573@newbook> References: <20150728052436.GC1923@newbook> <20160112153804.GI32545@example.net> <56953ABE.5090203@skarnet.org> <56958E22.90806@skarnet.org> <56964414.1000605@skarnet.org> <56978822.8020205@skarnet.org> <20160115045451.GA10573@newbook> Date: Sat, 16 Jan 2016 13:25:03 -0500 Message-ID: Subject: Re: [alpine-devel] udev replacement on Alpine Linux From: Jude Nelson To: Isaac Dunham Cc: Laurent Bercot , alpine-devel@lists.alpinelinux.org Content-Type: multipart/alternative; boundary=001a11c2c7be7805de052977a360 X-Virus-Scanned: ClamAV using ClamSMTP --001a11c2c7be7805de052977a360 Content-Type: text/plain; charset=UTF-8 Hi Isaac, On Thu, Jan 14, 2016 at 11:54 PM, Isaac Dunham wrote: > On Thu, Jan 14, 2016 at 12:36:02PM +0100, Laurent Bercot wrote: > > On 14/01/2016 06:55, Jude Nelson wrote: > > >I think you're close. The jist of it is that vdev needs to supply a > > >lot more information than the kernel gives it. In particular, its > > >helper programs go on to query the properties and status of each > > >device (this often requires root privileges, i.e. via privileged > > >ioctl()s), and vdev gathers the information into a (much larger) > > >event packet and stores it in a directory tree under /dev for > > >subsequent query by less-privileged programs. > > > > I see. > > I think this is exactly what could be made modular. I've heard > > people say they were reluctant to using vdev because it's not KISS, and > > I suspect the ioctl machinery and data gathering is a large part of > > the complexity. If that part could be pluggable, i.e. if admins could > > choose a "data gatherer" just complex enough for their needs, I believe > > it could encourage adoption. In other words, I'm looking at a 3-part > > program: > > - the netlink listener > > - the data gatherer > > - the event publisher > > > > Of course, for libudev to work, you would need the full data gatherer; > > but if people aren't using libudev programs, they can use a simpler one, > > closer to what mdev is doing. > > It's all from a very high point-of-view, and I don't know the details of > > the code so I have no idea whether it's envisionable for vdev, but that's > > what I'm thinking off the top of my head. > > I haven't really looked at the vdevd code in a while, but from what > I recollect... > The vdev/libudev-compat "solution" is split up as follows: > -the netlink listener, vdevd > -several data gatherers, which are invoked according to a set of rules > for vdevd (analogous to mdev.conf) > -helper scripts to create extra links and so on, run via the same rules > -I don't recall exactly how the event publishing is done, but IIRC, > there's a daemon watching the directory where the helpers write out the > data, and then distributing it as described -libudev-compat is libudev, patched to read events that are published > as described > -IIRC, there's some helper that will (also? instead?) maintain a list of > devices more like the way udev does, so you don't *need* libudev-compat. > Yup--all the requisite device state is maintained under /dev/metadata. The libudev-compat event helper behaves just like any other vdev/mdev-style script--when invoked, it reads /dev/metadata for a given device, sets up the appropriate files in /run/udev, and generates, writes, and hard-links the event-file (which libudev-compat clients detect and consume). Libudev-compat is totally unnecessary if device-aware programs and scripts can get away with reading and watching the contents of /dev/metadata. -Jude --001a11c2c7be7805de052977a360 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Isaac,

On Thu, Jan 14, 2016 at 11:54 PM, Isaac Dunham &= lt;ibid.ag@gmail.com= > wrote:
On Thu, Jan 14, 2016 at 12:36:02PM +0100, Laurent Ber= cot wrote:
> On 14/01/2016 06:55, Jude Nelson wrote:
> >I think you're close.=C2=A0 The jist of it is that vdev needs = to supply a
> >lot more information than the kernel gives it.=C2=A0 In particular= , its
> >helper programs go on to query the properties and status of each > >device (this often requires root privileges, i.e. via privileged > >ioctl()s), and vdev gathers the information into a (much larger) > >event packet and stores it in a directory tree under /dev for
> >subsequent query by less-privileged programs.
>
>=C2=A0 I see.
>=C2=A0 I think this is exactly what could be made modular. I've hea= rd
> people say they were reluctant to using vdev because it's not KISS= , and
> I suspect the ioctl machinery and data gathering is a large part of > the complexity. If that part could be pluggable, i.e. if admins could<= br> > choose a "data gatherer" just complex enough for their needs= , I believe
> it could encourage adoption. In other words, I'm looking at a 3-pa= rt
> program:
>=C2=A0 - the netlink listener
>=C2=A0 - the data gatherer
>=C2=A0 - the event publisher
>
>=C2=A0 Of course, for libudev to work, you would need the full data gat= herer;
> but if people aren't using libudev programs, they can use a simple= r one,
> closer to what mdev is doing.
>=C2=A0 It's all from a very high point-of-view, and I don't kno= w the details of
> the code so I have no idea whether it's envisionable for vdev, but= that's
> what I'm thinking off the top of my head.

I haven't really looked at the vdevd code in a while, but f= rom what
I recollect...
The vdev/libudev-compat "solution" is split up as follows:
-the netlink listener, vdevd
-several data gatherers, which are invoked according to a set of rules
for vdevd (analogous to mdev.conf)
-helper scripts to create extra links and so on, run via the same rules
-I don't recall exactly how the event publishing is done, but IIRC,
there's a daemon watching the directory where the helpers write out the=
data, and then distributing it as described
-libudev-compat is libudev, patched to read events that are published
as described
-IIRC, there's some helper that will (also? instead?) maintain a list o= f
devices more like the way udev does, so you don't *need* libudev-compat= .=C2=A0

Yup--all the requisite device s= tate is maintained under /dev/metadata.=C2=A0 The libudev-compat event help= er behaves just like any other vdev/mdev-style script--when invoked, it rea= ds /dev/metadata for a given device, sets up the appropriate files in /run/= udev, and generates, writes, and hard-links the event-file (which libudev-c= ompat clients detect and consume).=C2=A0 Libudev-compat is totally unnecess= ary if device-aware programs and scripts can get away with reading and watc= hing the contents of /dev/metadata.

-Jude

=
--001a11c2c7be7805de052977a360-- --- Unsubscribe: alpine-devel+unsubscribe@lists.alpinelinux.org Help: alpine-devel+help@lists.alpinelinux.org ---