Received: from nc-smtp1.sdv.fr (nc-smtp1.sdv.fr [212.95.69.91]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id A3C95780ECB for <~alpine/devel@lists.alpinelinux.org>; Thu, 27 Aug 2020 15:35:34 +0000 (UTC) Received: from skarnet.org (140.156.124.78.rev.sfr.net [78.124.156.140]) by nc-smtp1.sdv.fr (Postfix) with SMTP id C94DB20881 for <~alpine/devel@lists.alpinelinux.org>; Thu, 27 Aug 2020 17:35:33 +0200 (CEST) Received: (qmail 24312 invoked from network); 27 Aug 2020 17:36:00 +0200 Received: from elzian.internal.skarnet.org. (HELO ?192.168.0.2?) () by sinay.internal.skarnet.org. with SMTP; 27 Aug 2020 17:36:00 +0200 From: "Laurent Bercot" To: "Natanael Copa" , "Rasmus Thomsen" Subject: Re: Use of supervise-daemon in Alpine Cc: "Francesco Colista" , Leonardo , ~alpine/devel@lists.alpinelinux.org, =?utf-8?q?S=c3=b6ren=20Tempel?= Date: Thu, 27 Aug 2020 15:35:34 +0000 Message-Id: In-Reply-To: <20200827171314.5bca06cf@ncopa-desktop.lan> References: <3LLUI2KOULSYM.359WA6HATX45B@8pit.net> <20200821191507.7857010b@ncopa-macbook.copa.dup.pw> <799e151a9764838b5b0e273da3626e471976edb7.camel@cogitri.dev> <20200827171314.5bca06cf@ncopa-desktop.lan> Reply-To: "Laurent Bercot" User-Agent: eM_Client/8.0.3385.0 Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable X-VR-SPAMSTATE: OK X-VR-SPAMSCORE: -100 X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgeduiedruddvgedgkeeiucetufdoteggodftvfcurfhrohhfihhlvgemucfpfgfogfftkfevteeunffgpdfqfgfvnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvffufffkjghfrhgfgggtgfesthhqredttderjeenucfhrhhomhepfdfnrghurhgvnhhtuceuvghrtghothdfuceoshhkrgdquggvvhgvlhesshhkrghrnhgvthdrohhrgheqnecuggftrfgrthhtvghrnhepkeeuteelfefftddvgeeiieekleeugfelgfetieejudeftdehhfefgeffteduheetnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmohguvgepshhmthhpohhuth >But that would not give sysadmin/user the choice to die on error, which >I fear will lead to nobody caring if the services are buggy or not. The >"fix" is to restart the service. That's a classic administration mistake, and it absolutely on the sysadmin or ops person, not on the supervision infrastructure. A supervision system does not exist so that services can restart when they die and the admin can continue napping because who cares, the service is up. A supervision system exists so that services can restart when they die so they're still kinda functional in an ever-imperfect world while the admin actually analyzes the error and finds a real fix for the service. The goal of a supervision system is to maximize the uptime. It is not to enable laziness in fixing bugs. If nobody cares that a service is buggy, you can lay the full blame on the people who do not care; not on the supervision system. Not supervising daemons by default is putting more the burden on competent admins in order to cater to the others, and madness lies down this path. Of course, services should be configured so that if they crash, appropriate notifications are sent to the admin, so problems will not be silently ignored. supervise-daemon should have a hook you can use to take some action depending on the exit code (or signal) of the=20 daemon. Longruns should be supervised, but if some admin does not want to supervise a given service, there should be an interface allowing them to tell the supervisor not to restart the service next time it dies. supervise-daemon should have such an interface, you shouldn't need to patch it. *cough* Needless to say, s6 provides all of this. *cough* -- Laurent