Received: from out.migadu.com (out.migadu.com [91.121.223.63]) by nld3-dev1.alpinelinux.org (Postfix) with ESMTPS id 6C626781507 for <~alpine/apk-tools@lists.alpinelinux.org>; Wed, 19 Feb 2020 18:18:52 +0000 (UTC) Received: (Migadu outbound); Wed, 19 Feb 2020 18:17:38 +0000 Authentication-Results: out.migadu.com; auth=pass (plain) Received: from wms1-eu-central.migadu.com (wms1-eu-central.migadu.com [172.104.244.218]) by out.migadu.com (Haraka/2.8.16) with ESMTPSA id 0E325E2D-5707-47E6-996C-F2DC2109AA57.1 envelope-from (authenticated bits=0) (version=TLSv1/SSLv3 cipher=ECDHE-RSA-AES256-GCM-SHA384 verify=FAIL); Wed, 19 Feb 2020 18:17:38 +0000 MIME-Version: 1.0 Date: Wed, 19 Feb 2020 18:17:37 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: RainLoop/1.12.1 From: "Ariadne Conill" Message-ID: <4e90cb4a79798ceb74b224574a2da77c@dereferenced.org> Subject: Re: APK Package Name Issue on armel port To: "Timo Teras" Cc: "Carl Chave" , ~alpine/apk-tools@lists.alpinelinux.org In-Reply-To: <20200219104752.01d68e90@vostro.wlan> References: <20200219104752.01d68e90@vostro.wlan> <20200217123334.031047d1@vostro.wlan> <20200214105215.6873e402@vostro.wlan> <62f21c96-ff43-83e4-5b7f-2e65e6d72729@adelielinux.org> <20200217111200.08405b9c@vostro.wlan> <340209c209104e20dd01b7b6aa71251c@dereferenced.org> DKIM-Signature: v=1;a=rsa-sha256;bh=ta2pbNyRg2n35yp73Goj0BETk4sLIHmhbW6ymiu+alI=;c=relaxed/simple;d=dereferenced.org;h=from:subject:date:to;s=default;b=lTDOz6x8SJrSvAqhPdyic+PuaFP0P4+qDXlz7dvapxy85pj7UZ8cnUVqHOCeciPmQHSxVeSvYOhFIjwK9ooMg34j9UM5okqh5e6zjr0md5v2JDs8klAYgEk0OGs7ibsd9QwZie4t7OMMrJbAL4QkcGwH2t/vpT3jFWTQsZKShvE= Hello,=0A=0AFebruary 19, 2020 2:47 AM, "Timo Teras" w= rote:=0A=0A> On Wed, 19 Feb 2020 00:58:31 +0000=0A> "Ariadne Conill" wrote:=0A> =0A>> February 17, 2020 4:33 AM, "Timo = Teras" wrote:=0A>> =0A>> On Mon, 17 Feb 2020 11:12:00= +0200=0A>> Timo Teras wrote:=0A>> =0A>> Reviewing th= e code in apk, it seems that the murmur hashing code is=0A>> not properly= accounting for alignment. So the symptoms do match.=0A>> The hash lookup= does not work (well, works randomly based on few=0A>> things), but when = enumerating all packages (by using the wildcard=0A>> in lookups) it'll sh= ow everything.=0A>> =0A>> Seems this is a common complaint about original= murmur3 and there's=0A>> been other projects affected with this too.=0A>= > =0A>> I wonder if we may be better off adopting a simpler hash function= =0A>> for our hashtables, such as FNV-1? I have had much success over=0A>= > the years using FNV hashes in various projects in a similar role=0A>> a= s what apk uses murmur3 for.=0A> =0A> It used to be DJB hash. And later s= witched to murmur for better speed=0A> and hash qualities. FNV-1 seems to= be slightly better quality and speed=0A> than DJB, but below Murmur3. Th= ough, that probably greatly depends on=0A> input length.=0A> =0A> See als= o:=0A> https://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests=0A> = =0A> In the current code base that needs to intern large amounts of even= =0A> long strings, I'd prefer to keep Murmur3 or go even something better= =0A> like xxHash.=0A> =0A> Though, perhaps this is becoming less importan= t. That is, the=0A> interning of most package data is not needed as we mm= ap the files in=0A> future. If the result is that hashing is needed for s= hort strings only,=0A> such as the package name, it might be worth lookin= g at simplifying=0A> things by going to FNV-1.=0A=0AYeah, that is what I = mean. As the current design is moving away=0Afrom the use of interned st= rings, it may make more sense to use a=0Asimpler byte-for-byte hashing al= gorithm. That way we don't have=0Ato worry about alignment.=0A=0AAriadne