Hello, for your possible interest.
In a thead for the LUGA(ustria) i eventually had to time some
compression algorithms and wondered why zstd is so slow, but
especially so the decompressing stage, which a key feature of this
one. It turns out that the -Os compilation causes, well, drama-
tical performance degradation. I compiled my own with -O3 and the
difference is up to factor four. Just one example:
POSIX standard (C165.txt):
Alpine, -Os:
#?0[steffen@essex tmp]$ time zstd --rm x4.txt
x4.txt : 20.95% (12513780 => 2621685 bytes, x4.txt.zst)
0m00.57s real 0m00.23s user 0m00.12s system
#?0[steffen@essex tmp]$ time zstd -d -c x4.txt.zst >/dev/null
x4.txt.zst : 12513780 bytes
0m00.38s real 0m00.15s user 0m00.12s system
#?0[steffen@essex tmp]$ time zstd --rm -19 x5.txt
x5.txt : 15.40% (12513780 => 1926643 bytes, x5.txt.zst)
0m16.30s real 0m13.53s user 0m00.27s system
#?0[steffen@essex tmp]$ time zstd -d -c x5.txt.zst >/dev/null
x5.txt.zst : 12513780 bytes
0m00.39s real 0m00.12s user 0m00.14s system
-O3:
#?0[steffen@essex tmp]$ time x/zstd/zstd -f x1.txt
x1.txt : 20.95% (12513780 => 2621685 bytes, x1.txt.zst)
0m00.34s real 0m00.12s user 0m00.10s system
#?0[steffen@essex tmp]$ time x/zstd/zstd -d -c x1.txt.zst >/dev/null
x1.txt.zst : 12513780 bytes
0m00.10s real 0m00.02s user 0m00.05s system
#?0[steffen@essex tmp]$ time x/zstd/zstd -19 x1.txt
x1.txt : 15.40% (12513780 => 1926643 bytes, x1.txt.zst)
0m13.29s real 0m11.27s user 0m00.17s system
#?0[steffen@essex tmp]$ time x/zstd/zstd -d -c x1.txt.zst >/dev/null
x1.txt.zst : 12513780 bytes
0m00.12s real 0m00.02s user 0m00.07s system
That makes me actually wonder how ports should deal with CFLAGS.
Is it acceptable for a port to watch for compiler flags and set
them, my MUA would go for PIE, relro and all that, then?
Ciao,
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
Steffen Nurpmeso wrote:
> Hello, for your possible interest.> > In a thead for the LUGA(ustria) i eventually had to time some> compression algorithms and wondered why zstd is so slow, but> especially so the decompressing stage, which a key feature of this> one. It turns out that the -Os compilation causes, well, drama-> tical performance degradation. I compiled my own with -O3 and the> difference is up to factor four. Just one example:
What about -O2? Also what are the differences in binary sizes? Are you
using gcc? If yes, try clang.
--
caóc
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
Cág <ca6c@bitmessage.ch> wrote:
|Steffen Nurpmeso wrote:
|> Hello, for your possible interest.
|>
|> In a thead for the LUGA(ustria) i eventually had to time some
|> compression algorithms and wondered why zstd is so slow, but
|> especially so the decompressing stage, which a key feature of this
|> one. It turns out that the -Os compilation causes, well, drama-
|> tical performance degradation. I compiled my own with -O3 and the
|> difference is up to factor four. Just one example:
|
|What about -O2? Also what are the differences in binary sizes? Are you
|using gcc? If yes, try clang.
I thought it could be of interest for those who have many files or
whatever. Factor four is not nothing, especially if it is lost at
the bottommost level of computing.
In some private message i responded
Not really comparable since it found development stuff of other
archivers and compiled that in -- he adds more and more support
for other archive formats and i think that will end up like tar
a.k.a. libarchive umbrellas do. I do not know how i could have
an isolated quickshot or what make flags i would have to use to
get a stripped version that is comparable. (Too lazy, too late.)
But sure it will be somewhat larger, -Os is like -O2 (?) with some
reduction -- then again this is not chromium or something but
a (per se) small archiver, and factor four on decompression side
is drastical. It may also be platform dependent. I mean, for my
use case that is all right (but now that i have the binary around
it stays for a while), but if it would drive a compressed file
system or if i had a lot of compressed files to deal with
regulary, or if i had a server with database or whatever and it
would base on such files, then it would matter. (That is why
i said FYI.)
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
On Tue, 13 Mar 2018 19:06:48 +0100
Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
> Hello, for your possible interest.> > In a thead for the LUGA(ustria) i eventually had to time some> compression algorithms and wondered why zstd is so slow, but> especially so the decompressing stage, which a key feature of this> one. It turns out that the -Os compilation causes, well, drama-> tical performance degradation. I compiled my own with -O3 and the> difference is up to factor four. Just one example:> > POSIX standard (C165.txt):> > Alpine, -Os:> #?0[steffen@essex tmp]$ time zstd --rm x4.txt> x4.txt : 20.95% (12513780 => 2621685 bytes, x4.txt.zst)> 0m00.57s real 0m00.23s user 0m00.12s system> #?0[steffen@essex tmp]$ time zstd -d -c x4.txt.zst >/dev/null> x4.txt.zst : 12513780 bytes> 0m00.38s real 0m00.15s user 0m00.12s system> > #?0[steffen@essex tmp]$ time zstd --rm -19 x5.txt> x5.txt : 15.40% (12513780 => 1926643 bytes, x5.txt.zst)> 0m16.30s real 0m13.53s user 0m00.27s system> > #?0[steffen@essex tmp]$ time zstd -d -c x5.txt.zst >/dev/null> x5.txt.zst : 12513780 bytes> 0m00.39s real 0m00.12s user 0m00.14s system> > -O3:> #?0[steffen@essex tmp]$ time x/zstd/zstd -f x1.txt> x1.txt : 20.95% (12513780 => 2621685 bytes, x1.txt.zst)> 0m00.34s real 0m00.12s user 0m00.10s system> #?0[steffen@essex tmp]$ time x/zstd/zstd -d -c x1.txt.zst >/dev/null> x1.txt.zst : 12513780 bytes> 0m00.10s real 0m00.02s user 0m00.05s system> > #?0[steffen@essex tmp]$ time x/zstd/zstd -19 x1.txt> x1.txt : 15.40% (12513780 => 1926643 bytes, x1.txt.zst)> 0m13.29s real 0m11.27s user 0m00.17s system> #?0[steffen@essex tmp]$ time x/zstd/zstd -d -c x1.txt.zst >/dev/null> x1.txt.zst : 12513780 bytes> 0m00.12s real 0m00.02s user 0m00.07s system
Are you compressing the same file? I see x4.txt, x5.txt avs x1.txt.
File content may make difference too.
> That makes me actually wonder how ports should deal with CFLAGS.> Is it acceptable for a port to watch for compiler flags and set> them, my MUA would go for PIE, relro and all that, then?
I think if the difference is 4x then, yes, I think we should explicitly
set CFLAGS from aport with a reference on why. I do prefer -O2 over -O3
though, so It would be nice to see the numbers with -O2 and also what
the numbers are on different platforms.
We already explicitly set -O2 for zlib, because its a case where we do
want trade more speed at the cost of size.
-nc
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
Hello.
Natanael Copa <ncopa@alpinelinux.org> wrote:
|On Tue, 13 Mar 2018 19:06:48 +0100
|Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
|
|> Hello, for your possible interest.
|>
|> In a thead for the LUGA(ustria) i eventually had to time some
|> compression algorithms and wondered why zstd is so slow, but
|> especially so the decompressing stage, which a key feature of this
|> one. It turns out that the -Os compilation causes, well, drama-
|> tical performance degradation. I compiled my own with -O3 and the
|> difference is up to factor four. Just one example:
...
|Are you compressing the same file? I see x4.txt, x5.txt avs x1.txt.
|File content may make difference too.
Yes, it was all the same. It was just an excerpt of that LUGA
message, sorry.
|> That makes me actually wonder how ports should deal with CFLAGS.
|> Is it acceptable for a port to watch for compiler flags and set
|> them, my MUA would go for PIE, relro and all that, then?
|
|I think if the difference is 4x then, yes, I think we should explicitly
|set CFLAGS from aport with a reference on why. I do prefer -O2 over -O3
|though, so It would be nice to see the numbers with -O2 and also what
|the numbers are on different platforms.
|
|We already explicitly set -O2 for zlib, because its a case where we do
|want trade more speed at the cost of size.
I see. I only have control of x86 (with Linux) for now, i really
have to do something about that at some day... With -O2:
#?0[steffen@essex zstd]$ CFLAGS=-O2 make zstd
...
#?0[steffen@essex zstd]$ ll zstd
-rwxr-x--- 1 steffen steffen 582392 Mar 16 16:11 zstd*
#?0[steffen@essex zstd]$ ldd zstd
/lib/ld-musl-x86_64.so.1 (0x7fc87972c000)
libz.so.1 => /lib/libz.so.1 (0x7fc879291000)
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fc87972c000)
#?0[steffen@essex zstd]$ time ./zstd -c < C165.txt > .t1
0m00.40s real 0m00.27s user 0m00.09s system
#?0[steffen@essex zstd]$ time ./zstd -c < C165.txt > .t1
0m00.31s real 0m00.23s user 0m00.07s system
#?0[steffen@essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
0m12.50s real 0m12.35s user 0m00.13s system
#?0[steffen@essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
0m12.32s real 0m12.14s user 0m00.15s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.17s real 0m00.11s user 0m00.06s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.13s real 0m00.09s user 0m00.03s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.12s real 0m00.09s user 0m00.02s system
No difference with -O3, actually:
#?0[steffen@essex zstd]$ CFLAGS=-O3 make zstd
...
#?0[steffen@essex zstd]$ ll zstd
-rwxr-x--- 1 steffen steffen 619296 Mar 16 16:17 zstd*
#?0[steffen@essex zstd]$ ldd zstd
/lib/ld-musl-x86_64.so.1 (0x7f423a622000)
libz.so.1 => /lib/libz.so.1 (0x7f423a17e000)
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f423a622000)
#?0[steffen@essex zstd]$ time ./zstd -c < C165.txt > .t1
0m00.33s real 0m00.26s user 0m00.06s system
#?0[steffen@essex zstd]$ time ./zstd -c < C165.txt > .t1
0m00.28s real 0m00.23s user 0m00.04s system
#?0[steffen@essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
0m12.45s real 0m12.19s user 0m00.21s system
#?0[steffen@essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
0m12.97s real 0m12.82s user 0m00.14s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.13s real 0m00.07s user 0m00.06s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.13s real 0m00.08s user 0m00.05s system
But lots of difference for /usr/bin/zstd:
#?0[steffen@essex zstd]$ ll /usr/bin/zstd
-rwxr-xr-x 1 root root 382792 Dec 27 15:17 /usr/bin/zstd*
#?0[steffen@essex zstd]$ ldd /usr/bin/zstd
/lib/ld-musl-x86_64.so.1 (0x7f2255a3d000)
libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f2255a3d000)
#?0[steffen@essex zstd]$ time /usr/bin/zstd -c < C165.txt > .t1
0m00.53s real 0m00.44s user 0m00.07s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -c < C165.txt > .t1
0m00.52s real 0m00.44s user 0m00.07s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -19 -c < C165.txt > .t1
0m15.16s real 0m15.06s user 0m00.09s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -19 -c < C165.txt > .t1
0m15.35s real 0m15.19s user 0m00.14s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
0m00.40s real 0m00.27s user 0m00.12s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
0m00.36s real 0m00.30s user 0m00.05s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
0m00.40s real 0m00.27s user 0m00.14s system
Quick PDF with Steven-Levy_Hackers-Heroes-Computer-Revolution.pdf,
difference is not so big here, but decompression near factor two:
#?0[steffen@essex zstd]$ ll slhhcr.pdf
-rw-r----- 1 steffen steffen 2761072 Mar 16 16:24 slhhcr.pdf
#?0[steffen@essex zstd]$ time ./zstd -c < slhhcr.pdf >.t1
0m00.13s real 0m00.06s user 0m00.06s system
#?0[steffen@essex zstd]$ time ./zstd -19 -c < slhhcr.pdf >.t1
0m01.58s real 0m01.50s user 0m00.08s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.03s real 0m00.02s user 0m00.01s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.04s real 0m00.01s user 0m00.02s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.05s real 0m00.02s user 0m00.03s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -c < slhhcr.pdf >.t1
0m00.18s real 0m00.11s user 0m00.07s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -19 -c < slhhcr.pdf >.t1
0m01.82s real 0m01.74s user 0m00.07s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -d < .t1 >/dev/null
0m00.07s real 0m00.03s user 0m00.04s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -d < .t1 >/dev/null
0m00.09s real 0m00.04s user 0m00.04s system
And the Guide_to_Digital_Signal_Processing (directory of PDF) as
a tar file, finally, decompression factor three to four:
#?0[steffen@essex zstd]$ ll gtdsp.tar
-rw-r----- 1 steffen steffen 16537600 Mar 16 16:29 gtdsp.tar
#?0[steffen@essex zstd]$ time ./zstd -c < gtdsp.tar >.t1
0m00.36s real 0m00.22s user 0m00.13s system
#?0[steffen@essex zstd]$ time ./zstd -19 -c < gtdsp.tar >.t1
0m06.78s real 0m06.62s user 0m00.14s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.10s real 0m00.06s user 0m00.04s system
#?0[steffen@essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
0m00.10s real 0m00.05s user 0m00.04s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -c < gtdsp.tar >.t1
0m00.62s real 0m00.43s user 0m00.18s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -19 -c < gtdsp.tar >.t1
0m07.43s real 0m07.16s user 0m00.23s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
0m00.37s real 0m00.21s user 0m00.15s system
#?0[steffen@essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
0m00.33s real 0m00.29s user 0m00.04s system
Since i have no chance to test i leave the arch= unmodified, but
i wonder since the Makefile has explicit arm flags?
Ciao,
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
On Fri, 16 Mar 2018 16:37:41 +0100
Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
> Hello.> > Natanael Copa <ncopa@alpinelinux.org> wrote:> |On Tue, 13 Mar 2018 19:06:48 +0100> |Steffen Nurpmeso <steffen@sdaoden.eu> wrote:> |> |> Hello, for your possible interest.> |>> |> In a thead for the LUGA(ustria) i eventually had to time some> |> compression algorithms and wondered why zstd is so slow, but> |> especially so the decompressing stage, which a key feature of this> |> one. It turns out that the -Os compilation causes, well, drama-> |> tical performance degradation. I compiled my own with -O3 and the> |> difference is up to factor four. Just one example: > ...> |Are you compressing the same file? I see x4.txt, x5.txt avs x1.txt.> |File content may make difference too.> > Yes, it was all the same. It was just an excerpt of that LUGA> message, sorry.> > |> That makes me actually wonder how ports should deal with CFLAGS.> |> Is it acceptable for a port to watch for compiler flags and set> |> them, my MUA would go for PIE, relro and all that, then? > |> |I think if the difference is 4x then, yes, I think we should explicitly> |set CFLAGS from aport with a reference on why. I do prefer -O2 over -O3> |though, so It would be nice to see the numbers with -O2 and also what> |the numbers are on different platforms.> |> |We already explicitly set -O2 for zlib, because its a case where we do> |want trade more speed at the cost of size.> > I see. I only have control of x86 (with Linux) for now, i really> have to do something about that at some day... With -O2:> > #?0[steffen@essex zstd]$ CFLAGS=-O2 make zstd> ...> #?0[steffen@essex zstd]$ ll zstd> -rwxr-x--- 1 steffen steffen 582392 Mar 16 16:11 zstd*
...
> > No difference with -O3, actually:> > #?0[steffen@essex zstd]$ CFLAGS=-O3 make zstd> ...> #?0[steffen@essex zstd]$ ll zstd> -rwxr-x--- 1 steffen steffen 619296 Mar 16 16:17 zstd*
Yes, no big difference in performance -O2 vs -O3, but it gets bigger.
...
> But lots of difference for /usr/bin/zstd:> > #?0[steffen@essex zstd]$ ll /usr/bin/zstd> -rwxr-xr-x 1 root root 382792 Dec 27 15:17 /usr/bin/zstd*
I assume that is with -Os.
I think this alone is good enough reason to force -O2.
Thanks!
-nc
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---