Mail archive
alpine-user

Re: [alpine-user] FYI: community/zstd binary much (up to 4x) slower than necessary

From: Steffen Nurpmeso <steffen_at_sdaoden.eu>
Date: Fri, 16 Mar 2018 16:37:41 +0100

Hello.

Natanael Copa <ncopa_at_alpinelinux.org> wrote:
 |On Tue, 13 Mar 2018 19:06:48 +0100
 |Steffen Nurpmeso <steffen_at_sdaoden.eu> wrote:
 |
 |> Hello, for your possible interest.
 |>
 |> In a thead for the LUGA(ustria) i eventually had to time some
 |> compression algorithms and wondered why zstd is so slow, but
 |> especially so the decompressing stage, which a key feature of this
 |> one. It turns out that the -Os compilation causes, well, drama-
 |> tical performance degradation. I compiled my own with -O3 and the
 |> difference is up to factor four. Just one example:
 ...
 |Are you compressing the same file? I see x4.txt, x5.txt avs x1.txt.
 |File content may make difference too.

Yes, it was all the same. It was just an excerpt of that LUGA
message, sorry.

 |> That makes me actually wonder how ports should deal with CFLAGS.
 |> Is it acceptable for a port to watch for compiler flags and set
 |> them, my MUA would go for PIE, relro and all that, then?
 |
 |I think if the difference is 4x then, yes, I think we should explicitly
 |set CFLAGS from aport with a reference on why. I do prefer -O2 over -O3
 |though, so It would be nice to see the numbers with -O2 and also what
 |the numbers are on different platforms.
 |
 |We already explicitly set -O2 for zlib, because its a case where we do
 |want trade more speed at the cost of size.

I see. I only have control of x86 (with Linux) for now, i really
have to do something about that at some day... With -O2:

  #?0[steffen_at_essex zstd]$ CFLAGS=-O2 make zstd
  ...
  #?0[steffen_at_essex zstd]$ ll zstd
  -rwxr-x--- 1 steffen steffen 582392 Mar 16 16:11 zstd*
  #?0[steffen_at_essex zstd]$ ldd zstd
          /lib/ld-musl-x86_64.so.1 (0x7fc87972c000)
          libz.so.1 => /lib/libz.so.1 (0x7fc879291000)
          libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fc87972c000)
  #?0[steffen_at_essex zstd]$ time ./zstd -c < C165.txt > .t1
      0m00.40s real 0m00.27s user 0m00.09s system
  #?0[steffen_at_essex zstd]$ time ./zstd -c < C165.txt > .t1
      0m00.31s real 0m00.23s user 0m00.07s system
  #?0[steffen_at_essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
      0m12.50s real 0m12.35s user 0m00.13s system
  #?0[steffen_at_essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
      0m12.32s real 0m12.14s user 0m00.15s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.17s real 0m00.11s user 0m00.06s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.13s real 0m00.09s user 0m00.03s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.12s real 0m00.09s user 0m00.02s system

No difference with -O3, actually:

  #?0[steffen_at_essex zstd]$ CFLAGS=-O3 make zstd
  ...
  #?0[steffen_at_essex zstd]$ ll zstd
  -rwxr-x--- 1 steffen steffen 619296 Mar 16 16:17 zstd*
  #?0[steffen_at_essex zstd]$ ldd zstd
          /lib/ld-musl-x86_64.so.1 (0x7f423a622000)
          libz.so.1 => /lib/libz.so.1 (0x7f423a17e000)
          libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f423a622000)
  #?0[steffen_at_essex zstd]$ time ./zstd -c < C165.txt > .t1
      0m00.33s real 0m00.26s user 0m00.06s system
  #?0[steffen_at_essex zstd]$ time ./zstd -c < C165.txt > .t1
      0m00.28s real 0m00.23s user 0m00.04s system
  #?0[steffen_at_essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
      0m12.45s real 0m12.19s user 0m00.21s system
  #?0[steffen_at_essex zstd]$ time ./zstd -19 -c < C165.txt > .t1
      0m12.97s real 0m12.82s user 0m00.14s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.13s real 0m00.07s user 0m00.06s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.13s real 0m00.08s user 0m00.05s system

But lots of difference for /usr/bin/zstd:

  #?0[steffen_at_essex zstd]$ ll /usr/bin/zstd
  -rwxr-xr-x 1 root root 382792 Dec 27 15:17 /usr/bin/zstd*
  #?0[steffen_at_essex zstd]$ ldd /usr/bin/zstd
          /lib/ld-musl-x86_64.so.1 (0x7f2255a3d000)
          libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f2255a3d000)
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -c < C165.txt > .t1
      0m00.53s real 0m00.44s user 0m00.07s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -c < C165.txt > .t1
      0m00.52s real 0m00.44s user 0m00.07s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -19 -c < C165.txt > .t1
      0m15.16s real 0m15.06s user 0m00.09s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -19 -c < C165.txt > .t1
      0m15.35s real 0m15.19s user 0m00.14s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
      0m00.40s real 0m00.27s user 0m00.12s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
      0m00.36s real 0m00.30s user 0m00.05s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
    0m00.40s real 0m00.27s user 0m00.14s system

Quick PDF with Steven-Levy_Hackers-Heroes-Computer-Revolution.pdf,
difference is not so big here, but decompression near factor two:

  #?0[steffen_at_essex zstd]$ ll slhhcr.pdf
  -rw-r----- 1 steffen steffen 2761072 Mar 16 16:24 slhhcr.pdf
  #?0[steffen_at_essex zstd]$ time ./zstd -c < slhhcr.pdf >.t1
      0m00.13s real 0m00.06s user 0m00.06s system
  #?0[steffen_at_essex zstd]$ time ./zstd -19 -c < slhhcr.pdf >.t1
      0m01.58s real 0m01.50s user 0m00.08s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.03s real 0m00.02s user 0m00.01s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.04s real 0m00.01s user 0m00.02s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.05s real 0m00.02s user 0m00.03s system

  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -c < slhhcr.pdf >.t1
      0m00.18s real 0m00.11s user 0m00.07s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -19 -c < slhhcr.pdf >.t1
      0m01.82s real 0m01.74s user 0m00.07s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -d < .t1 >/dev/null
      0m00.07s real 0m00.03s user 0m00.04s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -d < .t1 >/dev/null
      0m00.09s real 0m00.04s user 0m00.04s system

And the Guide_to_Digital_Signal_Processing (directory of PDF) as
a tar file, finally, decompression factor three to four:

  #?0[steffen_at_essex zstd]$ ll gtdsp.tar
  -rw-r----- 1 steffen steffen 16537600 Mar 16 16:29 gtdsp.tar
  #?0[steffen_at_essex zstd]$ time ./zstd -c < gtdsp.tar >.t1
      0m00.36s real 0m00.22s user 0m00.13s system
  #?0[steffen_at_essex zstd]$ time ./zstd -19 -c < gtdsp.tar >.t1
      0m06.78s real 0m06.62s user 0m00.14s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.10s real 0m00.06s user 0m00.04s system
  #?0[steffen_at_essex zstd]$ time ./zstd -d -c < .t1 >/dev/null
      0m00.10s real 0m00.05s user 0m00.04s system

  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -c < gtdsp.tar >.t1
      0m00.62s real 0m00.43s user 0m00.18s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -19 -c < gtdsp.tar >.t1
      0m07.43s real 0m07.16s user 0m00.23s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
      0m00.37s real 0m00.21s user 0m00.15s system
  #?0[steffen_at_essex zstd]$ time /usr/bin/zstd -d -c < .t1 >/dev/null
      0m00.33s real 0m00.29s user 0m00.04s system

Since i have no chance to test i leave the arch= unmodified, but
i wonder since the Makefile has explicit arm flags?
Ciao,

--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)




---
Unsubscribe:  alpine-user+unsubscribe_at_lists.alpinelinux.org
Help:         alpine-user+help_at_lists.alpinelinux.org
---
Received on Fri Mar 16 2018 - 16:37:41 UTC