4 3

[alpine-devel] SSL connections hang on boot in Alpine VMs

Details
Message ID
<20180916235803.GA5606@homura.localdomain>
Sender timestamp
1537142283
DKIM signature
missing
Download raw message
Hey guys. I'm dealing with a super bizzare issue and I'm hoping I might
find some help here. I have a script which creates qcow2 images with
Alpine installed:

https://git.sr.ht/~sircmpwn/builds.sr.ht/tree/images/alpine/genimg

Running this as root on an Alpine machine will produce a bootable qcow2
you can feed into qemu to reproduce my problem:

	qemu-system-x86_64 \
		-m 2048 \
		-net nic,model=virtio -net user,hostfwd=tcp::8022-:22 \
		-cpu host \
		-enable-kvm \
		-nographic \
		-drive file="root.img.qcow2",media=disk,snapshot=on,if=virtio

You can then SSH in with `ssh -p 8022 builds@localhost`, with no
password. This user is in the sudoers file. You should then be able to
`curl http://example.org` to see that it can communicate fine with the
outside world. However, when you run `curl https://example.org`, it will
simply hang. It's not a problem specific to curl, as it can also be
reproduced with `openssl s_client example.org:443`.

Here's what makes it really weird: the problem goes away if you `apk del
alpine-sdk && apk add alpine-sdk`. I took one Alpine image on which the
problem was reproducable, and another after reinstalling alpine-sdk, and
diffed the filesystems - the only thing I saw here was /etc/apk/world
shook up beyond the capability of my diff tool. If no one has ideas I'm
going to try writing some scripts to make the differences in between
these files more apparent.

I build these images nightly. The problem first started appearing
sometime between 2018-09-06 20:36 UTC and 2018-09-07 20:36 UTC. I looked
over the commits to aports during that time (and a few days on either
end just to be sure), and found no leads. I also sorted
git.alpinelinux.org by date modified and looked over the same dates in
other Alpine repos, and left similarly empty-handed.

Does anyone have any ideas?

--
Drew DeVault


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Details
Message ID
<20180917114636.GA2008@homura.localdomain>
In-Reply-To
<CAFWK1CACRC+-sGyA=-AMQxWDKr1qpLbhCBtzHHNjVUnDhRUzrg@mail.gmail.com> (view parent)
Sender timestamp
1537184796
DKIM signature
missing
Download raw message
> It sounds like /dev/random runs out of entropy in your vm.
>
> Does it help to add `-device virtio-rng-pci`?

I've actually had -device virtio-rng-pci being passed to qemu the whole
time, but I omitted it for brevity in the original email. There might be
something wrong with it, though - I'll try installing haveged and
monitoring /proc/sys/kernel/random/entropy_avail and follow up.


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Details
Message ID
<20180917120813.GA1297@homura.localdomain>
In-Reply-To
<20180917114636.GA2008@homura.localdomain> (view parent)
Sender timestamp
1537186093
DKIM signature
missing
Download raw message
Okay, so RNG is indeed the issue. The virtio-rng module isn't getting
loaded on boot. modprobe virtio-rng or installing haveged solves the
problem. Not sure why virtio-rng isn't getting loaded on boot, but now
that the fire's out I can solve that problem at my own pace.

Thanks for the help!


---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Natanael Copa
Details
Message ID
<20180917103238.07f063d1@ncopa-desktop.copa.dup.pw>
In-Reply-To
<20180916235803.GA5606@homura.localdomain> (view parent)
Sender timestamp
1537173158
DKIM signature
missing
Download raw message
Hi!

It sounds like /dev/random runs out of entropy in your vm.

Does it help to add `-device virtio-rng-pci`?

https://wiki.qemu.org/Features/VirtIORNG

-nc

On Sun, 16 Sep 2018 19:58:03 -0400
Drew DeVault <sir@cmpwn.com> wrote:

> Hey guys. I'm dealing with a super bizzare issue and I'm hoping I might
> find some help here. I have a script which creates qcow2 images with
> Alpine installed:
> 
> https://git.sr.ht/~sircmpwn/builds.sr.ht/tree/images/alpine/genimg
> 
> Running this as root on an Alpine machine will produce a bootable qcow2
> you can feed into qemu to reproduce my problem:
> 
> 	qemu-system-x86_64 \
> 		-m 2048 \
> 		-net nic,model=virtio -net user,hostfwd=tcp::8022-:22 \
> 		-cpu host \
> 		-enable-kvm \
> 		-nographic \
> 		-drive file="root.img.qcow2",media=disk,snapshot=on,if=virtio
> 
> You can then SSH in with `ssh -p 8022 builds@localhost`, with no
> password. This user is in the sudoers file. You should then be able to
> `curl http://example.org` to see that it can communicate fine with the
> outside world. However, when you run `curl https://example.org`, it will
> simply hang. It's not a problem specific to curl, as it can also be
> reproduced with `openssl s_client example.org:443`.
> 
> Here's what makes it really weird: the problem goes away if you `apk del
> alpine-sdk && apk add alpine-sdk`. I took one Alpine image on which the
> problem was reproducable, and another after reinstalling alpine-sdk, and
> diffed the filesystems - the only thing I saw here was /etc/apk/world
> shook up beyond the capability of my diff tool. If no one has ideas I'm
> going to try writing some scripts to make the differences in between
> these files more apparent.
> 
> I build these images nightly. The problem first started appearing
> sometime between 2018-09-06 20:36 UTC and 2018-09-07 20:36 UTC. I looked
> over the commits to aports during that time (and a few days on either
> end just to be sure), and found no leads. I also sorted
> git.alpinelinux.org by date modified and looked over the same dates in
> other Alpine repos, and left similarly empty-handed.
> 
> Does anyone have any ideas?
> 
> --
> Drew DeVault
> 
> 
> ---
> Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
> Help:         alpine-devel+help@lists.alpinelinux.org
> ---
> 



---
Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
Help:         alpine-devel+help@lists.alpinelinux.org
---
Daniel Isaksen
Details
Message ID
<CAFWK1CACRC+-sGyA=-AMQxWDKr1qpLbhCBtzHHNjVUnDhRUzrg@mail.gmail.com>
In-Reply-To
<20180917103238.07f063d1@ncopa-desktop.copa.dup.pw> (view parent)
Sender timestamp
1537173663
DKIM signature
missing
Download raw message
Also consider installing haveged, it's a tiny daemon that generates entropy
for the system.

I believe the kernel also uses both HID and hardware (in this case,
emulated) RNG devices - such as ncopa says.

To check the system entropy (< ~200 is bad, > ~1000 is good), run
cat /proc/sys/kernel/random/entropy_avail.

-----
Sincerely / Med vennlig hilsen,
Daniel Isaksen <d@duniel.no> (https://duniel.no)

On Mon, Sep 17, 2018 at 10:32 AM, Natanael Copa <ncopa@alpinelinux.org>
wrote:

> Hi!
>
> It sounds like /dev/random runs out of entropy in your vm.
>
> Does it help to add `-device virtio-rng-pci`?
>
> https://wiki.qemu.org/Features/VirtIORNG
>
> -nc
>
> On Sun, 16 Sep 2018 19:58:03 -0400
> Drew DeVault <sir@cmpwn.com> wrote:
>
> > Hey guys. I'm dealing with a super bizzare issue and I'm hoping I might
> > find some help here. I have a script which creates qcow2 images with
> > Alpine installed:
> >
> > https://git.sr.ht/~sircmpwn/builds.sr.ht/tree/images/alpine/genimg
> >
> > Running this as root on an Alpine machine will produce a bootable qcow2
> > you can feed into qemu to reproduce my problem:
> >
> >       qemu-system-x86_64 \
> >               -m 2048 \
> >               -net nic,model=virtio -net user,hostfwd=tcp::8022-:22 \
> >               -cpu host \
> >               -enable-kvm \
> >               -nographic \
> >               -drive file="root.img.qcow2",media=
> disk,snapshot=on,if=virtio
> >
> > You can then SSH in with `ssh -p 8022 builds@localhost`, with no
> > password. This user is in the sudoers file. You should then be able to
> > `curl http://example.org` to see that it can communicate fine with the
> > outside world. However, when you run `curl https://example.org`, it will
> > simply hang. It's not a problem specific to curl, as it can also be
> > reproduced with `openssl s_client example.org:443`.
> >
> > Here's what makes it really weird: the problem goes away if you `apk del
> > alpine-sdk && apk add alpine-sdk`. I took one Alpine image on which the
> > problem was reproducable, and another after reinstalling alpine-sdk, and
> > diffed the filesystems - the only thing I saw here was /etc/apk/world
> > shook up beyond the capability of my diff tool. If no one has ideas I'm
> > going to try writing some scripts to make the differences in between
> > these files more apparent.
> >
> > I build these images nightly. The problem first started appearing
> > sometime between 2018-09-06 20:36 UTC and 2018-09-07 20:36 UTC. I looked
> > over the commits to aports during that time (and a few days on either
> > end just to be sure), and found no leads. I also sorted
> > git.alpinelinux.org by date modified and looked over the same dates in
> > other Alpine repos, and left similarly empty-handed.
> >
> > Does anyone have any ideas?
> >
> > --
> > Drew DeVault
> >
> >
> > ---
> > Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
> > Help:         alpine-devel+help@lists.alpinelinux.org
> > ---
> >
>
>
>
> ---
> Unsubscribe:  alpine-devel+unsubscribe@lists.alpinelinux.org
> Help:         alpine-devel+help@lists.alpinelinux.org
> ---
>
>