~alpine/users

6 3

[alpine-user] Kernel crashes on [edge] (1st VM kill inr 606 days!)

Steffen Nurpmeso <steffen@sdaoden.eu>
Details
Message ID
<20181228140158.FsqDX%steffen@sdaoden.eu>
Sender timestamp
1546005718
DKIM signature
missing
Download raw message
Hello!

I am seeing kernel crashes with [edge] and 4.19.12 on my server
VM.  Last night i had two offline hours, luckily i was working
and noticed it once i tried to do my final mail sync.
Fortunately the IPSuite web interface exists, but it is very
unresponsive, hard to use, and sometimes freezes.  (Also locally
i had the chromium which produced masses of SEGV (signal 11
SEGV_MAPERR) and other errors behind the lines, which did not make
that better.)  I thus cannot tell anything, i tried a normal
reboot (openrc: sshd "aborted", dnsmasq process "not killable",
lots of red messages), which ended up in a busyloop of the two
CPUs: i had to "kill" the VM for the first time in 606 days.
I had no klog running, too.  My bad.

Now i am here again, happy to see server is alive.  However, there
was an "automatic" reboot (however that happened!), kill, and now
we have in /var/crit

  Feb 10 17:24:35 kernel: [    7.868094] grsec: time set by /usr/sbin/ntpd[ntpd:2340] uid/euid:0/0 gid/egid:0/0, parent /bin/busybox[init:1] uid/euid:0/0 gid/egid:0/0
  Dec 28 03:17:35 kernel: [    0.695542] efi: EFI_MEMMAP is not enabled.
forced restart
  Dec 28 12:15:00 kernel: [33760.768098] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  Dec 28 12:34:29 kernel: [    0.634793] efi: EFI_MEMMAP is not enabled.

in messages

  Dec 28 12:15:00 crond[2109]: USER root pid 3906 cmd run-parts /etc/periodic/12hourly
  Dec 28 12:15:00 kernel: [33760.774551] PGD 0 P4D 0
  Dec 28 12:30:00 crond[2109]: USER root pid 3967 cmd run-parts /etc/periodic/15min
  Dec 28 12:34:27 syslogd started: BusyBox v1.29.3
  Dec 28 12:34:27 crond[2105]: crond (busybox 1.29.3) started, log level 8
  Dec 28 12:34:29 kernel: klogd started: BusyBox v1.29.3 (2018-12-06 15:01:53 UTC)

and we do have a _lot_ of messages like

  Dec 28 12:31:00 kernel: [34719.914043] list_add corruption. prev->next should be next (ffff9273faffe2c0), but was ffff9273faffea40. (prev=ffff9273f75a52c0).
  Dec 28 12:31:00 kernel: [34719.919121] WARNING: CPU: 1 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x3c/0x67
  Dec 28 12:31:00 kernel: [34719.921763] Modules linked in: sch_sfq sch_htb nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_recent xt_conn>
  Dec 28 12:31:00 kernel: [34719.932160]  scsi_mod
  Dec 28 12:31:00 kernel: [34719.933585] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D W         4.19.12-0-vanilla #1-Alpine
  Dec 28 12:31:00 kernel: [34719.935116] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  Dec 28 12:31:00 kernel: [34719.936620] RIP: 0010:__list_add_valid+0x3c/0x67
  Dec 28 12:31:00 kernel: [34719.938155] Code: 48 e6 ba 96 48 89 c2 e8 94 f5 dd ff 0f 0b eb 1c 48 8b 10 4c 39 c2 74 17 48 89 c1 4c 89 c6 48 c7 c7 a9 e6 ba 96 e8 76 f5 dd f>
  Dec 28 12:31:00 kernel: [34719.941440] RSP: 0018:ffff9273fdb039d8 EFLAGS: 00010286
  Dec 28 12:31:00 kernel: [34719.943069] RAX: 0000000000000000 RBX: ffff9273faffe2b8 RCX: 000000000000083f
  Dec 28 12:31:00 kernel: [34719.944680] RDX: 0000000000000001 RSI: 00000000000000f6 RDI: 000000000000083f
  Dec 28 12:31:00 kernel: [34719.946300] RBP: ffff9273f75a52c0 R08: 0000000000000000 R09: 0000000000000000
  Dec 28 12:31:00 kernel: [34719.947930] R10: ffff9d950070c418 R11: 0000000000000008 R12: ffff9273faffe2c0
  Dec 28 12:31:00 kernel: [34719.949532] R13: ffff9273fce8df20 R14: ffff9273fb190000 R15: ffffffff96ea3e40
  Dec 28 12:31:00 kernel: [34719.951149] FS:  0000000000000000(0000) GS:ffff9273fdb00000(0000) knlGS:0000000000000000
  Dec 28 12:31:00 kernel: [34719.952780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  Dec 28 12:31:00 kernel: [34719.954453] CR2: 00005605f1f49228 CR3: 000000007cdca000 CR4: 00000000000006a0
  Dec 28 12:31:00 kernel: [34719.956122] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  Dec 28 12:31:00 kernel: [34719.957774] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Dec 28 12:31:00 kernel: [34719.959467] Call Trace:
  Dec 28 12:31:00 kernel: [34719.961133]  <IRQ>
  Dec 28 12:31:00 kernel: [34719.962801]  nf_conncount_add+0xbb/0xe3 [nf_conncount]
  Dec 28 12:31:00 kernel: [34719.964475]  nf_conncount_count+0x194/0x3c7 [nf_conncount]
  Dec 28 12:31:00 kernel: [34719.966156]  ? __kmalloc+0x13e/0x1b4
  Dec 28 12:31:00 kernel: [34719.967802]  connlimit_mt+0x122/0x160 [xt_connlimit]
  Dec 28 12:31:00 kernel: [34719.969421]  ipt_do_table+0x27c/0x5e0 [ip_tables]
  Dec 28 12:31:00 kernel: [34719.971028]  nf_hook_slow+0x3c/0x9b
  Dec 28 12:31:00 kernel: [34719.972629]  ip_local_deliver+0xac/0xda
  Dec 28 12:31:00 kernel: [34719.974223]  ? inet_del_offload+0x43/0x43
  Dec 28 12:31:00 kernel: [34719.975814]  ip_rcv+0xa3/0xc1
  Dec 28 12:31:00 kernel: [34719.977386]  ? ip_rcv_finish_core.isra.0+0x2eb/0x2eb
  Dec 28 12:31:00 kernel: [34719.978983]  __netif_receive_skb_one_core+0x52/0x6e
  Dec 28 12:31:00 kernel: [34719.980564]  netif_receive_skb_internal+0xb0/0xcf
  Dec 28 12:31:00 kernel: [34719.982140]  napi_gro_receive+0x8a/0xc0
  Dec 28 12:31:00 kernel: [34719.983754]  receive_buf+0xd2f/0xd53 [virtio_net]
  Dec 28 12:31:00 kernel: [34719.985357]  ? vring_unmap_one+0x1a/0x67 [virtio_ring]
  Dec 28 12:31:00 kernel: [34719.986964]  ? detach_buf+0x5f/0xf9 [virtio_ring]
  Dec 28 12:31:00 kernel: [34719.988562]  virtnet_poll+0x121/0x269 [virtio_net]
  Dec 28 12:31:00 kernel: [34719.990167]  net_rx_action+0x157/0x339
  Dec 28 12:31:00 kernel: [34719.991772]  __do_softirq+0x11c/0x284
  Dec 28 12:31:00 kernel: [34719.993344]  ? sched_clock+0x5/0x8
  Dec 28 12:31:00 kernel: [34719.994868]  irq_exit+0x71/0xb0
  Dec 28 12:31:00 kernel: [34719.996337]  do_IRQ+0xae/0xcc
  Dec 28 12:31:00 kernel: [34719.997757]  common_interrupt+0xf/0xf
  Dec 28 12:31:00 kernel: [34719.999141]  </IRQ>
  Dec 28 12:31:00 kernel: [34720.000457] RIP: 0010:native_safe_halt+0x2/0x3
  Dec 28 12:31:00 kernel: [34720.001744] Code: c9 65 48 8b 04 25 00 5c 01 00 f0 80 60 02 df f0 83 44 24 fc 00 48 8b 00 a8 08 74 0b 65 81 25 1d 80 b2 69 ff ff ff 7f c3 fb f>
  Dec 28 12:31:00 kernel: [34720.004381] RSP: 0018:ffff9d9500387eb0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffda
  Dec 28 12:31:00 kernel: [34720.005644] RAX: ffffffff964ed820 RBX: 0000000000000000 RCX: ffffffff96e4ee90
  Dec 28 12:31:00 kernel: [34720.006983] RDX: 0000000003cc63a6 RSI: 0000000000000087 RDI: 0000000000000087
  Dec 28 12:31:00 kernel: [34720.008241] RBP: 0000000000000000 R08: 00000000e8bac711 R09: 00000000000000e2
  Dec 28 12:31:00 kernel: [34720.009470] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
  Dec 28 12:31:00 kernel: [34720.010677] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  Dec 28 12:31:00 kernel: [34720.011830]  ? __sched_text_end+0x3/0x3
  Dec 28 12:31:00 kernel: [34720.012948]  default_idle+0x91/0x11d
  Dec 28 12:31:00 kernel: [34720.014040]  do_idle+0xf5/0x224
  Dec 28 12:31:00 kernel: [34720.015142]  cpu_startup_entry+0x6f/0x71
  Dec 28 12:31:00 kernel: [34720.016219]  start_secondary+0x197/0x1b2
  Dec 28 12:31:00 kernel: [34720.017293]  secondary_startup_64+0xa4/0xb0
  Dec 28 12:31:00 kernel: [34720.018364] ---[ end trace e43225a2be2ca9b8 ]---
  Dec 28 12:34:29 kernel: [   13.984661] xt_connbytes: Forcing CT accounting to be enabled

in warn and

  Dec 28 12:15:00 kernel: [33760.775634] Oops: 0000 [#1] SMP PTI
  Dec 28 12:15:00 kernel: [33760.776726] CPU: 0 PID: 3934 Comm: iptables Not tainted 4.19.12-0-vanilla #1-Alpine
  Dec 28 12:15:00 kernel: [33760.777864] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  Dec 28 12:15:00 kernel: [33760.779111] RIP: 0010:nf_conncount_cache_free+0x26/0x2f [nf_conncount]
  Dec 28 12:15:00 kernel: [33760.780345] Code: 2d 0e ae d5 66 66 66 66 90 55 53 48 8b 77 08 48 8d 5f 08 48 8b 2e 48 39 de 74 15 48 8b 3d 05 20 00 00 e8 0a 0e ae d5 48 89 e>
  Dec 28 12:15:00 kernel: [33760.782357] RSP: 0018:ffff9d9501c87d28 EFLAGS: 00010202
  Dec 28 12:15:00 kernel: [33760.783297] RAX: ffff9273f75a5101 RBX: ffff9273fb7344a0 RCX: 00000000802e0024
  Dec 28 12:15:00 kernel: [33760.784255] RDX: 00000000802e0025 RSI: 0000000000000000 RDI: ffff9273fd417080
  Dec 28 12:15:00 kernel: [33760.785229] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff9273f7760ae8
  Dec 28 12:15:00 kernel: [33760.786173] R10: 0000000000000401 R11: ffff9d9500374001 R12: ffff9273fb734480
  Dec 28 12:15:00 kernel: [33760.787113] R13: ffff9273fd4a7808 R14: ffffffff96ea3e40 R15: ffff9273fb17a118
  Dec 28 12:15:00 kernel: [33760.788077] FS:  00007f7ebb421b88(0000) GS:ffff9273fda00000(0000) knlGS:0000000000000000
  Dec 28 12:15:00 kernel: [33760.789085] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  Dec 28 12:15:00 kernel: [33760.790111] CR2: 0000000000000000 CR3: 0000000035e38000 CR4: 00000000000006b0
  Dec 28 12:15:00 kernel: [33760.791163] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  Dec 28 12:15:00 kernel: [33760.792212] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Dec 28 12:15:00 kernel: [33760.793279] Call Trace:
  Dec 28 12:15:00 kernel: [33760.794369]  nf_conncount_destroy+0x5a/0x82 [nf_conncount]
  Dec 28 12:15:00 kernel: [33760.795476]  cleanup_match+0x45/0x6d [ip_tables]
  Dec 28 12:15:00 kernel: [33760.796553]  cleanup_entry+0x3e/0xa8 [ip_tables]
  Dec 28 12:15:00 kernel: [33760.797630]  __do_replace+0x171/0x203 [ip_tables]
  Dec 28 12:15:00 kernel: [33760.798706]  do_ipt_set_ctl+0x133/0x195 [ip_tables]
  Dec 28 12:15:00 kernel: [33760.799821]  nf_setsockopt+0x4b/0x64
  Dec 28 12:15:00 kernel: [33760.800937]  __sys_setsockopt+0x8b/0xc1
  Dec 28 12:15:00 kernel: [33760.802054]  __x64_sys_setsockopt+0x20/0x23
  Dec 28 12:15:00 kernel: [33760.803184]  do_syscall_64+0x55/0xe4
  Dec 28 12:15:00 kernel: [33760.804318]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
  Dec 28 12:15:00 kernel: [33760.805483] RIP: 0033:0x7f7ebb3cc3a7
  Dec 28 12:15:00 kernel: [33760.806619] Code: 48 89 c7 e9 0b 35 fe ff c3 c3 31 c0 c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 f6 48 63 ff 45 89 c0 45 31 c9 b8 36 00 00 00 0f 0>
  Dec 28 12:15:00 kernel: [33760.809059] RSP: 002b:00007ffc00b970a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
  Dec 28 12:15:00 kernel: [33760.810302] RAX: ffffffffffffffda RBX: 000055b5e912d380 RCX: 00007f7ebb3cc3a7
  Dec 28 12:15:00 kernel: [33760.811571] RDX: 0000000000000040 RSI: 0000000000000000 RDI: 0000000000000004
  Dec 28 12:15:00 kernel: [33760.812872] RBP: 000055b5e9134fc0 R08: 00000000000073d0 R09: 0000000000000000
  Dec 28 12:15:00 kernel: [33760.814156] R10: 000055b5e912d380 R11: 0000000000000246 R12: 00007f7ebb38e7a0
  Dec 28 12:15:00 kernel: [33760.815450] R13: 000055b5e91346a0 R14: 00007f7ebb38e7a8 R15: 0000000000000082
  Dec 28 12:15:00 kernel: [33760.816791] Modules linked in: sch_sfq sch_htb nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_recent xt_conn>
  Dec 28 12:15:00 kernel: [33760.827676]  scsi_mod
  Dec 28 12:15:00 kernel: [33760.829326] CR2: 0000000000000000
  Dec 28 12:15:00 kernel: [33760.830963] ---[ end trace e43225a2be2ca9b3 ]---
  Dec 28 12:15:00 kernel: [33760.832576] RIP: 0010:nf_conncount_cache_free+0x26/0x2f [nf_conncount]
  Dec 28 12:15:00 kernel: [33760.834198] Code: 2d 0e ae d5 66 66 66 66 90 55 53 48 8b 77 08 48 8d 5f 08 48 8b 2e 48 39 de 74 15 48 8b 3d 05 20 00 00 e8 0a 0e ae d5 48 89 e>
  Dec 28 12:15:00 kernel: [33760.837514] RSP: 0018:ffff9d9501c87d28 EFLAGS: 00010202
  Dec 28 12:15:00 kernel: [33760.839162] RAX: ffff9273f75a5101 RBX: ffff9273fb7344a0 RCX: 00000000802e0024
  Dec 28 12:15:00 kernel: [33760.840782] RDX: 00000000802e0025 RSI: 0000000000000000 RDI: ffff9273fd417080
  Dec 28 12:15:00 kernel: [33760.842395] RBP: 0000000000000000 R08: 0000000000000001 R09: ffff9273f7760ae8
  Dec 28 12:15:00 kernel: [33760.843945] R10: 0000000000000401 R11: ffff9d9500374001 R12: ffff9273fb734480
  Dec 28 12:15:00 kernel: [33760.845519] R13: ffff9273fd4a7808 R14: ffffffff96ea3e40 R15: ffff9273fb17a118
  Dec 28 12:15:00 kernel: [33760.847049] FS:  00007f7ebb421b88(0000) GS:ffff9273fda00000(0000) knlGS:0000000000000000
  Dec 28 12:15:00 kernel: [33760.848631] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  Dec 28 12:15:00 kernel: [33760.850205] CR2: 0000000000000000 CR3: 0000000035e38000 CR4: 00000000000006b0
  Dec 28 12:15:00 kernel: [33760.851794] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  Dec 28 12:15:00 kernel: [33760.853400] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Dec 28 12:20:48 kernel: [34107.728471] ------------[ cut here ]------------

in the older warn.0.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


---
Unsubscribe:  alpine-user+unsubscribe@lists.alpinelinux.org
Help:         alpine-user+help@lists.alpinelinux.org
---
Steffen Nurpmeso <steffen@sdaoden.eu>
Details
Message ID
<20181231182229.t_LWK%steffen@sdaoden.eu>
In-Reply-To
<20181228140158.FsqDX%steffen@sdaoden.eu> (view parent)
Sender timestamp
1546280549
DKIM signature
missing
Download raw message
Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:
 |I am seeing kernel crashes with [edge] and 4.19.12 on my server
 ...
 |and we do have a _lot_ of messages like
 |
 |  Dec 28 12:31:00 kernel: [34719.914043] list_add corruption. prev->next \
 |  should be next (ffff9273faffe2c0), but was ffff9273faffea40. (prev=ffff9\
 |  273f75a52c0).
 |  Dec 28 12:31:00 kernel: [34719.919121] WARNING: CPU: 1 PID: 0 at \
 |  lib/list_debug.c:28 __list_add_valid+0x3c/0x67
 |  Dec 28 12:31:00 kernel: [34719.921763] Modules linked in: sch_sfq \
 |  sch_htb nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_i\
 |  pv4 xt_tcpudp xt_recent xt_conn>
 ...
 |  Dec 28 12:15:00 kernel: [33760.775634] Oops: 0000 [#1] SMP PTI
 |  Dec 28 12:15:00 kernel: [33760.776726] CPU: 0 PID: 3934 Comm: iptables \
 |  Not tainted 4.19.12-0-vanilla #1-Alpine
 |  Dec 28 12:15:00 kernel: [33760.777864] Hardware name: QEMU Standard \
 |  PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 |  Dec 28 12:15:00 kernel: [33760.779111] RIP: 0010:nf_conncount_cache_free\
 |  +0x26/0x2f [nf_conncount]
  ...

Out of interest after seeing the 4.19.13 announcement on Saturday
i think i looked into that, and after that did not seem to mention
anything regarding xt_conntrack, into the git of iptables.  Indeed
there have been some commits (on master branch) that could fit to
list corruption about two weeks ago.  Maybe it comes with 4.19.14,
what do you think?  I have reverted the machine in the meantime
(luckily i always have c. urrent and o. ld kernels and modules, so
that is easy), 4.14.89 works absolutely neatless.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


---
Unsubscribe:  alpine-user+unsubscribe@lists.alpinelinux.org
Help:         alpine-user+help@lists.alpinelinux.org
---
Christian Kujau <lists@nerdbynature.de>
Details
Message ID
<alpine.DEB.2.21.999.1901111650590.1841@trent.utfs.org>
In-Reply-To
<20190111233244.B2bX-%steffen@sdaoden.eu> (view parent)
Sender timestamp
1547255233
DKIM signature
missing
Download raw message
On Sat, 12 Jan 2019, Steffen Nurpmeso wrote:
>   Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>   Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000

In the other mails you cited:

  list_add corruption. prev->next should be next (ffff9273faffe2c0), but was ffff9273faffea40. (prev=ffff9273f75a52c0). 

This particular message has been reported in numerous places[0][1][2], but 
for older kernels. Some reports[0] suggest to disable huge pages - maybe 
try that?

> I have reverted to 4.14.89.  I have never done this, but i think
> this should be reported to Linux kernel list, then?  pffffff...

Definitely worth a shot, IMHO. Be sure to CC the netdev list for the 
netfilter messages, although these may be just a red herring.

Good luck,
C.

[0] https://support.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0131607
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1507173
[2] https://bugs.openvz.org/browse/OVZ-5620
-- 
BOFH excuse #40:

not enough memory, go get system upgrade


---
Unsubscribe:  alpine-user+unsubscribe@lists.alpinelinux.org
Help:         alpine-user+help@lists.alpinelinux.org
---
Steffen Nurpmeso <steffen@sdaoden.eu>
Details
Message ID
<20190111233244.B2bX-%steffen@sdaoden.eu>
In-Reply-To
<20181231182229.t_LWK%steffen@sdaoden.eu> (view parent)
Sender timestamp
1547249564
DKIM signature
missing
Download raw message
Hello.

Steffen Nurpmeso wrote in <20181231182229.t_LWK%steffen@sdaoden.eu>:
 |Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:
 ||I am seeing kernel crashes with [edge] and 4.19.12 on my server
  ...
 |Out of interest after seeing the 4.19.13 announcement on Saturday
 ...
 |what do you think?  I have reverted the machine in the meantime
 |(luckily i always have c. urrent and o. ld kernels and modules, so
 |that is easy), 4.14.89 works absolutely neatless.

I have updated to 4.19.14, and the issue still exists on my server
VM:

crit:
  Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
warn:
  Jan 12 00:15:00 kernel: [36690.023028] Oops: 0000 [#1] SMP PTI
  Jan 12 00:15:00 kernel: [36690.024368] CPU: 0 PID: 3708 Comm: iptables Not tainted 4.19.14-0-vanilla #1-Alpine
  Jan 12 00:15:00 kernel: [36690.025679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
  Jan 12 00:15:00 kernel: [36690.027056] RIP: 0010:nf_conncount_cache_free+0x26/0p
messages:
  Jan 12 00:15:00 crond[2046]: USER root pid 3677 cmd run-parts /etc/periodic/12hourly
  Jan 12 00:15:00 kernel: [36690.021645] PGD 0 P4D 0

That periodic script outputs sort(1)ed entries from xt_recent, and
shows the state of the firewall.
I have reverted to 4.14.89.  I have never done this, but i think
this should be reported to Linux kernel list, then?  pffffff...
Ciao and a nice weekend.

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


---
Unsubscribe:  alpine-user+unsubscribe@lists.alpinelinux.org
Help:         alpine-user+help@lists.alpinelinux.org
---
Natanael Copa <ncopa@alpinelinux.org>
Details
Message ID
<20190112125421.2330ddc9@ncopa-desktop.copa.dup.pw>
In-Reply-To
<20190111233244.B2bX-%steffen@sdaoden.eu> (view parent)
Sender timestamp
1547294061
DKIM signature
missing
Download raw message
On Sat, 12 Jan 2019 00:32:44 +0100
Steffen Nurpmeso <steffen@sdaoden.eu> wrote:

> Hello.
> 
> Steffen Nurpmeso wrote in <20181231182229.t_LWK%steffen@sdaoden.eu>:
>  |Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:
>  ||I am seeing kernel crashes with [edge] and 4.19.12 on my server
>   ...
>  |Out of interest after seeing the 4.19.13 announcement on Saturday
>  ...
>  |what do you think?  I have reverted the machine in the meantime
>  |(luckily i always have c. urrent and o. ld kernels and modules, so
>  |that is easy), 4.14.89 works absolutely neatless.
> 
> I have updated to 4.19.14, and the issue still exists on my server
> VM:
> 
> crit:
>   Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>   Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> warn:
>   Jan 12 00:15:00 kernel: [36690.023028] Oops: 0000 [#1] SMP PTI
>   Jan 12 00:15:00 kernel: [36690.024368] CPU: 0 PID: 3708 Comm: iptables Not tainted 4.19.14-0-vanilla #1-Alpine
>   Jan 12 00:15:00 kernel: [36690.025679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
>   Jan 12 00:15:00 kernel: [36690.027056] RIP: 0010:nf_conncount_cache_free+0x26/0p
> messages:
>   Jan 12 00:15:00 crond[2046]: USER root pid 3677 cmd run-parts /etc/periodic/12hourly
>   Jan 12 00:15:00 kernel: [36690.021645] PGD 0 P4D 0
> 
> That periodic script outputs sort(1)ed entries from xt_recent, and
> shows the state of the firewall.
> I have reverted to 4.14.89.  I have never done this, but i think
> this should be reported to Linux kernel list, then?  pffffff...
> Ciao and a nice weekend.

Please report this upstream. https://bugzilla.kernel.org/

-nc


---
Unsubscribe:  alpine-user+unsubscribe@lists.alpinelinux.org
Help:         alpine-user+help@lists.alpinelinux.org
---
Steffen Nurpmeso <steffen@sdaoden.eu>
Details
Message ID
<20190112134852.cMiIO%steffen@sdaoden.eu>
In-Reply-To
<20190112125421.2330ddc9@ncopa-desktop.copa.dup.pw> (view parent)
Sender timestamp
1547300932
DKIM signature
missing
Download raw message
Natanael Copa wrote in <20190112125421.2330ddc9@ncopa-desktop.copa.dup.pw>:
 |On Sat, 12 Jan 2019 00:32:44 +0100
 |Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
 |> Steffen Nurpmeso wrote in <20181231182229.t_LWK%steffen@sdaoden.eu>:
 |>|Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:
 |>||I am seeing kernel crashes with [edge] and 4.19.12 on my server
 |>   ...
 |>|Out of interest after seeing the 4.19.13 announcement on Saturday
 ...
 |> I have updated to 4.19.14, and the issue still exists on my server
 |> VM:
 |> 
 |> crit:
 |>   Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel \
 |>   NULL pointer dereference at 0000000000000000
 |>   Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel \
 |>   NULL pointer dereference at 0000000000000000
 |> warn:
 |>   Jan 12 00:15:00 kernel: [36690.023028] Oops: 0000 [#1] SMP PTI
 |>   Jan 12 00:15:00 kernel: [36690.024368] CPU: 0 PID: 3708 Comm: iptables \
 |>   Not tainted 4.19.14-0-vanilla #1-Alpine
 |>   Jan 12 00:15:00 kernel: [36690.025679] Hardware name: QEMU Standard \
 |>   PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
 |>   Jan 12 00:15:00 kernel: [36690.027056] RIP: 0010:nf_conncount_cache_fre\
 |>   e+0x26/0p
 |> messages:
 |>   Jan 12 00:15:00 crond[2046]: USER root pid 3677 cmd run-parts /etc/peri\
 |>   odic/12hourly
 |>   Jan 12 00:15:00 kernel: [36690.021645] PGD 0 P4D 0
 ...
 |Please report this upstream. https://bugzilla.kernel.org/

I have posted this to linux-kernel@vger.kernel.org, and it seems
the message came through.  24000 messages a month.. but i hope
the subject attracts the right person(s).  This is meant for bugs
too, says the list overview page?  (Bugzilla, oh. my. Please not.)

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


---
Unsubscribe:  alpine-user+unsubscribe@lists.alpinelinux.org
Help:         alpine-user+help@lists.alpinelinux.org
---
Steffen Nurpmeso <steffen@sdaoden.eu>
Details
Message ID
<20190112135858.q806V%steffen@sdaoden.eu>
In-Reply-To
<alpine.DEB.2.21.999.1901111650590.1841@trent.utfs.org> (view parent)
Sender timestamp
1547301538
DKIM signature
missing
Download raw message
Christian Kujau wrote in <alpine.DEB.2.21.999.1901111650590.1841@trent.u\
tfs.org>:
 |On Sat, 12 Jan 2019, Steffen Nurpmeso wrote:
 |>   Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel \
 |>   NULL pointer dereference at 0000000000000000
 |>   Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel \
 |>   NULL pointer dereference at 0000000000000000
 |
 |In the other mails you cited:
 |
 |  list_add corruption. prev->next should be next (ffff9273faffe2c0), \
 |  but was ffff9273faffea40. (prev=ffff9273f75a52c0). 
 |
 |This particular message has been reported in numerous places[0][1][2], but 
 |for older kernels. Some reports[0] suggest to disable huge pages - maybe 
 |try that?

I will keep this suggestion, and look at those.  Thanks!
And it seems i have forgotten some stuff from my Alpine posting in
December, hmm.  I had looked at

 |> I have reverted to 4.14.89.  I have never done this, but i think
 |> this should be reported to Linux kernel list, then?  pffffff...
 |
 |Definitely worth a shot, IMHO. Be sure to CC the netdev list for the 
 |netfilter messages, although these may be just a red herring.

This i have not done.  In fact i misused the mail for thanking
the kernel guys, i am sure they will all read it.  Maybe i should
have sent it to the netfilter list, i have searched for "bug" on
the ML overview page.

 |Good luck,
 |C.

Yours too.  Thanks!

 |[0] https://support.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0131607
 |[1] https://bugzilla.redhat.com/show_bug.cgi?id=1507173
 |[2] https://bugs.openvz.org/browse/OVZ-5620
 |-- 
 |BOFH excuse #40:
 |
 |not enough memory, go get system upgrade
 --End of <alpine.DEB.2.21.999.1901111650590.1841@trent.utfs.org>

--steffen
|
|Der Kragenbaer,                The moon bear,
|der holt sich munter           he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)


---
Unsubscribe:  alpine-user+unsubscribe@lists.alpinelinux.org
Help:         alpine-user+help@lists.alpinelinux.org
---
Reply to thread Export thread (mbox)