Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:
|I am seeing kernel crashes with [edge] and 4.19.12 on my server
...
|and we do have a _lot_ of messages like
|
| Dec 28 12:31:00 kernel: [34719.914043] list_add corruption. prev->next \
| should be next (ffff9273faffe2c0), but was ffff9273faffea40. (prev=ffff9\
| 273f75a52c0).
| Dec 28 12:31:00 kernel: [34719.919121] WARNING: CPU: 1 PID: 0 at \
| lib/list_debug.c:28 __list_add_valid+0x3c/0x67
| Dec 28 12:31:00 kernel: [34719.921763] Modules linked in: sch_sfq \
| sch_htb nf_log_ipv4 nf_log_common xt_LOG xt_limit ipt_REJECT nf_reject_i\
| pv4 xt_tcpudp xt_recent xt_conn>
...
| Dec 28 12:15:00 kernel: [33760.775634] Oops: 0000 [#1] SMP PTI
| Dec 28 12:15:00 kernel: [33760.776726] CPU: 0 PID: 3934 Comm: iptables \
| Not tainted 4.19.12-0-vanilla #1-Alpine
| Dec 28 12:15:00 kernel: [33760.777864] Hardware name: QEMU Standard \
| PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
| Dec 28 12:15:00 kernel: [33760.779111] RIP: 0010:nf_conncount_cache_free\
| +0x26/0x2f [nf_conncount]
...
Out of interest after seeing the 4.19.13 announcement on Saturday
i think i looked into that, and after that did not seem to mention
anything regarding xt_conntrack, into the git of iptables. Indeed
there have been some commits (on master branch) that could fit to
list corruption about two weeks ago. Maybe it comes with 4.19.14,
what do you think? I have reverted the machine in the meantime
(luckily i always have c. urrent and o. ld kernels and modules, so
that is easy), 4.14.89 works absolutely neatless.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
On Sat, 12 Jan 2019, Steffen Nurpmeso wrote:
> Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000> Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
In the other mails you cited:
list_add corruption. prev->next should be next (ffff9273faffe2c0), but was ffff9273faffea40. (prev=ffff9273f75a52c0).
This particular message has been reported in numerous places[0][1][2], but
for older kernels. Some reports[0] suggest to disable huge pages - maybe
try that?
> I have reverted to 4.14.89. I have never done this, but i think> this should be reported to Linux kernel list, then? pffffff...
Definitely worth a shot, IMHO. Be sure to CC the netdev list for the
netfilter messages, although these may be just a red herring.
Good luck,
C.
[0] https://support.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0131607
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1507173
[2] https://bugs.openvz.org/browse/OVZ-5620
--
BOFH excuse #40:
not enough memory, go get system upgrade
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
Hello.
Steffen Nurpmeso wrote in <20181231182229.t_LWK%steffen@sdaoden.eu>:
|Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:
||I am seeing kernel crashes with [edge] and 4.19.12 on my server
...
|Out of interest after seeing the 4.19.13 announcement on Saturday
...
|what do you think? I have reverted the machine in the meantime
|(luckily i always have c. urrent and o. ld kernels and modules, so
|that is easy), 4.14.89 works absolutely neatless.
I have updated to 4.19.14, and the issue still exists on my server
VM:
crit:
Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
warn:
Jan 12 00:15:00 kernel: [36690.023028] Oops: 0000 [#1] SMP PTI
Jan 12 00:15:00 kernel: [36690.024368] CPU: 0 PID: 3708 Comm: iptables Not tainted 4.19.14-0-vanilla #1-Alpine
Jan 12 00:15:00 kernel: [36690.025679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Jan 12 00:15:00 kernel: [36690.027056] RIP: 0010:nf_conncount_cache_free+0x26/0p
messages:
Jan 12 00:15:00 crond[2046]: USER root pid 3677 cmd run-parts /etc/periodic/12hourly
Jan 12 00:15:00 kernel: [36690.021645] PGD 0 P4D 0
That periodic script outputs sort(1)ed entries from xt_recent, and
shows the state of the firewall.
I have reverted to 4.14.89. I have never done this, but i think
this should be reported to Linux kernel list, then? pffffff...
Ciao and a nice weekend.
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
On Sat, 12 Jan 2019 00:32:44 +0100
Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
> Hello.> > Steffen Nurpmeso wrote in <20181231182229.t_LWK%steffen@sdaoden.eu>:> |Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:> ||I am seeing kernel crashes with [edge] and 4.19.12 on my server> ...> |Out of interest after seeing the 4.19.13 announcement on Saturday> ...> |what do you think? I have reverted the machine in the meantime> |(luckily i always have c. urrent and o. ld kernels and modules, so> |that is easy), 4.14.89 works absolutely neatless.> > I have updated to 4.19.14, and the issue still exists on my server> VM:> > crit:> Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000> Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000> warn:> Jan 12 00:15:00 kernel: [36690.023028] Oops: 0000 [#1] SMP PTI> Jan 12 00:15:00 kernel: [36690.024368] CPU: 0 PID: 3708 Comm: iptables Not tainted 4.19.14-0-vanilla #1-Alpine> Jan 12 00:15:00 kernel: [36690.025679] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014> Jan 12 00:15:00 kernel: [36690.027056] RIP: 0010:nf_conncount_cache_free+0x26/0p> messages:> Jan 12 00:15:00 crond[2046]: USER root pid 3677 cmd run-parts /etc/periodic/12hourly> Jan 12 00:15:00 kernel: [36690.021645] PGD 0 P4D 0> > That periodic script outputs sort(1)ed entries from xt_recent, and> shows the state of the firewall.> I have reverted to 4.14.89. I have never done this, but i think> this should be reported to Linux kernel list, then? pffffff...> Ciao and a nice weekend.
Please report this upstream. https://bugzilla.kernel.org/
-nc
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
Natanael Copa wrote in <20190112125421.2330ddc9@ncopa-desktop.copa.dup.pw>:
|On Sat, 12 Jan 2019 00:32:44 +0100
|Steffen Nurpmeso <steffen@sdaoden.eu> wrote:
|> Steffen Nurpmeso wrote in <20181231182229.t_LWK%steffen@sdaoden.eu>:
|>|Steffen Nurpmeso wrote in <20181228140158.FsqDX%steffen@sdaoden.eu>:
|>||I am seeing kernel crashes with [edge] and 4.19.12 on my server
|> ...
|>|Out of interest after seeing the 4.19.13 announcement on Saturday
...
|> I have updated to 4.19.14, and the issue still exists on my server
|> VM:
|>
|> crit:
|> Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel \
|> NULL pointer dereference at 0000000000000000
|> Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel \
|> NULL pointer dereference at 0000000000000000
|> warn:
|> Jan 12 00:15:00 kernel: [36690.023028] Oops: 0000 [#1] SMP PTI
|> Jan 12 00:15:00 kernel: [36690.024368] CPU: 0 PID: 3708 Comm: iptables \
|> Not tainted 4.19.14-0-vanilla #1-Alpine
|> Jan 12 00:15:00 kernel: [36690.025679] Hardware name: QEMU Standard \
|> PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
|> Jan 12 00:15:00 kernel: [36690.027056] RIP: 0010:nf_conncount_cache_fre\
|> e+0x26/0p
|> messages:
|> Jan 12 00:15:00 crond[2046]: USER root pid 3677 cmd run-parts /etc/peri\
|> odic/12hourly
|> Jan 12 00:15:00 kernel: [36690.021645] PGD 0 P4D 0
...
|Please report this upstream. https://bugzilla.kernel.org/
I have posted this to linux-kernel@vger.kernel.org, and it seems
the message came through. 24000 messages a month.. but i hope
the subject attracts the right person(s). This is meant for bugs
too, says the list overview page? (Bugzilla, oh. my. Please not.)
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---
Christian Kujau wrote in <alpine.DEB.2.21.999.1901111650590.1841@trent.u\
tfs.org>:
|On Sat, 12 Jan 2019, Steffen Nurpmeso wrote:
|> Dec 29 00:15:01 kernel: [23338.689515] BUG: unable to handle kernel \
|> NULL pointer dereference at 0000000000000000
|> Jan 12 00:15:00 kernel: [36690.017115] BUG: unable to handle kernel \
|> NULL pointer dereference at 0000000000000000
|
|In the other mails you cited:
|
| list_add corruption. prev->next should be next (ffff9273faffe2c0), \
| but was ffff9273faffea40. (prev=ffff9273f75a52c0).
|
|This particular message has been reported in numerous places[0][1][2], but
|for older kernels. Some reports[0] suggest to disable huge pages - maybe
|try that?
I will keep this suggestion, and look at those. Thanks!
And it seems i have forgotten some stuff from my Alpine posting in
December, hmm. I had looked at
|> I have reverted to 4.14.89. I have never done this, but i think
|> this should be reported to Linux kernel list, then? pffffff...
|
|Definitely worth a shot, IMHO. Be sure to CC the netdev list for the
|netfilter messages, although these may be just a red herring.
This i have not done. In fact i misused the mail for thanking
the kernel guys, i am sure they will all read it. Maybe i should
have sent it to the netfilter list, i have searched for "bug" on
the ML overview page.
|Good luck,
|C.
Yours too. Thanks!
|[0] https://support.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0131607
|[1] https://bugzilla.redhat.com/show_bug.cgi?id=1507173
|[2] https://bugs.openvz.org/browse/OVZ-5620
|--
|BOFH excuse #40:
|
|not enough memory, go get system upgrade
--End of <alpine.DEB.2.21.999.1901111650590.1841@trent.utfs.org>
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
---
Unsubscribe: alpine-user+unsubscribe@lists.alpinelinux.org
Help: alpine-user+help@lists.alpinelinux.org
---