Frequent network code panics

Discussion:

(too old to reply)

Hauke Fath

2017-03-27 09:40:47 UTC

Hi,

with a pair of netbsd-7 amd64 carp/pf routers, I see the primary router
panic and reboot frequently with either something like

fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff801dc5b9 cs 8 rflags 10246 cr2 20 ilevel
6 rsp fffffe810e876b50
curlwp 0xfffffe821e73b420 pid 0.3 lowest kstack 0xfffffe810e8742c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
pf_test() at netbsd:pf_test+0xeaa
pfil4_wrapper() at netbsd:pfil4_wrapper+0x84
pfil_run_hooks() at netbsd:pfil_run_hooks+0xc4
ip_output() at netbsd:ip_output+0x38f
ip_forward() at netbsd:ip_forward+0x16b
ipintr() at netbsd:ipintr+0x2d4
softint_dispatch() at netbsd:softint_dispatch+0x7d
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e876ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
cpu0: End traceback...
rebooting...

or, with either ixg0 of wm0,

fatal protection fault in supervisor mode
trap type 4 code 0 rip ffffffff801d5cab cs 8 rflags 10202 cr2
7f7ff7b4c000 ilevel 6 rsp fffffe810e873c40
curlwp 0xfffffe821e73b840 pid 0.2 lowest kstack 0xfffffe810e8712c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
ixgbe_rxeof.isra.9() at netbsd:ixgbe_rxeof.isra.9+0x14a
ixgbe_legacy_irq() at netbsd:ixgbe_legacy_irq+0xcf
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x19
Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xf2
--- interrupt ---
x86_mwait() at netbsd:x86_mwait+0xd
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xc2
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0x6d
idle_loop() at netbsd:idle_loop+0xe8
cpu0: End traceback...
rebooting...

Do these stack traces ring a bell for anyone?

Cheerio,
hauke

--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-21344

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Masanobu SAITOH

2017-03-28 05:16:05 UTC

Permalink

Post by Hauke Fath
Hi,
with a pair of netbsd-7 amd64 carp/pf routers, I see the primary router
panic and reboot frequently with either something like
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff801dc5b9 cs 8 rflags 10246 cr2 20 ilevel
6 rsp fffffe810e876b50
curlwp 0xfffffe821e73b420 pid 0.3 lowest kstack 0xfffffe810e8742c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
pf_test() at netbsd:pf_test+0xeaa
pfil4_wrapper() at netbsd:pfil4_wrapper+0x84
pfil_run_hooks() at netbsd:pfil_run_hooks+0xc4
ip_output() at netbsd:ip_output+0x38f
ip_forward() at netbsd:ip_forward+0x16b
ipintr() at netbsd:ipintr+0x2d4
softint_dispatch() at netbsd:softint_dispatch+0x7d
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e876ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
cpu0: End traceback...
rebooting...
or, with either ixg0 of wm0,

ixg0 "or" wm0?

Could you show me the stack trace of wm?

Post by Hauke Fath
fatal protection fault in supervisor mode
trap type 4 code 0 rip ffffffff801d5cab cs 8 rflags 10202 cr2
7f7ff7b4c000 ilevel 6 rsp fffffe810e873c40
curlwp 0xfffffe821e73b840 pid 0.2 lowest kstack 0xfffffe810e8712c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
ixgbe_rxeof.isra.9() at netbsd:ixgbe_rxeof.isra.9+0x14a
ixgbe_legacy_irq() at netbsd:ixgbe_legacy_irq+0xcf
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x19
Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xf2
--- interrupt ---
x86_mwait() at netbsd:x86_mwait+0xd
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xc2
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0x6d
idle_loop() at netbsd:idle_loop+0xe8
cpu0: End traceback...
rebooting...
Do these stack traces ring a bell for anyone?
Cheerio,
hauke

--
-----------------------------------------------
SAITOH Masanobu (***@execsw.org
***@netbsd.org)

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Hauke Fath

2017-03-28 07:07:42 UTC

Permalink

Post by Masanobu SAITOH

Post by Hauke Fath
or, with either ixg0 of wm0,

ixg0 "or" wm0?

Yes, of course, sorry.

Post by Masanobu SAITOH
Could you show me the stack trace of wm?

fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff801d5cab cs 8 rflags 10202 cr2 20000000c
ilevel 6 rsp fffffe810e873c88
curlwp 0xfffffe821e73b840 pid 0.2 lowest kstack 0xfffffe810e8712c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
wm_intr() at netbsd:wm_intr+0x3e0
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x19
Xintr_ioapic_level1() at netbsd:Xintr_ioapic_level1+0xf2
--- interrupt ---
x86_mwait() at netbsd:x86_mwait+0xd
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xc2
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0x6d
idle_loop() at netbsd:idle_loop+0xe8
cpu0: End traceback...
rebooting...

Cheerio,
hauke

Masanobu SAITOH

2017-03-29 04:54:29 UTC

Permalink

Post by Hauke Fath

Post by Masanobu SAITOH

Post by Hauke Fath
or, with either ixg0 of wm0,

ixg0 "or" wm0?

Yes, of course, sorry.

Post by Masanobu SAITOH
Could you show me the stack trace of wm?

someone (carp, pf or othes) broke memory?

Source code line of the following two address
might show what happened, but it would hard
to know the reason...

Post by Hauke Fath
wm_intr() at netbsd:wm_intr+0x3e0
ixgbe_rxeof.isra.9() at netbsd:ixgbe_rxeof.isra.9+0x14a

Hauke Fath

2017-03-29 06:52:14 UTC

Permalink

Post by Masanobu SAITOH
someone (carp, pf or othes) broke memory?

"broke" meaning what, exactly? Locking issues, stack corruption?

I seem to sense a pattern related to (internal) traffic - the primary
is more likely to crash during nightly backup runs, or on busy days
like Mondays.

Cheerio,
Hauke

Masanobu SAITOH

2017-03-29 07:15:31 UTC

Permalink

Post by Hauke Fath

Post by Masanobu SAITOH
someone (carp, pf or othes) broke memory?

"broke" meaning what, exactly? Locking issues, stack corruption?

netbsd-7's network stack is not MP capable, so I suspect it's not
locking issues. Stack corruption, freeing mbuf wrongly, or something
else...

Post by Hauke Fath
I seem to sense a pattern related to (internal) traffic - the primary
is more likely to crash during nightly backup runs, or on busy days
like Mondays.
Cheerio,
Hauke