Discussion:
netbsd-7 panic in rn_walknext via sysctl and rtsock
(too old to reply)
Stephen Borrill
2018-11-08 14:39:16 UTC
Permalink
With a kernel from 2018-11-01 netbsd-7 sources:

(gdb) bt
#0 0xffffffff8065d56f in cpu_reboot (howto=***@entry=260,
bootstr=***@entry=0x0)
at /usr/src/7.0/sys/arch/amd64/amd64/machdep.c:671
#1 0xffffffff80894242 in vpanic (fmt=***@entry=0xffffffff80d50632 "trap",
ap=***@entry=0xfffffe813a780a60) at /usr/src/7.0/sys/kern/subr_prf.c:340
#2 0xffffffff808942fd in panic (fmt=***@entry=0xffffffff80d50632 "trap")
at /usr/src/7.0/sys/kern/subr_prf.c:256
#3 0xffffffff808d7676 in trap (frame=0xfffffe813a780b80)
at /usr/src/7.0/sys/arch/amd64/amd64/trap.c:304
#4 0xffffffff80100f26 in alltraps ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
#6 rn_walktree (h=<optimized out>,
f=***@entry=0xffffffff807f7d70 <rt_walktree_visitor>,
w=***@entry=0xfffffe813a780ca8) at /usr/src/7.0/sys/net/radix.c:992
#7 0xffffffff807f7ea2 in rt_walktree (family=***@entry=2 '\002',
f=***@entry=0xffffffff807fe930 <sysctl_dumpentry>,
v=***@entry=0xfffffe813a780d08) at /usr/src/7.0/sys/net/rtbl.c:204
#8 0xffffffff807fee7f in sysctl_rtable (name=0xfffffe813a780e5c,
namelen=<optimized out>, oldp=0x1bcad2000, oldlenp=0xfffffe813a780e48,
newp=<optimized out>, newlen=<optimized out>, oname=0xfffffe813a780e50,
l=0xfffffe8831a25240, rnode=0xfffffe813a5bf008)
at /usr/src/7.0/sys/net/rtsock.c:1417
#9 0xffffffff806035ac in sysctl_dispatch (name=***@entry=0xfffffe813a780e50,
namelen=6, oldp=0x1bcad2000, oldlenp=***@entry=0xfffffe813a780e48,
newp=0x0, newlen=0, oname=***@entry=0xfffffe813a780e50,
l=***@entry=0xfffffe8831a25240, rnode=0xfffffe813a5bf008, ***@entry=0x0)
at /usr/src/7.0/sys/kern/kern_sysctl.c:451
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
#11 0xffffffff808af2ca in sy_call (rval=0xfffffe813a780eb8,
uap=0xfffffe813a780f00, l=0xfffffe8831a25240,
sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:61
#12 sy_invoke (code=202, rval=0xfffffe813a780eb8, uap=0xfffffe813a780f00,
l=0xfffffe8831a25240, sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:85
#13 syscall (frame=0xfffffe813a780f00)
at /usr/src/7.0/sys/arch/x86/x86/syscall.c:156
#14 0xffffffff80100691 in Xsyscall ()


#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
959 rn = rn->rn_l;


#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
307 error = sysctl_dispatch(&name[0], SCARG(uap, namelen),
(gdb) print name[0]
$3 = 4

I have a core, any suggestions on what to get from gdb?
--
Stephen


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Ryota Ozaki
2018-11-12 09:13:49 UTC
Permalink
Post by Stephen Borrill
(gdb) bt
at /usr/src/7.0/sys/arch/amd64/amd64/machdep.c:671
at /usr/src/7.0/sys/kern/subr_prf.c:256
#3 0xffffffff808d7676 in trap (frame=0xfffffe813a780b80)
at /usr/src/7.0/sys/arch/amd64/amd64/trap.c:304
#4 0xffffffff80100f26 in alltraps ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
#6 rn_walktree (h=<optimized out>,
#8 0xffffffff807fee7f in sysctl_rtable (name=0xfffffe813a780e5c,
namelen=<optimized out>, oldp=0x1bcad2000, oldlenp=0xfffffe813a780e48,
newp=<optimized out>, newlen=<optimized out>, oname=0xfffffe813a780e50,
l=0xfffffe8831a25240, rnode=0xfffffe813a5bf008)
at /usr/src/7.0/sys/net/rtsock.c:1417
at /usr/src/7.0/sys/kern/kern_sysctl.c:451
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
#11 0xffffffff808af2ca in sy_call (rval=0xfffffe813a780eb8,
uap=0xfffffe813a780f00, l=0xfffffe8831a25240,
sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:61
#12 sy_invoke (code=202, rval=0xfffffe813a780eb8, uap=0xfffffe813a780f00,
l=0xfffffe8831a25240, sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:85
#13 syscall (frame=0xfffffe813a780f00)
at /usr/src/7.0/sys/arch/x86/x86/syscall.c:156
#14 0xffffffff80100691 in Xsyscall ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
959 rn = rn->rn_l;
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
307 error = sysctl_dispatch(&name[0], SCARG(uap, namelen),
(gdb) print name[0]
$3 = 4
I have a core, any suggestions on what to get from gdb?
--
Stephen
Perhaps there were parallel accesses to the routing table and sysctl
touched a corrupted entry.

sysctl_rtable probably needs softnet_lock and/or KERNEL_LOCK.
Adding them around splsoftnet in sysctl_rtable would fix the panic.

Regards,
ozaki-r

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Stephen Borrill
2019-02-08 14:09:57 UTC
Permalink
Post by Ryota Ozaki
Post by Stephen Borrill
(gdb) bt
at /usr/src/7.0/sys/arch/amd64/amd64/machdep.c:671
at /usr/src/7.0/sys/kern/subr_prf.c:256
#3 0xffffffff808d7676 in trap (frame=0xfffffe813a780b80)
at /usr/src/7.0/sys/arch/amd64/amd64/trap.c:304
#4 0xffffffff80100f26 in alltraps ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
#6 rn_walktree (h=<optimized out>,
#8 0xffffffff807fee7f in sysctl_rtable (name=0xfffffe813a780e5c,
namelen=<optimized out>, oldp=0x1bcad2000, oldlenp=0xfffffe813a780e48,
newp=<optimized out>, newlen=<optimized out>, oname=0xfffffe813a780e50,
l=0xfffffe8831a25240, rnode=0xfffffe813a5bf008)
at /usr/src/7.0/sys/net/rtsock.c:1417
at /usr/src/7.0/sys/kern/kern_sysctl.c:451
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
#11 0xffffffff808af2ca in sy_call (rval=0xfffffe813a780eb8,
uap=0xfffffe813a780f00, l=0xfffffe8831a25240,
sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:61
#12 sy_invoke (code=202, rval=0xfffffe813a780eb8, uap=0xfffffe813a780f00,
l=0xfffffe8831a25240, sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:85
#13 syscall (frame=0xfffffe813a780f00)
at /usr/src/7.0/sys/arch/x86/x86/syscall.c:156
#14 0xffffffff80100691 in Xsyscall ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
959 rn = rn->rn_l;
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
307 error = sysctl_dispatch(&name[0], SCARG(uap, namelen),
(gdb) print name[0]
$3 = 4
I have a core, any suggestions on what to get from gdb?
--
Stephen
Perhaps there were parallel accesses to the routing table and sysctl
touched a corrupted entry.
sysctl_rtable probably needs softnet_lock and/or KERNEL_LOCK.
Adding them around splsoftnet in sysctl_rtable would fix the panic.
I've been running with the following patch on netbsd-7 for a few months
with success. Is this applicable to HEAD? If so, should it be commmitted?
If not, I'll try to work out a way to pull this up to netbsd-7 without a
HEAD commit.

Index: sys/net/rtsock.c
===================================================================
RCS file: /cvsroot/src/sys/net/rtsock.c,v
retrieving revision 1.163.2.1
diff -u -r1.163.2.1 rtsock.c
--- sys/net/rtsock.c 28 Nov 2018 16:30:06 -0000 1.163.2.1
+++ sys/net/rtsock.c 8 Feb 2019 14:06:56 -0000
@@ -1408,6 +1408,8 @@
w.w_needed = 0 - w.w_given;
w.w_where = where;

+ mutex_enter(softnet_lock);
+ KERNEL_LOCK(1, NULL);
s = splsoftnet();
switch (w.w_op) {

@@ -1434,6 +1436,8 @@
break;
}
splx(s);
+ KERNEL_UNLOCK_ONE(NULL);
+ mutex_exit(softnet_lock);

/* check to see if we couldn't allocate memory with NOWAIT */
if (error == ENOBUFS && w.w_tmem == 0 && w.w_tmemneeded)
--
Stephen


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Ryota Ozaki
2019-02-19 02:56:21 UTC
Permalink
Post by Stephen Borrill
Post by Ryota Ozaki
Post by Stephen Borrill
(gdb) bt
at /usr/src/7.0/sys/arch/amd64/amd64/machdep.c:671
at /usr/src/7.0/sys/kern/subr_prf.c:256
#3 0xffffffff808d7676 in trap (frame=0xfffffe813a780b80)
at /usr/src/7.0/sys/arch/amd64/amd64/trap.c:304
#4 0xffffffff80100f26 in alltraps ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
#6 rn_walktree (h=<optimized out>,
#8 0xffffffff807fee7f in sysctl_rtable (name=0xfffffe813a780e5c,
namelen=<optimized out>, oldp=0x1bcad2000, oldlenp=0xfffffe813a780e48,
newp=<optimized out>, newlen=<optimized out>, oname=0xfffffe813a780e50,
l=0xfffffe8831a25240, rnode=0xfffffe813a5bf008)
at /usr/src/7.0/sys/net/rtsock.c:1417
at /usr/src/7.0/sys/kern/kern_sysctl.c:451
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
#11 0xffffffff808af2ca in sy_call (rval=0xfffffe813a780eb8,
uap=0xfffffe813a780f00, l=0xfffffe8831a25240,
sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:61
#12 sy_invoke (code=202, rval=0xfffffe813a780eb8, uap=0xfffffe813a780f00,
l=0xfffffe8831a25240, sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:85
#13 syscall (frame=0xfffffe813a780f00)
at /usr/src/7.0/sys/arch/x86/x86/syscall.c:156
#14 0xffffffff80100691 in Xsyscall ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
959 rn = rn->rn_l;
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
307 error = sysctl_dispatch(&name[0], SCARG(uap, namelen),
(gdb) print name[0]
$3 = 4
I have a core, any suggestions on what to get from gdb?
--
Stephen
Perhaps there were parallel accesses to the routing table and sysctl
touched a corrupted entry.
sysctl_rtable probably needs softnet_lock and/or KERNEL_LOCK.
Adding them around splsoftnet in sysctl_rtable would fix the panic.
I've been running with the following patch on netbsd-7 for a few months
with success. Is this applicable to HEAD? If so, should it be commmitted?
If not, I'll try to work out a way to pull this up to netbsd-7 without a
HEAD commit.
I'm sorry for late replying.

You can pull the diff up to nebsd-7 solely because HEAD needs the same
fix but it's better to fix it in a slightly different way for HEAD
(there are utility
macros for the locks in HEAD, but not in netbsd-7).

Thanks,
ozaki-r
Post by Stephen Borrill
Index: sys/net/rtsock.c
===================================================================
RCS file: /cvsroot/src/sys/net/rtsock.c,v
retrieving revision 1.163.2.1
diff -u -r1.163.2.1 rtsock.c
--- sys/net/rtsock.c 28 Nov 2018 16:30:06 -0000 1.163.2.1
+++ sys/net/rtsock.c 8 Feb 2019 14:06:56 -0000
@@ -1408,6 +1408,8 @@
w.w_needed = 0 - w.w_given;
w.w_where = where;
+ mutex_enter(softnet_lock);
+ KERNEL_LOCK(1, NULL);
s = splsoftnet();
switch (w.w_op) {
@@ -1434,6 +1436,8 @@
break;
}
splx(s);
+ KERNEL_UNLOCK_ONE(NULL);
+ mutex_exit(softnet_lock);
/* check to see if we couldn't allocate memory with NOWAIT */
if (error == ENOBUFS && w.w_tmem == 0 && w.w_tmemneeded)
--
Stephen
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Stephen Borrill
2019-02-22 11:56:03 UTC
Permalink
Post by Ryota Ozaki
Post by Stephen Borrill
Post by Ryota Ozaki
Post by Stephen Borrill
(gdb) bt
at /usr/src/7.0/sys/arch/amd64/amd64/machdep.c:671
at /usr/src/7.0/sys/kern/subr_prf.c:256
#3 0xffffffff808d7676 in trap (frame=0xfffffe813a780b80)
at /usr/src/7.0/sys/arch/amd64/amd64/trap.c:304
#4 0xffffffff80100f26 in alltraps ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
#6 rn_walktree (h=<optimized out>,
#8 0xffffffff807fee7f in sysctl_rtable (name=0xfffffe813a780e5c,
namelen=<optimized out>, oldp=0x1bcad2000, oldlenp=0xfffffe813a780e48,
newp=<optimized out>, newlen=<optimized out>, oname=0xfffffe813a780e50,
l=0xfffffe8831a25240, rnode=0xfffffe813a5bf008)
at /usr/src/7.0/sys/net/rtsock.c:1417
at /usr/src/7.0/sys/kern/kern_sysctl.c:451
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
#11 0xffffffff808af2ca in sy_call (rval=0xfffffe813a780eb8,
uap=0xfffffe813a780f00, l=0xfffffe8831a25240,
sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:61
#12 sy_invoke (code=202, rval=0xfffffe813a780eb8, uap=0xfffffe813a780f00,
l=0xfffffe8831a25240, sy=0xffffffff81043580 <sysent+3232>)
at /usr/src/7.0/sys/sys/syscallvar.h:85
#13 syscall (frame=0xfffffe813a780f00)
at /usr/src/7.0/sys/arch/x86/x86/syscall.c:156
#14 0xffffffff80100691 in Xsyscall ()
#5 0xffffffff807aece4 in rn_walknext (printer=0x0, arg=0x0,
rn=0x3a77b79e61ff6d35) at /usr/src/7.0/sys/net/radix.c:959
959 rn = rn->rn_l;
#10 0xffffffff80603724 in sys___sysctl (l=0xfffffe8831a25240,
uap=0xfffffe813a780f00, retval=<optimized out>)
at /usr/src/7.0/sys/kern/kern_sysctl.c:307
307 error = sysctl_dispatch(&name[0], SCARG(uap, namelen),
(gdb) print name[0]
$3 = 4
I have a core, any suggestions on what to get from gdb?
--
Stephen
Perhaps there were parallel accesses to the routing table and sysctl
touched a corrupted entry.
sysctl_rtable probably needs softnet_lock and/or KERNEL_LOCK.
Adding them around splsoftnet in sysctl_rtable would fix the panic.
I've been running with the following patch on netbsd-7 for a few months
with success. Is this applicable to HEAD? If so, should it be commmitted?
If not, I'll try to work out a way to pull this up to netbsd-7 without a
HEAD commit.
I'm sorry for late replying.
You can pull the diff up to nebsd-7 solely because HEAD needs the same
fix but it's better to fix it in a slightly different way for HEAD
(there are utility
macros for the locks in HEAD, but not in netbsd-7).
Thanks, pullup to netbsd-7 has been requested. Shall I leave you to deal
with HEAD? Which is the best approach for -8?
Post by Ryota Ozaki
Post by Stephen Borrill
Index: sys/net/rtsock.c
===================================================================
RCS file: /cvsroot/src/sys/net/rtsock.c,v
retrieving revision 1.163.2.1
diff -u -r1.163.2.1 rtsock.c
--- sys/net/rtsock.c 28 Nov 2018 16:30:06 -0000 1.163.2.1
+++ sys/net/rtsock.c 8 Feb 2019 14:06:56 -0000
@@ -1408,6 +1408,8 @@
w.w_needed = 0 - w.w_given;
w.w_where = where;
+ mutex_enter(softnet_lock);
+ KERNEL_LOCK(1, NULL);
s = splsoftnet();
switch (w.w_op) {
@@ -1434,6 +1436,8 @@
break;
}
splx(s);
+ KERNEL_UNLOCK_ONE(NULL);
+ mutex_exit(softnet_lock);
/* check to see if we couldn't allocate memory with NOWAIT */
if (error == ENOBUFS && w.w_tmem == 0 && w.w_tmemneeded)
--
Stephen
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Ryota Ozaki
2019-02-24 07:36:59 UTC
Permalink
On Fri, Feb 22, 2019 at 8:56 PM Stephen Borrill <***@precedence.co.uk> wrote:
(snip)
Post by Stephen Borrill
Thanks, pullup to netbsd-7 has been requested.
Thanks!
Post by Stephen Borrill
Shall I leave you to deal
with HEAD? Which is the best approach for -8?
Yes, I'll handle it. The fix for -8 will be the same as HEAD.

ozaki-r

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...