Discussion:
crashes in nd6_llinfo_timer...
(too old to reply)
Christos Zoulas
2020-03-28 02:09:33 UTC
Permalink
I filed http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=55030
and I've been trying to fix it myself since my machine crashes every
couple of days... This is working for me so far, and unless someone
comes up with something better, I will commit it soon (with out the
messages).

Yes, it does work:

[10:08pm] 30>dmesg -T | grep oof
[Thu Mar 26 16:43:51 EDT 2020] in6_lltable_lookup: oof did not crash
[10:08pm] 31>uptime
10:09PM up 3 days, 2:23, 1 user, load averages: 0.00, 0.00, 0.00

christos


Index: in6.c
===================================================================
RCS file: /cvsroot/src/sys/netinet6/in6.c,v
retrieving revision 1.277
diff -u -u -r1.277 in6.c
--- in6.c 20 Jan 2020 18:38:22 -0000 1.277
+++ in6.c 28 Mar 2020 02:08:01 -0000
@@ -2652,10 +2652,14 @@
if (lle == NULL)
return NULL;

- if (flags & LLE_EXCLUSIVE)
- LLE_WLOCK(lle);
- else
- LLE_RLOCK(lle);
+ LLE_RLOCK(lle);
+ if (flags & LLE_EXCLUSIVE) {
+ if (!LLE_TRY_UPGRADE(lle)) {
+ LLE_RUNLOCK(lle);
+ printf("%s: oof did not crash\n", __func__);
+ return NULL;
+ }
+ }
return lle;
}

Index: nd6.c
===================================================================
RCS file: /cvsroot/src/sys/netinet6/nd6.c,v
retrieving revision 1.267
diff -u -u -r1.267 nd6.c
--- nd6.c 9 Mar 2020 21:20:56 -0000 1.267
+++ nd6.c 28 Mar 2020 02:08:02 -0000
@@ -466,7 +466,12 @@

SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();

- LLE_WLOCK(ln);
+ LLE_RLOCK(ln);
+ if (!LLE_TRY_UPGRADE(ln)) {
+ LLE_RUNLOCK(ln);
+ printf("%s: oof did not crash\n", __func__);
+ goto out;
+ }
if ((ln->la_flags & LLE_LINKED) == 0)
goto out;
if (ln->ln_ntick > 0) {

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Ryota Ozaki
2020-03-30 08:21:13 UTC
Permalink
Hi christos,

I'm sorry for late replying.

It seems that the panic occurs because icmp6_error2 is called with holding
LLE_WLOCK. So I think that a fix should avoid such a situation by say
moving icmp6_error2 to somewhere out of the lock.

Thanks,
ozaki-r
Post by Christos Zoulas
I filed http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=55030
and I've been trying to fix it myself since my machine crashes every
couple of days... This is working for me so far, and unless someone
comes up with something better, I will commit it soon (with out the
messages).
[10:08pm] 30>dmesg -T | grep oof
[Thu Mar 26 16:43:51 EDT 2020] in6_lltable_lookup: oof did not crash
[10:08pm] 31>uptime
10:09PM up 3 days, 2:23, 1 user, load averages: 0.00, 0.00, 0.00
christos
Index: in6.c
===================================================================
RCS file: /cvsroot/src/sys/netinet6/in6.c,v
retrieving revision 1.277
diff -u -u -r1.277 in6.c
--- in6.c 20 Jan 2020 18:38:22 -0000 1.277
+++ in6.c 28 Mar 2020 02:08:01 -0000
@@ -2652,10 +2652,14 @@
if (lle == NULL)
return NULL;
- if (flags & LLE_EXCLUSIVE)
- LLE_WLOCK(lle);
- else
- LLE_RLOCK(lle);
+ LLE_RLOCK(lle);
+ if (flags & LLE_EXCLUSIVE) {
+ if (!LLE_TRY_UPGRADE(lle)) {
+ LLE_RUNLOCK(lle);
+ printf("%s: oof did not crash\n", __func__);
+ return NULL;
+ }
+ }
return lle;
}
Index: nd6.c
===================================================================
RCS file: /cvsroot/src/sys/netinet6/nd6.c,v
retrieving revision 1.267
diff -u -u -r1.267 nd6.c
--- nd6.c 9 Mar 2020 21:20:56 -0000 1.267
+++ nd6.c 28 Mar 2020 02:08:02 -0000
@@ -466,7 +466,12 @@
SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
- LLE_WLOCK(ln);
+ LLE_RLOCK(ln);
+ if (!LLE_TRY_UPGRADE(ln)) {
+ LLE_RUNLOCK(ln);
+ printf("%s: oof did not crash\n", __func__);
+ goto out;
+ }
if ((ln->la_flags & LLE_LINKED) == 0)
goto out;
if (ln->ln_ntick > 0) {
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2020-03-30 13:12:24 UTC
Permalink
Thanks, I moved to to the end of the function.

christos
Post by Ryota Ozaki
Hi christos,
I'm sorry for late replying.
It seems that the panic occurs because icmp6_error2 is called with holding
LLE_WLOCK. So I think that a fix should avoid such a situation by say
moving icmp6_error2 to somewhere out of the lock.
Thanks,
ozaki-r
Post by Christos Zoulas
I filed http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=55030
and I've been trying to fix it myself since my machine crashes every
couple of days... This is working for me so far, and unless someone
comes up with something better, I will commit it soon (with out the
messages).
[10:08pm] 30>dmesg -T | grep oof
[Thu Mar 26 16:43:51 EDT 2020] in6_lltable_lookup: oof did not crash
[10:08pm] 31>uptime
10:09PM up 3 days, 2:23, 1 user, load averages: 0.00, 0.00, 0.00
christos
Index: in6.c
===================================================================
RCS file: /cvsroot/src/sys/netinet6/in6.c,v
retrieving revision 1.277
diff -u -u -r1.277 in6.c
--- in6.c 20 Jan 2020 18:38:22 -0000 1.277
+++ in6.c 28 Mar 2020 02:08:01 -0000
@@ -2652,10 +2652,14 @@
if (lle == NULL)
return NULL;
- if (flags & LLE_EXCLUSIVE)
- LLE_WLOCK(lle);
- else
- LLE_RLOCK(lle);
+ LLE_RLOCK(lle);
+ if (flags & LLE_EXCLUSIVE) {
+ if (!LLE_TRY_UPGRADE(lle)) {
+ LLE_RUNLOCK(lle);
+ printf("%s: oof did not crash\n", __func__);
+ return NULL;
+ }
+ }
return lle;
}
Index: nd6.c
===================================================================
RCS file: /cvsroot/src/sys/netinet6/nd6.c,v
retrieving revision 1.267
diff -u -u -r1.267 nd6.c
--- nd6.c 9 Mar 2020 21:20:56 -0000 1.267
+++ nd6.c 28 Mar 2020 02:08:02 -0000
@@ -466,7 +466,12 @@
SOFTNET_KERNEL_LOCK_UNLESS_NET_MPSAFE();
- LLE_WLOCK(ln);
+ LLE_RLOCK(ln);
+ if (!LLE_TRY_UPGRADE(ln)) {
+ LLE_RUNLOCK(ln);
+ printf("%s: oof did not crash\n", __func__);
+ goto out;
+ }
if ((ln->la_flags & LLE_LINKED) == 0)
goto out;
if (ln->ln_ntick > 0) {
Loading...