Discussion:
Locking question
(too old to reply)
Mark Keaton
2013-06-10 18:13:29 UTC
Permalink
I am trying to understand the locking strategy used in the following
functions on NetBSD 6:

ip_input()
ip_output()
ip6_input()
ip6_output()
ip6_forward()

There are several cases where these functions are called with the
"softnet_lock" mutex locked, and I assumed that this would always be true.

However, I added a KASSERT(mutex_owned(softnet_lock)) call to these
functions, ran a test, and found that there is at least one case where
this assertion fails. Here is the call chain that caused the failure:

ip6_output()
mld_sendpkt()
mld_start_listening()
in6_addmulti()
in6_joingroup()
in6_update_ifa1()
in6_update_ifa()
in6_control1()
in6_control()
udp6_usrreq_wrapper()
compat_ifioctl()
ifioctl()
soo_ioctl()
sys_ioctl()
sy_call()
syscall()

So, I am wondering if anyone knows if this is a bug in the kernel (i.e.,
"softnet_lock" should have been locked before calling ip6_output())?
Or, am I not understanding the real locking strategy used by these
functions?

Any help is appreciated.

-Mark

---
Mark Keaton
***@bbn.com

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Mindaugas Rasiukevicius
2013-06-10 20:06:35 UTC
Permalink
Post by Mark Keaton
So, I am wondering if anyone knows if this is a bug in the kernel (i.e.,
"softnet_lock" should have been locked before calling ip6_output())?
Or, am I not understanding the real locking strategy used by these
functions?
It should be locked by softnet_lock. Unfortunately, entry/exit paths into
the stack (and different parts of the stack, like in your example) are not
very well contained in this respect; softnet_lock is a big lock covering
paths across multiple layers, so it is easy to end up with a problem like
this. The ultimate goal is to dismantle softnet_lock.
--
Mindaugas

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2013-06-10 20:15:19 UTC
Permalink
Post by Mindaugas Rasiukevicius
Post by Mark Keaton
So, I am wondering if anyone knows if this is a bug in the kernel (i.e.,
"softnet_lock" should have been locked before calling ip6_output())?
Or, am I not understanding the real locking strategy used by these
functions?
It should be locked by softnet_lock. Unfortunately, entry/exit paths into
the stack (and different parts of the stack, like in your example) are not
very well contained in this respect; softnet_lock is a big lock covering
paths across multiple layers, so it is easy to end up with a problem like
this. The ultimate goal is to dismantle softnet_lock.
So should we be adding a comment to -current (and -6) about this, and
fixing the current lack of locking? I understand that you're moving to
more fine-grained locking without softnet_lock, but it seems good to
make the current code reliable.

How about a commented-out KASSERT for now in ip6_output?
Greg Troxel
2013-06-17 17:55:21 UTC
Permalink
Following up on Mark's query where ip6_output was called without
softnet_lock via a callchain of:

ip6_output()
mld_sendpkt()
mld_start_listening()
in6_addmulti()
in6_joingroup()
in6_update_ifa1()
in6_update_ifa()
in6_control1()
in6_control()
udp6_usrreq_wrapper()
compat_ifioctl()
ifioctl()
soo_ioctl()
sys_ioctl()
sy_call()
syscall()

I looked at where softnet_lock is and isn't acquired in the kernel. I
think this is a simple case of mld_start_listening failing to acquire
the lock.

So I'd suggest just acquiring/releasing at start/end of
mld_{start,stop}_listening, as shown below. However,
mld_start_listening can call mld_sendpkt, which acquires softnet_lock.
And, in6_addmulti does splsoftnet, but doesn't acquire softnet_lock.

From the mutex man page, it seems that it's not acceptable to enter a
mutex that one already holds.

It's also not clear to me why the kernel lock is involved.

wrong patch that will mutex_enter an already held mutex:

--- mld6.c.~1.55.~ 2011-11-21 16:32:25.000000000 -0500
+++ mld6.c 2013-06-17 13:34:43.000000000 -0400
@@ -271,6 +271,9 @@ mld_start_listening(struct in6_multi *in
{
struct in6_addr all_in6;

+ mutex_enter(softnet_lock);
+ KERNEL_LOCK(1, NULL);
+
/*
* RFC2710 page 10:
* The node never sends a Report or Done for the link-scope all-nodes
@@ -296,6 +299,9 @@ mld_start_listening(struct in6_multi *in

mld_starttimer(in6m);
}
+
+ KERNEL_UNLOCK_ONE(NULL);
+ mutex_exit(softnet_lock);
}

static void
@@ -303,15 +309,18 @@ mld_stop_listening(struct in6_multi *in6
{
struct in6_addr allnode, allrouter;

+ mutex_enter(softnet_lock);
+ KERNEL_LOCK(1, NULL);
+
allnode = in6addr_linklocal_allnodes;
if (in6_setscope(&allnode, in6m->in6m_ifp, NULL)) {
/* XXX: this should not happen! */
- return;
+ goto out;
}
allrouter = in6addr_linklocal_allrouters;
if (in6_setscope(&allrouter, in6m->in6m_ifp, NULL)) {
/* XXX impossible */
- return;
+ goto out;
}

if (in6m->in6m_state == MLD_IREPORTEDLAST &&
@@ -320,6 +329,10 @@ mld_stop_listening(struct in6_multi *in6
IPV6_ADDR_SCOPE_INTFACELOCAL) {
mld_sendpkt(in6m, MLD_LISTENER_DONE, &allrouter);
}
+
+ out:
+ KERNEL_UNLOCK_ONE(NULL);
+ mutex_exit(softnet_lock);
}

void

Loading...