Manuel Bouyer
2014-04-09 15:11:51 UTC
Hello,
as reported on current-users@ some time ago, I experience mbuf leaks
on a router running netbsd-6-1 of december. It's not a slow, continous leak
but a very sudden one:
in regular use netstat -m reports between 800 and 3000 mbufs in use,
and it can run for days. But very suddenly, in a few minutes the number of
mbufs can rise up to the limit (which I raised to 512k mbclusters).
I suspected it was associated with a specific traffic, eventually associated
with ipf. I suspected this because it seems it started to show up after
a ipf rules changes, which caused ipf to return icmp errors to ntp scans,
and MBUFTRACE showed that the lost mbufs were in output queues.
The attached patch does 2 things.
The first one is to add KASSERT(KERNEL_LOCKED_P()) at strategic places,
where we're about to touch the output queues. This is because the
output queues are protected by spl() calls only, so the kernel lock
must be held to protect them.
As the KASSERT() did fire, I then added KERNEL_LOCK()/UNLOCK() where needed.
I found that ifp_ouput() was being called without KERNEL_LOCK() from
ipf(4), and carp(4) so far.
I'm now running with this patch for 2 days without apparent problem, and
the leak didn't show up (but it's too soon to say if the issue is
fixed or not).
So, 2 questions:
- is my analysis right ?
- if so, should I commit the patch as is ?
as reported on current-users@ some time ago, I experience mbuf leaks
on a router running netbsd-6-1 of december. It's not a slow, continous leak
but a very sudden one:
in regular use netstat -m reports between 800 and 3000 mbufs in use,
and it can run for days. But very suddenly, in a few minutes the number of
mbufs can rise up to the limit (which I raised to 512k mbclusters).
I suspected it was associated with a specific traffic, eventually associated
with ipf. I suspected this because it seems it started to show up after
a ipf rules changes, which caused ipf to return icmp errors to ntp scans,
and MBUFTRACE showed that the lost mbufs were in output queues.
The attached patch does 2 things.
The first one is to add KASSERT(KERNEL_LOCKED_P()) at strategic places,
where we're about to touch the output queues. This is because the
output queues are protected by spl() calls only, so the kernel lock
must be held to protect them.
As the KASSERT() did fire, I then added KERNEL_LOCK()/UNLOCK() where needed.
I found that ifp_ouput() was being called without KERNEL_LOCK() from
ipf(4), and carp(4) so far.
I'm now running with this patch for 2 days without apparent problem, and
the leak didn't show up (but it's too soon to say if the issue is
fixed or not).
So, 2 questions:
- is my analysis right ?
- if so, should I commit the patch as is ?
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--