Discussion:
Using multiple interfaces (3+) causes intermittent system freezes on NetBSD 5.1_STABLE and later... [was: NetBSD 5 partly freezing, may be related to 802.1Q]
(too old to reply)
Greg Troxel
2012-08-15 11:55:26 UTC
Permalink
Fredrik Pettai <***@nordu.net> writes:

[system with 3 NICs freezes, may be vlan, ??]
I haven't found any suitable PR for this problem, so maybe you or I
should file one?
The sad thing is that we can't get any valuable debugging information,
as the system becomes unresponsive...
I know of two problems, and likely neither is the one you are having:

On four-port wm cards, there is some sort of pci-pci bridge (to two
dual wm chips), and netbsd-5 and netbsd-6 both (as of early 2012) fail
to cope, causing some sort of inscrutable hard lockup. This is not
about having 4 wm interfaces; it's specifically about the quad-port
PCI-E cards with a PCI-PCI bridge and 2 chips, each of which is a
2-function PCI device.

In if_bnx.c, there is incorrect handling of failure to get a
replacement mbuf on receive, leading to loss of lots of mbufs. We
have a fix and it's on my todo list to extract it and commit it to
current. I do not know of anyone else hitting this. This is not
related to multiple interfaces, but can be more likely triggered since
a) traffic on multiple interfaces increases mbuf pressure and b) bnx
allocates 510 cluster mbufs per interface, even with no traffic,
increasing mbuf pressure. I am not aware of anyone else hitting this
problem; we've been pushing multiple-interface machines pretty hard.
Fredrik Pettai
2012-08-15 12:15:20 UTC
Permalink
Post by Greg Troxel
[system with 3 NICs freezes, may be vlan, ??]
I haven't found any suitable PR for this problem, so maybe you or I
should file one?
The sad thing is that we can't get any valuable debugging information,
as the system becomes unresponsive...
[...]
In if_bnx.c, there is incorrect handling of failure to get a
replacement mbuf on receive, leading to loss of lots of mbufs. We
have a fix and it's on my todo list to extract it and commit it to
current. I do not know of anyone else hitting this. This is not
related to multiple interfaces, but can be more likely triggered since
a) traffic on multiple interfaces increases mbuf pressure and b) bnx
allocates 510 cluster mbufs per interface, even with no traffic,
increasing mbuf pressure. I am not aware of anyone else hitting this
problem; we've been pushing multiple-interface machines pretty hard.
Actually, the multiple interfaces our system is using is of the bnx type (a quad card).
However, we haven't seen any problem with using just 1 or 2 interfaces (yet)... but this system has been running for at least 2 years or so.

I would be happy to try out the patch, if it's possible.
(And it would also be nice if this patch could be pulled up before NetBSD 6.0 RELEASE is tagged.)

/P
Fredrik Pettai
2012-08-16 08:50:37 UTC
Permalink
I will try to dig it out reasonably quickly, but it's non-trivial and I
can't publish the repository it's in.
If your systems is running fine with 1 or 2 interfaces on the quad-port
card, you are likely not having the problem the patch fixes. The
symptoms of the bug are that no packets can be received on an interface
(because there are no mbufs on the receive ring, and none free), but the
system mostly works otherwise.
To my surprise, the server got unresponsive again yesterday...(and it's currently just using a single interface).
I applied your patch (thanks btw.) and rebooted the machine. I'll patch up the other interfaces later today, to speed up the process of triggering the problem we've been seeing...

Re,
/P
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Fredrik Pettai
2012-11-21 13:56:54 UTC
Permalink
I can't find a PR on this issue. Shouldn't we create one?
It's now reported this in PR kern/47229

I have a machine so I can help testing out any patches (and doing whatever debugging that can be done). I guess verbose debug output for the driver is needed, since the nature of the problem there isn't much one can do to debug after the system freezes, as it becomes totally unresponsive.

Re,
/P
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...