Discussion:
[Greg Troxel] Dell R610 lockups?
(too old to reply)
Greg Troxel
2010-08-25 00:47:18 UTC
Permalink
It's not clear if this is an amd64 issue or a network code issue, so
pointing it out here.
Christos Zoulas
2010-08-25 12:20:21 UTC
Permalink
-=-=-=-=-=-
It's not clear if this is an amd64 issue or a network code issue, so
pointing it out here.
-=-=-=-=-=-
-=-=-=-=-=-
Some colleagues at BBN have several Dell R610s, purchased fairly
recently. They've been experiencing total hangs, from which they can
recover only with the power button (hold 4s). ctrl-alt-esc works to get
into DDB, but after the hang ctrl-alt-esc does nothing. The Dell boxes
are pretty normal, with single SATA disks, 4 on-board bnx and a 4-port
wm.
The lockup happens with a netbsd-5 (RC3 I think) install cd for amd64
after doing an install and running from disk. They haven't tried i386.
It's not exactly clear what triggers the hang, but it seems to be
network traffic, with ping (sourcing and sinking) being worse than
forwarding. A fairly reliable way to hose the machines is to hook up a
cat5 between two of them, ifconfig some addresses, and ping -f across
that. RTT is an impressive 40us, but a lockup usually happens within 20
minutes. Using a switch seems to make the hang less likely. So I
wondered about a locking error triggered by tx complete interrupts
arriving in the middle of processing the next received packet.
I suggested using LOCKDEBUG (and DIAGNOSTIC and DEBUG). That runs ok
until it hangs :-) Can one enter DDB if the big kernel lock is taken and
not released?
The machines were updated to the latest Dell BIOS; apparently there's a
dell advisory about a xeon firmware bug that results in windows
bluescreens.
Other than the lockup the machines are acting fine. So I wonder if the
machines are buggy, or if there's a locking bug.
I have not seen any postings about trouble with this kind of lockup in
NetBSD, and there are some posts of trouble with Linux on these Dell
machines.
If someone has two beefy machines with bnx or wm and has a few minutes
to connect them with a cable, I'd be very curious to see what happens
after ping -f for several hours.
Has anyone else had similar trouble? Any clues of what to try?
Boot linux on both and try the same test.

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...