Discussion:
network-related deadlock
(too old to reply)
Maxime Villard
2018-01-26 14:20:04 UTC
Permalink
There appears to be a network deadlock somewhere in the kernel. I've narrowed
the issue down to the following situation: if you have a NetBSD vm on
VirtualBox, with one CPU, and one enabled network card that is not attached
(Settings->Network->Adapter_1->Attached_To = Not attached), the kernel freezes
~ten seconds after booting.

I can log in, type a few commands, and then the keyboard does not answer anymore
and system deadlocks (no pings either).

I've disassembled %rip, it points to x86_pause(). So there must be a deadlock.

If the card settings are switched to "Bridged Adapter" there is no deadlock.

A kernel from December 27 works fine.

Maxime

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Valery Ushakov
2018-01-26 15:05:12 UTC
Permalink
Post by Maxime Villard
There appears to be a network deadlock somewhere in the kernel. I've
narrowed the issue down to the following situation: if you have a
NetBSD vm on VirtualBox, with one CPU, and one enabled network card
that is not attached (Settings->Network->Adapter_1->Attached_To =
Not attached), the kernel freezes ~ten seconds after booting.
Don't discount the possibility of bugs in vbox e1000 emulation :).
The kernel still shouldn't lock up in that case, but it would be easy
to miss this on real, well-behaved hardware.

-uwe


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Ryota Ozaki
2018-01-29 08:52:59 UTC
Permalink
Post by Maxime Villard
There appears to be a network deadlock somewhere in the kernel. I've narrowed
the issue down to the following situation: if you have a NetBSD vm on
VirtualBox, with one CPU, and one enabled network card that is not attached
(Settings->Network->Adapter_1->Attached_To = Not attached), the kernel freezes
~ten seconds after booting.
I can log in, type a few commands, and then the keyboard does not answer anymore
and system deadlocks (no pings either).
I've disassembled %rip, it points to x86_pause(). So there must be a deadlock.
If the card settings are switched to "Bridged Adapter" there is no deadlock.
A kernel from December 27 works fine.
I could reproduce the issue (not sure the same one). I used VirtualBox 5.2.6
on macOS Sierra. The issue also happened on a two core system. It happened
with a network adapter with "Not attached" configuration, and didn't happen
with "NAT" configuration.

I tried four network adapters, Intel 82540, 82543, 82545 ana virtio.
With 82540, the system hanged but I could enter the DDB and got a stack trace:
db{0}> bt/a ffffe4007fbf0860
trace: pid 0 lid 5 at 0xffff800043b2bcc8
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x746
handle_ioapic_edge8() at netbsd:handle_ioapic_edge8+0x66
wm_watchdog() at netbsd:wm_watchdog+0x3c
if_slowtimo() at netbsd:if_slowtimo+0x6d
callout_softclock() at netbsd:callout_softclock+0x41c
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xffff800043b2bff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:

With 82543 and 82545, I couldn't enter the DDB.

With virtio, the system didn't hang but dhcpcd had stuck in the virtio driver:
db{0}> bt/a ffffe4007effa660
trace: pid 116 lid 1 at 0xffff800044a8bb30
sleepq_block() at netbsd:sleepq_block+0x97
cv_wait() at netbsd:cv_wait+0xfb
vioif_ctrl_rx() at netbsd:vioif_ctrl_rx+0x1cb
vioif_init() at netbsd:vioif_init+0xf7
vioif_ioctl() at netbsd:vioif_ioctl+0x2f
doifioctl() at netbsd:doifioctl+0x824
sys_ioctl() at netbsd:sys_ioctl+0x101
syscall() at netbsd:syscall+0x1d8
--- syscall (number 54) ---
73339f71a26a:

Then I broke the boot loader of my system and I had no progress after that...

ozaki-r

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Valery Ushakov
2018-01-29 21:13:20 UTC
Permalink
Post by Maxime Villard
I've disassembled %rip, it points to x86_pause(). So there must be a deadlock.
Forgot to mention. VirtualBox has a built-in debugger (if quirky and
idiosyncratic). See

https://www.virtualbox.org/manual/ch12.html#ts_debugger


You can use

$ VirtualBox --startvm $vmname --debug-command-line

to start the VM paused.

To get the symbols you can tell the debugger where the image is,
e.g. for i386:

loadimage "/path/to/host/location/of/netbsd" c0100000

You can unpause the VM from the "Machine" menu and examine its state
when it locks up. You may boot the VM normally, create a snapshot and
then switch the network card attachment to "Not attached" to trigger
the problem. Then you can close the VM and select "Poweroff" and tick
"Restore ..." to quickly get back to the snapshot state.

-uwe


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Valery Ushakov
2018-01-30 09:47:27 UTC
Permalink
Post by Maxime Villard
There appears to be a network deadlock somewhere in the kernel. I've
narrowed the issue down to the following situation: if you have a
NetBSD vm on VirtualBox, with one CPU, and one enabled network card
that is not attached (Settings->Network->Adapter_1->Attached_To =
Not attached), the kernel freezes ~ten seconds after booting.
if_wm.c rev 1.562 by knakahara@ seems to have fixed it.

-uwe


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...