matthew green
2019-07-21 05:17:18 UTC
hi folks..
i've been debugging a hang on the rock64. it's fairly easy to
trigger -- send a lot of data at it.
from ddb i would usually see one cpu with an lwp, usually the
idle lwp, fast lwp switched to softnet, and again fast switched
to the softser lwp. it seemed to be a kernel lock issue as the
kernel lock was held and at least one thread was waiting for
it. i couldn't really tell what was up.
i tried enabling NET_MPSAFE (which changes the behaviour of
awge(4) / dwc_gmac.c, beyond the network stack.) that kernel
ran for a lot longer, but ended up locking up again, this time
the rt_lock was being waited upon. but again, i couldn't find
where it was held or what context should be giving it up, though
i did again think about arm's pic_dispatch() being the last
lock and unlock of kernel_lock. then i realised that even with
NET_MPSAFE, awge(4)'s frontends don't setup MPSAFE interrupts.
with a kernel patched to do that under NET_MPSAFE i've had over
5 hours of heavy network access without a hang.
i don't know what is the underlying issue here. it could be
some network stack bug, it could be an awge/gmac bug, it could
be an arm or arm64 bug..
anyone have a clue where to investigate next? alternatively,
how far off is NET_MPSAFE default? :)
.mrg.
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
i've been debugging a hang on the rock64. it's fairly easy to
trigger -- send a lot of data at it.
from ddb i would usually see one cpu with an lwp, usually the
idle lwp, fast lwp switched to softnet, and again fast switched
to the softser lwp. it seemed to be a kernel lock issue as the
kernel lock was held and at least one thread was waiting for
it. i couldn't really tell what was up.
i tried enabling NET_MPSAFE (which changes the behaviour of
awge(4) / dwc_gmac.c, beyond the network stack.) that kernel
ran for a lot longer, but ended up locking up again, this time
the rt_lock was being waited upon. but again, i couldn't find
where it was held or what context should be giving it up, though
i did again think about arm's pic_dispatch() being the last
lock and unlock of kernel_lock. then i realised that even with
NET_MPSAFE, awge(4)'s frontends don't setup MPSAFE interrupts.
with a kernel patched to do that under NET_MPSAFE i've had over
5 hours of heavy network access without a hang.
i don't know what is the underlying issue here. it could be
some network stack bug, it could be an awge/gmac bug, it could
be an arm or arm64 bug..
anyone have a clue where to investigate next? alternatively,
how far off is NET_MPSAFE default? :)
.mrg.
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de