Discussion:
checking m->m_pkthdr.csum_flags in ip_output()
(too old to reply)
Takahiro Kambe
2008-04-15 11:32:16 UTC
Permalink
Hi,

Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
forwarding IPv4 multicast packet. The packet was short (36 octets)
UDP/IP pakcet.

#0 0xc0424c95 in cpu_reboot (howto=0x0, bootstr=0x0)
at ../../../../arch/i386/i386/machdep.c:896
#1 0xc039f588 in panic (
fmt=0xc062ba50 "ip_output: conflicting checksum offload flags: %d")
at ../../../../kern/subr_prf.c:246
#2 0xc012908b in ip_output (m0=0xc1f84600)
at ../../../../netinet/ip_output.c:246
#3 0xc0124332 in tbf_send_packet (vifp=0xc06d97d0, m=0x0)
at ../../../../netinet/ip_mroute.c:2222
#4 0xc01251f4 in ip_mdq (m=0xc1a33100, ifp=<value optimized out>,
rt=0xc1af3d00) at ../../../../netinet/ip_mroute.c:1885
#5 0xc0122582 in ip_input (m=0xc1a33100) at ../../../../netinet/ip_input.c:780
#6 0xc0122884 in ipintr () at ../../../../netinet/ip_input.c:471
#7 0xc010bcb5 in Xsoftnet ()

The kernel has DIAGNOSTIC option enabled and corresponding code
fragments in ip_output().

#ifdef DIAGNOSTIC
if ((m->m_flags & M_PKTHDR) == 0)
panic("ip_output: no HDR");

if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv6|M_CSUM_UDPv6)) != 0) {
panic("ip_output: IPv6 checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}

if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv4|M_CSUM_UDPv4)) ==
(M_CSUM_TCPv4|M_CSUM_UDPv4)) {
panic("ip_output: conflicting checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
#endif

It seems that this diagnostic code checking M_CSUM_TCPv4 and
M_CSUM_UDPv4 are exclusive one.

I got crash dump and examine with gdb, above mbuf contains such value:

$1 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xcb336810 "E",
mh_owner = 0x0, mh_len = 0x24, mh_flags = 0x9000203,
mh_paddr = 0x3942a100, mh_type = 0x1}, M_dat = {MH = {MH_pkthdr = {
rcvif = 0xc1b5e03c, tags = {slh_first = 0x0}, len = 0x24,
csum_flags = 0x8000004b, csum_data = 0x4210, segsz = 0x0}, MH_dat = {
...

I don't exactly know where this csum_flags was set to 0x8000004b:

M_CSUM_NO_PSEUDOHDR | M_CSUM_IPv4 | M_CSUM_DATA | M_CSUM_UDPv4 |M_CSUM_TCPv4

And this packet was recived by bge0 and if_bge.c has such code
fragment in bge_rxeof().

/*
* Rx transport checksum-offload may also
* have bugs with packets which, when transmitted,
* were `runts' requiring padding.
*/
if (cur_rx->bge_flags & BGE_RXBDFLAG_TCP_UDP_CSUM &&
(/* (sc->_bge_quirks & BGE_QUIRK_SHORT_CKSUM_BUG) == 0 ||*/
m->m_pkthdr.len >= ETHER_MIN_NOPAD)) {
m->m_pkthdr.csum_data =
cur_rx->bge_tcp_udp_csum;
m->m_pkthdr.csum_flags |=
(M_CSUM_TCPv4|M_CSUM_UDPv4|
M_CSUM_DATA|M_CSUM_NO_PSEUDOHDR);
}


But the packet was too short to set csum_flags here.


My question is:

- Is diagnostic code in ip_output() correct?
- How can I investigate the origin of this problem?

I can still access crash dump but the machine is running at my
customer with stopping mrouted.


Thanks in your advice.
--
Takahiro Kambe <***@back-street.net>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Takahiro Kambe
2008-05-04 03:33:05 UTC
Permalink
Hi,

In message <***@back-street.net>
on Tue, 15 Apr 2008 20:32:16 +0900 (JST),
Post by Takahiro Kambe
Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
forwarding IPv4 multicast packet. The packet was short (36 octets)
UDP/IP pakcet.
...
Post by Takahiro Kambe
The kernel has DIAGNOSTIC option enabled and corresponding code
fragments in ip_output().
#ifdef DIAGNOSTIC
if ((m->m_flags & M_PKTHDR) == 0)
panic("ip_output: no HDR");
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv6|M_CSUM_UDPv6)) != 0) {
panic("ip_output: IPv6 checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv4|M_CSUM_UDPv4)) ==
(M_CSUM_TCPv4|M_CSUM_UDPv4)) {
panic("ip_output: conflicting checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
#endif
It seems that this diagnostic code checking M_CSUM_TCPv4 and
M_CSUM_UDPv4 are exclusive one.
I confirmed that bge(4) sets both M_CSUM_TCPv4 and M_CSUM_UDPv4 to
m->m_pkthdr.csum_flags with usual unicast IP packets.

I don't know it is bug of bge(4) or above DIAGNOSTIC is wrong or
obsolete.
--
Takahiro Kambe <***@back-street.net>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2008-05-04 16:53:07 UTC
Permalink
Post by Takahiro Kambe
Hi,
on Tue, 15 Apr 2008 20:32:16 +0900 (JST),
Post by Takahiro Kambe
Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
forwarding IPv4 multicast packet. The packet was short (36 octets)
UDP/IP pakcet.
This is a bug in the multicast forwarding code (it is related to a bug
I have been investigating in ipf, pf, and bridge). Look at ip_forward()
and ip_flow(): on NetBSD, if you forward a packet using the same mbuf
in which it was received, you must set csum_flags to 0 before handing
that packet to ip_output.

This is because the same flags were used for "hardware checked checksum
on receive" and "hardware should insert checksum on transmit" which, in
my opinion, was a mistake. The existing M_CSUM_DATA and M_CSUM_NO_PSEUDOHDR
would have been sufficient for receive, leaving M_CSUM_TCPv4 (etc.) for
transmit use.

As it is now, if you receive such a packet and forward it without looking
inside the UDP or TCP layer, you can in fact cause the hardware to stamp
a *good* checksum on a packet which had a *bad* one when received, because
you won't check for M_CSUM_TCPUDP_BAD, but will send the packet into
ip_output() with M_CSUM_TCPv4 or M_CSUM_UDPv4 set, because that's how it
was received.

Anyway, all code forwarding packets on NetBSD must explicitly set
csum_flags to 0 after receive because of this.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Takahiro Kambe
2008-05-08 05:14:19 UTC
Permalink
Hi,

In message <***@panix.com>
on Sun, 4 May 2008 12:53:07 -0400,
Post by Thor Lancelot Simon
Post by Takahiro Kambe
Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
forwarding IPv4 multicast packet. The packet was short (36 octets)
UDP/IP pakcet.
This is a bug in the multicast forwarding code (it is related to a bug
I have been investigating in ipf, pf, and bridge). Look at ip_forward()
and ip_flow(): on NetBSD, if you forward a packet using the same mbuf
in which it was received, you must set csum_flags to 0 before handing
that packet to ip_output.
Thanks very much for your explanation.
Post by Thor Lancelot Simon
This is because the same flags were used for "hardware checked checksum
on receive" and "hardware should insert checksum on transmit" which, in
my opinion, was a mistake. The existing M_CSUM_DATA and M_CSUM_NO_PSEUDOHDR
would have been sufficient for receive, leaving M_CSUM_TCPv4 (etc.) for
transmit use.
I agree your opinion.
Post by Thor Lancelot Simon
Anyway, all code forwarding packets on NetBSD must explicitly set
csum_flags to 0 after receive because of this.
Though I don't understand codes in ip_mroute.c very well, attached
patch might be things make better. (Not tested since I don't have
testing environment now, hoping it could test in this month.)
--
Takahiro Kambe <***@back-street.net>

Index: sys/netinet/ip_mroute.c
===================================================================
RCS file: /cvs/src-4/sys/netinet/ip_mroute.c,v
retrieving revision 1.1.1.1
diff -u -p -d -d -u -p -r1.1.1.1 ip_mroute.c
--- sys/netinet/ip_mroute.c 7 Feb 2007 01:50:26 -0000 1.1.1.1
+++ sys/netinet/ip_mroute.c 6 May 2008 04:51:24 -0000
@@ -1425,6 +1425,11 @@ ip_mforward(struct mbuf *m, struct ifnet
return (1);
}

+ /*
+ * Clear any in-bound checksum flags for this packet.
+ */
+ m->m_pkthdr.csum_flags = 0;
+
#ifdef RSVP_ISI
if (imo && ((vifi = imo->imo_multicast_vif) < numvifs)) {
if (ip->ip_ttl < MAXTTL)


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2008-05-08 06:24:36 UTC
Permalink
Post by Takahiro Kambe
Post by Thor Lancelot Simon
Anyway, all code forwarding packets on NetBSD must explicitly set
csum_flags to 0 after receive because of this.
Though I don't understand codes in ip_mroute.c very well, attached
patch might be things make better. (Not tested since I don't have
testing environment now, hoping it could test in this month.)
Your patch looks correct. Please check it in.
Post by Takahiro Kambe
--
Index: sys/netinet/ip_mroute.c
===================================================================
RCS file: /cvs/src-4/sys/netinet/ip_mroute.c,v
retrieving revision 1.1.1.1
diff -u -p -d -d -u -p -r1.1.1.1 ip_mroute.c
--- sys/netinet/ip_mroute.c 7 Feb 2007 01:50:26 -0000 1.1.1.1
+++ sys/netinet/ip_mroute.c 6 May 2008 04:51:24 -0000
@@ -1425,6 +1425,11 @@ ip_mforward(struct mbuf *m, struct ifnet
return (1);
}
+ /*
+ * Clear any in-bound checksum flags for this packet.
+ */
+ m->m_pkthdr.csum_flags = 0;
+
#ifdef RSVP_ISI
if (imo && ((vifi = imo->imo_multicast_vif) < numvifs)) {
if (ip->ip_ttl < MAXTTL)
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Takahiro Kambe
2008-05-08 08:02:51 UTC
Permalink
In message <8B68CB47-1D64-48A0-A8C1-***@shagadelic.org>
on Wed, 7 May 2008 23:24:36 -0700,
Post by Jason Thorpe
Post by Takahiro Kambe
Post by Thor Lancelot Simon
Anyway, all code forwarding packets on NetBSD must explicitly set
csum_flags to 0 after receive because of this.
Though I don't understand codes in ip_mroute.c very well, attached
patch might be things make better. (Not tested since I don't have
testing environment now, hoping it could test in this month.)
Your patch looks correct. Please check it in.
Done. And I'll request pull-up to netbsd-4 branch and this problem
dosen't exist netbsd-3 branch and before.
--
Takahiro Kambe <***@back-street.net>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Patrick Welche
2008-05-16 14:05:03 UTC
Permalink
Post by Takahiro Kambe
Hi,
on Tue, 15 Apr 2008 20:32:16 +0900 (JST),
Post by Takahiro Kambe
Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
forwarding IPv4 multicast packet. The packet was short (36 octets)
UDP/IP pakcet.
...
Post by Takahiro Kambe
The kernel has DIAGNOSTIC option enabled and corresponding code
fragments in ip_output().
#ifdef DIAGNOSTIC
if ((m->m_flags & M_PKTHDR) == 0)
panic("ip_output: no HDR");
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv6|M_CSUM_UDPv6)) != 0) {
panic("ip_output: IPv6 checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv4|M_CSUM_UDPv4)) ==
(M_CSUM_TCPv4|M_CSUM_UDPv4)) {
panic("ip_output: conflicting checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
#endif
It seems that this diagnostic code checking M_CSUM_TCPv4 and
M_CSUM_UDPv4 are exclusive one.
I confirmed that bge(4) sets both M_CSUM_TCPv4 and M_CSUM_UDPv4 to
m->m_pkthdr.csum_flags with usual unicast IP packets.
I don't know it is bug of bge(4) or above DIAGNOSTIC is wrong or
obsolete.
Don't know whether relevant, but a 4.99.60/i386 box with bge gave:

uvm_fault(0xcdfae574, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 22172.1 (dhcpd) at 0xc03a6f25: movl 0x14(%eax),%eax
db{1}> bt/l
m_length(0,0,cd985abc,c0377c4f,5) at 0xc03a6f25
bpf_mtap(c2d822c0,0,cd985aec,c03a8f5d,cd985a05) at netbsd:bpf_mtap+0x17
bge_start(c2da7004,178,9000003,3,0) at netbsd:bge_start+0x10c
ifq_enqueue(c2da7004,c3111300,c2da7004,2,cdfae574) at netbsd:ifq_enqueue+0x13f
ether_output(c2da7004,c3111300,c06077a0,0,c06077a0) at netbsd:ether_output+0x71e
bpf_write(cdc82300,cdc82300,cd985c60,d5bf99c0,1) at netbsd:bpf_write+0x126
do_filewritev(7,bfbfc668,3,cdc82300,1) at netbsd:do_filewritev+0x270
sys_writev(cdfac900,cd985d04,cd985cfc,cd985d10,c03d0d79) at netbsd:sys_writev+0x3f
syscall(cd985d48,b3,ab,bfbf001f,bfbf001f) at netbsd:syscall+0x141

yesterday...

Cheers,

Patrick

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2008-05-16 22:14:10 UTC
Permalink
Post by Patrick Welche
Post by Takahiro Kambe
Hi,
on Tue, 15 Apr 2008 20:32:16 +0900 (JST),
Post by Takahiro Kambe
Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
forwarding IPv4 multicast packet. The packet was short (36 octets)
UDP/IP pakcet.
...
Post by Takahiro Kambe
The kernel has DIAGNOSTIC option enabled and corresponding code
fragments in ip_output().
#ifdef DIAGNOSTIC
if ((m->m_flags & M_PKTHDR) == 0)
panic("ip_output: no HDR");
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv6|M_CSUM_UDPv6)) != 0) {
panic("ip_output: IPv6 checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv4|M_CSUM_UDPv4)) ==
(M_CSUM_TCPv4|M_CSUM_UDPv4)) {
panic("ip_output: conflicting checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
#endif
It seems that this diagnostic code checking M_CSUM_TCPv4 and
M_CSUM_UDPv4 are exclusive one.
I confirmed that bge(4) sets both M_CSUM_TCPv4 and M_CSUM_UDPv4 to
m->m_pkthdr.csum_flags with usual unicast IP packets.
I don't know it is bug of bge(4) or above DIAGNOSTIC is wrong or
obsolete.
uvm_fault(0xcdfae574, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 22172.1 (dhcpd) at 0xc03a6f25: movl 0x14(%eax),%eax
db{1}> bt/l
m_length(0,0,cd985abc,c0377c4f,5) at 0xc03a6f25
bpf_mtap(c2d822c0,0,cd985aec,c03a8f5d,cd985a05) at netbsd:bpf_mtap+0x17
bge_start(c2da7004,178,9000003,3,0) at netbsd:bge_start+0x10c
ifq_enqueue(c2da7004,c3111300,c2da7004,2,cdfae574) at netbsd:ifq_enqueue+0x13f
ether_output(c2da7004,c3111300,c06077a0,0,c06077a0) at netbsd:ether_output+0x71e
bpf_write(cdc82300,cdc82300,cd985c60,d5bf99c0,1) at netbsd:bpf_write+0x126
do_filewritev(7,bfbfc668,3,cdc82300,1) at netbsd:do_filewritev+0x270
sys_writev(cdfac900,cd985d04,cd985cfc,cd985d10,c03d0d79) at netbsd:sys_writev+0x3f
syscall(cd985d48,b3,ab,bfbf001f,bfbf001f) at netbsd:syscall+0x141
yesterday...
It looks like IFQ_POLL()/IFQ_DEQUEUE() did not honor their contract. In
order to reach the bpf_mtap() statement, IFQ_POLL() had to return m_head
!= NULL. According to altq(9), "It is guaranteed that IFQ_DEQUEUE()
immediately after IFQ_POLL() returns the same packet."

Are you using ALTQ?

Dave
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933 ext 24

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Patrick Welche
2008-05-17 12:37:58 UTC
Permalink
Post by David Young
Post by Patrick Welche
Post by Takahiro Kambe
Hi,
on Tue, 15 Apr 2008 20:32:16 +0900 (JST),
Post by Takahiro Kambe
Today, NetBSD 4.0_STABLE machine paniced in ip_output() when
forwarding IPv4 multicast packet. The packet was short (36 octets)
UDP/IP pakcet.
...
Post by Takahiro Kambe
The kernel has DIAGNOSTIC option enabled and corresponding code
fragments in ip_output().
#ifdef DIAGNOSTIC
if ((m->m_flags & M_PKTHDR) == 0)
panic("ip_output: no HDR");
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv6|M_CSUM_UDPv6)) != 0) {
panic("ip_output: IPv6 checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
if ((m->m_pkthdr.csum_flags & (M_CSUM_TCPv4|M_CSUM_UDPv4)) ==
(M_CSUM_TCPv4|M_CSUM_UDPv4)) {
panic("ip_output: conflicting checksum offload flags: %d",
m->m_pkthdr.csum_flags);
}
#endif
It seems that this diagnostic code checking M_CSUM_TCPv4 and
M_CSUM_UDPv4 are exclusive one.
I confirmed that bge(4) sets both M_CSUM_TCPv4 and M_CSUM_UDPv4 to
m->m_pkthdr.csum_flags with usual unicast IP packets.
I don't know it is bug of bge(4) or above DIAGNOSTIC is wrong or
obsolete.
uvm_fault(0xcdfae574, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 22172.1 (dhcpd) at 0xc03a6f25: movl 0x14(%eax),%eax
db{1}> bt/l
m_length(0,0,cd985abc,c0377c4f,5) at 0xc03a6f25
bpf_mtap(c2d822c0,0,cd985aec,c03a8f5d,cd985a05) at netbsd:bpf_mtap+0x17
bge_start(c2da7004,178,9000003,3,0) at netbsd:bge_start+0x10c
ifq_enqueue(c2da7004,c3111300,c2da7004,2,cdfae574) at netbsd:ifq_enqueue+0x13f
ether_output(c2da7004,c3111300,c06077a0,0,c06077a0) at netbsd:ether_output+0x71e
bpf_write(cdc82300,cdc82300,cd985c60,d5bf99c0,1) at netbsd:bpf_write+0x126
do_filewritev(7,bfbfc668,3,cdc82300,1) at netbsd:do_filewritev+0x270
sys_writev(cdfac900,cd985d04,cd985cfc,cd985d10,c03d0d79) at netbsd:sys_writev+0x3f
syscall(cd985d48,b3,ab,bfbf001f,bfbf001f) at netbsd:syscall+0x141
yesterday...
It looks like IFQ_POLL()/IFQ_DEQUEUE() did not honor their contract. In
order to reach the bpf_mtap() statement, IFQ_POLL() had to return m_head
!= NULL. According to altq(9), "It is guaranteed that IFQ_DEQUEUE()
immediately after IFQ_POLL() returns the same packet."
Are you using ALTQ?
No altq - also this was a kernel from 21st April - I hope I didn't
hijack Takahiro's thread - just noticed that they were both with
bge.

Cheers,

Patrick

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2008-05-19 17:36:18 UTC
Permalink
Post by Patrick Welche
Post by David Young
Post by Patrick Welche
uvm_fault(0xcdfae574, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 22172.1 (dhcpd) at 0xc03a6f25: movl 0x14(%eax),%eax
db{1}> bt/l
m_length(0,0,cd985abc,c0377c4f,5) at 0xc03a6f25
bpf_mtap(c2d822c0,0,cd985aec,c03a8f5d,cd985a05) at netbsd:bpf_mtap+0x17
bge_start(c2da7004,178,9000003,3,0) at netbsd:bge_start+0x10c
ifq_enqueue(c2da7004,c3111300,c2da7004,2,cdfae574) at netbsd:ifq_enqueue+0x13f
ether_output(c2da7004,c3111300,c06077a0,0,c06077a0) at netbsd:ether_output+0x71e
bpf_write(cdc82300,cdc82300,cd985c60,d5bf99c0,1) at netbsd:bpf_write+0x126
do_filewritev(7,bfbfc668,3,cdc82300,1) at netbsd:do_filewritev+0x270
sys_writev(cdfac900,cd985d04,cd985cfc,cd985d10,c03d0d79) at netbsd:sys_writev+0x3f
syscall(cd985d48,b3,ab,bfbf001f,bfbf001f) at netbsd:syscall+0x141
yesterday...
It looks like IFQ_POLL()/IFQ_DEQUEUE() did not honor their contract. In
order to reach the bpf_mtap() statement, IFQ_POLL() had to return m_head
!= NULL. According to altq(9), "It is guaranteed that IFQ_DEQUEUE()
immediately after IFQ_POLL() returns the same packet."
Are you using ALTQ?
No altq - also this was a kernel from 21st April - I hope I didn't
hijack Takahiro's thread - just noticed that they were both with
bge.
Is this an SMP box? I don't know how this could happen unless a second
thread or an interrupt handler ran bge_start() simultaneously with the
thread where the fault occurred. Looking at the bge(4) code, I don't
see how that could happen.

Dave
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933 ext 24

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Andrew Doran
2008-05-19 17:43:04 UTC
Permalink
Post by David Young
Post by Patrick Welche
Post by David Young
Post by Patrick Welche
uvm_fault(0xcdfae574, 0, 1) -> 0xe
kernel: supervisor trap page fault, code=0
Stopped in pid 22172.1 (dhcpd) at 0xc03a6f25: movl 0x14(%eax),%eax
db{1}> bt/l
m_length(0,0,cd985abc,c0377c4f,5) at 0xc03a6f25
bpf_mtap(c2d822c0,0,cd985aec,c03a8f5d,cd985a05) at netbsd:bpf_mtap+0x17
bge_start(c2da7004,178,9000003,3,0) at netbsd:bge_start+0x10c
ifq_enqueue(c2da7004,c3111300,c2da7004,2,cdfae574) at netbsd:ifq_enqueue+0x13f
ether_output(c2da7004,c3111300,c06077a0,0,c06077a0) at netbsd:ether_output+0x71e
bpf_write(cdc82300,cdc82300,cd985c60,d5bf99c0,1) at netbsd:bpf_write+0x126
do_filewritev(7,bfbfc668,3,cdc82300,1) at netbsd:do_filewritev+0x270
sys_writev(cdfac900,cd985d04,cd985cfc,cd985d10,c03d0d79) at netbsd:sys_writev+0x3f
syscall(cd985d48,b3,ab,bfbf001f,bfbf001f) at netbsd:syscall+0x141
yesterday...
It looks like IFQ_POLL()/IFQ_DEQUEUE() did not honor their contract. In
order to reach the bpf_mtap() statement, IFQ_POLL() had to return m_head
!= NULL. According to altq(9), "It is guaranteed that IFQ_DEQUEUE()
immediately after IFQ_POLL() returns the same packet."
Are you using ALTQ?
No altq - also this was a kernel from 21st April - I hope I didn't
hijack Takahiro's thread - just noticed that they were both with
bge.
Is this an SMP box? I don't know how this could happen unless a second
thread or an interrupt handler ran bge_start() simultaneously with the
thread where the fault occurred. Looking at the bge(4) code, I don't
see how that could happen.
Without looking at the code, it seems that the bpf fileops need to take
kernel_lock.

Andrew


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Patrick Welche
2008-05-19 18:57:36 UTC
Permalink
Post by David Young
Post by Patrick Welche
No altq - also this was a kernel from 21st April - I hope I didn't
hijack Takahiro's thread - just noticed that they were both with
bge.
Is this an SMP box?
An SMP box of sort: a pentium 4 with hyperthreading, so 2 cpus as
far NetBSD is concerned..

Cheers,

Patrick

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...