David Young
2013-02-01 23:27:07 UTC
I've been trying to get a handle on a few of the many MP-safety problems
in the network stack. My investigations brought me yesterday to
ether_output().
ether_output() is surprisingly complicated. The ether_output() that one
would hope for takes an ifnet, a packet, and a nexthop as arguments,
adds the correct L2 header to the front of the packet, and calls into a
driver's transmit routine. The ether_output() that we have does some
acrobatics with routes, ARP resolution, cons'ing up and prepending an L2
header, queueing the packet and nudging the driver.
Here are the route acrobatics with some annotations:
if ((rt = rt0) != NULL) {
if ((rt->rt_flags & RTF_UP) == 0) {
It's funny that we get into ether_output() with an rtentry that's
not even usable. I'm not sure how that happens. I will have to
look more carefully at what ip_output() is doing.
if ((rt0 = rt = rtalloc1(dst, 1)) != NULL) {
rt->rt_refcnt--;
if (rt->rt_ifp != ifp)
return (*rt->rt_ifp->if_output)
(ifp, m0, dst, rt);
} else
senderr(EHOSTUNREACH);
}
Yikes! To call rt->rt_ifp's output routine with a different ifp is not
advisable, is it? ISTR finding a bug in carp(4) involving a similar
pattern, ifpX->if_output(ifqY, ...).
It feels wrong that a packet should ever make a hairpin turn in an
L2 output routine, but there it is: first it looks like it's going
out ifp, then it goes out rt->rt_ifp.
The code that follows checks for a gateway route. Finding one, it looks
for a host route to the gateway. It seems to me that ether_output() is
a peculiar place to do that: ip_output() should have taken care of that
for us. Also, it seems to me that the (potentially) two-step lookup,
1) lookup(destination) -> gateway_rt,
2) lookup(gateway_rt->destination)
ought to be a one-step lookup, lookup(destination) -> nexthop.
if ((rt->rt_flags & RTF_GATEWAY) && dst->sa_family != AF_NS) {
I wonder what the Xerox NS (AF_NS) exemption is about? Didn't NetBSD
have a sys/netns/ that was deleted? Surely the check isn't needed
any more?
if (rt->rt_gwroute == NULL)
goto lookup;
if (((rt = rt->rt_gwroute)->rt_flags & RTF_UP) == 0) {
rtfree(rt); rt = rt0;
lookup: rt->rt_gwroute = rtalloc1(rt->rt_gateway, 1);
if ((rt = rt->rt_gwroute) == NULL)
senderr(EHOSTUNREACH);
/* the "G" test below also prevents rt == rt0 */
if ((rt->rt_flags & RTF_GATEWAY) ||
(rt->rt_ifp != ifp)) {
rt->rt_refcnt--;
rt0->rt_gwroute = NULL;
senderr(EHOSTUNREACH);
}
}
}
if (rt->rt_flags & RTF_REJECT)
if (rt->rt_rmx.rmx_expire == 0 ||
(u_long) time_second < rt->rt_rmx.rmx_expire)
senderr(rt == rt0 ? EHOSTDOWN : EHOSTUNREACH);
}
The route acrobatics alone are replicated in nd6_output(). nd6_output()
will call if_output == ether_output, so similar acrobatics are
performed twice. Of course, the latter performance is cut short.
Here is the ARP resolution:
case AF_INET:
if (m->m_flags & M_BCAST)
(void)memcpy(edst, etherbroadcastaddr, sizeof(edst));
else if (m->m_flags & M_MCAST)
ETHER_MAP_IP_MULTICAST(&satocsin(dst)->sin_addr, edst);
else if (!arpresolve(ifp, rt, m, dst, edst))
return (0); /* if not yet resolved */
/* If broadcasting on a simplex interface, loopback a copy */
if ((m->m_flags & M_BCAST) && (ifp->if_flags & IFF_SIMPLEX))
mcopy = m_copy(m, 0, (int)M_COPYALL);
etype = htons(ETHERTYPE_IP);
break;
BTW, it makes me grimace to see the simplex-loopback handled in an L2
transmit routine. Here's the rest of that,
if (mcopy)
(void)looutput(ifp, mcopy, dst, rt);
Shouldn't that be looutput(lo0ifp, ...) or lo0ifp->if_output(lo0ifp,
...) ? It seems that the network layer should handle loopback itself by
copying the packet and ip_input()'ing it directly.
Both the route acrobatics and the ARP resolution are replicated in
arc_output(), ieee1394_output(), and fddi_output().
As a first attempt at cleaning this up, I've extracted the route
acrobatics from ether_output and moved them to a routine called
ip_hresolv_output that wraps the ifp->if_output call in ip_output. See
attached patch. I guess that this if any protocols other than IPv4 and
IPv6 rely on the acrobatics, this change will break them.
I'd like ether_output never, ever to deal with the routing table,
but arpresolve() deals some in routes, so eventually I will have
to factor that out. I'm aiming for a streamlined IP transmission
path. Today we have something comparatively baroque,
# enter ip_output
dst <- destination(packet)
nexthop <- lookup(ip routing table, dst)
# enter ether_output
if (nexthop is gateway)
if (gateway is cached)
nexthop <- gwcache(nexthop)
else
gwcache(nexthop) <- lookup(ip routing table,
gwaddr(nexthop))
nexthop <- gwcache(nexthop)
if (nexthop l2 info incomplete)
defer_until_arp_complete(nexthop, packet)
else
l2hdr <- create_l2header(interface(nexthop), gwaddr(nexthop))
transmit(interface(nexthop), l2hdr | packet)
I'm aiming for something more like this:
# enter ip_output
dst <- destination(packet)
nexthop <- lookup(ip routing table, dst)
if (nexthop l2 info incomplete)
defer_until_arp_complete(nexthop, packet)
else
# enter ether_output
transmit(interface(nexthop), l2header(nexthop), packet)
It will take several steps to get there. Here is one. A step in the
wrong direction? Your feedback, please. :-)
Dave
in the network stack. My investigations brought me yesterday to
ether_output().
ether_output() is surprisingly complicated. The ether_output() that one
would hope for takes an ifnet, a packet, and a nexthop as arguments,
adds the correct L2 header to the front of the packet, and calls into a
driver's transmit routine. The ether_output() that we have does some
acrobatics with routes, ARP resolution, cons'ing up and prepending an L2
header, queueing the packet and nudging the driver.
Here are the route acrobatics with some annotations:
if ((rt = rt0) != NULL) {
if ((rt->rt_flags & RTF_UP) == 0) {
It's funny that we get into ether_output() with an rtentry that's
not even usable. I'm not sure how that happens. I will have to
look more carefully at what ip_output() is doing.
if ((rt0 = rt = rtalloc1(dst, 1)) != NULL) {
rt->rt_refcnt--;
if (rt->rt_ifp != ifp)
return (*rt->rt_ifp->if_output)
(ifp, m0, dst, rt);
} else
senderr(EHOSTUNREACH);
}
Yikes! To call rt->rt_ifp's output routine with a different ifp is not
advisable, is it? ISTR finding a bug in carp(4) involving a similar
pattern, ifpX->if_output(ifqY, ...).
It feels wrong that a packet should ever make a hairpin turn in an
L2 output routine, but there it is: first it looks like it's going
out ifp, then it goes out rt->rt_ifp.
The code that follows checks for a gateway route. Finding one, it looks
for a host route to the gateway. It seems to me that ether_output() is
a peculiar place to do that: ip_output() should have taken care of that
for us. Also, it seems to me that the (potentially) two-step lookup,
1) lookup(destination) -> gateway_rt,
2) lookup(gateway_rt->destination)
ought to be a one-step lookup, lookup(destination) -> nexthop.
if ((rt->rt_flags & RTF_GATEWAY) && dst->sa_family != AF_NS) {
I wonder what the Xerox NS (AF_NS) exemption is about? Didn't NetBSD
have a sys/netns/ that was deleted? Surely the check isn't needed
any more?
if (rt->rt_gwroute == NULL)
goto lookup;
if (((rt = rt->rt_gwroute)->rt_flags & RTF_UP) == 0) {
rtfree(rt); rt = rt0;
lookup: rt->rt_gwroute = rtalloc1(rt->rt_gateway, 1);
if ((rt = rt->rt_gwroute) == NULL)
senderr(EHOSTUNREACH);
/* the "G" test below also prevents rt == rt0 */
if ((rt->rt_flags & RTF_GATEWAY) ||
(rt->rt_ifp != ifp)) {
rt->rt_refcnt--;
rt0->rt_gwroute = NULL;
senderr(EHOSTUNREACH);
}
}
}
if (rt->rt_flags & RTF_REJECT)
if (rt->rt_rmx.rmx_expire == 0 ||
(u_long) time_second < rt->rt_rmx.rmx_expire)
senderr(rt == rt0 ? EHOSTDOWN : EHOSTUNREACH);
}
The route acrobatics alone are replicated in nd6_output(). nd6_output()
will call if_output == ether_output, so similar acrobatics are
performed twice. Of course, the latter performance is cut short.
Here is the ARP resolution:
case AF_INET:
if (m->m_flags & M_BCAST)
(void)memcpy(edst, etherbroadcastaddr, sizeof(edst));
else if (m->m_flags & M_MCAST)
ETHER_MAP_IP_MULTICAST(&satocsin(dst)->sin_addr, edst);
else if (!arpresolve(ifp, rt, m, dst, edst))
return (0); /* if not yet resolved */
/* If broadcasting on a simplex interface, loopback a copy */
if ((m->m_flags & M_BCAST) && (ifp->if_flags & IFF_SIMPLEX))
mcopy = m_copy(m, 0, (int)M_COPYALL);
etype = htons(ETHERTYPE_IP);
break;
BTW, it makes me grimace to see the simplex-loopback handled in an L2
transmit routine. Here's the rest of that,
if (mcopy)
(void)looutput(ifp, mcopy, dst, rt);
Shouldn't that be looutput(lo0ifp, ...) or lo0ifp->if_output(lo0ifp,
...) ? It seems that the network layer should handle loopback itself by
copying the packet and ip_input()'ing it directly.
Both the route acrobatics and the ARP resolution are replicated in
arc_output(), ieee1394_output(), and fddi_output().
As a first attempt at cleaning this up, I've extracted the route
acrobatics from ether_output and moved them to a routine called
ip_hresolv_output that wraps the ifp->if_output call in ip_output. See
attached patch. I guess that this if any protocols other than IPv4 and
IPv6 rely on the acrobatics, this change will break them.
I'd like ether_output never, ever to deal with the routing table,
but arpresolve() deals some in routes, so eventually I will have
to factor that out. I'm aiming for a streamlined IP transmission
path. Today we have something comparatively baroque,
# enter ip_output
dst <- destination(packet)
nexthop <- lookup(ip routing table, dst)
# enter ether_output
if (nexthop is gateway)
if (gateway is cached)
nexthop <- gwcache(nexthop)
else
gwcache(nexthop) <- lookup(ip routing table,
gwaddr(nexthop))
nexthop <- gwcache(nexthop)
if (nexthop l2 info incomplete)
defer_until_arp_complete(nexthop, packet)
else
l2hdr <- create_l2header(interface(nexthop), gwaddr(nexthop))
transmit(interface(nexthop), l2hdr | packet)
I'm aiming for something more like this:
# enter ip_output
dst <- destination(packet)
nexthop <- lookup(ip routing table, dst)
if (nexthop l2 info incomplete)
defer_until_arp_complete(nexthop, packet)
else
# enter ether_output
transmit(interface(nexthop), l2header(nexthop), packet)
It will take several steps to get there. Here is one. A step in the
wrong direction? Your feedback, please. :-)
Dave
--
David Young
***@pobox.com Urbana, IL (217) 721-9981
David Young
***@pobox.com Urbana, IL (217) 721-9981