AsiaBSDCon 2014 BoF: Improving bridge(4) or Toward a Unified L2 Framework

Dennis Ferguson

2014-03-26 23:16:17 UTC

Post by Ryota Ozaki
This proposal is a bit radical though, anyway we think
we have to improve bridge(4), say making it L3 capable.

What exactly would a "L3 capable" bridge(4) be? As opposed to "normal
routed interfaces"?

One direction would, as you say, make bridge(4) normal routed interfaces,
like other BSD families and Linux. OTOH, we are thinking another direction
having a special interface for a bridge where the interface acts a routed
interface instead of the bridge itself. The design has benefits that it can
keep bridge(4) L2 (letting bridges have L3 features is strange for us),
keep bridge(4) code itself simple, and its structure fits the unification of
dataplanes that we proposed in the presentation.

Yes, please! I've never understood the current behaviour of bridge(4)
interfaces, but I'm pretty sure the semantic differences between the
definition of an "interface" at the switch level and an "interface" from
the point of view of L3 protocols causes things to break when you try
to treat a single hardware interface as if it were both kinds. Even the
interface flags don't match up, since hardware ethernet interfaces which
are "multiaccess" interfaces for L3 are generally "point-to-point" interfaces
for L2 purposes. I'll guess the problems with all of this will minimally
show up in broken behaviour for IP multicast, or as a pile of warts in
there to avoid breakage, as multicasting is generally the canary that
tells you something is wrong in the coal mine.

Hardware interfaces should either be L2 switched interfaces (i.e. you make
routing decisions for arriving packets by looking solely at the MAC addresses
and, optionally, VLAN tags), or they should be L3 interfaces, where the
destination MAC address might be used for drop/no-drop filtering but the
routing decision is arrived at by dumping the Ethernet header and looking
at the L3 header instead. If the hardware interface is configured for bridging
it shouldn't have L3 configuration, and vice versa. If you want to add the local
host to the bridge for L3 use you instead conjure up a pseudo-interface which
has one side added to the bridge group and treated pretty much like the hardware
interface members of the same bridge group, while the other side of the
pseudo-interface has a MAC address and gets the L3 configuration. In particular,
no matter which of the bridge's hardware interfaces a packet arrives on, by the
time a packet makes it to the L3 stack its incoming interface should be identified
as the bridge group's pseudo-interface; the things L2 forwarding considers to be
"interfaces" are not relevant to IP. I'm happy the proposal seems to want to
arrange it this way as well.

If you do it like this, however, please consider allowing more than one
pseudo-interface to be added to the bridge group when this makes sense. This
might make it possible to, say, share the single hardware interface in a host
between the host itself and a SIMH vax by putting the hardware interface and
two pseudo-interfaces into a bridge group, one pseudo-interface for the host's
IP configuration and the other to be opened "raw" by the SIMH DEUNA emulation
to send and receive packets (thus avoiding BPF; when BPF is the solution it
is generally an indication that the problem remains to be solved). This might
also help if one of the links you want to add to the bridge is an ethernet-over-PPP
or ethernet-over-ATM link and a way is needed to glue this in, though this problem
is less common than it used to be.

Finally, on an incidentally related topic, I really wish the IP multicast support
in the kernel were implemented as an unified part of the IP forwarding path, sharing
the basic forwarding code and route lookup with unicast to the extent possible (that
extent is considerable, actually), rather than the bag-on-the-side, whole-different-thing
approach used now. The reason for this is not that I am a fan of IP multicasting (the
opposite of that is closer to the truth), but that I find that asking the question "How
does this work with IP multicast?" and finding a good answer for that usually leads
to better design decisions for the protocol stack as a whole. It might encourage
people to ask the question more often if the multicast support were an integrated,
unavoidable component of the basic forwarding path rather than split out into
separate files that no one ever looks at.

Dennis Ferguson
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de