Discussion:
IFF_SIMPLEX vs bridge(4)
(too old to reply)
der Mouse
2008-08-12 21:22:54 UTC
Permalink
I've just run into a case where IFF_SIMPLEX is interacting badly with
bridge(4).

The setup: 4.0/i386. Two interfaces are in a bridge. One of them
(vr0) is simplex (IFF_SIMPLEX is set). An ARP request is received on
the other interface, generated by an external event. This packet is
bridged through to vr0. But ether_output on vr0 says "oh, this
interface is simplex and this packet is broadcast, we'll loop back a
copy". Thus, the barest instant later, vr0 receives a copy of the very
same packet.

This causes the bridge to end up with the sending MAC address learnt on
vr0 rather than the interface it really is on. This then causes the
ARP reply, which is aimed at that MAC address, to go unbridged
(because, as far as the bridge is concerned, it was received on the
same interface it should go out on and thus shouldn't be bridged
anywhere).

This leads to the obvious communication failure.

Thoughts on what the correct fix is?

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Daniel Carosone
2008-08-12 21:38:45 UTC
Permalink
Post by der Mouse
I've just run into a case where IFF_SIMPLEX is interacting badly with
bridge(4).
The setup: 4.0/i386. Two interfaces are in a bridge. One of them
(vr0) is simplex (IFF_SIMPLEX is set). An ARP request is received on
the other interface, generated by an external event. This packet is
bridged through to vr0. But ether_output on vr0 says "oh, this
interface is simplex and this packet is broadcast, we'll loop back a
copy". Thus, the barest instant later, vr0 receives a copy of the very
same packet.
This causes the bridge to end up with the sending MAC address learnt on
vr0 rather than the interface it really is on. This then causes the
ARP reply, which is aimed at that MAC address, to go unbridged
(because, as far as the bridge is concerned, it was received on the
same interface it should go out on and thus shouldn't be bridged
anywhere).
This leads to the obvious communication failure.
Thoughts on what the correct fix is?
Too quick and too little coffee for thoughts to that extent, but in
the meantime an obvious quick workaround is to disable learning on
that bridge port.

A better fix might be some way to mark the copied packet for the same
treatment, or do the analysis to decide whether its safe not to do the
simplex copy in this case - when we know the packet isn't ours.

--
Dan.
der Mouse
2008-08-12 21:34:39 UTC
Permalink
Post by der Mouse
The setup: 4.0/i386. Two interfaces are in a bridge.
[...IFF_SIMPLEX-looped-back packet confusing bridge's MAC list...]
Hmm. It can't be quite this simple, because the same thing doesn't
happen the other way. But the effect - the MAC in question winds up
learned on the wrong interface - is definitely occurring.

I don't have time now to post the full details - I'll do that within a
day or two, unless I figure out what's wrong first. But there's
certainly something amiss: tcpdump on "the wrong interface" shows the
broadcast ARP request twice, and it's learned wrong, are the primary
symptoms.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Daniel Carosone
2008-08-12 21:45:42 UTC
Permalink
Post by Daniel Carosone
Too quick and too little coffee for thoughts to that extent, but in
the meantime an obvious quick workaround is to disable learning on
that bridge port.
A better fix might be some way to mark the copied packet for the same
treatment, or do the analysis to decide whether its safe not to do the
simplex copy in this case - when we know the packet isn't ours.
No, the fix is for bridge(4) to not learn addresses from broadcast
packets (possibly, only if simplex is set). IE, the packet is already
self-marking for this behaviour. We'll learn the address soon enough
from later communications.

--
Dan.
Daniel Carosone
2008-08-12 21:52:38 UTC
Permalink
Post by der Mouse
I don't have time now to post the full details - I'll do that within a
day or two, unless I figure out what's wrong first. But there's
certainly something amiss: tcpdump on "the wrong interface" shows the
broadcast ARP request twice, and it's learned wrong, are the primary
symptoms.
Hm. If you're seeing it twice, then either you're bridging it twice or
the interface isn't really SIMPLEX. Do you see it twice from an
external listener, on either port?

--
Dan.
der Mouse
2008-08-13 03:20:25 UTC
Permalink
Post by Daniel Carosone
[...apparent IFF_SIMPLEX vs bridge(4) issue...]
I don't have time now to post the full details [...]
Hm. If you're seeing it twice, then either you're bridging it twice
or the interface isn't really SIMPLEX. Do you see it twice from an
external listener, on either port?
I don't (yet) know.

Here's the situation. The machine is 4.0 i386 Xen. It has five
hardware network interfaces, ne0 rtk0 ex0 vr0 sk0, but only two of them
(ne0 and vr0) are actually involved. (ex0 is also in use but isn't, I
believe, relevant; rtk0 and sk0 are up but unused.) It's being used to
play with some new (to us) DSL hardware; there are three CPEs connected
to it, of which only one is relevant. The DSL stuff seemed to be
working, but we were having some trouble getting it to behave in real
use; while trying to track it down, I ran into this issue.

Hardware connections:
- ex0 is connected to the house LAN.
- ne0 is connected to the DSLAM uplink.
- vr0 is connected to the relevant CPE's LAN port.
- rtk0 is connected to another CPE's LAN port (not used here).
- The DSLAM is configured for the relevant CPE's traffic to come out
the uplink tagged vlan 112. (The other CPEs are tagged vlans 18 and
19, but I think those don't matter.)

There are two domUs, d1 and d2 (well, there's a third set up, but it
isn't running and thus shouldn't have any relevance) and, of course,
the managing dom0.

Conceptually, what I am trying to set up looks like this:

House LAN ---------+---------------+------------
| |
+--+---+ +--+---+
| d1 | | d2 |
+--+---+ +--+---+
| |
CPE DSLAM uplink

But, because of xen, it's not quite that simple. Here's the truth
(including a private network between the domUs which does not appear on
the above simplified diagram but which I'm including here - for
completeness, not because I think it really bears on the problem):

House LAN
|
| +------------------------- dom0 -------------------------+
| | |
| | +---------+ |
| | | bridge0 | |
| | +-+--+--+-+ |
| | | | | |
+---+--------------+ | | |
| |ex0 | | |
| |10.10.10.39/23 | | |
| | | | +------ domU d1 ------+ |
| | | | | |
| +---------+ | | | xennet0 | |
| | bridge1 | | +---------+ 00:16:3e:65:11:19 | |
| +--+---+--+ | | 10.10.10.59/23 | |
| | | | | | |
| | | | | xennet1 | |
CPE ----+-----+ +-------|------------+ 00:16:3e:0c:7f:99 | |
|vr0 | | 172.16.1.1/24 | |
| | | | |
| | | xennet2 | |
DSLAM ----+--------+ | +-----+ 00:16:3e:7f:f9:1e | |
|ne0 | | | | 10.0.0.1/24 | |
| | | | +---------------------+ |
| | | +---+-----+ |
| +----+----+ | | bridge4 | |
| | vlanif | | +---+-----+ |
| | | | | +------ domU d2 ------+ |
| | VLAN | | | | | |
| | | | | | xennet0 | |
| | vlan112 | +------|-----+ 00:16:3e:48:f5:90 | |
| +----+----+ | | 10.10.10.61/23 | |
| 172.16.1.2 | | | |
| | | | xennet1 | |
| | +-----------|-----+ 00:16:3e:16:3b:a4 | |
| | | | | 172.16.1.3/24 | |
| | | | | | |
| +--+---+--+ | | xennet2 | |
| | bridge2 | +-----+ 00:16:3e:16:c1:38 | |
| +---------+ | 10.0.0.2/24 | |
| +---------------------+ |
| |
+--------------------------------------------------------+

In text form, here's the config:

- dom0 ex0 is configured 10.10.10.39/23.
- ne0, vr0, and rtk0 are up but have no addresses of their own.
- There are five bridge interfaces, bridge0 through bridge4.
- Each domU's xennet0 is configured with an address in 10.10.10.0/23
and its corresponding xvif interface is a member of bridge0.
- Each domU's xennet2 is configured with an address in 10.0.0.0/24 and
its corresponding xvif interface is a member of bridge4.
- bridge1 bridges vr0 and the xvif corresponding to domU d1's xennet1.
- domU d1's xennet1 is configured with 172.16.1.1/24.
- bridge2 bridges vlan112 and the xvif corresponding to domU d2's
xennet1.
- domU d2's xennet1 is configured with 172.16.1.3/24.
- vlan112 is configured "vlan 112 vlanif ne0" and has address
172.16.1.2/24.

To reproduce the problem: on d2 I tcpdump xennet1 and on the dom0 I
tcpdump vr0 and vlan112. Then I have d1 "ping -n -c 1 172.16.1.3".
The xennet1 and vlan112 tcpdumps show

00:16:3e:0c:7f:99 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 172.16.1.3 tell 172.16.1.1
00:16:3e:16:3b:a4 > 00:16:3e:0c:7f:99, ethertype ARP (0x0806), length 42: arp reply 172.16.1.3 is-at 00:16:3e:16:3b:a4

but the tcpdump on vr0 shows the request twice

00:16:3e:0c:7f:99 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 172.16.1.3 tell 172.16.1.1
00:16:3e:0c:7f:99 > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 60: arp who-has 172.16.1.3 tell 172.16.1.1
00:16:3e:16:3b:a4 > 00:16:3e:0c:7f:99, ethertype ARP (0x0806), length 64: arp reply 172.16.1.3 is-at 00:16:3e:16:3b:a4

and when I "brconfig bridge1", it says

00:16:3e:16:3b:a4 vr0 1179 flags=0<>
00:16:3e:0c:7f:99 vr0 1179 flags=0<>

However, every interface in sight (vr0, ne0, xvif*.*, vlan112, and even
ex0) turns out to be marked SIMPLEX, and bridge2 has got it right:

00:16:3e:16:3b:a4 xvif10.1 1194 flags=0<>
00:16:3e:0c:7f:99 vlan112 1194 flags=0<>

So it's not just SIMPLEX interacting badly with bridge(4), but I'm not
sure what it is.

Having d1 ping 172.16.1.2 works just as badly.

It has something to do with broadcasts, though; if I manually set an
ARP entry for 172.16.1.2 or 172.16.1.3 with the correct MAC and *then*
ping the relevant address, bridge1 learns the affected MAC address
correctly, and traffic works fine - as long as the ARP entry is around.
And if - even with the ARP entries in place - I ping 172.16.1.255, the
broadcast echo request is duplicated according to vr0's tcpdump, and
bridge1 moves the affected MAC address to vr0. (Only 172.16.1.1
actually _responds_ to the broadcast ping, but that appears to be
irrelevant.) Further unicast traffic from d1 - such as a ping to
172.16.1.2 or .3 using the hardwired ARP table entries - moves it back
again.

I should check out the "external listener" question - interpose a hub
between vr0 and the CPE, snoop it, and see what is actually present on
the wire. I can probably do that sometime Thursday - I don't expect to
get back to this before then. (The lack of a duplicate on vlan112
argues that it isn't duplicated on the wire, but nothing beats actually
checking to see. :)

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2008-08-13 08:39:07 UTC
Permalink
Post by der Mouse
This causes the bridge to end up with the sending MAC address learnt on
vr0 rather than the interface it really is on.
If anyone fixes it, please close PR kern/18035 (which is slightly unrelated,
but still open due to this bug).

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2008-08-13 08:59:40 UTC
Permalink
Post by Martin Husemann
Post by der Mouse
This causes the bridge to end up with the sending MAC address learnt
on vr0 rather than the interface it really is on.
If anyone fixes it, please close PR kern/18035 (which is slightly
unrelated, but still open due to this bug).
Oho! Yes, vr0 is running at "Ethernet autoselect (10baseT)", so it is
probably more generic than just hme. I should see if I can find
something to speed-adapt so vr0 can run at 100 even though the device
it's connected to insists on 10, to see if that "fixes it".

I'm sending a bcc of this to the PR. To anyone reading this via the
PR, my problem was very similar - a bridge learning a MAC on the wrong
interface. My tests indicate that it is related to broadcast packets,
which is hardly surprising. (Maybe "broadcast or multicast"; I didn't
even try to test multicast.)

I think "bridge doesn't learn MACs from broadcast packets" is a
reasonable approximation to a fix. I can justify doing this on work
time, since the case that's breaking for me is a work setup, which
means it's likely going to happen sometime Thursday 2008-08-14.

Any thoughts on whether it would be better to do "bridge doesn't learn
MACs from broadcast or multicast packets"?

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2008-08-15 04:03:26 UTC
Permalink
Post by der Mouse
I think "bridge doesn't learn MACs from broadcast packets" is a
reasonable approximation to a fix.
It is. I tried that and, while I still see the ARP request duplicated
when tcpdumping vr0, I no longer see bridge1 learning MAC addresses on
the wrong interface.

The patch I got this effect with is almost ludicrously simple:

--- /dev/fd/4 Tue Dec 9 16:18:08 2003
+++ /dev/fd/5 Tue Dec 9 16:18:08 2003
@@ -1360,10 +1360,12 @@
/*
* If the interface is learning, and the source
* address is valid and not multicast, record
- * the address.
+ * the address. But don't do this if the destination
+ * is broadcast; such packets are looped back too often.
*/
if ((bif->bif_flags & IFBIF_LEARNING) != 0 &&
ETHER_IS_MULTICAST(eh->ether_shost) == 0 &&
+ memcmp(etherbroadcastaddr,eh->ether_dhost,sizeof(etherbroadcastaddr)) &&
(eh->ether_shost[0] == 0 &&
eh->ether_shost[1] == 0 &&
eh->ether_shost[2] == 0 &&

I'm sending a bcc of this mail, too, to the PR, so's to ensure the
patch is there for anyone who wants it. (The above is relative to
stock 4.0 source, if_bridge.c,v 1.46.)

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christoph Badura
2008-08-26 21:16:05 UTC
Permalink
Post by der Mouse
Post by der Mouse
I think "bridge doesn't learn MACs from broadcast packets" is a
reasonable approximation to a fix.
It is. I tried that and, while I still see the ARP request duplicated
when tcpdumping vr0, I no longer see bridge1 learning MAC addresses on
the wrong interface.
@@ -1360,10 +1360,12 @@
/*
* If the interface is learning, and the source
* address is valid and not multicast, record
- * the address.
+ * the address. But don't do this if the destination
+ * is broadcast; such packets are looped back too often.
*/
if ((bif->bif_flags & IFBIF_LEARNING) != 0 &&
ETHER_IS_MULTICAST(eh->ether_shost) == 0 &&
+ memcmp(etherbroadcastaddr,eh->ether_dhost,sizeof(etherbroadcastaddr)) &&
(eh->ether_shost[0] == 0 &&
eh->ether_shost[1] == 0 &&
eh->ether_shost[2] == 0 &&
You should be using ETHER_IS_MULTICAST(eh->ether_dhost) instead of memcmp().
That is the standard way.

I'm wondering whether the 's' in ether_shost in the original code is a typo.
's' and 'd' are next to each other on US keyboards.
AFAIK no protocols send ethernet packets with a multicast source address.
And if the ETHER_IS_MULTICAST() check is supposed to ensure that no multicast
addresses are put into the bridge routing table, the check should better be
moved inside bridge_rtupdate().

--chris

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2008-08-26 21:20:50 UTC
Permalink
Post by Christoph Badura
Post by der Mouse
if ((bif->bif_flags & IFBIF_LEARNING) != 0 &&
ETHER_IS_MULTICAST(eh->ether_shost) == 0 &&
+ memcmp(etherbroadcastaddr,eh->ether_dhost,sizeof(etherbroadcastaddr)) &&
(eh->ether_shost[0] == 0 &&
eh->ether_shost[1] == 0 &&
eh->ether_shost[2] == 0 &&
You should be using ETHER_IS_MULTICAST(eh->ether_dhost) instead of
memcmp(). That is the standard way.
The semantics are different, though; I found no ETHER_IS_BROADCAST
macro, so I went with memcmp(). It's possible that ETHER_IS_MULTICAST
actually gives the correct semantics, but, since I have no easy way to
test for this problem with multicast-but-not-broadcast packets, I went
for the minimal change.
Post by Christoph Badura
I'm wondering whether the 's' in ether_shost in the original code is a typo.
Possibly, but I think probably not. The effect is "never learn
multicast addresses", which strikes me as the right thing.
Post by Christoph Badura
AFAIK no protocols send ethernet packets with a multicast source address.
I don't know of any either, but I'm definitely not ready to say there
are none. (I can easily imagine some sort of load-balancing setup that
uses a multicast MAC as if it were an ordinary MAC, for example.)
Post by Christoph Badura
And if the ETHER_IS_MULTICAST() check is supposed to ensure that no
multicast addresses are put into the bridge routing table, the check
should better be moved inside bridge_rtupdate().
Perhaps. I don't consider myself competent to judge that, so I didn't
meddle with it. It does seem to me that the current placement has the
effect of "never automatically learn multicast, but allow them to be
inserted by specific action", which seems to me like TRT: do the
usually-right thing automatically, but allow it to be overridden.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Daniel Carosone
2008-08-28 02:12:08 UTC
Permalink
Post by der Mouse
Post by Christoph Badura
I'm wondering whether the 's' in ether_shost in the original code is a typo.
Possibly, but I think probably not. The effect is "never learn
multicast addresses", which strikes me as the right thing.
Yes, because such destinations should be splashed to all ports; if we
learn them behind one port that will no longer happen.
Post by der Mouse
Post by Christoph Badura
AFAIK no protocols send ethernet packets with a multicast source address.
I don't know of any either, but I'm definitely not ready to say there
are none.
.. and regardless we should be defensive against such senders.
Post by der Mouse
(I can easily imagine some sort of load-balancing setup that
uses a multicast MAC as if it were an ordinary MAC, for example.)
No need to imagine, I can cite a specific example. Check Point
firewalls running in a particular active-active load-balancing cluster
mode use exactly this trick. They use a multicast MAC address for the
unicast IP address of the firewall, and respond to ARP requests
accordingly with that multicast MAC source address. The intention is
that all cluster thus members get sent copies of all packets, and they
arbitrate amongst themselves as to who will process the packet further
using the usual header-hash-bucket-ownership pattern. This doesn't
always work (some devices, notably cisco routers, refuse to learn from
such ARP replies and need to have the arp entry statically configured)
so there are other cluster modes in the product (with other tradeoffs)
too.

--
Dan.

Loading...