Simplify bridge(4)

Discussion:

Simplify bridge(4)

(too old to reply)

Ryota Ozaki

2016-02-12 08:26:36 UTC

Date: Wed, 10 Feb 2016 18:56:46 +0900
Thanks to introducing softint-based if_input,
we can simplify bridge(4).
Awesome! I love patches that have loads more -'s than +'s, and
simplify locking schemes, and remove sketchy cpu_intr_p conditionals,
and things like that.
http://www.netbsd.org/~ozaki-r/simplify-bridge.diff
Remove cpu_intr_p from BRIDGE_RT_RENTER/REXIT too?

Definitely I forgot them :-/ Fixed.

I wonder how much of a difference BRIDGE_MPSAFE really makes on
uniprocessor systems. If this were new code I wouldn't have done any
conditional compilation of that. I can't imagine the performance
impact is very high: maybe a few more words of memory are used, but
uniprocessor mutex acquisition should be pretty cheap. Maybe in a
future patch we can eliminate all that.

I agree. I'll do that soon.

BTW, currently softints of a percpuq is SOFTINT_MPSAFE so they can
run in parallel even now if a underlying driver supports multi-queue
(now only wm(4) supports it though). On that reason, I think we
should always enable bridge(4) MP-safe codes and add KERNEL_LOCK to
vlan(4) for safe. (Or remove SOFTINT_MPSAFE at this point.)

Hmm... Another note, not related to your patch: queue(3) does not
issue the necessary memory barriers for pserialization. So the use of
LIST_* for rtlist and iflist is not actually safe here -- I imagine it
has worked only by accident before.
Either we need to make a variant of queue(3) that is pserialize-safe
(https://mail-index.netbsd.org/tech-kern/2014/11/21/msg018055.html) or
favour of just open-coding it in the few places where it's needed. I
don't think that's a good idea but I didn't care to take on that fight
at the time.)

I prefer that such fundamental requirements are guaranteed by the API.
Users (including me) of pserialize aren't always expert on memory
barriers (of every CPU architectures!).

Thanks,
ozaki-r

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Roy Marples

2016-02-12 10:33:17 UTC

Permalink

[J]ust wondering if we are going to see vether(4) anytime soon.

How would this vether differ from the existing tap? Presumably I'm
just missing something....

dhcpcd didn't work well with bridge(4) and tap(4) didn't help that.
vether(4) would help that. We may be able to address the issue by
fixing bridge or tap but I have no idea for now.

It's not actually dhcpcd itself - it's the kernel BPF implementation.
There was also an issue where some DHCPv6 messages were not following
across the bridge properly either.

If vether solves that then great, but does that mean we could drop the
tap interface entirely or just swap it in place?
From my perspective (a user), there is no difference between tap and vether?

Roy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Rhialto

2016-02-13 13:29:27 UTC

Permalink

tap(4) is a direct interface between userland and the network.
vether(4) would not be (although you could use BPF, etc.). It
would be an ethernet device that represents the host. If you know
how to configure Cisco devices, think BVI.

As someone who tinkers with virtual machine emulators (PDP-10, PDP-11
etc) I like tap(4) very much, because it really makes it easy to make
the VM communicate over ethernet with the outside world. Some other OSes
also have a tap device. It would be REALLY inconvenient if suddenly
NetBSD would change methods, and need not only NetBSD-dependent code
but also NetBSD-version-dependent code.

-Olaf.

--
___ Olaf 'Rhialto' Seibert -- The Doctor: No, 'eureka' is Greek for
\X/ rhialto/at/xs4all.nl -- 'this bath is too hot.'

Roy Marples

2016-02-12 12:56:51 UTC

Permalink

[J]ust wondering if we are going to see vether(4) anytime soon.

How would this vether differ from the existing tap? Presumably I'm
just missing something....

dhcpcd didn't work well with bridge(4) and tap(4) didn't help that.
vether(4) would help that. We may be able to address the issue by
fixing bridge or tap but I have no idea for now.

I have a theory for why this happens.
I have a local change to bridge(4) in my tree that explicitly adds the
MAC address for each member interface rather than adding it lazily the
first time it is seen in a packet.

Does it handle the MAC address changing on a member interface?

Roy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Ryota Ozaki

2016-03-23 11:03:54 UTC

Permalink

Post by Ryota Ozaki

Definitely I forgot them :-/ Fixed.

I agree. I'll do that soon.

...not so much soon though, I prepared patches:
http://www.netbsd.org/~ozaki-r/bridge-psz-list.diff
http://www.netbsd.org/~ozaki-r/bridge-default-mpsafe.diff

The first one introduces membar to list operations used in bridge
for pserialize, and the other one removes BRIDGE_MPSAFE switch and
make bridge MP-safe by default. The first one should be revisited
once we have the official API of list operations for pserialize.

ozaki-r

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Ryota Ozaki

2016-03-28 07:07:19 UTC

Permalink

On Fri, Mar 25, 2016 at 11:40 PM, Taylor R Campbell

Date: Wed, 23 Mar 2016 20:03:54 +0900
http://www.netbsd.org/~ozaki-r/bridge-psz-list.diff
http://www.netbsd.org/~ozaki-r/bridge-default-mpsafe.diff
The first one introduces membar to list operations used in bridge
for pserialize, and the other one removes BRIDGE_MPSAFE switch and
make bridge MP-safe by default. The first one should be revisited
once we have the official API of list operations for pserialize.
This looks great. Let's try again to get the pserialize-safe list
operations in properly -- not having them is obviously getting in the
way of useful progress. But I have no objection to putting in local
definitions in the bridge code for now, as long as we make sure to get
them into queue.h soon.
The comments below suggest to me that we should perhaps change the
`_PSZ' suffix, because it does not generically mean `if you're doing
pserialize, you need to use the _PSZ variant'. E.g., LIST_FOREACH_PSZ
is for use by readers -- writers under the exclusive write lock don't
need it.
We could have
LIST_INSERT_HEAD_WRITER = LIST_INSERT_HEAD with membar_producer
LIST_REMOVE_WRITER = LIST_REMOVE without QUEUEDEBUG
LIST_FOREACH_WRITER = LIST_FOREACH
LIST_FOREACH_READER = LIST_FOREACH with membar_datadep_consumer
so that it's still easy to eyeball whether you're using a
pserialize-safe thing or not (no suffix? not safe), and then to
eyeball whether it's the correct one for the context.

Thanks. I changed the names so.

diff --git a/sys/net/bridgestp.c b/sys/net/bridgestp.c
index 01968f9..fa494dc 100644
--- a/sys/net/bridgestp.c
+++ b/sys/net/bridgestp.c
@@ -341,7 +341,7 @@ bstp_config_bpdu_generation(struct bridge_softc *sc)
{
struct bridge_iflist *bif;
- LIST_FOREACH(bif, &sc->sc_iflist, bif_next) {
+ LIST_FOREACH_PSZ(bif, &sc->sc_iflist, bif_next) {
It looks like all of the bstp code runs under an exclusive lock, so
there's no need for the _PSZ here.

Sure. Reverted.

diff --git a/sys/net/if_bridge.c b/sys/net/if_bridge.c
index 651e830..57b37c2 100644
--- a/sys/net/if_bridge.c
+++ b/sys/net/if_bridge.c
@@ -433,7 +433,7 @@ bridge_clone_create(struct if_clone *ifc, int unit)
if_alloc_sadl(ifp);
mutex_enter(&bridge_list_lock);
- LIST_INSERT_HEAD(&bridge_list, sc, sc_list);
+ LIST_INSERT_HEAD_PSZ(&bridge_list, sc, sc_list);
mutex_exit(&bridge_list_lock);
return (0);
@@ -461,7 +461,7 @@ bridge_clone_destroy(struct ifnet *ifp)
BRIDGE_UNLOCK(sc);
mutex_enter(&bridge_list_lock);
- LIST_REMOVE(sc, sc_list);
+ LIST_REMOVE_PSZ(sc, sc_list);
mutex_exit(&bridge_list_lock);
No need for the _PSZ variants here: the bridge list is used only under
an exclusive lock.
Actually... I don't see any users of that list! Do we need it any
more? Why was it there to begin with?
Looks like it was added in the first commit and never used. Let's get
rid of it! (Separate commit.)

Heh, that's true. Removed!

So patches are updated:
http://www.netbsd.org/~ozaki-r/bridge-psz-list.diff
http://www.netbsd.org/~ozaki-r/bridge-default-mpsafe.diff

BTW, I found a possible race condition in bridge_output:

BRIDGE_PSZ_RENTER(s);
LIST_FOREACH_READER(bif, &sc->sc_iflist, bif_next) {
bif = bridge_try_hold_bif(bif);
if (bif == NULL)
continue;
BRIDGE_PSZ_REXIT(s);

// sleepable operations

bridge_release_member(sc, bif);
BRIDGE_PSZ_RENTER(s);
}
BRIDGE_PSZ_REXIT(s);

bif can be freed after bridge_release_member on another CPU,
however, LIST_FOREACH_READER touches bif after it.

IIUC, using psref solves the issue like this:

BRIDGE_PSZ_RENTER(s);
bif = (&sc->sc_iflist)->lh_first;
while (bif != LIST_END(&sc->sc_iflist)) {
struct bridge_iflist *next;

membar_datadep_consumer();

psref_acquire(&psref, &bif->target);
BRIDGE_PSZ_REXIT(s);

// sleepable operations

BRIDGE_PSZ_RENTER(s);
next = bif->field.le_next;
psref_release(&psref, &bif->target);
bif = next;
}
BRIDGE_PSZ_REXIT(s);

So I'm thinking the above patches should be committed
after introducing psref.

Thanks,
ozaki-r

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Ryota Ozaki

2016-03-30 01:30:54 UTC

Permalink

On Tue, Mar 29, 2016 at 12:27 AM, Taylor R Campbell

Date: Mon, 28 Mar 2016 16:07:19 +0900
On Fri, Mar 25, 2016 at 11:40 PM, Taylor R Campbell

No need for the _PSZ variants here: the bridge list is used only under
an exclusive lock.
Actually... I don't see any users of that list! Do we need it any
more? Why was it there to begin with?
Looks like it was added in the first commit and never used. Let's get
rid of it! (Separate commit.)

Heh, that's true. Removed!
Great!
http://www.netbsd.org/~ozaki-r/bridge-psz-list.diff
http://www.netbsd.org/~ozaki-r/bridge-default-mpsafe.diff
Thanks, I'll take a look when I have a chance.
[...]
bif can be freed after bridge_release_member on another CPU,
however, LIST_FOREACH_READER touches bif after it.
BRIDGE_PSZ_RENTER(s);
bif = (&sc->sc_iflist)->lh_first;
while (bif != LIST_END(&sc->sc_iflist)) {
struct bridge_iflist *next;
membar_datadep_consumer();
psref_acquire(&psref, &bif->target);
BRIDGE_PSZ_REXIT(s);
// sleepable operations
BRIDGE_PSZ_RENTER(s);
next = bif->field.le_next;
psref_release(&psref, &bif->target);
bif = next;
}
BRIDGE_PSZ_REXIT(s);
So I'm thinking the above patches should be committed
after introducing psref.
You don't *need* to switch to psref for this to work: it is sufficient
for bridge_release_member to work inside a pserialize read section.
Currently the only reason it doesn't work inside a pserialize read
section is that it acquires an adaptive lock

Yes. So I thought I need to use a spin mutex if I want to put
bridge_release_member inside the pserialize read section.

-- but it doesn't look
like it actually needs to acquire that lock.

Hm, I'm confused. I added the lock because convar(9) says:
The mutex passed to the wait function (mtx) must also be held when
calling cv_broadcast().

Isn't it correct?

However, as an aside, I am a little puzzled by the protocol that
bridge_try_hold_bif, bridge_release_member, and bridge_delete_member
follow.
(Also: Use LIST_FIRST and LIST_NEXT, not .lh_first and .le_next!)

Sure!

Thanks,
ozaki-r

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de