Discussion:
apparently missing locking in if_bnx.c
(too old to reply)
Greg Troxel
2012-03-05 20:45:10 UTC
Permalink
My colleague Beverly Schwartz has found a problem in if_bnx.c where
bnx_start takes packets off ifp->if_snd without a mutex. bnx_start can
be called from multiple places (the output routine, the tx complete
routine, and the allocate-more-tx-buffers routine). We have in fact
seen calls to bnx_start while it is running, resulting in the packet
being queued twice and a double free. With pure netbsd-6, we can lock
up a machine (i386) with multiple bnx interfaces by running multiple
transfers in both directions on multiple interfaces at once. With the
following patch, it is solid. (I am pretty sure we are running with
only 1 core.)

I'm sending this as early as I can, because whatever change we converge
on will need to be pulled up to netbsd-6 and I'd like some more eyes on
it.

There are a splnet calls around ifq_enqueue. So perhaps the bug is
instead that bnx_start is somehow called not at splnet. But bnx_intr
should be running at splnet, which should serialize access.



Author: Bev Schwartz <***@repo>
Date: Mon Mar 5 15:02:04 2012 -0500

Fix to race condition in bnx_start

The bnx driver has a defect which causes the same mbuf to be
queued in the bnx driver twice. bnx_start is re-entrant, and
the ifp->if_snd queue was not protected with a mutex, thus the
same head could be grabbed and passed to the bnx code twice.
Added mutex to protect ifp->if_snd.

In addition, the variable used_tx_bd was not protected by a mutex,
but should be. Use of tx_pkt_mtx was expanded to include
protection for tx_used_pkts.

diff --git a/netbsd/src/sys/dev/pci/if_bnxvar.h b/netbsd/src/sys/dev/pci/if_bnxvar.h
index 685b7ce..1397a8d 100644
--- a/netbsd/src/sys/dev/pci/if_bnxvar.h
+++ b/netbsd/src/sys/dev/pci/if_bnxvar.h
@@ -258,11 +258,28 @@ struct bnx_softc
bus_dma_segment_t tx_mbuf_seg;
int tx_mbuf_rseg;

+ /*
+ * ifq_pkt_mtx is for protecting the ifp->if_snd queue in bnx_start.
+ * bnx_start can be called from interface queue handling in the
+ * kernel, or from the bnx interrupt handler in bnx_intr. This
+ * mutex prevents the same packet being queued in the tx_used_pkts
+ * twice. Note: The tx_pkt_mtx will be used when ifq_pkt_mtx is
+ * held.
+ */
+ kmutex_t ifq_pkt_mtx;
+
/* S/W maintained mbuf TX chain structure. */
+ /*
+ * Note: tx_pkt_mutex protect the tx_free_pkts and tx_used_pkt
+ * queues as well as the counters tx_pkt_count and used_tx_bd.
+ * The ifq_pkt_mtx may already be taken when we attempt to get
+ * tx_pkt_mtx. Therefore, ifq_pkt_mtx must not be taken when
+ * tx_pkt_mtx is held.
+ */
kmutex_t tx_pkt_mtx;
- u_int tx_pkt_count;
- struct bnx_pkt_list tx_free_pkts;
- struct bnx_pkt_list tx_used_pkts;
+ u_int tx_pkt_count; /* take_tx_pkt_mutex */
+ struct bnx_pkt_list tx_free_pkts; /* take tx_pkt_mutex */
+ struct bnx_pkt_list tx_used_pkts; /* take tx_pkt_mutex */

/* S/W maintained mbuf RX chain structure. */
bus_dmamap_t rx_mbuf_map[TOTAL_RX_BD];
@@ -271,7 +288,7 @@ struct bnx_softc
/* Track the number of rx_bd and tx_bd's in use. */
u_int16_t free_rx_bd;
u_int16_t max_rx_bd;
- u_int16_t used_tx_bd;
+ u_int16_t used_tx_bd; /* take tx_pkt_mutex */
u_int16_t max_tx_bd;

/* Provides access to hardware statistics through sysctl. */
diff --git a/netbsd/src/sys/dev/pci/if_bnx.c b/netbsd/src/sys/dev/pci/if_bnx.c
index 7b7f315..1b88d10 100644
--- a/netbsd/src/sys/dev/pci/if_bnx.c
+++ b/netbsd/src/sys/dev/pci/if_bnx.c
@@ -2416,6 +2416,7 @@ bnx_dma_alloc(struct bnx_softc *sc)
TAILQ_INIT(&sc->tx_used_pkts);
sc->tx_pkt_count = 0;
mutex_init(&sc->tx_pkt_mtx, MUTEX_DEFAULT, IPL_NET);
+ mutex_init(&sc->ifq_pkt_mtx, MUTEX_DEFAULT, IPL_NET);

/*
* Allocate DMA memory for the Rx buffer descriptor chain,
@@ -4056,7 +4057,9 @@ bnx_free_tx_chain(struct bnx_softc *sc)
BNX_TX_CHAIN_PAGE_SZ, BUS_DMASYNC_PREWRITE);
}

+ mutex_enter(&sc->tx_pkt_mtx);
sc->used_tx_bd = 0;
+ mutex_exit(&sc->tx_pkt_mtx);

/* Check if we lost any mbufs in the process. */
DBRUNIF((sc->tx_mbuf_alloc),
@@ -4676,9 +4679,9 @@ bnx_tx_intr(struct bnx_softc *sc)
mutex_enter(&sc->tx_pkt_mtx);
TAILQ_INSERT_TAIL(&sc->tx_free_pkts, pkt, pkt_entry);
}
- mutex_exit(&sc->tx_pkt_mtx);

sc->used_tx_bd--;
+ mutex_exit(&sc->tx_pkt_mtx);
DBPRINT(sc, BNX_INFO_SEND, "%s(%d) used_tx_bd %d\n",
__FILE__, __LINE__, sc->used_tx_bd);

@@ -4702,6 +4705,7 @@ bnx_tx_intr(struct bnx_softc *sc)
ifp->if_timer = 0;

/* Clear the tx hardware queue full flag. */
+ mutex_enter(&sc->tx_pkt_mtx);
if (sc->used_tx_bd < sc->max_tx_bd) {
DBRUNIF((ifp->if_flags & IFF_OACTIVE),
aprint_debug_dev(sc->bnx_dev,
@@ -4709,6 +4713,7 @@ bnx_tx_intr(struct bnx_softc *sc)
sc->used_tx_bd, sc->max_tx_bd));
ifp->if_flags &= ~IFF_OACTIVE;
}
+ mutex_exit(&sc->tx_pkt_mtx);

sc->tx_cons = sw_tx_cons;
}
@@ -4912,8 +4917,13 @@ bnx_tx_encap(struct bnx_softc *sc, struct mbuf *m)
bus_dmamap_sync(sc->bnx_dmatag, map, 0, map->dm_mapsize,
BUS_DMASYNC_PREWRITE);
/* Make sure there's room in the chain */
- if (map->dm_nsegs > (sc->max_tx_bd - sc->used_tx_bd))
+ mutex_enter(&sc->tx_pkt_mtx);
+ if (map->dm_nsegs > (sc->max_tx_bd - sc->used_tx_bd)) {
+ mutex_exit(&sc->tx_pkt_mtx);
goto nospace;
+ } else {
+ mutex_exit(&sc->tx_pkt_mtx);
+ }

/* prod points to an empty tx_bd at this point. */
prod_bseq = sc->tx_prod_bseq;
@@ -4962,9 +4972,9 @@ bnx_tx_encap(struct bnx_softc *sc, struct mbuf *m)

mutex_enter(&sc->tx_pkt_mtx);
TAILQ_INSERT_TAIL(&sc->tx_used_pkts, pkt, pkt_entry);
- mutex_exit(&sc->tx_pkt_mtx);

sc->used_tx_bd += map->dm_nsegs;
+ mutex_exit(&sc->tx_pkt_mtx);
DBPRINT(sc, BNX_INFO_SEND, "%s(%d) used_tx_bd %d\n",
__FILE__, __LINE__, sc->used_tx_bd);

@@ -5006,7 +5016,7 @@ bnx_start(struct ifnet *ifp)
struct bnx_softc *sc = ifp->if_softc;
struct mbuf *m_head = NULL;
int count = 0;
- u_int16_t tx_prod, tx_chain_prod;
+ u_int16_t tx_prod, tx_chain_prod, used_tx_bd;

/* If there's no link or the transmit queue is empty then just exit. */
if ((ifp->if_flags & (IFF_OACTIVE|IFF_RUNNING)) != IFF_RUNNING) {
@@ -5028,11 +5038,22 @@ bnx_start(struct ifnet *ifp)
/*
* Keep adding entries while there is space in the ring.
*/
- while (sc->used_tx_bd < sc->max_tx_bd) {
+ mutex_enter(&sc->tx_pkt_mtx);
+ used_tx_bd = sc->used_tx_bd;
+ mutex_exit(&sc->tx_pkt_mtx);
+ while (used_tx_bd < sc->max_tx_bd) {
+
+ /*
+ * Take ifq_pkt_mtx to prevent the same
+ * mbuf being enqueued in bnx twice.
+ */
+ mutex_enter(&sc->ifq_pkt_mtx);
/* Check for any frames to send. */
IFQ_POLL(&ifp->if_snd, m_head);
- if (m_head == NULL)
+ if (m_head == NULL) {
+ mutex_exit(&sc->ifq_pkt_mtx);
break;
+ }

/*
* Pack the data into the transmit ring. If we
@@ -5043,15 +5064,21 @@ bnx_start(struct ifnet *ifp)
ifp->if_flags |= IFF_OACTIVE;
DBPRINT(sc, BNX_INFO_SEND, "TX chain is closed for "
"business! Total tx_bd used = %d\n",
- sc->used_tx_bd);
+ used_tx_bd);
+ mutex_exit(&sc->ifq_pkt_mtx);
break;
}

IFQ_DEQUEUE(&ifp->if_snd, m_head);
+ mutex_exit(&sc->ifq_pkt_mtx);
count++;

/* Send a copy of the frame to any BPF listeners. */
bpf_mtap(ifp, m_head);
+
+ mutex_enter(&sc->tx_pkt_mtx);
+ used_tx_bd = sc->used_tx_bd;
+ mutex_exit(&sc->tx_pkt_mtx);
}

if (count == 0) {
Manuel Bouyer
2012-03-06 08:31:20 UTC
Permalink
Post by Greg Troxel
My colleague Beverly Schwartz has found a problem in if_bnx.c where
bnx_start takes packets off ifp->if_snd without a mutex. bnx_start can
be called from multiple places (the output routine, the tx complete
routine, and the allocate-more-tx-buffers routine). We have in fact
seen calls to bnx_start while it is running, resulting in the packet
being queued twice and a double free. With pure netbsd-6, we can lock
up a machine (i386) with multiple bnx interfaces by running multiple
transfers in both directions on multiple interfaces at once. With the
following patch, it is solid. (I am pretty sure we are running with
only 1 core.)
I'm sending this as early as I can, because whatever change we converge
on will need to be pulled up to netbsd-6 and I'd like some more eyes on
it.
There are a splnet calls around ifq_enqueue. So perhaps the bug is
instead that bnx_start is somehow called not at splnet. But bnx_intr
should be running at splnet, which should serialize access.
Yes, bnx_start() should be called at splnet(), or raise to splnet() itself.
This driver runs under the KERNEL_LOCK() anyway, so no mutex is required,
ony splnet().
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 11:51:29 UTC
Permalink
We were able to produce the same behavior with the intel device driver (wm). It appears this problem is in multiple drivers.
Are you sure it's not a change in your local tree ?
I have several hosts with multiple NICs, and I'm not seeing this
problem ...
No network driver is SMP-safe at this time, so they're all running
under KERNEL_LOCK. splnet() is enough to protect the queues.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 12:01:09 UTC
Permalink
I was using the tip of netbsd-6. I can test it again to be sure.
You should add tests to make sure that:
- bnx_start() is always called at splnet():
KASSERT(curcpu()->ci_level >= IPL_NET);
- bnx_start() is always called with KERNEL_LOCK held:
KASSERT(ci->ci_biglock_count > 0);
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 12:07:20 UTC
Permalink
Post by Manuel Bouyer
I was using the tip of netbsd-6. I can test it again to be sure.
KASSERT(curcpu()->ci_level >= IPL_NET);
KASSERT(ci->ci_biglock_count > 0);
make that:
KASSERT(curcpu()->ci_ilevel >= IPL_NET);
KASSERT(curcpu()->ci_biglock_count > 0);
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Beverly Schwartz
2012-03-06 15:45:11 UTC
Permalink
I can get a kernel to boot with the recommended KASSERTs, but as soon as I try to ssh into the machine, I get a panic on:

KASSERT(curcpu()->ci_biglock_count > 0);

(This is in bnx.)

I will be trying our wm interfaces soon.

-Bev
Post by Manuel Bouyer
Post by Manuel Bouyer
I was using the tip of netbsd-6. I can test it again to be sure.
KASSERT(curcpu()->ci_level >= IPL_NET);
KASSERT(ci->ci_biglock_count > 0);
KASSERT(curcpu()->ci_ilevel >= IPL_NET);
KASSERT(curcpu()->ci_biglock_count > 0);
--
NetBSD: 26 ans d'experience feront toujours la difference
--
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 15:59:13 UTC
Permalink
Post by Manuel Bouyer
KASSERT(curcpu()->ci_biglock_count > 0);
OK, so this means the kernel lock is not held. What's interesting
now is how we did get there. Can you set ddb.onpanic=1 in /etc/sysctl.conf
and get a stack trace from here ?
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Beverly Schwartz
2012-03-06 16:02:42 UTC
Permalink
I compiled with kgdb enabled. I ran it in the debugger, did a stack trace and got:

#0 0xc0276d64 in breakpoint()
#1 0xc0575512 in kgdb_connect (verbose=0)
#2 0xc057557f in kgdb_panic()
#3 0x00000000 in ?? ()

Useful, huh?

I can try using ddb instead.

-Bev
Post by Manuel Bouyer
Post by Manuel Bouyer
KASSERT(curcpu()->ci_biglock_count > 0);
OK, so this means the kernel lock is not held. What's interesting
now is how we did get there. Can you set ddb.onpanic=1 in /etc/sysctl.conf
and get a stack trace from here ?
--
NetBSD: 26 ans d'experience feront toujours la difference
--
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 16:10:12 UTC
Permalink
Post by Beverly Schwartz
#0 0xc0276d64 in breakpoint()
#1 0xc0575512 in kgdb_connect (verbose=0)
#2 0xc057557f in kgdb_panic()
#3 0x00000000 in ?? ()
Useful, huh?
I can try using ddb instead.
Yes, please try ddb. I've never tried kgdb; but ddb usually gives
more usefull outputs.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Beverly Schwartz
2012-03-06 16:56:45 UTC
Permalink
ddb backtrace produces:
vpanic
kern_assert
bnx_start
bnx_alloc_pkts
workqueue_worker
Post by Manuel Bouyer
Post by Beverly Schwartz
#0 0xc0276d64 in breakpoint()
#1 0xc0575512 in kgdb_connect (verbose=0)
#2 0xc057557f in kgdb_panic()
#3 0x00000000 in ?? ()
Useful, huh?
I can try using ddb instead.
Yes, please try ddb. I've never tried kgdb; but ddb usually gives
more usefull outputs.
--
NetBSD: 26 ans d'experience feront toujours la difference
--
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 17:02:51 UTC
Permalink
Post by Beverly Schwartz
vpanic
kern_assert
bnx_start
bnx_alloc_pkts
workqueue_worker
thanks, so the problem is really the workqueue that should not
be marked MPSAFE ...
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-03-06 17:25:23 UTC
Permalink
Post by Manuel Bouyer
Post by Beverly Schwartz
vpanic
kern_assert
bnx_start
bnx_alloc_pkts
workqueue_worker
thanks, so the problem is really the workqueue that should not
be marked MPSAFE ...
Restating some off-list discussion for the record, now that we've
figured it out:

bnx_start can defer work to allocate tx data structures via a
workqueue

the workqueue registration is marked MPSAFE

so when the workqueue calls the alloc routines, the kernel lock is not
held

the alloc routine calls bnx_start, and it protects that with splnet,
but it hasn't taken the kernel lock

so bnx_start (the second time on the first packet) is running at
splnet, without the kernel lock. This triggers the assert.

if the assert isn't there, then there's the possibility of another
processor handling an interrupt and calling bnx_start. Both the
workqueue-called copy and the intr-called copy will be at splnet, but
on differerent processors.

The above is typically rare, and it seems to take heavy load to
trigger it sometimes. It's probably the combination of multiple TCPs
opening up cwnd and the CPU utilization getting high that leads to the
unintended concurrency.

The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
patch I sent earlier).
Ignatios Souvatzis
2012-03-06 19:40:22 UTC
Permalink
Post by Manuel Bouyer
Post by Ignatios Souvatzis
Post by Manuel Bouyer
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
So bnx is safe on netbsd-6-to-be? What about netbsd-5.1?
The workqueue is not present in netbsd-5; which means it may be subject
to PR# 45051
Oh. Planning to deploy a bnx machine as a Xen server shortly. What to do?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 19:44:31 UTC
Permalink
Post by Ignatios Souvatzis
Post by Manuel Bouyer
Post by Ignatios Souvatzis
Post by Manuel Bouyer
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
So bnx is safe on netbsd-6-to-be? What about netbsd-5.1?
The workqueue is not present in netbsd-5; which means it may be subject
to PR# 45051
Oh. Planning to deploy a bnx machine as a Xen server shortly. What to do?
ftp.lip6.fr, web and ftp server, with traffic graphs always above 100Mb/s
has a bnx interface and has never run into PR# 45051 ...
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-03-06 19:49:28 UTC
Permalink
Post by Ignatios Souvatzis
Oh. Planning to deploy a bnx machine as a Xen server shortly. What to do?
Wait 3 days and run netbsd-6? (Or grab the change yourself from current
- it's one line.)
Ignatios Souvatzis
2012-03-07 07:17:37 UTC
Permalink
Hi,
Post by Manuel Bouyer
Post by Ignatios Souvatzis
Oh. Planning to deploy a bnx machine as a Xen server shortly. What to do?
ftp.lip6.fr, web and ftp server, with traffic graphs always above 100Mb/s
has a bnx interface and has never run into PR# 45051 ...
As a dom0 ?

-is
--
seal your e-mail: http://www.gnupg.org/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-07 10:41:41 UTC
Permalink
Post by Ignatios Souvatzis
Hi,
Post by Manuel Bouyer
Post by Ignatios Souvatzis
Oh. Planning to deploy a bnx machine as a Xen server shortly. What to do?
ftp.lip6.fr, web and ftp server, with traffic graphs always above 100Mb/s
has a bnx interface and has never run into PR# 45051 ...
As a dom0 ?
No, it's a plain amd64 host.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2012-03-07 12:18:43 UTC
Permalink
Oh, I do agree that LOCKDEBUG causing a silent lockup is a bug (but probably
not easy to track).

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2012-03-07 18:13:47 UTC
Permalink
Post by Beverly Schwartz
I understand that lockdebug changes timing and can impact performance
and that my bug may "disappear" because of that. What I found really
difficult was that for my bug, lockdebug consistently caused a freeze
up. Not a panic. Not a crash. An indiscriminate freeze. And it
happened quickly and consistently. That sent me down a completely
wrong path of inquiry, taking me further from "my" bug.
Can you drop into the kernel debugger, or is it too frozen even for
that?

Dave
--
David Young
***@pobox.com Urbana, IL (217) 721-9981

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2012-03-07 19:54:07 UTC
Permalink
Post by David Young
Post by Beverly Schwartz
I understand that lockdebug changes timing and can impact performance
and that my bug may "disappear" because of that. What I found really
difficult was that for my bug, lockdebug consistently caused a freeze
up. Not a panic. Not a crash. An indiscriminate freeze. And it
happened quickly and consistently. That sent me down a completely
wrong path of inquiry, taking me further from "my" bug.
Can you drop into the kernel debugger, or is it too frozen even for
that?
Dropping into the kernel debugger has been hosed on amd64 for a while,
a problem with cnmagic, isn't it?

Or did that get fixed, and did I miss the memo -- again?

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-07 20:05:44 UTC
Permalink
Post by Thor Lancelot Simon
Post by David Young
Post by Beverly Schwartz
I understand that lockdebug changes timing and can impact performance
and that my bug may "disappear" because of that. What I found really
difficult was that for my bug, lockdebug consistently caused a freeze
up. Not a panic. Not a crash. An indiscriminate freeze. And it
happened quickly and consistently. That sent me down a completely
wrong path of inquiry, taking me further from "my" bug.
Can you drop into the kernel debugger, or is it too frozen even for
that?
Dropping into the kernel debugger has been hosed on amd64 for a while,
a problem with cnmagic, isn't it?
I'm not sure why it would be worse on amd64 than on i386, the
console code is the same.
I don't remmeber issues but I mostly use serial console on my
test systems ...
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Dave Huang
2012-03-07 20:28:44 UTC
Permalink
Post by Thor Lancelot Simon
Dropping into the kernel debugger has been hosed on amd64 for a while,
a problem with cnmagic, isn't it?
Or did that get fixed, and did I miss the memo -- again?
IIRC, the issue is that wscons doesn't use cnmagic--dropping into the debugger is hardcoded as Ctrl+Alt+Esc, which AFAIK, works fine with a PS/2 keyboard. I do seem to recall hearing about issues if it's a USB keyboard though.
--
Name: Dave Huang | Mammal, mammal / their names are called /
INet: ***@azeotrope.org | they raise a paw / the bat, the cat /
FurryMUCK: Dahan | dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 36 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 18:30:40 UTC
Permalink
Post by Greg Troxel
Restating some off-list discussion for the record, now that we've
bnx_start can defer work to allocate tx data structures via a
workqueue
the workqueue registration is marked MPSAFE
so when the workqueue calls the alloc routines, the kernel lock is not
held
the alloc routine calls bnx_start, and it protects that with splnet,
but it hasn't taken the kernel lock
so bnx_start (the second time on the first packet) is running at
splnet, without the kernel lock. This triggers the assert.
if the assert isn't there, then there's the possibility of another
processor handling an interrupt and calling bnx_start. Both the
workqueue-called copy and the intr-called copy will be at splnet, but
on differerent processors.
The above is typically rare, and it seems to take heavy load to
trigger it sometimes. It's probably the combination of multiple TCPs
opening up cwnd and the CPU utilization getting high that leads to the
unintended concurrency.
The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
patch I sent earlier).
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Beverly Schwartz
2012-03-06 18:59:01 UTC
Permalink
Got your patch.

This change is allowing for very heavy usage on the bnx drivers with no problems.

However, I tried the same test with wm drivers. In that case, the kernel just freezes. (I put your KASSERTs in wm_start.) So there is a problem with the wm driver, it's just not the same thing.

-Bev
Post by Manuel Bouyer
Post by Greg Troxel
Restating some off-list discussion for the record, now that we've
bnx_start can defer work to allocate tx data structures via a
workqueue
the workqueue registration is marked MPSAFE
so when the workqueue calls the alloc routines, the kernel lock is not
held
the alloc routine calls bnx_start, and it protects that with splnet,
but it hasn't taken the kernel lock
so bnx_start (the second time on the first packet) is running at
splnet, without the kernel lock. This triggers the assert.
if the assert isn't there, then there's the possibility of another
processor handling an interrupt and calling bnx_start. Both the
workqueue-called copy and the intr-called copy will be at splnet, but
on differerent processors.
The above is typically rare, and it seems to take heavy load to
trigger it sometimes. It's probably the combination of multiple TCPs
opening up cwnd and the CPU utilization getting high that leads to the
unintended concurrency.
The proposed fix is to not mark bnx's workqueue MPSAFE (instead of the
patch I sent earlier).
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
--
NetBSD: 26 ans d'experience feront toujours la difference
--
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Ignatios Souvatzis
2012-03-06 19:20:25 UTC
Permalink
Post by Manuel Bouyer
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
So bnx is safe on netbsd-6-to-be? What about netbsd-5.1?

-is

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-03-06 19:30:04 UTC
Permalink
Post by Ignatios Souvatzis
Post by Manuel Bouyer
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
So bnx is safe on netbsd-6-to-be? What about netbsd-5.1?
This change needs to be pulled up to netbsd-6.

As I read the code in netbsd-5, the workqueue to defer bnx_alloc_packets
does not exist, so this particular bug is not present.
Manuel Bouyer
2012-03-06 19:30:06 UTC
Permalink
Post by Ignatios Souvatzis
Post by Manuel Bouyer
This is what I've just commited. This doesn't mean your patch isn't
correct, but I prefer to go the easy way for netbsd-6 first.
So bnx is safe on netbsd-6-to-be? What about netbsd-5.1?
The workqueue is not present in netbsd-5; which means it may be subject
to PR# 45051
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matthew Mondor
2012-03-06 19:30:52 UTC
Permalink
On Tue, 6 Mar 2012 13:59:01 -0500
Post by Beverly Schwartz
Got your patch.
This change is allowing for very heavy usage on the bnx drivers with no problems.
However, I tried the same test with wm drivers. In that case, the kernel just freezes. (I put your KASSERTs in wm_start.) So there is a problem with the wm driver, it's just not the same thing.
It might or might not be related, but since network interface problems
are on topic, I also experienced issues with re(4) on both netbsd-6 and
-current; the details can be found in PR kern/45928. There is nothing
in that driver that pretends to be MPSAFE though, and I've not found
mismatched spl(9) calls when I audited it the other day.
--
Matt

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-03-06 19:32:42 UTC
Permalink
Post by Manuel Bouyer
KASSERT(curcpu()->ci_ilevel >= IPL_NET);
KASSERT(curcpu()->ci_biglock_count > 0);
I wonder if these should be present all the time. They are individually
cheap checks, and thus qualify for DIAGNOSTIC, but I can see an argument
that having multiple KASSERTS per packet is too much. Alternatively
they could be part of LOCKDEBUG, perhaps written differently.
It does seem that at some level of pararnoia >= DIAGNOSTIC and <=
LOCKDEBUG that invariants about locks held should be checked in as many
places as possible.
Manuel Bouyer
2012-03-06 19:35:28 UTC
Permalink
Post by Greg Troxel
Post by Manuel Bouyer
KASSERT(curcpu()->ci_ilevel >= IPL_NET);
KASSERT(curcpu()->ci_biglock_count > 0);
I wonder if these should be present all the time. They are individually
cheap checks, and thus qualify for DIAGNOSTIC, but I can see an argument
that having multiple KASSERTS per packet is too much. Alternatively
they could be part of LOCKDEBUG, perhaps written differently.
It does seem that at some level of pararnoia >= DIAGNOSTIC and <=
LOCKDEBUG that invariants about locks held should be checked in as many
places as possible.
curcpu()->ci_ilevel is x86-specific, unfortunably ...
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2012-03-06 19:39:33 UTC
Permalink
Post by Manuel Bouyer
curcpu()->ci_ilevel is x86-specific, unfortunably ...
Make it something like:

#ifdef CPU_ASSERT_SPL
CPU_ASSERT_SPL(...)
#endif

and have archs (optionally) provide the magic (which is similar or identical
to the x86 one on quite a lot archs, IIRC).

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 19:45:29 UTC
Permalink
Post by Martin Husemann
Post by Manuel Bouyer
curcpu()->ci_ilevel is x86-specific, unfortunably ...
#ifdef CPU_ASSERT_SPL
CPU_ASSERT_SPL(...)
#endif
and have archs (optionally) provide the magic (which is similar or identical
to the x86 one on quite a lot archs, IIRC).
I'd like to see something like that.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-03-07 00:15:04 UTC
Permalink
For others trying to repeat this kind of stress test:

Note that we've found that actually triggering a problem seems to be
very dependent on all sorts of things that shouldn't matter, e.g. i386
vs amd64, firmware revisions, etc. But that may be about bugs in
private code; we don't have enough experience to make this statement
about the workqueue/MPSAFE bug.

We were reliably able to induce a lockup with

netbsd-6 from yesterday
2 machines
3 bnx each, cabled back-to-back in pairs
each machine runs a web server, with a ~10G+ file
each machine runs 3 wget, pulling per interface from the other machine

so this is 6 tcp streams, one per direction on each of 3 pairs of
interfaces. With the workququeue/remove-MPSAFE patch, the machines are
totally solid under this load. With the mutex patch I posted earlier,
they were almost solid, but not quite (probably because access to the tx
dma setup hardware was not serialized).

Further, with the patch and LOCKDEBUG, the systems run without
crashing/panicing, but about 40x slow. Without the patch and with
LOCKDEBUG, there were mysterious hangs.

I would expect that on most machines, it wouldn't be possible to provoke
the bug with only one interface.

My understanding is that the above stress test with 3 pairs of wm
(yesterday or today netbsd-6) also leads to hangs. (wm doesn't use
workqueues, so it must be something else. But wm quad-port cards seem
to have funky bridge chips that netbsd-5 at least doesn't handle.)
Beverly Schwartz
2012-03-07 00:37:44 UTC
Permalink
Post by Greg Troxel
My understanding is that the above stress test with 3 pairs of wm
(yesterday or today netbsd-6) also leads to hangs. (wm doesn't use
workqueues, so it must be something else. But wm quad-port cards seem
to have funky bridge chips that netbsd-5 at least doesn't handle.)
I paired a machine w/ wm interfaces with a machine w/ bnx interfaces.

If I run wget on the machine w/ the bnx interfaces (3x, one for each interface, pulling data from wm machine to bnx machine), it would take a long time to get bad (freeze/hang) behavior.

If I run wget on the machine w/ the wm interfaces (3x, one for each interface, pulling data from bnx machine to wm machine), I get a freeze pretty promptly.

I can get a freeze with or without lockdebug. So I don't know if lockdebug affects the underlying problem.

I will say that with the bnx problem, if I ran without lockdebug, then I would get a panic in the m_freem in bnx_tx_intr. If I ran *with* lockdebug, I would get a freeze up quite quickly with no indication of where or what the problem was.

Lockdebug, so far, has been entirely unhelpful, even detrimental.

I will be spending some more time with lockdebug tomorrow.

-Bev
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2012-03-07 08:33:04 UTC
Permalink
Post by Beverly Schwartz
Lockdebug, so far, has been entirely unhelpful, even detrimental.
Lockdebug is very usefull to catch a lot of locking errors and provide
great diagnostics. However, it has a very severe performance impact. So
if the problem you are hunting is not one of the lock-misuses lockdebug
can detect (like in the bnx case), it will not help. Worse, due to the
different timing it will change chances for all races involved, which
might even hide bugs.

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Beverly Schwartz
2012-03-07 12:15:48 UTC
Permalink
Post by Martin Husemann
Post by Beverly Schwartz
Lockdebug, so far, has been entirely unhelpful, even detrimental.
Lockdebug is very usefull to catch a lot of locking errors and provide
great diagnostics. However, it has a very severe performance impact. So
if the problem you are hunting is not one of the lock-misuses lockdebug
can detect (like in the bnx case), it will not help. Worse, due to the
different timing it will change chances for all races involved, which
might even hide bugs.
I understand that lockdebug changes timing and can impact performance and that my bug may "disappear" because of that. What I found really difficult was that for my bug, lockdebug consistently caused a freeze up. Not a panic. Not a crash. An indiscriminate freeze. And it happened quickly and consistently. That sent me down a completely wrong path of inquiry, taking me further from "my" bug.

Now that we have a fix to "my" bug, lockdebug is affecting performance, as expected, but not causing additional problems.

So, in my mind, lockdebug was an impediment to my work. Only after I disabled lockdebug did I make progress. While lockdebug may "hide" a bug because of timing changes, it should not create completely new problems and symptoms that are more difficult to work with than the original bug.

-Bev




--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Beverly Schwartz
2012-03-06 11:53:42 UTC
Permalink
I was using the tip of netbsd-6. I can test it again to be sure.

-Bev
Post by Manuel Bouyer
We were able to produce the same behavior with the intel device driver (wm). It appears this problem is in multiple drivers.
Are you sure it's not a change in your local tree ?
I have several hosts with multiple NICs, and I'm not seeing this
problem ...
No network driver is SMP-safe at this time, so they're all running
under KERNEL_LOCK. splnet() is enough to protect the queues.
--
NetBSD: 26 ans d'experience feront toujours la difference
--
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Beverly Schwartz
2012-03-06 11:35:04 UTC
Permalink
We were able to produce the same behavior with the intel device driver (wm). It appears this problem is in multiple drivers.

-Bev
Post by Manuel Bouyer
Post by Greg Troxel
My colleague Beverly Schwartz has found a problem in if_bnx.c where
bnx_start takes packets off ifp->if_snd without a mutex. bnx_start can
be called from multiple places (the output routine, the tx complete
routine, and the allocate-more-tx-buffers routine). We have in fact
seen calls to bnx_start while it is running, resulting in the packet
being queued twice and a double free. With pure netbsd-6, we can lock
up a machine (i386) with multiple bnx interfaces by running multiple
transfers in both directions on multiple interfaces at once. With the
following patch, it is solid. (I am pretty sure we are running with
only 1 core.)
I'm sending this as early as I can, because whatever change we converge
on will need to be pulled up to netbsd-6 and I'd like some more eyes on
it.
There are a splnet calls around ifq_enqueue. So perhaps the bug is
instead that bnx_start is somehow called not at splnet. But bnx_intr
should be running at splnet, which should serialize access.
Yes, bnx_start() should be called at splnet(), or raise to splnet() itself.
This driver runs under the KERNEL_LOCK() anyway, so no mutex is required,
ony splnet().
--
NetBSD: 26 ans d'experience feront toujours la difference
--
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2012-03-06 14:43:42 UTC
Permalink
Post by Manuel Bouyer
No network driver is SMP-safe at this time, so they're all running
under KERNEL_LOCK. splnet() is enough to protect the queues.
I think there are a few that are SMP-safe. One of Matt's for a powerpc board,
and probably ixgbe.

I know there are extensive patches floating around out there to make wm and
some of the software "interfaces" SMP-safe. I'll let their authors speak up
for themselves if they think it's appropriate...

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 14:55:09 UTC
Permalink
Post by Thor Lancelot Simon
Post by Manuel Bouyer
No network driver is SMP-safe at this time, so they're all running
under KERNEL_LOCK. splnet() is enough to protect the queues.
I think there are a few that are SMP-safe. One of Matt's for a powerpc board,
and probably ixgbe.
I know there are extensive patches floating around out there to make wm and
some of the software "interfaces" SMP-safe. I'll let their authors speak up
for themselves if they think it's appropriate...
It's possible some of then have already been made SMP-safe (I think xennet
is, or is close to be). But all the driver's entry points from upper
level (e.g. if_ethersubr.c, netinet, etc ...) should still be called with
KERNEL_LOCK held (partly because most network drivers are not SMP-safe
yet; parly because the upper levels are not SMP-safe either).
If the driver's interrupt is not registered SMP-safe it will be called
with KERNEL_LOCK too, and so the whole driver can safely use spl locking.

Now it's possible some code path enters a network driver without KERNEL_LOCK;
but this would be a bug that could cause concurency issues, and I don't think
the right place to fix it is in the driver itself at this time (or all drivers
would have to be fixed at once).

I tested the KASSERT I mentionned with in the wm driver, they didn't fire.
But I don't have much beside ethernet v4 and v6 configured.
A more complicated setup may be needed to reproduce the problem and
discover the faultly code path.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2012-03-06 15:33:27 UTC
Permalink
Post by Manuel Bouyer
Post by Thor Lancelot Simon
Post by Manuel Bouyer
No network driver is SMP-safe at this time, so they're all running
under KERNEL_LOCK. splnet() is enough to protect the queues.
I think there are a few that are SMP-safe. One of Matt's for a powerpc board,
and probably ixgbe.
It's possible some of then have already been made SMP-safe (I think xennet
is, or is close to be). But all the driver's entry points from upper
level (e.g. if_ethersubr.c, netinet, etc ...) should still be called with
KERNEL_LOCK held (partly because most network drivers are not SMP-safe
yet; parly because the upper levels are not SMP-safe either).
If the driver's interrupt is not registered SMP-safe it will be called
with KERNEL_LOCK too, and so the whole driver can safely use spl locking.
But I believe there are, in fact, one or two drivers in the tree whose interrupts
are registered SMP-safe. I don't think it's a good idea to skip over opportunities
to phase out spl "locking" in drivers just because, for a little while longer at
least, it's stiff safe.
Post by Manuel Bouyer
Now it's possible some code path enters a network driver without KERNEL_LOCK;
but this would be a bug that could cause concurency issues, and I don't think
the right place to fix it is in the driver itself at this time (or all drivers
would have to be fixed at once).
I'd be in favor of both. Replace spl "locking" in drivers incrementally, and also
figure out what isn't holding KERNEL_LOCK in the case we're talking about here.

I am wondering whether a stacked software driver is what's calling the bnx
start routine in BBN's case. The locking in some of those is kind of dodgy.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-03-06 15:40:17 UTC
Permalink
Post by Thor Lancelot Simon
Post by Manuel Bouyer
It's possible some of then have already been made SMP-safe (I think xennet
is, or is close to be). But all the driver's entry points from upper
level (e.g. if_ethersubr.c, netinet, etc ...) should still be called with
KERNEL_LOCK held (partly because most network drivers are not SMP-safe
yet; parly because the upper levels are not SMP-safe either).
If the driver's interrupt is not registered SMP-safe it will be called
with KERNEL_LOCK too, and so the whole driver can safely use spl locking.
But I believe there are, in fact, one or two drivers in the tree whose interrupts
are registered SMP-safe. I don't think it's a good idea to skip over opportunities
to phase out spl "locking" in drivers just because, for a little while longer at
least, it's stiff safe.
Post by Manuel Bouyer
Now it's possible some code path enters a network driver without KERNEL_LOCK;
but this would be a bug that could cause concurency issues, and I don't think
the right place to fix it is in the driver itself at this time (or all drivers
would have to be fixed at once).
I'd be in favor of both. Replace spl "locking" in drivers incrementally, and also
figure out what isn't holding KERNEL_LOCK in the case we're talking about here.
I am wondering whether a stacked software driver is what's calling the bnx
start routine in BBN's case. The locking in some of those is kind of dodgy.
I agree that patches to get rid of spl locking in drivers should not
be rejected. But if a driver is made SMP-safe just to "fix" a bug,
then it's not the right thing to do because the bug is still there.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-03-06 15:45:35 UTC
Permalink
I am wondering whether a stacked software driver is what's calling the bnx
start routine in BBN's case. The locking in some of those is kind of dodgy.

I don't think that's it. While we're doing some complicated things in a
private tree, chasing down this problem led to backing off on those
things and finally running a build of just netbsd-6, and then just
pushing traffic over the interfaces, without any odd configuration. (It
does seem to take multiple flows on multiple interfaces to lead to the
issue.)
Manuel Bouyer
2012-03-06 15:57:13 UTC
Permalink
Post by Thor Lancelot Simon
I am wondering whether a stacked software driver is what's calling the bnx
start routine in BBN's case. The locking in some of those is kind of dodgy.
I don't think that's it. While we're doing some complicated things in a
private tree, chasing down this problem led to backing off on those
things and finally running a build of just netbsd-6, and then just
pushing traffic over the interfaces, without any odd configuration. (It
does seem to take multiple flows on multiple interfaces to lead to the
issue.)
multiple interfaces alone is not enough (each interface has its own queue).
It's probably how multiple interfaces are used that matters; like brigdes,
tunnels or other setup that could cause a packet to be sent back to
another interface without going to the IP stack.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2012-03-06 16:11:11 UTC
Permalink
Post by Manuel Bouyer
I agree that patches to get rid of spl locking in drivers should not
be rejected. But if a driver is made SMP-safe just to "fix" a bug,
then it's not the right thing to do because the bug is still there.
No argument on this from me. There's pretty clearly something off in
the grand scheme of how splnet/kernel_lock is supposed to protect
bnx_start, and we definitely need to get to the bottom of it.
David Young
2012-03-06 17:00:40 UTC
Permalink
Post by Thor Lancelot Simon
I am wondering whether a stacked software driver is what's calling the bnx
start routine in BBN's case. The locking in some of those is kind of dodgy.
Dodgy and expensive. :-/

Word to the wise: avoid a mutex acquire/release on every packet if you
can! There are lots of possible strategies, e.g., amortize locks over
several packets, or else use per-CPU packet queues so you don't have to
synchronize with other CPUs.

Dave
--
David Young
***@pobox.com Urbana, IL (217) 721-9981

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2012-03-06 16:51:16 UTC
Permalink
Post by Thor Lancelot Simon
Post by Manuel Bouyer
No network driver is SMP-safe at this time, so they're all running
under KERNEL_LOCK. splnet() is enough to protect the queues.
I think there are a few that are SMP-safe. One of Matt's for a powerpc board,
and probably ixgbe.
I know there are extensive patches floating around out there to make wm and
some of the software "interfaces" SMP-safe. I'll let their authors speak up
for themselves if they think it's appropriate...
I am working on wm(4) performance. My work so far has entailed making
wm(4) more MP-friendly, especially in the Tx path. The plan is to feed
all of my changes back.

I'm also working on a fast replacement to m_tag(9), and on various
improvements to vlan(4) and agr(4).

BTW, I've mined Matt Thomas' driver in
sys/arch/powerpc/booke/dev/pq3etsec.c for a lot of fine ideas about
structuring an ethernet driver. Hats off to Matt.

Dave
--
David Young
***@pobox.com Urbana, IL (217) 721-9981

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...