multi-core router

Discussion:

multi-core router

(too old to reply)

Thomas E. Spanjaard

2007-11-03 22:53:16 UTC

The choices were OpenBSD 4.2 with the updated pf (supposedly lots of
performance improvements) or FreeBSD 5.3 I think it was.

Perhaps a bit off-topic, but if that option was really FreeBSD 5.3, I'd
call it a rather unfair comparison, as FreeBSD 5.3 is that much older
than OpenBSD 4.2, which was only released this week or so. Also, I don't
think FreeBSD 5.3 supports (as many) 10GbE adapters as newer
6.2/7-CURRENT releases/snapshots do.

Cheers,

--
Thomas E. Spanjaard
***@netphreax.net

Daniel Horecki

2007-11-03 23:01:58 UTC

Permalink

Post by Thomas E. Spanjaard

The choices were OpenBSD 4.2 with the updated pf (supposedly lots of
performance improvements) or FreeBSD 5.3 I think it was.

In a couple of weeks there will be released 6.3. Maybe it was that.

morr

--
Daniel 'Shinden' Horecki
http://morr.pl

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Erik Fair

2007-11-04 01:24:15 UTC

Permalink

Actually, that's similar to what Matt Dillon has been doing in
DragonFly BSD - dedicating particular kernel threads to particular
processors. He and I had a long discussion about it a few years ago.
I haven't been paying close attention to what he's been doing since,
but he claimed that the approach worked well (as opposed to
generalized SMP where any processor is elegible to run processes/
threads) because it resulted in much better cache utilization. I was
worried about one core (or CPU) being overburndened while the other
(s) idled, but he claimed that was not a problem in practice.

Erik <***@netbsd.org>

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Thomas E. Spanjaard

2007-11-04 17:10:46 UTC

Permalink

Post by Erik Fair
Actually, that's similar to what Matt Dillon has been doing in
DragonFly BSD - dedicating particular kernel threads to particular
processors. He and I had a long discussion about it a few years ago. I
haven't been paying close attention to what he's been doing since, but
he claimed that the approach worked well (as opposed to generalized SMP
where any processor is elegible to run processes/ threads) because it
resulted in much better cache utilization. I was worried about one core
(or CPU) being overburndened while the other (s) idled, but he claimed
that was not a problem in practice.

He might have plans to do that, but that's not the current reality in
DragonFly; at least, not to the point of dedicating a core to certain
jobs like transput, and NOT using it for anything else. What is true
though is that each core basically manages itself, in that it runs its
own scheduler, manages its own routing table, etc. Also, threads
basically belong to a single core, and usually don't get shuffled around
if they allow so. This also amounts to better cache utilization and less
TLB purge/refill activity.

On a sidenote, it might be beneficial to take advantage of shared caches
in e.g. Core 2 family CPUs, where a number of cores (in current models,
2) share a common L2 or L3 cache, which makes sharing threads between
them a bit less of a problem regarding cache utilization. As far as I
know, no current open-source operating system makes decisions on a level
like this, with this kind of knowledge about the hierarchy of the system.

Cheers,

--
Thomas E. Spanjaard
***@netphreax.net

Bill Stouder-Studenmund

2007-11-04 04:09:44 UTC

Permalink

Post by Daniel Horecki

Post by Thomas E. Spanjaard

The choices were OpenBSD 4.2 with the updated pf (supposedly lots of
performance improvements) or FreeBSD 5.3 I think it was.

In a couple of weeks there will be released 6.3. Maybe it was that.

Probably.

Take care,

Bill

Robert Watson

2007-11-04 14:27:03 UTC

Permalink

The choices were OpenBSD 4.2 with the updated pf (supposedly lots of
performance improvements) or FreeBSD 5.3 I think it was.

Perhaps a bit off-topic, but if that option was really FreeBSD 5.3, I'd call
it a rather unfair comparison, as FreeBSD 5.3 is that much older than
OpenBSD 4.2, which was only released this week or so. Also, I don't think
FreeBSD 5.3 supports (as many) 10GbE adapters as newer 6.2/7-CURRENT
releases/snapshots do.

I would suggest running a FreeBD 7.0 beta (or full release if it's available
when you do the experimentation). It has both a more recent pf and
significantly improved multiprocessor performance, and will run IP to
completion in multiple threads (and hence on multiple cores). It also
includes vendor-supported 10gbps drivers from most of the major 10gbps
vendors; the Chelsio and Myricom PCIe cards seem to perform particularly well.
Using the new release will give you access to a lot more in the way of
features, and also allow you to report any problems to us so that we can fix
them :-). FreeBSD 5.3 is a three-year-old release and predates a lot of our
SMP networking work.

Robert N M Watson
Computer Laboratory
University of Cambridge

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Darren Reed

2007-11-04 17:57:26 UTC

Permalink

...
So that leads me to two questions for these lists.
1) How would you design a router to make use of multiple cores. We've
talked about different multi-core setups for running the TCP stack on one
core and IP on another and so on to sustain 10 GigE TCP connections. But
this case is different in that all we're doing is NAT and forwarding.

Depends on how good the NICs are and how you can interact with them.

On a good 10GE NIC, you can do some decent hardware classification into
multiple receive rings...providing the O/S supports that. Solaris is
going that
way, so is Microsoft Windows (there was some press release earlier in the
year about them rewriting/redesigning their networking stack to take
advantage of more capable NICs.)

And what is "multiple cores"? 2 or 4 or more?
And is it 1 thread per core or more?
(Think niagara which has 8 cores and 4 threads per core.)

On a 4 core router, why not dedicate a core per descriptor ring used, so for
a 2 port box, you've got 4 active areas (rx port A, tx port A, rx port B, tx
port B.)

But all of this is wasted if you're using pf - this will make your NAT
single threaded.

So in order to get good multi-core performance, you need to use ipfilter
or ipfw
(both of these use locking that's more intelligent than giant-lock style.)

Darren

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de