Discussion:
multiple rx/tx rings and interrupt delivery on newer nics
(too old to reply)
Darren Reed
2007-06-13 08:12:23 UTC
Permalink
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?

For example, would this get tied in with ALTQ or something else?

Should I be able to dedicate a rx/tx ring pair to http traffic
and another to ssh, etc?

...but to do any of that will require some sort of framework.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2007-06-13 13:50:29 UTC
Permalink
Post by Darren Reed
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?
Think, yes -- do, no.
Post by Darren Reed
For example, would this get tied in with ALTQ or something else?
Ooh, I hope not. ALTQ is slow.
Post by Darren Reed
Should I be able to dedicate a rx/tx ring pair to http traffic
and another to ssh, etc?
I've been thinking about this -- I know Solaris can do this now -- and
I have to say I'm somewhat skeptical. Does this really save much overhead?

I might rather have the packets all land in one ring, tagged for what
rule they've matched in classification. Now, on the other hand, what would
be of great performance benefit for _my_ application, at least, on a
multiprocessor, would be the ability to use multiple rings according to
a slightly different set of classification parameters, e.g. destination
IP address. That way you can avoid stomping the cache of CPU A when
packets come in that will only ever be touched by the network stack on
CPU B.
Post by Darren Reed
...but to do any of that will require some sort of framework.
Yes. Do you know how what Sun recently did has been working out?

It seems to me our highest priority for accomodating smarter NICs ought
to be header splitting. It's ubiquitous and should give a good performance
boost while we try to figure out how to handle the fancier new NICs that
can classify packets and the like.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2007-06-14 07:03:03 UTC
Permalink
Post by Thor Lancelot Simon
Post by Darren Reed
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?
Think, yes -- do, no.
Post by Darren Reed
For example, would this get tied in with ALTQ or something else?
Ooh, I hope not. ALTQ is slow.
Post by Darren Reed
Should I be able to dedicate a rx/tx ring pair to http traffic
and another to ssh, etc?
I've been thinking about this -- I know Solaris can do this now -- and
I have to say I'm somewhat skeptical. Does this really save much overhead?
It's not just about speeding up delivery to an application, but being
able to deliver a certain QOS too because you don't have to deal
with each queue equally.

For local delivery, if you have specific rings being served by specific
cores/cpus then it stands to reason you should see some locality of
execution style benefits.
Post by Thor Lancelot Simon
I might rather have the packets all land in one ring, tagged for what
rule they've matched in classification. Now, on the other hand, what would
be of great performance benefit for _my_ application, at least, on a
multiprocessor, would be the ability to use multiple rings according to
a slightly different set of classification parameters, e.g. destination
IP address.
I believe the standard matching for packets is a 5-tuple:
source address, destination address, protocol, source port, destination port
Post by Thor Lancelot Simon
That way you can avoid stomping the cache of CPU A when
packets come in that will only ever be touched by the network stack on
CPU B.
Right, see above.
Post by Thor Lancelot Simon
Post by Darren Reed
...but to do any of that will require some sort of framework.
Yes. Do you know how what Sun recently did has been working out?
Yes, I do...the people working on the project are my co-workers, so
this is about as involved in any NetBSD stuff as I can realisticly be.
The project to do this is known as "crossbow" and is hosted on
the opensolaris.org website at:
http://www.opensolaris.org/os/project/crossbow/

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Cherry G. Mathew
2007-06-15 03:14:39 UTC
Permalink
Post by Darren Reed
Post by Thor Lancelot Simon
Post by Darren Reed
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?
Think, yes -- do, no.
Post by Darren Reed
For example, would this get tied in with ALTQ or something else?
Ooh, I hope not. ALTQ is slow.
Post by Darren Reed
Should I be able to dedicate a rx/tx ring pair to http traffic
and another to ssh, etc?
I've been thinking about this -- I know Solaris can do this now -- and
I have to say I'm somewhat skeptical. Does this really save much overhead?
It's not just about speeding up delivery to an application, but being
able to deliver a certain QOS too because you don't have to deal
with each queue equally.
For local delivery, if you have specific rings being served by specific
cores/cpus then it stands to reason you should see some locality of
execution style benefits.
Post by Thor Lancelot Simon
I might rather have the packets all land in one ring, tagged for what
rule they've matched in classification. Now, on the other hand, what would
be of great performance benefit for _my_ application, at least, on a
multiprocessor, would be the ability to use multiple rings according to
a slightly different set of classification parameters, e.g. destination
IP address.
source address, destination address, protocol, source port, destination port
Post by Thor Lancelot Simon
That way you can avoid stomping the cache of CPU A when
packets come in that will only ever be touched by the network stack on
CPU B.
Right, see above.
Post by Thor Lancelot Simon
Post by Darren Reed
...but to do any of that will require some sort of framework.
Yes. Do you know how what Sun recently did has been working out?
Yes, I do...the people working on the project are my co-workers, so
this is about as involved in any NetBSD stuff as I can realisticly be.
The project to do this is known as "crossbow" and is hosted on
http://www.opensolaris.org/os/project/crossbow/
Darren
http://www.xensource.com/files/xensummit_4/xen-ny-summit-smartnic_Pratt.pdf
--
~Cherry

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2007-06-15 05:02:50 UTC
Permalink
Post by Cherry G. Mathew
Post by Darren Reed
...
Yes, I do...the people working on the project are my co-workers, so
this is about as involved in any NetBSD stuff as I can realisticly be.
The project to do this is known as "crossbow" and is hosted on
http://www.opensolaris.org/os/project/crossbow/
Darren
http://www.xensource.com/files/xensummit_4/xen-ny-summit-smartnic_Pratt.pdf
Exactly. I believe the Solaris approach is to define vnics as being the
bottom
side of what interfaces with Xen. Combine this with what the crossbow
project
is delivering in terms of control of descriptor rings and you can fill
in the blanks.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Sepherosa Ziehau
2009-03-24 09:46:38 UTC
Permalink
Post by Darren Reed
source address, destination address, protocol, source port, destination port
Nah, not 5 tuple. And you will have a trouble time to figure out
protocol of the packet, if the NIC does not provide it in the RX desc.

Best Regards,
sephe
--
Live Free or Die

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2009-03-24 12:03:50 UTC
Permalink
Post by Sepherosa Ziehau
Post by Darren Reed
source address, destination address, protocol, source port, destination port
Nah, not 5 tuple. And you will have a trouble time to figure out
protocol of the packet, if the NIC does not provide it in the RX desc.
For many applications, just {saddr, daddr} is good enough to give you
reasonable distribution of input packets over CPUs, which is good enough
to let you get a lot of concurrency in the network stack with only a
single network interface.

The newer 8257x can actually go so far as to compute the TCP PCB hash
for us (well, this would require changing our PCB hash function, but
that's no big deal) allowing a ton of the logic in ip_input and tcp_input
to be skipped, as well as memory access even to the packet _headers_ until
quite late in the receive path, which is a neat trick indeed.
--
Thor Lancelot Simon ***@rek.tjls.com
"Even experienced UNIX users occasionally enter rm *.* at the UNIX
prompt only to realize too late that they have removed the wrong
segment of the directory structure." - Microsoft WSS whitepaper

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Sepherosa Ziehau
2009-03-25 01:37:27 UTC
Permalink
Post by Thor Lancelot Simon
Post by Sepherosa Ziehau
Post by Darren Reed
source address, destination address, protocol, source port, destination port
Nah, not 5 tuple. And you will have a trouble time to figure out
protocol of the packet, if the NIC does not provide it in the RX desc.
For many applications, just {saddr, daddr} is good enough to give you
reasonable distribution of input packets over CPUs, which is good enough
to let you get a lot of concurrency in the network stack with only a
single network interface.
The newer 8257x can actually go so far as to compute the TCP PCB hash
_All_ devices supporting RSS standard hash function could do the
{faddr,laddr,fport,lport} hash; this includes jme(4). Intel's
extention on their PCIe devices is hashing non-frag UDP packet using
{faddr,laddr,fport,lport}. BTW, _all_ 8257x support RSS, including
the relative weak 82573 in this product line.
Post by Thor Lancelot Simon
for us (well, this would require changing our PCB hash function, but
that's no big deal) allowing a ton of the logic in ip_input and tcp_input
Yep, the Toeplitz hash function implemented in host does not have
noticeable performace hit (at least it is so on dragonfly)
Post by Thor Lancelot Simon
to be skipped, as well as memory access even to the packet _headers_ until
quite late in the receive path, which is a neat trick indeed.
I don't know, but I think you still will have to check length fields
in ip header during ip_input(), which means you will have to touch ip
header earlier.

Best Regards,
sephe
--
Live Free or Die

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Allen Briggs
2007-06-13 14:11:29 UTC
Permalink
Post by Darren Reed
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?
I've given larger # of rx/tx rings a little bit of thought, but
haven't come up with any good general-purpose way to make use
of them.

For one application, I'd have liked to have had multiple tx
rings for handling normal traffic and high-priority traffic (for
"application"-level exception/flow handling). But the hardware
in question there doesn't offer multiple tx rings.

Different hardware has different kinds of packet classification
for incoming packets and probably different kinds of scheduling
for outgoing queues. Is there prior art in some kind of general
purpose API to utilize these?

I'd be interested to hear if anyone's started thinking about MSI,
too, but that's not really a tech-net issue so much as a tech-kern
issue.

-allen
--
Allen Briggs | http://www.ninthwonder.com/~briggs/ | ***@ninthwonder.com

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2007-06-14 07:05:13 UTC
Permalink
Post by Allen Briggs
...
Different hardware has different kinds of packet classification
for incoming packets and probably different kinds of scheduling
for outgoing queues. Is there prior art in some kind of general
purpose API to utilize these?
I don't believe so.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Sepherosa Ziehau
2009-03-24 09:43:14 UTC
Permalink
Post by Allen Briggs
...
Different hardware has different kinds of packet classification
for incoming packets and probably different kinds of scheduling
M$ defined two standard classification of incoming packets. Intel has
several extension (like use addr/port tuple for UDP packets) in their
pcie NICs, however, I don't think it will get widely deployed. As
about the multi-tx queues, AFAIK, all NICs support multi-tx queue
provided some mechanism to make all TX queues use same priority.
Post by Allen Briggs
for outgoing queues. Is there prior art in some kind of general
purpose API to utilize these?
--
Live Free or Die

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2009-04-01 15:59:03 UTC
Permalink
Post by Sepherosa Ziehau
Post by Allen Briggs
...
Different hardware has different kinds of packet classification
for incoming packets and probably different kinds of scheduling
M$ defined two standard classification of incoming packets. Intel has
several extension (like use addr/port tuple for UDP packets) in their
pcie NICs, however, I don't think it will get widely deployed. As
about the multi-tx queues, AFAIK, all NICs support multi-tx queue
provided some mechanism to make all TX queues use same priority.
I disagree.

Everyone that wants to sell a NIC for use wit virtualisation is
going to add support for delivery into rings based on IP address.


The latest releases of Solaris that you can download (Solaris
Express Community Edition - not an officially supported product)
can make full use of these NIC features for either queueing
packets up for zones or applications. It'll be in the next
"release" of OpenSolaris. You can be sure that if Linux does
not have something like that now, it will "soon" and the same
for Microsoft. NetBSD can sit around, look at the ceiling,
whistle dixie and pretend that it isn't relevant but nothing
is going to stop the others striving to be "better".

Intel isn't adding these features because people don't want
them or won't use them, if anything it is the exact opposite.
They've got a single 10GB NIC and would like to turn that into
10 virtual 1GB NICs, etc.

I can easily see this feature being used for routing (lets give
all port 80 traffic its own set of tx/rx rings.) Wouldn't you
rather be able to dedicate a descriptor ring or two for your ssh
traffic than rely on ALTQ and the device driver for priority
delivery? Or maybe you want specific rings for ssh and http
because you're using bittorrent a lot and that uses effectively
random addresses and ports?

This capability will eventually work its way into cheaper NICs,
if only in a very limited fashion, because people will want to
run a virtual guest on the desktop using the motherboard NIC
and to not have to suffer from the virtual guest "flooding"
their NIC.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Sepherosa Ziehau
2009-04-02 02:36:56 UTC
Permalink
Post by Darren Reed
Post by Sepherosa Ziehau
Post by Allen Briggs
...
Different hardware has different kinds of packet classification
for incoming packets and probably different kinds of scheduling
M$ defined two standard classification of incoming packets. Intel has
several extension (like use addr/port tuple for UDP packets) in their
pcie NICs, however, I don't think it will get widely deployed. As
about the multi-tx queues, AFAIK, all NICs support multi-tx queue
provided some mechanism to make all TX queues use same priority.
I disagree.
mmm, which points do you disagree :)? The description about the RX
side or the TX side?
Post by Darren Reed
Everyone that wants to sell a NIC for use wit virtualisation is
going to add support for delivery into rings based on IP address.
Do you mean by the hash of {faddr,laddr} pair? All NICs support RSS
could do that.
Post by Darren Reed
The latest releases of Solaris that you can download (Solaris
Express Community Edition - not an officially supported product)
can make full use of these NIC features for either queueing
Could you tell me which NIC (the chip/vendor) and which feature you
have mentioned above? On RX side, I think the most common feature is
RSS, which is widely implemented in many modern NICs. Broadcom's NICs
have kind of programmable filtering mechanism on RX side, I don't know
whether you mean that. Currently in dragonfly, we only utilize RSS
feature of some NICs.
Post by Darren Reed
packets up for zones or applications. It'll be in the next
"release" of OpenSolaris. You can be sure that if Linux does
not have something like that now, it will "soon" and the same
for Microsoft. NetBSD can sit around, look at the ceiling,
whistle dixie and pretend that it isn't relevant but nothing
is going to stop the others striving to be "better".
Intel isn't adding these features because people don't want
Could you describe a little bit more about "these features"?
Post by Darren Reed
them or won't use them, if anything it is the exact opposite.
They've got a single 10GB NIC and would like to turn that into
10 virtual 1GB NICs, etc.
Well, I have to say, I currently don't have a clear idea about how
SR-IOV works in Intel's relatively newer products (e.g. 82576, it is
1GB NIC tho), if by "these features" you mean SR-IOV (I think your
description is quite close to it)
Post by Darren Reed
I can easily see this feature being used for routing (lets give
all port 80 traffic its own set of tx/rx rings.) Wouldn't you
rather be able to dedicate a descriptor ring or two for your ssh
traffic than rely on ALTQ and the device driver for priority
delivery? Or maybe you want specific rings for ssh and http
because you're using bittorrent a lot and that uses effectively
random addresses and ports?
Well, these are nice features. It is definitely doable on TX side.
However, I don't have idea how you could program any currently
available NICs to do that on RX side. I didn't take a close look at
NetExtremeII's RX filtering mechanism, maybe it could kinda do what
you have described?
Post by Darren Reed
This capability will eventually work its way into cheaper NICs,
if only in a very limited fashion, because people will want to
run a virtual guest on the desktop using the motherboard NIC
and to not have to suffer from the virtual guest "flooding"
their NIC.
Yep, if one day they appeared in the NICs, they would be nice features
to support :)

Best Regards,
sephe
--
Live Free or Die

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2009-04-04 19:05:24 UTC
Permalink
Post by Sepherosa Ziehau
Post by Darren Reed
Post by Sepherosa Ziehau
Post by Allen Briggs
...
Different hardware has different kinds of packet classification
for incoming packets and probably different kinds of scheduling
M$ defined two standard classification of incoming packets. Intel has
several extension (like use addr/port tuple for UDP packets) in their
pcie NICs, however, I don't think it will get widely deployed. As
about the multi-tx queues, AFAIK, all NICs support multi-tx queue
provided some mechanism to make all TX queues use same priority.
I disagree.
mmm, which points do you disagree :)? The description about the RX
side or the TX side?
I don't agree with "I don't think it will get widely deployed."

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Sepherosa Ziehau
2009-04-05 01:48:32 UTC
Permalink
Post by Darren Reed
Post by Sepherosa Ziehau
Post by Darren Reed
Post by Sepherosa Ziehau
Post by Allen Briggs
...
Different hardware has different kinds of packet classification
for incoming packets and probably different kinds of scheduling
M$ defined two standard classification of incoming packets. Intel has
several extension (like use addr/port tuple for UDP packets) in their
pcie NICs, however, I don't think it will get widely deployed. As
about the multi-tx queues, AFAIK, all NICs support multi-tx queue
provided some mechanism to make all TX queues use same priority.
I disagree.
mmm, which points do you disagree :)? The description about the RX
side or the TX side?
I don't agree with "I don't think it will get widely deployed."
Haha, I see. However, using the 4-tuples for non-frag UDP datagram is
not mentioned in RSS standard at all and currently most vendors only
do RSS standard hash functions. I personally like to see that "hash
using 4-tuples for non-frag UDP" becomes standard one day :)

Best Regards,
sephe
--
Live Free or Die

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2007-06-13 22:08:55 UTC
Permalink
Post by Darren Reed
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?
I've been thinking about it. Don't have anything concrete yet. I
think the first priority should be supporting MSI interrupts, tho.
Post by Darren Reed
For example, would this get tied in with ALTQ or something else?
Should I be able to dedicate a rx/tx ring pair to http traffic
and another to ssh, etc?
...but to do any of that will require some sort of framework.
Darren
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2009-03-13 09:35:21 UTC
Permalink
Post by Darren Reed
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?
I've been thinking about it. Don't have anything concrete yet. I think
the first priority should be supporting MSI interrupts, tho.
To pick up on this...

It would be nice if we could "divide" a NIC that has (say) 16
RX/TX rings into 2 "virtual" NICs that each had 8 and somehow
tell Xen it can use one set of 8 for dom0 and one set of 8 for
domU. How would that then get configured into Xen?

Darren

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2009-03-13 16:47:38 UTC
Permalink
Post by Darren Reed
Post by Darren Reed
Has anyone started to think about how NetBSD can take advantage
of NICs that have a larger number of rx/tx descriptor rings,
and/or MSI interrupts?
I've been thinking about it. Don't have anything concrete yet. I think
the first priority should be supporting MSI interrupts, tho.
To pick up on this...
It would be nice if we could "divide" a NIC that has (say) 16
RX/TX rings into 2 "virtual" NICs that each had 8 and somehow
tell Xen it can use one set of 8 for dom0 and one set of 8 for
domU. How would that then get configured into Xen?
We'd need a generic way to add priority tags to packets, and then
use it in NIC drivers.
then we could add a way to tag packets at the bridge level.

this would be usefull for altq too ...
--
Manuel Bouyer, LIP6, Universite Paris VI. ***@lip6.fr
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2009-03-13 17:37:27 UTC
Permalink
Post by Manuel Bouyer
We'd need a generic way to add priority tags to packets, and then
use it in NIC drivers.
But most modern NICs can use multiple receive rings based on much
richer (and more useful) criteria than just priority. And, generally
speaking, you never want anything in the system to touch all packets --
you want them segregated to a receive ring based on, for example,
source and destination address and port, and you never want anything
that's not on CPU N to touch receive ring N.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2009-03-17 11:48:49 UTC
Permalink
Post by Thor Lancelot Simon
Post by Manuel Bouyer
We'd need a generic way to add priority tags to packets, and then
use it in NIC drivers.
But most modern NICs can use multiple receive rings based on much
richer (and more useful) criteria than just priority. And, generally
speaking, you never want anything in the system to touch all packets --
you want them segregated to a receive ring based on, for example,
source and destination address and port, and you never want anything
that's not on CPU N to touch receive ring N.
Right.

Hardware classification on server NICs now supports using the
complete 5-tuple (srcip, dstip, protocol, srcport, dstport) as
part of deciding which rx ring to place a packet in.

Darren

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Sepherosa Ziehau
2009-03-24 09:38:32 UTC
Permalink
Post by Darren Reed
Post by Thor Lancelot Simon
Post by Manuel Bouyer
We'd need a generic way to add priority tags to packets, and then
use it in NIC drivers.
But most modern NICs can use multiple receive rings based on much
richer (and more useful) criteria than just priority. And, generally
speaking, you never want anything in the system to touch all packets --
you want them segregated to a receive ring based on, for example,
source and destination address and port, and you never want anything
that's not on CPU N to touch receive ring N.
Right.
Hardware classification on server NICs now supports using the
complete 5-tuple (srcip, dstip, protocol, srcport, dstport) as
Only TCP uses {faddr,laddr,fport,lport} tuple, rest just use
{faddr,laddr}. AFAIK, many NICs (e.g. intel's 8257x) could not
differentiate IP fragments and non-UDP/non-TCP non-fragmented packets.

Best Regards,
sephe
--
Live Free or Die

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...