Discussion:
packet-classifier Summer of Code RFQ
(too old to reply)
Andy Pyles
2010-03-25 14:16:22 UTC
Permalink
---------
In kernel API for registering packet classes
--------


Hello, I'm interested in the Summer of Code proposal for
packet-classifier API. Below is a proposal that I have put together
regarding this SOC project. Any comments provided will be greatly
appreciated.
Most of the RFQ here is a re-phrasing of proposal outlined here:
http://www.netbsd.org/contrib/soc-projects.html#packet-classifying
This is my understanding of the proposal, and may contain errors, in
which case corrections are welcome.


Request for Comment: packet-classifier API

Overview:
Currently when a new class is added via pf/ALTQ / ALTQD it is added to
an internal ALTQ data structure. Similarly, PF uses tags, it's own
data structure for tagging packets. Since PF and ALTQ are tightly
coupled together, I propose a way to decouple PF and ALTQ. For
instance, suppose an alternative firewall would like to integrate with
ALTQ? Currently there is not a viable way to accomplish this outside
of the tight coupling that is done with PF. What is required is a
common API that can be used by ALTQ, PF and network device drivers.
This common API is coined the "packet-classifier" API. The
packet-classifier will store packet classes in a separate data
structure. ALTQ and network device drivers can register new classes
with the packet-classifier. An ethernet device driver could register
two classes "hi_priority" and "low_priority" with the
packet-classifier for instance if it contains transmit rings of
different priority. When a register() is successful, the
packet-classifier returns a token than can then be used for labeling a
packet placed in an MBUF for instance that can be classified later.

The packet-classifier API will have the following function calls:
Token *packet_classifier_register(....); // register a new class
returns Token
int packet_classifier_lookup(....); // lookup classifier
int packet_classifier_destroy(Token); // destroy classifier with
associated Token
int packet_classifier_replace(Token,.....); // replace classifier
with associated Token, with updated information.
TODO: others??

Challenges:
The most difficult part of this project is the learning curve. There
are a number of internals that will need to be changed, specifically
ALTQ, PF and at least one ethernet driver. There is a lot of things to
follow up on here. However, looking the source code the last couple of
days, things look pretty straight forward.

Approach:
I plan on doing the following: First, setup a simulated environment
with traffic shaping enabled. I have this environment already setup
with two virtual machines that gives higher precedence for ssh traffic
over http traffic.
Second, I will start with PF using the packet-classifier to convert
TAG names to packet-classifier tokens and label in MBUF packet
header.
Third, I will modify ALTQ to read the token from the MBUF and then
lookup() the token in the packet-classifier to determine how to
classify the packet, i.e. which queue to enqueue the packet.
Finally, assuming I can obtain the hardware for an appropriate NIC, I
will add the support to at least one NIC driver.


Open Questions:
What data structure to use for the packet-classifier?
I'm thinking at this point a hash-table, or a binary tree.
The SOC description describes an ethernet NIC with high and low
priority transmit rings. Can someone give me an example of such a NIC?

Thanks for your time in reading this. Please let me know if you have
any suggestions or comments.

regards,
Andy

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2010-03-25 23:43:19 UTC
Permalink
Post by Andy Pyles
---------
In kernel API for registering packet classes
--------
Hello, I'm interested in the Summer of Code proposal for
packet-classifier API. Below is a proposal that I have put together
regarding this SOC project. Any comments provided will be greatly
appreciated.
http://www.netbsd.org/contrib/soc-projects.html#packet-classifying
This is my understanding of the proposal, and may contain errors, in
which case corrections are welcome.
Request for Comment: packet-classifier API
Currently when a new class is added via pf/ALTQ / ALTQD it is added to
an internal ALTQ data structure. Similarly, PF uses tags, it's own
data structure for tagging packets. Since PF and ALTQ are tightly
coupled together, I propose a way to decouple PF and ALTQ. For
instance, suppose an alternative firewall would like to integrate with
ALTQ? Currently there is not a viable way to accomplish this outside
of the tight coupling that is done with PF. What is required is a
common API that can be used by ALTQ, PF and network device drivers.
This common API is coined the "packet-classifier" API. The
packet-classifier will store packet classes in a separate data
structure. ALTQ and network device drivers can register new classes
with the packet-classifier. An ethernet device driver could register
two classes "hi_priority" and "low_priority" with the
packet-classifier for instance if it contains transmit rings of
different priority. When a register() is successful, the
packet-classifier returns a token than can then be used for labeling a
packet placed in an MBUF for instance that can be classified later.
Token *packet_classifier_register(....); // register a new class
returns Token
int packet_classifier_lookup(....); // lookup classifier
int packet_classifier_destroy(Token); // destroy classifier with
associated Token
int packet_classifier_replace(Token,.....); // replace classifier
with associated Token, with updated information.
TODO: others??
Andy,

A couple of comments about your proposal:

1) Some of the ALTQ schedule types *may* be able to use a "packet
extract," such as a src/dst port/addr tuple (or hash), that the
packet filter has already calculated. For example, a schedule may
assign packets to different queues using the hash of the source port
& address.

2) Each interface may need a mapping from one or more class-labels
applied by the packet filter to service categories. E.g., "bulk"
may be the name of a packet classification, but "hipri" and "lopri"
are NIC 1's service types, and "hi", "med", "lo" are NIC 2's service
types. The operator may desire for NIC 1 to assign service type
"lopri" to "bulk" packets, and for NIC 2 to assign service type "med"
to "bulk" packets.

BTW, these days I am also concerned with three problems that are
related to the packet-classifying one:

1) What flows go through my router or network interface, *right now*?
How much traffic is in each flow? What are the historical flows on
this router/interface?

Several programs purport to track this information, but none of them
have met my needs.

2) Lots of bus-mastering NICs have both a transmit-descriptor ring that
holds umpteen packets, and a transmission queue (struct ifnet member
if_snd) that holds umpteen more. The queue empties into the ring.
The queue size is controllable by sysctl, but the ring size usually
is not. ISTM that *one* sysctl variable should control the maximum
length of the ring plus the queue.

A related problem is that ALTQ operates on the transmission queue,
not on the ring+queue. I don't think that ALTQ can be effective by
prioritizing packets on the transmission queue if the ring can grow
very long.

Dave
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Andy Pyles
2010-03-26 20:07:25 UTC
Permalink
Post by David Young
Post by Andy Pyles
---------
In kernel API for registering packet classes
--------
Hello, I'm interested in the Summer of Code proposal for
packet-classifier API. Below is a proposal that I have put together
regarding this SOC project. Any comments provided will be greatly
appreciated.
http://www.netbsd.org/contrib/soc-projects.html#packet-classifying
This is my understanding of the proposal, and may contain errors, in
which case corrections are welcome.
Request for Comment: packet-classifier API
Currently when a new class is added via pf/ALTQ / ALTQD it is added to
an internal ALTQ data structure. Similarly, PF uses tags, it's own
data structure for tagging packets.  Since PF and ALTQ are tightly
coupled together, I propose a way to decouple PF and ALTQ.  For
instance, suppose an alternative firewall would like to integrate with
ALTQ? Currently there is not a viable way to accomplish this outside
of the tight coupling that is done with PF. What is required is a
common API that can be used by ALTQ, PF and network device drivers.
    This common API is coined the "packet-classifier" API. The
packet-classifier will store packet classes in a separate data
structure. ALTQ and network device drivers can register new classes
with the packet-classifier. An ethernet device driver could register
two classes "hi_priority" and "low_priority" with the
packet-classifier for instance if it contains transmit rings of
different priority. When a register() is successful, the
packet-classifier returns a token than can then be used for labeling a
packet placed in an MBUF for instance that can be classified later.
     Token *packet_classifier_register(....); // register a new class
returns Token
     int packet_classifier_lookup(....);  // lookup classifier
     int packet_classifier_destroy(Token); // destroy classifier with
associated Token
     int packet_classifier_replace(Token,.....); // replace classifier
with associated Token, with updated         information.
     TODO:  others??
Andy,
1) Some of the ALTQ schedule types *may* be able to use a "packet
  extract," such as a src/dst port/addr tuple (or hash), that the
  packet filter has already calculated.  For example, a schedule may
  assign packets to different queues using the hash of the source port
  & address.
OK. This is good to know, this could help in performance.
Post by David Young
2) Each interface may need a mapping from one or more class-labels
  applied by the packet filter to service categories.  E.g., "bulk"
  may be the name of a packet classification, but "hipri" and "lopri"
  are NIC 1's service types, and "hi", "med", "lo" are NIC 2's service
  types.  The operator may desire for NIC 1 to assign service type
  "lopri" to "bulk" packets, and for NIC 2 to assign service type "med"
  to "bulk" packets.
So there should be a distinction between service types and
class-labels then. Perhaps when registering to the packet-classifier
API,
we should distinguish if this is a generic class-label to a service
category. Can you give an example of a NIC with these characteristics?
Also is that even possible to configure different service types on a
NIC? A quick scan through some of the latest device driver man pages
did not indicate how this is done.
Post by David Young
BTW, these days I am also concerned with three problems that are
1) What flows go through my router or network interface, *right now*?
  How much traffic is in each flow?  What are the historical flows on
  this router/interface?
  Several programs purport to track this information, but none of them
  have met my needs.
Doesn't pfctl -sa give you all that you need? What is it that you are
not getting from this output?( Assuming you are using PF)
Post by David Young
2) Lots of bus-mastering NICs have both a transmit-descriptor ring that
  holds umpteen packets, and a transmission queue (struct ifnet member
  if_snd) that holds umpteen more.  The queue empties into the ring.
  The queue size is controllable by sysctl, but the ring size usually
  is not.  ISTM that *one* sysctl variable should control the maximum
  length of the ring plus the queue.
What is the current behavior?
Post by David Young
  A related problem is that ALTQ operates on the transmission queue,
  not on the ring+queue.  I don't think that ALTQ can be effective by
  prioritizing packets on the transmission queue if the ring can grow
  very long.
I see how this could be an issue. How large can the ring get? I
imagine on some of the gigabit cards this could be more of an issue
because there is probably a larger transmit ring on these.
The overhead of prioritizing packets in the transmit ring could
impede performance though.
Post by David Young
Dave
--
David Young             OJC Technologies
Dave,

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2010-04-04 17:22:04 UTC
Permalink
Post by Andy Pyles
Post by David Young
2) Each interface may need a mapping from one or more class-labels
  applied by the packet filter to service categories.  E.g., "bulk"
  may be the name of a packet classification, but "hipri" and "lopri"
  are NIC 1's service types, and "hi", "med", "lo" are NIC 2's service
  types.  The operator may desire for NIC 1 to assign service type
  "lopri" to "bulk" packets, and for NIC 2 to assign service type "med"
  to "bulk" packets.
So there should be a distinction between service types and
class-labels then. Perhaps when registering to the packet-classifier
API,
we should distinguish if this is a generic class-label to a service
category. Can you give an example of a NIC with these characteristics?
Also is that even possible to configure different service types on a
NIC? A quick scan through some of the latest device driver man pages
did not indicate how this is done.
Just for example, an Atheros WLAN NIC supports more than one type of
service, and a Realtek WLAN NIC provides three transmission rings.
IIRC, ath(4) supports 802.11 type of service (WME), but that's not
documented.
?
It seems that you need to expand on your API in order to support both
packet classifications and types of service.
What do you think?
Post by Andy Pyles
Post by David Young
BTW, these days I am also concerned with three problems that are
1) What flows go through my router or network interface, *right now*?
  How much traffic is in each flow?  What are the historical flows on
  this router/interface?
  Several programs purport to track this information, but none of them
  have met my needs.
Doesn't pfctl -sa give you all that you need? What is it that you are
not getting from this output?( Assuming you are using PF)
That only shows flow information for current filter states---i.e., no
past information. The information is not easily digested by a person or
by a visualization program.
Post by Andy Pyles
Post by David Young
2) Lots of bus-mastering NICs have both a transmit-descriptor ring that
  holds umpteen packets, and a transmission queue (struct ifnet member
  if_snd) that holds umpteen more.  The queue empties into the ring.
  The queue size is controllable by sysctl, but the ring size usually
  is not.  ISTM that *one* sysctl variable should control the maximum
  length of the ring plus the queue.
What is the current behavior?
Today, you can control the maximum length of the queue with a sysctl,
only, and that's a recent development!
Post by Andy Pyles
Post by David Young
  A related problem is that ALTQ operates on the transmission queue,
  not on the ring+queue.  I don't think that ALTQ can be effective by
  prioritizing packets on the transmission queue if the ring can grow
  very long.
I see how this could be an issue. How large can the ring get? I
imagine on some of the gigabit cards this could be more of an issue
because there is probably a larger transmit ring on these.
It's an issue even on 100 Mb/s cards if they've negotiated a slow link
rate (10 Mb/s).
Post by Andy Pyles
The overhead of prioritizing packets in the transmit ring could
impede performance though.
Why do you say that?

Dave
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Mihai Chelaru
2010-04-02 20:43:30 UTC
Permalink
Post by Andy Pyles
---------
In kernel API for registering packet classes
--------
Hi Andy,

The main issue is how to map between classes registered by different
interfaces. Think that you have two interfaces, one (let's call it input)
have M priority classes and other (let's call it output) has N priority
classes and you have to forward packets from input to output, keeping in
mind that higher priority should be put on wire first, no matter what. For
example think that pf is registering 4 classes (realtime, high, medium,
low) and you have to map those on 2 hi-lo ethernet rings. In order to
achieve this you have to design solutions (and finally code common
interfaces) for at least these issues:

- how can you parse queues (and rings in ethernet case) in order to see what
packets/frames are there ?
- how can you insert priority packets into queues ?
- what should you do if queues are full and you need to insert a high
priority packet ?
- how can you flush packets from rings and refeed them to driver queue ?

Also pseudo-interfaces like gre should have a mechanism in order to
automatically map to "parent" interface. Keep in mind that a
pseudo-interface can have a different number of classes than its parent.

Moreover, think that other kernel interfaces like INET can register classes.
In this case an interface should be presented to userspace - probably via
setsockopt (if we're thinking only about pre-defined classes).
--
Mihai


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2010-04-04 21:45:59 UTC
Permalink
Post by Mihai Chelaru
Post by Andy Pyles
---------
In kernel API for registering packet classes
--------
Hi Andy,
The main issue is how to map between classes registered by different
interfaces. Think that you have two interfaces, one (let's call it input)
have M priority classes and other (let's call it output) has N priority
classes and you have to forward packets from input to output, keeping in
mind that higher priority should be put on wire first, no matter what. For
example think that pf is registering 4 classes (realtime, high, medium,
low) and you have to map those on 2 hi-lo ethernet rings. In order to
achieve this you have to design solutions (and finally code common
- how can you parse queues (and rings in ethernet case) in order to see what
packets/frames are there ?
- how can you insert priority packets into queues ?
- what should you do if queues are full and you need to insert a high
priority packet ?
- how can you flush packets from rings and refeed them to driver queue ?
Also pseudo-interfaces like gre should have a mechanism in order to
automatically map to "parent" interface. Keep in mind that a
pseudo-interface can have a different number of classes than its parent.
There can be no fixed relationship between pseudo-interfaces and "real"
interfaces.

Perhaps every interface, pseudo- or not, should apply a mapping from
packet class to packet class in its output routine? Maybe we can come
up with a good conceptual model and, eventually, an architecture, by
thinking of per-interface class-to-class mappings.

Where C is the set of packet classes, and I is the set of interfaces,
let there be a global class map, g : I x C -> C, and a mapping for each
interface, f[i] : C -> C, f[i](c) = g(i, c). Let c[d] be the default
packet class.

A reasonable global default mapping is g(i, c) = c[d].

Sometimes, for a tunnel 't', the identity mapping may be appropriate:

f[t](c) -> c

For a WLAN, w, net80211 currently provides this default mapping from IP
ToS field to WME classes, in ieee80211_classify():

f[w](tos) -> BK ("background") if tos is in {0x08, 0x20}
VI ("video") if tos is in {0x28, 0xa0}
VO ("voice") if tos is in {0x30, 0xe0, 0x88, 0xb8}
BE ("best effort") for all other tos
Post by Mihai Chelaru
Moreover, think that other kernel interfaces like INET can register classes.
Good point.
Post by Mihai Chelaru
In this case an interface should be presented to userspace - probably via
setsockopt (if we're thinking only about pre-defined classes).
Already there is a socket option for IP ToS. ip(4) gives this example:

setsockopt(s, IPPROTO_IP, IP_TOS, &tos, sizeof(tos));

Supposing that there is a global packet-class namespace, we need just
one new socket-level option:

setsockopt(s, SOL_SOCKET, SO_PKT_CLASS, &pkt_class, sizeof(pkt_class));

Dave
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...