Discussion:
Socket options KPI
(too old to reply)
Elad Efrat
2008-01-18 17:08:37 UTC
Permalink
Hi,

I have a patch to make a large portion of the network code use a new
"struct sockopt" (inspired by similar work done in FreeBSD, with input
from dyoung@ and yamt@) instead of passing socket options in mbufs.

The work is incomplete - although I think the majority of the code was
covered. I've been running it here for about a month or so, but I'm
putting it up for public review as it's unlikely I'll be able to test
all of the code paths affected by these changes, specifically in network
stacks I don't use...

The diff is available online:

http://www.netbsd.org/~elad/sockopt-20080118.diff

All comments are welcome. It's not planned on being committed anytime
soon (unless, of course, all feedback suggests otherwise), so take your
time carefully reviewing the changes if you're interested. :)

Thanks,

-e.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2008-01-18 18:21:43 UTC
Permalink
Post by Elad Efrat
Hi,
I have a patch to make a large portion of the network code use a new
"struct sockopt" (inspired by similar work done in FreeBSD, with input
The work is incomplete - although I think the majority of the code was
covered. I've been running it here for about a month or so, but I'm
putting it up for public review as it's unlikely I'll be able to test
all of the code paths affected by these changes, specifically in network
stacks I don't use...
http://www.netbsd.org/~elad/sockopt-20080118.diff
All comments are welcome. It's not planned on being committed anytime
soon (unless, of course, all feedback suggests otherwise), so take your
time carefully reviewing the changes if you're interested. :)
You have a few int x = sizeof(); which should be size_t's.

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
matthew green
2008-01-18 19:06:15 UTC
Permalink
hi elad,

could you explain what the purpose/benefit of this is?

thanks,


.mrg.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Elad Efrat
2008-01-18 22:04:03 UTC
Permalink
Post by matthew green
hi elad,
could you explain what the purpose/benefit of this is?
thanks,
.mrg.
Instead of passing "struct mbuf" for everything (socket options, various
socket addresses, etc.) I'd like to -- like done in FreeBSD over 10
years ago -- use (the) proper types for everything.

-e.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2008-01-18 19:26:28 UTC
Permalink
Post by Elad Efrat
All comments are welcome. It's not planned on being committed anytime
soon (unless, of course, all feedback suggests otherwise), so take your
time carefully reviewing the changes if you're interested. :)
so_setsockopt:
Shouldn't valsize be the normal size_t? Does socklen_t really make sense
here? The val pointer should be const.

sockopt_ensure_writeable --> I don't like the name.

If you can now also push the socket options down into ipv4/ipv6 and the
patch can go without the sockopt_setmbuf/getmbuf, that would be very
nice :-)

Without a full review, this looks like a huge improvement.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2008-01-18 22:21:28 UTC
Permalink
Post by matthew green
could you explain what the purpose/benefit of this is?
The use of mbufs for socket options is a micro optimisation from the
time when no efficient memory allocator was present.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
matthew green
2008-01-18 23:51:08 UTC
Permalink
Post by Joerg Sonnenberger
Shouldn't valsize be the normal size_t? Does socklen_t really make sense
here?
I vaguely remember there was a reason for this socklen_t... maybe the
original FreeBSD changes. Anyway, I'm thinking it should be unsigned
int, actually, to fit the len parameter to sys_setsockopt() and
sys_getsockopt(), no?


don't getsockopt/setsockopt take socklen_t? oh ouch. their
manual pages claim socklen_t but their implementations use
unsigned int. that should really be fixed.

socklen_t is the right type.


.mrg.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Elad Efrat
2008-01-27 18:33:06 UTC
Permalink
Something I'm not keen on is the SOPT_SET/SOPT_GET thing. Would it be too
much work to split pr_ctloutput() into pr_setopt() and pr_getopt()?
Let's do this in smaller steps so we don't break too much stuff. :)
For the first step, I'd like to reduce the mbuf abuse.
In that case, I think better to keep the PRCO_ and not encode the
int pr_ctloutput(int req, struct socket *so, struct sockopt *opt);
(just because I don't like the fit)
What you're suggesting implies we will be changing the KPI that way,
which, at the moment, is not necessarily true. I would very much like
for the first sweep to be close to what is done in FreeBSD.

While separating usrreq to individual routines (bind, connect, etc.) is
done in FreeBSD, breaking ctloutput to "set" and "get" isn't.

Can we agree on revising this at a later date please?
also,
int
sockopt_setint(struct sockopt *sopt, int val)
{
return sockopt_set(sopt, &val, sizeof(val));
}
would avoid duplicating (minimal, I admit :) code..
Righto, I'll add this one too. :)

-e.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Iain Hibbert
2008-01-27 18:56:18 UTC
Permalink
Post by Elad Efrat
For the first step, I'd like to reduce the mbuf abuse.
In that case, I think better to keep the PRCO_ and not encode the
int pr_ctloutput(int req, struct socket *so, struct sockopt *opt);
(just because I don't like the fit)
What you're suggesting implies we will be changing the KPI that way,
which, at the moment, is not necessarily true. I would very much like
for the first sweep to be close to what is done in FreeBSD.
Ah, I see - I did not notice they changed it that way.

my objection really was that pr_ctloutput() seems to be designed such that
it can be extended in a per-protocol way so that protocols can be layered
in the kernel. changing it to use SOPT_ means that you can only get and
set options (though I think there are no other usages)

in fact the way that protocols are implemented is borked, you can't
actually layer them without the lower knowing about the upper and I had to
work it differently when writing the bluetooth code (for which the upper
layer is anonymous)
Post by Elad Efrat
Can we agree on revising this at a later date please?
thats fine

iain

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Iain Hibbert
2008-01-28 21:22:10 UTC
Permalink
Post by Iain Hibbert
Post by Elad Efrat
What you're suggesting implies we will be changing the KPI that way,
which, at the moment, is not necessarily true. I would very much like
for the first sweep to be close to what is done in FreeBSD.
Ah, I see - I did not notice they changed it that way.
I'm looking into this some more but finding some trouble. Its not really
possible to separate the 'sockopt' from the netbt protocol layers because
it is necessary to know the length of the data before you extract
anything. So, the sockopt will have to be passed down.

I don't see sockopt_get/sockopt_set in the FreeBSD code (that I have), are
you willing to consider changing sockopt_get() from

void *
sockopt_get(const struct sockopt *sopt, size_t len)
{
if (sopt->sopt_un == NULL || sopt->sopt_size != len)
return NULL;

return (void *)sopt->sopt_un;
}

to

int
sockopt_get(const struct sockopt *sopt, void *buf, size_t len)
{

if (sopt->sopt_un == NULL || sopt->sopt_size != len)
return EINVAL;

memcpy(buf, sopt->sopt_un, sopt->sopt_size);
return 0;
}

?

IMHO this results in cleaner code (see below for what I've made in the
L2CAP case). It also mirrors sockopt_set() which also does the copy and
returns an errno(2). It means that you have to have a (probably stack
based) variable but I prefer to do that as it removes any alignment
issues. (If you wanted to provide a sockopt_getptr() also I would not
object)

iain

/* sys/netbt/l2cap_socket.c */
int
l2cap_ctloutput(struct socket *so, struct sockopt *sopt)
{
struct l2cap_channel *pcb = so->so_pcb;
int err = 0;

if (pcb == NULL)
return EINVAL;

if (sopt->sopt_level != BTPROTO_L2CAP)
return ENOPROTOOPT;

switch(sopt->sopt_dir) {
case SOPT_GET:
err = l2cap_getopt(pcb, sopt);
break;

case SOPT_SET:
err = l2cap_setopt(pcb, sopt);
break;

default:
err = EINVAL;
break;
}

return err;
}

/* sys/netbt/l2cap_upper.c */
int
l2cap_setopt(struct l2cap_channel *chan, struct sockopt *sopt)
{
uint16_t mtu;
int mode, err = 0;

switch (sopt->sopt_name) {
case SO_L2CAP_IMTU: /* set Incoming MTU */
err = sockopt_get(sopt, &mtu, sizeof(uint16_t)):
if (err)
break;

if (mtu < L2CAP_MTU_MINIMUM)
err = EINVAL;
else if (chan->lc_state == L2CAP_CLOSED)
chan->lc_imtu = mtu;
else
err = EBUSY;

break;

case SO_L2CAP_LM: /* set link mode */
err = sockopt_get(sopt, &mode, sizeof(int));
if (err)
break;

mode &= (L2CAP_LM_SECURE | L2CAP_LM_ENCRYPT | L2CAP_LM_AUTH);

if (mode & L2CAP_LM_SECURE)
mode |= L2CAP_LM_ENCRYPT;

if (mode & L2CAP_LM_ENCRYPT)
mode |= L2CAP_LM_AUTH;

chan->lc_mode = mode;

if (chan->lc_state == L2CAP_OPEN)
err = l2cap_setmode(chan);

break;

case SO_L2CAP_OQOS: /* set Outgoing QoS flow spec */
case SO_L2CAP_FLUSH: /* set Outgoing Flush Timeout */
default:
err = ENOPROTOOPT;
break;
}

return err;
}

int
l2cap_getopt(struct l2cap_channel *chan, struct sockopt *sopt)
{

switch (sopt->sopt_name) {
case SO_L2CAP_IMTU: /* get Incoming MTU */
return sockopt_set(sopt, &chan->lc_imtu, sizeof(uint16_t));

case SO_L2CAP_OMTU: /* get Outgoing MTU */
return sockopt_set(sopt, &chan->lc_omtu, sizeof(uint16_t));

case SO_L2CAP_IQOS: /* get Incoming QoS flow spec */
return sockopt_set(sopt, &chan->lc_iqos, sizeof(l2cap_qos_t));

case SO_L2CAP_OQOS: /* get Outgoing QoS flow spec */
return sockopt_set(sopt, &chan->lc_oqos, sizeof(l2cap_qos_t));

case SO_L2CAP_FLUSH: /* get Flush Timeout */
return sockopt_set(sopt, &chan->lc_flush, sizeof(uint16_t));

case SO_L2CAP_LM: /* get link mode */
return sockopt_set(sopt, &chan->lc_mode, sizeof(int));

default:
break;
}

return EPROTONOOPT;
}

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Elad Efrat
2008-01-28 21:50:36 UTC
Permalink
Post by Iain Hibbert
I'm looking into this some more but finding some trouble. Its not really
possible to separate the 'sockopt' from the netbt protocol layers because
it is necessary to know the length of the data before you extract
anything. So, the sockopt will have to be passed down.
I don't see sockopt_get/sockopt_set in the FreeBSD code (that I have), are
you willing to consider changing sockopt_get() from
void *
sockopt_get(const struct sockopt *sopt, size_t len)
{
if (sopt->sopt_un == NULL || sopt->sopt_size != len)
return NULL;
return (void *)sopt->sopt_un;
}
to
int
sockopt_get(const struct sockopt *sopt, void *buf, size_t len)
{
if (sopt->sopt_un == NULL || sopt->sopt_size != len)
return EINVAL;
memcpy(buf, sopt->sopt_un, sopt->sopt_size);
return 0;
}
?
IMHO this results in cleaner code (see below for what I've made in the
L2CAP case). It also mirrors sockopt_set() which also does the copy and
returns an errno(2). It means that you have to have a (probably stack
based) variable but I prefer to do that as it removes any alignment
issues. (If you wanted to provide a sockopt_getptr() also I would not
object)
Sure, this sounds good to me and I'll do it unless others object or have
different opinions.

Thanks!

-e.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2008-06-26 19:02:26 UTC
Permalink
Post by Elad Efrat
Hi,
I have a patch to make a large portion of the network code use a new
"struct sockopt" (inspired by similar work done in FreeBSD, with input
What happened to this? It looked like you got mostly positive comments...

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Elad Efrat
2008-06-26 21:40:25 UTC
Permalink
Post by Thor Lancelot Simon
Post by Elad Efrat
Hi,
I have a patch to make a large portion of the network code use a new
"struct sockopt" (inspired by similar work done in FreeBSD, with input
What happened to this? It looked like you got mostly positive comments...
See http://mail-index.netbsd.org/tech-net/2008/05/02/msg000501.html.

-e.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2008-08-05 16:05:51 UTC
Permalink
Post by Elad Efrat
Post by Thor Lancelot Simon
Post by Elad Efrat
Hi,
I have a patch to make a large portion of the network code use a new
"struct sockopt" (inspired by similar work done in FreeBSD, with input
What happened to this? It looked like you got mostly positive comments...
See http://mail-index.netbsd.org/tech-net/2008/05/02/msg000501.html.
Let's get this checked in!
Post by Elad Efrat
-e.
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Iain Hibbert
2008-08-05 22:41:36 UTC
Permalink
Post by Jason Thorpe
Post by Elad Efrat
Post by Thor Lancelot Simon
Post by Elad Efrat
Hi,
I have a patch to make a large portion of the network code use a new
"struct sockopt" (inspired by similar work done in FreeBSD, with input
What happened to this? It looked like you got mostly positive comments...
See http://mail-index.netbsd.org/tech-net/2008/05/02/msg000501.html.
Let's get this checked in!
Sorry. I didn't want to check it in while nobody was around to fix any
problems (I was away for a month+, and Elad was busy)

In the meantime, we have moved to feature freeze and this is a large
change so I was holding off.. if we want it in 5.0 I can check it in
though?

iain

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2008-08-05 23:17:29 UTC
Permalink
Post by Iain Hibbert
In the meantime, we have moved to feature freeze and this is a large
change so I was holding off.. if we want it in 5.0 I can check it in
though?
I think it's fairly important to get in. I would also like to see the
transition OFF of mbufs done for 5.0
Post by Iain Hibbert
iain
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Iain Hibbert
2008-08-06 08:40:00 UTC
Permalink
Post by Iain Hibbert
In the meantime, we have moved to feature freeze and this is a large
change so I was holding off.. if we want it in 5.0 I can check it in
though?
I think it's fairly important to get in. I would also like to see the
transition OFF of mbufs done for 5.0
Ok I'm just converting the recent ACCEPT_FILTERS code to use sockopt now
will do it this evening..

iain

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...