Discussion:
Proposal: socketfrom()
(too old to reply)
Thor Lancelot Simon
2007-07-05 05:38:05 UTC
Permalink
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.

I could partially address this by adding a system call that sets multiple
socket options at once (which, I think, would be a better API than
setsockopt() anyway) but that gets rid of _all but one_ system call to
set up the socket before connect(); I want to get rid of them all.

I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/accept() make
possible for the other direction. I'm thinking of something along the lines
of this:

int socketfrom(int template, int domain, int type, int protocol);

Which would return a new socket using the socket options already set on
socket "template". If domain, type, and protocol don't match, this is
an error (or perhaps it would be best to omit them entirely and just
have one argument, the template socket.

Opinions?
--
Thor Lancelot Simon ***@rek.tjls.com

"The inconsistency is startling, though admittedly, if consistency is to
be abandoned or transcended, there is no problem." - Noam Chomsky

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg A. Woods
2007-07-05 06:05:11 UTC
Permalink
At Thu, 5 Jul 2007 01:38:05 -0400, Thor Lancelot Simon wrote:
Subject: Proposal: socketfrom()
Post by Thor Lancelot Simon
I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/accept() make
possible for the other direction. I'm thinking of something along the lines
int socketfrom(int template, int domain, int type, int protocol);
That might be easier to implement inside the kernel, but wouldn't it be
better from a clean API design point of view to have something like a:

set
setsockopt_default(int level,
int optname,
const void *optval,
socklen_t optlen);

Which would set the default option(s) for all new sockets created by the
process until the next call to it?

(options would be set back to the system defaults across exec, and
perhaps also when called with all params zeros except "level")
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
Daniel Carosone
2007-07-05 06:17:48 UTC
Permalink
Post by Greg A. Woods
That might be easier to implement inside the kernel, but wouldn't it be
Which would set the default option(s) for all new sockets created by the
process until the next call to it?
This makes a process-wide default; the other way a process can keep
around multiple templates and clone the one it needs each time.

--
Dan.
Jason Thorpe
2007-07-05 15:14:34 UTC
Permalink
Post by Greg A. Woods
Subject: Proposal: socketfrom()
Post by Thor Lancelot Simon
I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/
accept() make
possible for the other direction. I'm thinking of something along the lines
int socketfrom(int template, int domain, int type, int protocol);
That might be easier to implement inside the kernel, but wouldn't it be
set
setsockopt_default(int level,
int optname,
const void *optval,
socklen_t optlen);
Which would set the default option(s) for all new sockets created by the
process until the next call to it?
Not a great API, IMO, for many apps. What if you have connections
using several different AFs / protocols within an app -- surely most
of the options won't be applicable across AFs...
Post by Greg A. Woods
(options would be set back to the system defaults across exec, and
perhaps also when called with all params zeros except "level")
--
Greg A. Woods
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Simon Burge
2007-07-05 06:27:53 UTC
Permalink
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
I could partially address this by adding a system call that sets multiple
socket options at once (which, I think, would be a better API than
setsockopt() anyway) but that gets rid of _all but one_ system call to
set up the socket before connect(); I want to get rid of them all.
I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/accept() make
possible for the other direction. I'm thinking of something along the lines
int socketfrom(int template, int domain, int type, int protocol);
Which would return a new socket using the socket options already set on
socket "template". If domain, type, and protocol don't match, this is
an error (or perhaps it would be best to omit them entirely and just
have one argument, the template socket.
Opinions?
My initial thoughts are that this is a special case system call that
would be there to speed up one application and wouldn't have much use
in general, and that we'd open the floodgates to keep on adding system
calls for the next performance problem and so on.

Is there any prior art for this sort of thing? Googling for
"socketfrom" didn't return anything useful.

Cheers,
Simon.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2007-07-05 06:38:54 UTC
Permalink
Post by Thor Lancelot Simon
int socketfrom(int template, int domain, int type, int protocol);
Which would return a new socket using the socket options already set
on socket "template". If domain, type, and protocol don't match,
this is an error (or perhaps it would be best to omit them entirely
and just have one argument, the template socket.
Opinions?
Sort of like dup() but at the socket layer instead of the
open-file-table layer. I like it, though I'd do as you suggest and
skip everything but the template socket.

I don't like Greg Woods's idea of having process-default socket
options, because I much prefer to have some way to name this collection
of options, rather than having just one set of default options for a
whole process. Using a socket on which they have been set isn't the
best (it uses up a file descriptor relatively unnecessarily), but it's
not unreasonable, and I think it's cleaner than inventing a new kind of
API object for the purpose. I also don't like imposing the same set of
options on sockets of all types.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael Richardson
2007-07-05 13:46:46 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
der> I don't like Greg Woods's idea of having process-default socket
der> options, because I much prefer to have some way to name this
der> collection of options, rather than having just one set of
der> default options for a whole process. Using a socket on which

I agree with you.

Having said that, the ability to make some template socket the
default, and to be able to inheirit that would be cool. That would
permit an entire tree of processes to be setup to make connections from
a particular IP on a multihomed host.

So, I think that Thor's proposal makes a good step, which can be
extended later to do something more sophisticated.

- --
] Bear: "Me, I'm just the shape of a bear." | firewalls [
] Michael Richardson, Xelerance Corporation, Ottawa, ON |net architect[
] ***@xelerance.com http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [





-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Finger me for keys

iQEVAwUBRoz2Q4CLcPvd0N1lAQI5vAgAjwebpaH1sq/uFIASnNxLt4sFwx8vLxpa
qqdlOKle0R8oXyuPyznHDCRKGt2SoEXih/fghIR+yMvvNlrLg+BIqaVuWCK1q52Z
NcmyOb0q0oBgNvVJRfPSmq+LCBxj4OMwHWqOBn3/X59AvTBw4/HSFx3+iAflnyDh
br89vRf3ntKkOgmsC3MvNEi36ujcqkmzm9w/z80uvbQIlk7hD1ZXV88dgJlHcpsG
/MQauXcdvIzciKtzWEi9GBsSW328vnZsLLBch2CNOGefSPTQEGqyoCgGyTLFvBrP
MnDcCN5wjaABYQJ8wPYBP+eou20HIVVp66E5SyOVBRqIAr3KDXc9sg==
=oNcM
-----END PGP SIGNATURE-----

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg A. Woods
2007-07-05 18:24:24 UTC
Permalink
At Thu, 5 Jul 2007 02:38:54 -0400 (EDT), der Mouse wrote:
Subject: Re: Proposal: socketfrom()
Post by der Mouse
I don't like Greg Woods's idea of having process-default socket
options, because I much prefer to have some way to name this collection
of options, rather than having just one set of default options for a
whole process.
I'm not so sure why having default "options" for a process is such a bad
thing.

You can change the defaults at any time again and then still create new
sockets with the "new" defaults.

The concept of "defaults" for socket options for a process already
effectively exists -- it's just that they're currently the same for all
processes all of the time. All I'm suggesting is a way to change them,
and only for the scope of the current process and only for sockets yet
to be created.

From a user's perspective it always makes much more sense to be able to
change the defaults than it does to add yet another magic way to do
something which there's already a very well established way of doing.
Post by der Mouse
I also don't like imposing the same set of
options on sockets of all types.
Well that wasn't actually an intention of my proposal, just a side
effect of my hasty design of the function call signature.
Post by der Mouse
Using a socket on which they have been set isn't the
best (it uses up a file descriptor relatively unnecessarily), but it's
not unreasonable, and I think it's cleaner than inventing a new kind of
API object for the purpose.
Perhaps, BUT, if the API and its semantics are not VERY carefully
thought out then the result will be that you'll have these magic new
types of file descriptors that look and feel almost exactly like every
other file descriptor. That would be a far less than ideal result.

Jason's socketlike() [I greatly prefer dup_socket()] and the way he
described it and it was elaborated upon is the closest way I've seen
from the discussion today to avoid ending up with a magic new type of
handle that looks just like every other handle while still being able to
use a socket itself as the "template".

Other than of course my idea of just changing the current defaults. ;-)

I agree that a new type of template object doesn't really solve anything
and just makes everything much more complicated for no good reason.
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
David Maxwell
2007-07-05 18:48:34 UTC
Permalink
Post by Greg A. Woods
Subject: Re: Proposal: socketfrom()
Post by der Mouse
I don't like Greg Woods's idea of having process-default socket
options, because I much prefer to have some way to name this collection
of options, rather than having just one set of default options for a
whole process.
I'm not so sure why having default "options" for a process is such a bad
thing.
You can change the defaults at any time again and then still create new
sockets with the "new" defaults.
It sounded like Thor's application creates new sockets often enough that
it suffer from not having defaults. So, per-process defaults might work
in his case, but it's not hard to imagine a more general solution where
an application creates two different types of sockets often - maybe even
interleaving them - and having to switch the defaults back and forth
every time wouldn't help much.
--
David Maxwell, ***@vex.net|***@maxwell.net --> Mastery of UNIX, like
mastery of language, offers real freedom. The price of freedom is always dear,
but there's no substitute. Personally, I'd rather pay for my freedom than live
in a bitmapped, pop-up-happy dungeon like NT. - Thomas Scoville

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Sam Leffler
2007-07-07 20:00:26 UTC
Permalink
Post by Greg A. Woods
Subject: Re: Proposal: socketfrom()
Post by Thor Lancelot Simon
They do, and always were intended to -- but we had a bug for many years
that caused the watermarks and timeouts to not be copied. I just fixed
it a few days ago.
Notoriously, Linux does not propagate this state from the listen socket
to the accepted sockets, though the Berkeley documentation has always
been pretty clear that that is part of the API.
I don't necessarily want to make excuses for Linux, but this fact is
certainly not documented very well, just as the Linux (mis)behaviour
attests. Saying simply that the new socket has the same "properties" as
the listen socket is very vague, and it wasn't enough to clue me in.
I did finally find a concrete reference. Stevens said in UNIX Network
The following socket options are inherited by a connected TCP
SO_DEBUG, SO_DONTROUTE, SO_KEEPALIVE, SO_LINGER, SO_OOBLINE,
SO_RRVBUF, and SO_SNDBUF.
(there is also a rationale given for why some options must be inherited
too)
It's no wonder the Linux kernel isn't doing the same thing -- they're
not likely to reproduce behaviour that's not documented clearly in the
manual page.
"properties" could be read to only include the socket type and the local
address the socket is bound to. After all getsockopt(2) is quite
careful to use only the word "options" and never the word "properties".
An option setting could be a property of something, but nothing ever
really comes out and says that they are for sure.
Hmm, I thought it was pretty clear in the original sockets api
documentation I wrote but it has been a while and many hands have
touched things...

Sam


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg A. Woods
2007-07-08 20:15:00 UTC
Permalink
At Sat, 07 Jul 2007 13:00:26 -0700, Sam Leffler wrote:
Subject: Re: Proposal: socketfrom()
Post by Sam Leffler
Hmm, I thought it was pretty clear in the original sockets api
documentation I wrote but it has been a while and many hands have
touched things...
Where might I find something close to the original, either online or in
printed form?

Which reminds me, I need to get a copy of the third edition of the UNIX
Network Programming set! :-)

Looks like the POSIX folks read accept(2) the same way I did and
interpreted "properties" to mean only the socket type, protocol, and
address family:

http://www.opengroup.org/onlinepubs/009695399/functions/accept.html

The accept() function shall extract the first connection on the
queue of pending connections, create a new socket with the same
socket type protocol and address family as the specified socket,
and allocate a new file descriptor for that socket.

Sadly the socket API documented by Douglas Comer and David Stevens in
their Internetworking with TCP/IP Vol 3 (client-server programming and
applications, Linux/POSIX Sockets Version) barely even mentions socket
options, and certainly doesn't mention them in context of the accept()
call.

Now that I take the time to look, I see that the Linux accept(2) manual
page does explicitly deny inheritance of at least F_SETFL flags by the
new connected socket:

Note that any per file descriptor flags (everything that can be
set with the F_SETFL fcntl(), like non blocking or async state)
are not inherited across an accept.

The Linux accept(2) manual page then goes on to claim conformance with
4.4BSD with the following caveat:

Linux accept does _not_ inherit socket flags like
O_NONBLOCK. This behaviour differs from other BSD socket
implementations. Portable programs should not rely on this
behaviour and always set all required flags on the socket
returned from accept.
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
David Brownlee
2007-07-05 19:51:38 UTC
Permalink
Post by Greg A. Woods
Subject: Re: Proposal: socketfrom()
Post by der Mouse
I don't like Greg Woods's idea of having process-default socket
options, because I much prefer to have some way to name this collection
of options, rather than having just one set of default options for a
whole process.
I'm not so sure why having default "options" for a process is such a bad
thing.
You can change the defaults at any time again and then still create new
sockets with the "new" defaults.
The concept of "defaults" for socket options for a process already
effectively exists -- it's just that they're currently the same for all
processes all of the time. All I'm suggesting is a way to change them,
and only for the scope of the current process and only for sockets yet
to be created.
From a user's perspective it always makes much more sense to be able to
change the defaults than it does to add yet another magic way to do
something which there's already a very well established way of doing.
It means any library that craetes a socket, including third
party libraries, needs to know about this feature and
specifically code around it.
--
David/absolute -- www.NetBSD.org: No hype required --

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2007-07-05 21:15:37 UTC
Permalink
Post by Greg A. Woods
[...per-process default socket options...]
I'm not so sure why having default "options" for a process is such a
bad thing.
It means any library that craetes a socket, including third party
libraries, needs to know about this feature and specifically code
around it.
Hm. I hadn't thought about that.

This is a reason to have process-wide defaults, but it's also a reason
to have named sets of defaults that do not apply to naïvely-created
sockets.

In particular, you want to avoid process-wide defaults when you just
want to affect your own sockets, not those created by library routines;
but you *do* want process-wide defaults if you want certain options to
apply to *all* networking your process does, including that done by
not-yours library routines.

Is having this flexibility actually worth the complexity it would add?
That's debatable. Personally, I think it's not; I'd dump process-wide
defaults, myself.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg A. Woods
2007-07-05 21:54:36 UTC
Permalink
At Thu, 5 Jul 2007 14:48:34 -0400, David Maxwell wrote:
Subject: Re: Proposal: socketfrom()
Post by David Maxwell
It sounded like Thor's application creates new sockets often enough that
it suffer from not having defaults. So, per-process defaults might work
in his case, but it's not hard to imagine a more general solution where
an application creates two different types of sockets often - maybe even
interleaving them - and having to switch the defaults back and forth
every time wouldn't help much.
I'm having a bit of a hard time imagining such an application.... :-)

However, in such an application where different types of sockets are
created often though there may not be any need to change the defaults?.

This changing of the defaults idea is really only to solve performance
problems where nearly identical sockets are created in great numbers and
where the system-wide socket option defaults are radically different
from what is required (i.e. where many setsockopt() calls would be
necessary for each socket being created).

I have less trouble imagining other applications like Thor's, though the
one or two similar to what I imagine and which I've examined (including
"pen") don't set any options on the sockets they create frequently.

Unless perhaps you're talking about options for sockets created by
accept()? Changing the defaults for the process works for them too, and
perhaps that's the only way to do it without yet another system call.
If the loop is "accept(); ... socket();" and each needs different
options then my "defaults" idea still works assuming it's interface also
allows the user to set different defaults for accept() vs. socket()
sockets (and of different socket types, AF_*'s, etc., etc., etc.).

I don't know off-hand of any applications which call setsockopt() on
sockets created by accept(), though of course there may be many I don't
know of.
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
Greg A. Woods
2007-07-05 21:58:40 UTC
Permalink
At Thu, 5 Jul 2007 20:51:38 +0100 (BST), David Brownlee wrote:
Subject: Re: Proposal: socketfrom()
Post by David Brownlee
It means any library that craetes a socket, including third
party libraries, needs to know about this feature and
specifically code around it.
Or rather that any application making use of the setsockopt_defaults()
facility know about all library routines it uses which could create a
socket and then reset the defaults back to "normal" for the duration of
the library call..... :-)

(assuming the new defaults would be unworkable for the library call)

Not ideal if those calls are interleaved with socket() or accept() calls
mind you, but perhaps pairs of setsockopt_defaults() would be faster
than many setsockopt() calls and so would still be a win....
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
der Mouse
2007-07-05 22:08:42 UTC
Permalink
Post by Greg A. Woods
[I]t's not hard to imagine a more general solution where an
application creates two different types of sockets often - maybe
even interleaving them - and having to switch the defaults back and
forth every time wouldn't help much.
I'm having a bit of a hard time imagining such an application.... :-)
Consider something that creates a lot of transient connections, but
always in pairs - one long-haul, with option set A, and the other
local, with option set B.
Post by Greg A. Woods
Unless perhaps you're talking about options for sockets created by
accept()? Changing the defaults for the process works for them too,
Does it? accept() currently, I think, inherits options settings and
such from the listening socket (and thus already has a template socket
built in). There's no need to setsockopt on accept()-created sockets
unless you want the new connection's socket to have different options
from those on the listening socket - which basically means, there's
some option with meaning both for listening and connected sockets, but
you want them different. I'm having trouble thinking of an example,
but that could just be betraying my ignorance of the variety of socket
options out there.
Post by Greg A. Woods
I don't know off-hand of any applications which call setsockopt() on
sockets created by accept(), though of course there may be many I
don't know of.
I have the feeling I may have written one once, but, looking over the
list of socket options, I can't think which option(s) that could have
been, so I may be conflating it with other "options" such as O_NONBLOCK
that aren't really covered by this discussion.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2007-07-05 23:06:56 UTC
Permalink
Post by Greg A. Woods
Subject: Re: Proposal: socketfrom()
Post by David Maxwell
It sounded like Thor's application creates new sockets often enough that
it suffer from not having defaults. So, per-process defaults might work
in his case, but it's not hard to imagine a more general solution where
an application creates two different types of sockets often - maybe even
interleaving them - and having to switch the defaults back and forth
every time wouldn't help much.
I'm having a bit of a hard time imagining such an application.... :-)
Hint: if I understand it correctly, your proposal would break the
resolver library.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg A. Woods
2007-07-06 00:25:17 UTC
Permalink
At Thu, 5 Jul 2007 19:06:56 -0400, Thor Lancelot Simon wrote:
Subject: Re: Proposal: socketfrom()
Post by Thor Lancelot Simon
Post by Greg A. Woods
Subject: Re: Proposal: socketfrom()
Post by David Maxwell
It sounded like Thor's application creates new sockets often enough that
it suffer from not having defaults. So, per-process defaults might work
in his case, but it's not hard to imagine a more general solution where
an application creates two different types of sockets often - maybe even
interleaving them - and having to switch the defaults back and forth
every time wouldn't help much.
I'm having a bit of a hard time imagining such an application.... :-)
Hint: if I understand it correctly, your proposal would break the
resolver library.
Hmmm, I suppose it could cause some funny business, assuming the
application set some option for SOCK_DGRAM sockets that wouldn't work so
well for the purposes of resolver(3) and also didn't realize the
potential conflict.

It's still not too hard for me to imagine such an application
temporarily revoking the custom defaults prior to using resolver(3) (or
anything similar, and re-instating them later), or indeed always doing
so during the execution of any thread with the sole purpose of
interacting with such a set of library functions.

I would think though that any application that currently made so many
socket() and setsockopt() calls that some performance enhancement was
needed would have a hard time finding the time to do DNS lookups,
especially within its main loop of execution. Does your application do
DNS at the same time it's doing all the socket creation and setup?
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
Greg A. Woods
2007-07-06 00:39:39 UTC
Permalink
At Thu, 5 Jul 2007 18:08:42 -0400 (EDT), der Mouse wrote:
Subject: Re: Proposal: socketfrom()
Post by der Mouse
Does it? accept() currently, I think, inherits options settings and
such from the listening socket (and thus already has a template socket
built in). There's no need to setsockopt on accept()-created sockets
unless you want the new connection's socket to have different options
from those on the listening socket - which basically means, there's
some option with meaning both for listening and connected sockets, but
you want them different. I'm having trouble thinking of an example,
but that could just be betraying my ignorance of the variety of socket
options out there.
I'm not even sure that sockets returned from accept() do in fact inherit
their options from the listening socket. A quick browse of the accept()
code and annotations in TCP/IP Illustrated Vol2 suggests not, but I may
be mistaken.

Perhaps SO_KEEPALIVE and maybe SO_DEBUG would be set only on the
accept()ed sockets, or maybe even only on some of them? I dunno -- I'm
just grasping at examples out of thin air.

There's also the potential need to adjust buffer sizes and hi/lo-water
marks, timeouts, etc., all on a per-accept() basis, though again I'm
just thinking out loud....
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
Thor Lancelot Simon
2007-07-06 02:14:07 UTC
Permalink
Post by Greg A. Woods
I'm not even sure that sockets returned from accept() do in fact inherit
their options from the listening socket. A quick browse of the accept()
code and annotations in TCP/IP Illustrated Vol2 suggests not, but I may
be mistaken.
They do, and always were intended to -- but we had a bug for many years
that caused the watermarks and timeouts to not be copied. I just fixed
it a few days ago.

Notoriously, Linux does not propagate this state from the listen socket
to the accepted sockets, though the Berkeley documentation has always
been pretty clear that that is part of the API. I don't think they even
propagate whether the socket's blocking or not -- which is why you see
modern application code doing eight setsockopts and a fcntl every time
it accepts a connection, ugh.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Maxwell
2007-07-06 02:47:12 UTC
Permalink
Post by Greg A. Woods
Post by David Maxwell
in his case, but it's not hard to imagine a more general solution where
an application creates two different types of sockets often - maybe even
interleaving them - and having to switch the defaults back and forth
every time wouldn't help much.
I'm having a bit of a hard time imagining such an application.... :-)
Hmm. Maybe a bittorrent-like client, that is creating connections to
peers for data transfer, and connections which want different options
for control channels?

Or a distributed conferencing application - asterisk + video +
whiteboarding...

I didn't have something specific in mind - just was thinking that at the
time one talks about adding syscalls, it's nice to get it right the
first time instead of having another 'Oh, I need something new' request
in the short-term.
--
David Maxwell, ***@vex.net|***@maxwell.net -->
(About an Amiga rendering landscapes) It's not thinking, it's being artistic!
- Jamie Woods

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg A. Woods
2007-07-06 21:33:40 UTC
Permalink
At Thu, 5 Jul 2007 22:14:07 -0400, Thor Lancelot Simon wrote:
Subject: Re: Proposal: socketfrom()
Post by Thor Lancelot Simon
They do, and always were intended to -- but we had a bug for many years
that caused the watermarks and timeouts to not be copied. I just fixed
it a few days ago.
Notoriously, Linux does not propagate this state from the listen socket
to the accepted sockets, though the Berkeley documentation has always
been pretty clear that that is part of the API.
I don't necessarily want to make excuses for Linux, but this fact is
certainly not documented very well, just as the Linux (mis)behaviour
attests. Saying simply that the new socket has the same "properties" as
the listen socket is very vague, and it wasn't enough to clue me in.

I did finally find a concrete reference. Stevens said in UNIX Network
Programming Vol. 1, 2nd edition, p. 183 section 7.4:

The following socket options are inherited by a connected TCP
socket from the listening socket (pp. 462-463 of TCPv2):
SO_DEBUG, SO_DONTROUTE, SO_KEEPALIVE, SO_LINGER, SO_OOBLINE,
SO_RRVBUF, and SO_SNDBUF.

(there is also a rationale given for why some options must be inherited
too)

It's no wonder the Linux kernel isn't doing the same thing -- they're
not likely to reproduce behaviour that's not documented clearly in the
manual page.

"properties" could be read to only include the socket type and the local
address the socket is bound to. After all getsockopt(2) is quite
careful to use only the word "options" and never the word "properties".
An option setting could be a property of something, but nothing ever
really comes out and says that they are for sure.
--
Greg A. Woods

H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <***@robohack.ca>
Planix, Inc. <***@planix.com> Secrets of the Weird <***@weird.com>
David Laight
2007-07-05 06:54:50 UTC
Permalink
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
That sounds as though there is something badly wrong in the code
paths somewhere.
Even allowing for the system calls costs, I'm surprised that
setsockopt() gets anywhere near the cost of connect().

David
--
David Laight: ***@l8s.co.uk

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2007-07-05 07:02:15 UTC
Permalink
Post by David Laight
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very
high rate, so high that the overhead of additional system calls to
set socket options considerably impacts performance.
That sounds as though there is something badly wrong in the code
paths somewhere.
Maybe, but I'm not so sure.
Post by David Laight
Even allowing for the system calls costs, I'm surprised that
setsockopt() gets anywhere near the cost of connect().
Some processors make crossing the kernel/user boundary very expensive.
That crossing can be a substantial portion of the cost of a call such
as setsockopt() - or even connect(), especially async connect() (which
is probably what's going on).

Upon considering that Thor's code was probably doing at least a
half-dozen setsockopt() calls for each socket, I find it totally
plausible that doing (say) eight syscalls instead of two for each
connection could "considerably" impact performance.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Trevor Talbot
2007-07-05 06:59:47 UTC
Permalink
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
int socketfrom(int template, int domain, int type, int protocol);
Which would return a new socket using the socket options already set on
socket "template". If domain, type, and protocol don't match, this is
an error (or perhaps it would be best to omit them entirely and just
have one argument, the template socket.
Windows XP/2003 era has a notion of "reusable sockets". A bunch of
caveats apply, but the general notion is that you can disconnect a
socket a certain way, and then reuse it in a call to
accept/connect/etc. Socket creation itself is rather expensive on
Windows, which is what this is meant to help, but it keeps the socket
options around too. Applications would generally use this by
maintaining a pool of recently-disconnected sockets.

I don't know how hard this would be to implement, or if it would help
your application, but it's another idea.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matthias Scheler
2007-07-05 11:50:31 UTC
Permalink
Post by Thor Lancelot Simon
int socketfrom(int template, int domain, int type, int protocol);
Is there any other useful application than using this for outbound
connection?

If there isn't what about this:

int connectfrom(int template, int domain, int type, int protocol,
struct sockaddr *name, socklen_t namelen);

Kind regards
--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2007-07-05 14:59:59 UTC
Permalink
Post by Matthias Scheler
Post by Thor Lancelot Simon
int socketfrom(int template, int domain, int type, int protocol);
Is there any other useful application than using this for outbound
connection?
I don't see one offhand, but I wouldn't want to assume there aren't
any; there are socket types that don't need connect() - UDP, raw
sockets, and special cases like routing sockets. I'm not convinced
they need this kind of socket cloning, but I'm not convinced they
don't, either, and I can easily imagine it being useful for UDP.
Post by Matthias Scheler
int connectfrom(int template, int domain, int type, int protocol,
struct sockaddr *name, socklen_t namelen);
I still think the domain/type/protocol should be inherited from the
template socket rather than passed in.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2007-07-05 15:11:17 UTC
Permalink
Post by Matthias Scheler
Post by Thor Lancelot Simon
int socketfrom(int template, int domain, int type, int protocol);
Is there any other useful application than using this for outbound
connection?
You can use it for UDP sockets where there's no connect(), at least.

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2007-07-05 15:20:22 UTC
Permalink
Post by Matthias Scheler
Post by Thor Lancelot Simon
int socketfrom(int template, int domain, int type, int protocol);
Is there any other useful application than using this for outbound
connection?
int connectfrom(int template, int domain, int type, int protocol,
struct sockaddr *name, socklen_t namelen);
I'm not sure "connectfrom()" is a great name for this. It makes it
sound more like a bind / connect in one call.
Post by Matthias Scheler
Kind regards
--
Matthias Scheler http://
zhadum.org.uk/
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matthias Scheler
2007-07-05 15:30:25 UTC
Permalink
Post by Thor Lancelot Simon
Post by Matthias Scheler
Is there any other useful application than using this for outbound
connection?
You can use it for UDP sockets where there's no connect(), at least.
But why would you need several UDP sockets unless you are using them
in connected mode?

Kind regards
--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2007-07-05 17:09:13 UTC
Permalink
Post by Matthias Scheler
Post by Thor Lancelot Simon
You can use it for UDP sockets where there's no connect(), at least.
But why would you need several UDP sockets unless you are using them
in connected mode?
One per internal subsystem using them?

One per bound address?

Just the first two possible reasons that occur to me.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matthias Scheler
2007-07-05 15:31:15 UTC
Permalink
Post by Jason Thorpe
Post by Matthias Scheler
int connectfrom(int template, int domain, int type, int protocol,
struct sockaddr *name, socklen_t namelen);
I'm not sure "connectfrom()" is a great name for this. It makes it
sound more like a bind / connect in one call.
Yes, I agree it's a bad name. Maybe dup_connect()?

Kind regards
--
Matthias Scheler http://zhadum.org.uk/

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2007-07-05 15:03:35 UTC
Permalink
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
I could partially address this by adding a system call that sets multiple
socket options at once (which, I think, would be a better API than
setsockopt() anyway) but that gets rid of _all but one_ system call to
set up the socket before connect(); I want to get rid of them all.
I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/accept() make
possible for the other direction. I'm thinking of something along the lines
int socketfrom(int template, int domain, int type, int protocol);
Which would return a new socket using the socket options already set on
socket "template". If domain, type, and protocol don't match, this is
an error (or perhaps it would be best to omit them entirely and just
have one argument, the template socket.
Opinions?
I dislike this approach becaues it is a new way to create a socket.

Currently you create a socket with socket(), have the system allocate
you a new one with accept() and...I think that is the limit.

I would rather see something like:

setsockopt(fd, SOL_SOCKET, SO_TEMPLATE, &template, sizeof(template))

The next question is then how to define the "template" data structure
in such a way that it is extensible for arbitrary sockets.

Another reason that I prefer this approach because it is easier to
learn or adapt to use. Also when reading new code, someone is
confronted with a "wtf is this socket option" rather than "wtf is
this socketfrom" and goes looking for socketfrom() in your app because
it isn't a widely known system call.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2007-07-05 15:11:19 UTC
Permalink
Post by Darren Reed
Post by Thor Lancelot Simon
int socketfrom(int template, int domain, int type, int protocol);
I dislike this approach becaues it is a new way to create a socket.
Currently you create a socket with socket(), have the system allocate
you a new one with accept() and...I think that is the limit.
Yes, this is a third way - and what's wrong with that?
Post by Darren Reed
setsockopt(fd, SOL_SOCKET, SO_TEMPLATE, &template, sizeof(template))
The next question is then how to define the "template" data structure
in such a way that it is extensible for arbitrary sockets.
My own suggestion would be to make it an opaque-to-userland blob
created by the kernel with something like getsockopt(SO_GET_TEMPLATE).

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael Richardson
2007-07-05 16:19:15 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Darren> Another reason that I prefer this approach because it is easier to
Darren> learn or adapt to use. Also when reading new code, someone is
Darren> confronted with a "wtf is this socket option" rather than "wtf is
Darren> this socketfrom" and goes looking for socketfrom() in your
Darren> app because
Darren> it isn't a widely known system call.

I've always found documentation for socket options to be very hard to
find.

- --
] Bear: "Me, I'm just the shape of a bear." | firewalls [
] Michael Richardson, Xelerance Corporation, Ottawa, ON |net architect[
] ***@xelerance.com http://www.sandelman.ottawa.on.ca/mcr/ |device driver[
] panic("Just another Debian GNU/Linux using, kernel hacking, security guy"); [



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Finger me for keys

iQEVAwUBRo0aAoCLcPvd0N1lAQJhwwf8DqlBF6E+LYfXnvK899Aa0ZgI2I8fZYbI
tFLbTyxk0vnNyFZuMhkM+ovVCIUA76B89F3z1i4Z3OOjz6TPjwMIZZDupuQbTX4T
YPdYuWy1DLHCle9X+QQZhccv3NywhyHxX6NDXo5sAh+0cxtXOJT25+ZuUgZVnQQm
kc/EXhXLgSGm4sReg9Vk5XQ384Q6bBcVCBtrJmKTJd5MLZDLz62Cag/EOU6lcrhk
aEmzNS1iIFW+ATDJaW4KVn9wHJUF9Muol5W9HNlHDjL1SmT4Em8bu6MBFEcweR6t
cksS7dnxjEWE14B7n2Hc6yQkBKJgAM2GROrR7e5LD2yqbU59ARTHhw==
=IG1m
-----END PGP SIGNATURE-----

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2007-07-06 19:00:57 UTC
Permalink
Post by Darren Reed
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
I could partially address this by adding a system call that sets multiple
socket options at once (which, I think, would be a better API than
setsockopt() anyway) but that gets rid of _all but one_ system call to
set up the socket before connect(); I want to get rid of them all.
I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/accept() make
possible for the other direction. I'm thinking of something along the lines
int socketfrom(int template, int domain, int type, int protocol);
Which would return a new socket using the socket options already set on
socket "template". If domain, type, and protocol don't match, this is
an error (or perhaps it would be best to omit them entirely and just
have one argument, the template socket.
Opinions?
I dislike this approach becaues it is a new way to create a socket.
Currently you create a socket with socket(), have the system allocate
you a new one with accept() and...I think that is the limit.
setsockopt(fd, SOL_SOCKET, SO_TEMPLATE, &template, sizeof(template))
The next question is then how to define the "template" data structure
in such a way that it is extensible for arbitrary sockets.
Another reason that I prefer this approach because it is easier to
learn or adapt to use. Also when reading new code, someone is
confronted with a "wtf is this socket option" rather than "wtf is
this socketfrom" and goes looking for socketfrom() in your app because
it isn't a widely known system call.
You can pass the template socket fd in as the template :-)

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2007-07-05 15:18:48 UTC
Permalink
Post by David Laight
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
That sounds as though there is something badly wrong in the code
paths somewhere.
Even allowing for the system calls costs, I'm surprised that
setsockopt() gets anywhere near the cost of connect().
Lots and lots of calls?
Post by David Laight
David
--
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2007-07-05 15:23:08 UTC
Permalink
Post by Thor Lancelot Simon
int socketfrom(int template, int domain, int type, int protocol);
Instead of "socketfrom()", what about "socketlike()"? That's what
you're asking for .. "give me a socket just like this one". Drop
everything except for the "template" argument. Have it inherit not
only options, but also any bind that had been done on the template
socket.

-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Martin Husemann
2007-07-05 15:28:19 UTC
Permalink
Post by Jason Thorpe
Instead of "socketfrom()", what about "socketlike()"? That's what
you're asking for .. "give me a socket just like this one". Drop
everything except for the "template" argument. Have it inherit not
only options, but also any bind that had been done on the template
socket.
I like this variant.

Martin

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matt Thomas
2007-07-05 18:00:27 UTC
Permalink
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
I could partially address this by adding a system call that sets multiple
socket options at once (which, I think, would be a better API than
setsockopt() anyway) but that gets rid of _all but one_ system call to
set up the socket before connect(); I want to get rid of them all.
I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/accept() make
possible for the other direction. I'm thinking of something along the lines
int socketfrom(int template, int domain, int type, int protocol);
Which would return a new socket using the socket options already set on
socket "template". If domain, type, and protocol don't match, this is
an error (or perhaps it would be best to omit them entirely and just
have one argument, the template socket.
new_fd = socketclone(int template_fd);

Would be my suggestion. What about non-socket level options (IP, TCP, etc.)?
--
Matt Thomas email: ***@3am-software.com
3am Software Foundry www: http://3am-software.com/bio/matt/
Cupertino, CA disclaimer: I avow all knowledge of this message.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Thor Lancelot Simon
2007-07-05 18:04:22 UTC
Permalink
Post by Matt Thomas
new_fd = socketclone(int template_fd);
Would be my suggestion. What about non-socket level options (IP, TCP, etc.)?
My intent was to copy them across as well. Do I understand correctly
from my earlier discussion of accept() with you that these are in fact
encoded in so_state?
--
Thor Lancelot Simon ***@rek.tjls.com

"The inconsistency is startling, though admittedly, if consistency is to
be abandoned or transcended, there is no problem." - Noam Chomsky

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2007-07-06 03:51:06 UTC
Permalink
Post by Thor Lancelot Simon
I have an application that makes outbound TCP connections at a very high
rate, so high that the overhead of additional system calls to set socket
options considerably impacts performance.
I could partially address this by adding a system call that sets multiple
socket options at once (which, I think, would be a better API than
setsockopt() anyway) but that gets rid of _all but one_ system call to
set up the socket before connect(); I want to get rid of them all.
I'd like to make it possible to set options on one "template" or "master"
socket and then have them inherited by children, as listen()/accept() make
possible for the other direction. I'm thinking of something along the lines
int socketfrom(int template, int domain, int type, int protocol);
Which would return a new socket using the socket options already set on
socket "template". If domain, type, and protocol don't match, this is
an error (or perhaps it would be best to omit them entirely and just
have one argument, the template socket.
Opinions?
Thor, if someone came to you and said they wanted to add this
system call to NetBSD, what would your reflex reaction be?

Wind back the clock n years, or whatever it takes to be at a point
in time where you didn't have this problem. Would your instinctive
reaction be "yes, add a new system call"? And if so, would you
accept the proposal given so far?

To put your proposal in proper context:
- you have a specialised app and haven't given us any details about it
- you have a problem that nobody else does (that we're aware of)
AND
- you want to add a new system call to make *it* faster.

Forgive me for being cynical, but I think if some ordinary/unknown
user came along and presented the same case you are, the response
would be a bit different.

That said, it's stupid to ignore the idea that there is a problem
here that needs attention.

The proposal that far has been to create a system call that clones
a socket - well almost. It clones *only* the socket options. What
would happen if it is called after bind()? Or even after connect()?
Are addresses copied over too? If they weren't, is that an intuitive
leap from the API presented? Or are there different failure modes
introduced if called aftre bind/connect/listen?

How different is the behaviour of one of these calls to using dup(2)
on an unbound/unconnected socket? Is there some reason that
dup(2) shouldn't work as desired here? If it did, would that break
applications that use dup(2) today with sockets?

In the case of dup, the usual problem is that it just creates a new
reference (fd) to the same file as the original fd. By this line of
thinking, creating a socket_dup() or even dup_socket() seems
like a confusing path to take.

The idea with most merit that I've seen is to be able to save and
restore socket option state. Save it into a binary blob using a call
to getsockopt() and apply it to another with setsockopt(). I think
there's much more programmability and usefulness with this model
than any other - it lets me apply the socket options to fd's that get
passed in from other processes amongst other things.

The only downside from Thor's point of view is that this would still
be 2 calls, and not 1, to create the socket - but still less than the 4
or 5 (guess) he is likely to be using now. Yes, the socket options
may not be documented well, but if the properly architected solution
to this problem lays down that path, the state of documentation should
not be a barrier.

Comments?

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2007-07-06 03:59:30 UTC
Permalink
Post by Darren Reed
How different is the behaviour of one of these calls to using dup(2)
on an unbound/unconnected socket?
It's the difference between dup()ing a fd open onto a file versus
opening a new fd onto the file.

It's the difference between dup()ing a socket versus calling socket()
plus a stream of setsockopt()s (and maybe others - what else, if
anything, is yet to be hashed out).
Post by Darren Reed
Is there some reason that dup(2) shouldn't work as desired here?
In the case of dup, the usual problem is that it just creates a new
reference (fd) to the same file
(socket, not file, in this case)
Post by Darren Reed
as the original fd. By this line of thinking, creating a
socket_dup() or even dup_socket() seems like a confusing path to
take.
Only if you name it in a way that suggests dup() has anything to do
with it. Think shallow copy versus deep copy. ln -s versus cp -r.

This is not to say I totally disagree with you. Encapsulating socket
options into blobs is an attractive approach to me as well, and, as you
point out, it gives abilities not available via socketclone() or
whatever is the principal call under discussion here.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Daniel Carosone
2007-07-06 04:43:27 UTC
Permalink
Post by Darren Reed
The idea with most merit that I've seen is to be able to save and
restore socket option state. Save it into a binary blob using a call
to getsockopt() and apply it to another with setsockopt().
I don't really like this idea, for a couple of reasons:

- the "opaque blob" may be manipulated by a malicious program, we
need to be defensive about not trusting the content in the kernel
once it's handed back to us.

- to try and avoid this, if the blob is to be a reference to some
other storage in the kernel where the real data is, we now have
another namespace, maybe more interface to manipulate the options
stored in the blobs, we have storage management to deal with, etc.

Frankly, if you're going to pass something to represent a set of
options to copy, why not just pass the fd of the socket that has
those options set? That seems like the place to store them without
any of the above issues (or at least, well defined semantics and
tested implementations of those issues :-).

The question then becomes where to put the operation that does this
copying:

- in a new syscall, as Thor proposes, though I prefer socketlike() as
a name, and with the single argument.

- in setsockopt(), with a new variant/magic option that says "make
the options on this socket the same as that socket" and takes
another fd-number as the parameter.

- in the user program, with a different get/setsockprops() interface
(and a blob format whose opacity is the subject of endless debate :)

I like the first as a direct response to the immediate requirement,
but the second or third are more flexible/extensible, in that it could
be applied to existing sockets as well as just at the creation of new
ones, at the cost of one extra syscall in Thor's (and perhaps the most
common?) case.

--
Dan.
der Mouse
2007-07-06 05:02:04 UTC
Permalink
Post by Daniel Carosone
Post by Darren Reed
The idea with most merit that I've seen is to be able to save and
restore socket option state. Save it into a binary blob using a
call to getsockopt() and apply it to another with setsockopt().
- the "opaque blob" may be manipulated by a malicious program, we
need to be defensive about not trusting the content in the kernel
once it's handed back to us.
I don't consider this a big deal. If the opaque blob format is
suitably designed and its handling code is sensible, handing arbitrary
blobs to the kernel cannot do anything that making equally arbitrary
calls to setsockopt couldn't; I see no particular hazard there. The
only difference would be one kernel/user crossing for the whole setting
operation instead of one per option.

I'm thinking something like array of packed struct { unsigned char opt;
int value; } (mutatis mutandis for options not using int), though of
course the details would be private to the kernel. If the kernel finds
an invalid option, or value, in the blob, it would fail the set call,
just as it would a setsockopt call that got passed similar garbage.
Post by Daniel Carosone
- to try and avoid this, if the blob is to be a reference to some
other storage in the kernel where the real data is, [...]
Oh, ugh. I like that about as much as you seem to. :-þ
Post by Daniel Carosone
Frankly, if you're going to pass something to represent a set of
options to copy, why not just pass the fd of the socket that has
those options set?
It requires that that socket still exist. I see no particular reason
to demand one file descriptor be kept open per set of options, rather
than storing just the useful information.

I see no particular reason *not* to provide such an interface, though.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2007-07-06 05:17:04 UTC
Permalink
Post by Daniel Carosone
Post by Darren Reed
The idea with most merit that I've seen is to be able to save and
restore socket option state. Save it into a binary blob using a call
to getsockopt() and apply it to another with setsockopt().
- the "opaque blob" may be manipulated by a malicious program, we
need to be defensive about not trusting the content in the kernel
once it's handed back to us.
It is the job of the implemtation of whatever takes in the blob
to verify it. How is this blob any different to, say, a struct
sockaddr, or a BPF program or...?
Post by Daniel Carosone
- to try and avoid this, if the blob is to be a reference to some
other storage in the kernel where the real data is, we now have
another namespace, maybe more interface to manipulate the options
stored in the blobs, we have storage management to deal with, etc.
You're judging the implementation before it has been prototyped.
Post by Daniel Carosone
Frankly, if you're going to pass something to represent a set of
options to copy, why not just pass the fd of the socket that has
those options set? That seems like the place to store them without
any of the above issues (or at least, well defined semantics and
tested implementations of those issues :-).
Maybe you want to do privilege splitting in the application and don't trust
the "other part" that deals with the "clone" fd's to access the original?

Who knows.
Post by Daniel Carosone
The question then becomes where to put the operation that does this
- in a new syscall, as Thor proposes, though I prefer socketlike() as
a name, and with the single argument.
I've yet to see any generality to this system call that makes it
anything other than a hack to solve Thor's problem. Those
kinds of things belong in ThorBSD.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Daniel Carosone
2007-07-06 05:44:59 UTC
Permalink
Post by der Mouse
I'm thinking something like array of packed struct { unsigned char opt;
int value; } (mutatis mutandis for options not using int)
As I suggested earlier, *if* we were to go down this path, we already
have proplib for marshalling mutatis mutandis. There's more structure
here, remember: options at several protocol levels, for starters.
Post by der Mouse
though of course the details would be private to the kernel.
though of course proplib has the benefit of not being opaque to the
user process (so it could manipulate/filter/examine the option set
without needing any syscalls).
Post by der Mouse
If the kernel finds an invalid option, or value, in the blob, it
would fail the set call, just as it would a setsockopt call that got
passed similar garbage.
And here we have the atomicity issue: the kernel would have to check
them all for syntax validity first (what kind of syntax and existing
tools might be handy for that, I wonder?) before setting any options.

Then it would also have to check the full set of options in the blob,
and the ones already on the socket, for mutual consistency and
compatibility, etc. That checking currently gets done as the options
are set, and would have to be duplicated or factored out, otherwise
we'd then need code to roll back all the previous options from the
blob when one fails, and the side-effects they may have, and so on..

Oh, and it might be nice to tell the user which option in the blob
caused the problem if we can, which rather implies that the blob not
be opaque.

Once again, proplib offers opportunities here (since this kind of work
was its design purpose), but it's a rather more ambitious exercise
than the present proposal requires.

So once again: I think if someone wanted to go down this road for some
reason, this would be the right way for them to do it. For all that
the academic discussion is interesting, I'm pretty sure we don't
presently have the reason, and quite sure we don't have the someone.

Until and unless, doing the copy in a socketlike() avoids all the
conflict and checking issues, since it copies state that is already
valid and doesn't have to merge it with anything else.
Post by der Mouse
Post by Daniel Carosone
- to try and avoid this, if the blob is to be a reference to some
other storage in the kernel where the real data is, [...]
Oh, ugh. I like that about as much as you seem to. :-?
Right.
Post by der Mouse
Post by Daniel Carosone
Frankly, if you're going to pass something to represent a set of
options to copy, why not just pass the fd of the socket that has
those options set?
It requires that that socket still exist. I see no particular reason
to demand one file descriptor be kept open per set of options, rather
than storing just the useful information.
True, but that's a feature by the point we agree on above - the fd
already has all the properties we want as a storage handle, including
doing all the cleanup on close.

--
Dan.
Daniel Carosone
2007-07-06 06:15:24 UTC
Permalink
Post by Darren Reed
Post by Daniel Carosone
- the "opaque blob" may be manipulated by a malicious program, we
need to be defensive about not trusting the content in the kernel
once it's handed back to us.
It is the job of the implemtation of whatever takes in the blob
to verify it.
Oh, I agree entirely - my objection/dislike is simply the need for
having all that implementation at all.

I'm highlighting that what might seem otherwise like a simple proposal
carries the implication of that extra work/code/possibility of error,
etc. which might not otherwise be obvious.
Post by Darren Reed
Post by Daniel Carosone
Frankly, if you're going to pass something to represent a set of
options to copy, why not just pass the fd of the socket that has
those options set?
Maybe you want to do privilege splitting in the application and don't trust
the "other part" that deals with the "clone" fd's to access the original?
I'm not sure that's a good example: in that case you're either going
to be passing the unpriv'd code fully-prepared and possibly connected
fd's, even now after many set* syscalls each, or you'll pass a
created-for-purpose template fd with nothing connected or shared in
the original.

Still, the point you're trying to make has merit, but so do the
principles of KISS and YAGNI.

--
Dan.
der Mouse
2007-07-06 06:04:11 UTC
Permalink
Post by Daniel Carosone
I'm thinking something like [...]
As I suggested earlier, *if* we were to go down this path, we already
have proplib for marshalling mutatis mutandis. There's more
structure here, remember: options at several protocol levels, for
starters.
Proplib occurred to me. I'm not convinced it is a right answer (but
also definitely not convinced it is a wrong answer).
Post by Daniel Carosone
though of course the details would be private to the kernel.
though of course proplib has the benefit of not being opaque to the
user process (so it could manipulate/filter/examine the option set
without needing any syscalls).
This may or may not be a benefit. It's a benefit in that it turns the
"apply these settings" call into a general-purpose multi-option
setsockopt. It's a drawback in that it commits the kernel to
supporting a general-purpose multi-option setsockopt with a proplib
interface, which making it an opaque blob (which just currently happens
to be a proplib serialization) doesn't.
Post by Daniel Carosone
If the kernel finds an invalid option, or value, in the blob, it
would fail the set call, just as it would a setsockopt call that got
passed similar garbage.
And here we have the atomicity issue: the kernel would have to check
them all for syntax validity first (what kind of syntax and existing
tools might be handy for that, I wonder?) before setting any options.
Not necessarily. I see nothing wrong with documentation language like
"if an error occurs, some of the options contained in the blob may get
set anyway".
Post by Daniel Carosone
Then it would also have to check the full set of options in the blob,
and the ones already on the socket, for mutual consistency and
compatibility, etc.
Again, not necessarily; the semantics could be defined as something
like "this call is semantically equivalent to multiple setsockopt calls
setting the same options in some unspecified order, stopping at the
first error", with all the consistency and compatability implications
that has.
Post by Daniel Carosone
Oh, and it might be nice to tell the user which option in the blob
caused the problem if we can, which rather implies that the blob not
be opaque.
Well, it would imply that. I'm not sure that's necessary, or even very
desirable; it makes sense only if the blob decomposes into a sequence
of option-and-value settings. (For example, I could see it containing
a copy of so_options directly, rather than broken out into individual
options.) If it's an opaque blob that was generated legitimately (ie,
with the "get" call on a socket of the same family/type/protocol), any
error is a can't-happen; otherwise, userland is either busted or trying
to fool the kernel, and either way I'm not sure there's value in trying
to say exactly what's wrong.
Post by Daniel Carosone
[...] why not just pass the fd of the socket that has those options
set?
I see no particular reason to demand one file descriptor be kept
open per set of options [...]
True, but that's a feature by the point we agree on above -
Which point do you mean?
Post by Daniel Carosone
the fd already has all the properties we want as a storage handle,
including doing all the cleanup on close.
Actually, it lacks some properties I think could be good, such as being
able to be stored in and retrieved from a file, and not consuming a
relatively scarce resource (file descriptor space).

Anyway, you're certainly right that this is all academic until and
unless someone comes along with a use for it and volunteers to create
an implementation.

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Lucio De Re
2007-07-06 07:04:03 UTC
Permalink
Post by Daniel Carosone
As I suggested earlier, *if* we were to go down this path, we already
have proplib for marshalling mutatis mutandis. There's more structure
here, remember: options at several protocol levels, for starters.
Sounds reasonable to me, not that my opinion can be treated as
important, I lack the background to be in any way authoritative.

It does strike me, however, that an opaque binary blob could be
checksummed or even encrypted so the kernel could trust it not to have
been altered. This way, the objection that a malicious process could
cause trouble can be disposed of.

Secondly, I think that "cloning" makes the most sense. Of course, the
interpretation of "clone" may be a little off the ordinary, but it does
not seem very remote from the use we'd make of it here.

Just my contribution, in case no one else raises these issues.

++L



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Darren Reed
2007-07-06 09:23:47 UTC
Permalink
Post by Daniel Carosone
Post by Darren Reed
Post by Daniel Carosone
- the "opaque blob" may be manipulated by a malicious program, we
need to be defensive about not trusting the content in the kernel
once it's handed back to us.
It is the job of the implemtation of whatever takes in the blob
to verify it.
Oh, I agree entirely - my objection/dislike is simply the need for
having all that implementation at all.
I'm highlighting that what might seem otherwise like a simple proposal
carries the implication of that extra work/code/possibility of error,
etc. which might not otherwise be obvious.
Post by Darren Reed
Post by Daniel Carosone
Frankly, if you're going to pass something to represent a set of
options to copy, why not just pass the fd of the socket that has
those options set?
Maybe you want to do privilege splitting in the application and don't trust
the "other part" that deals with the "clone" fd's to access the original?
I'm not sure that's a good example: in that case you're either going
to be passing the unpriv'd code fully-prepared and possibly connected
fd's, even now after many set* syscalls each, or you'll pass a
created-for-purpose template fd with nothing connected or shared in
the original.
Still, the point you're trying to make has merit, but so do the
principles of KISS and YAGNI.
Well, using the getsockopt/setsockopt approach, I can implement a
socketclone()
(with varying levels of efficiency) but it'll always take more than one
system call.

If we implement a socketclone(2), then there's nothing more you can build on
top of that.

Darren


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...