TCP socket buffers automatic sizing

Discussion:

TCP socket buffers automatic sizing

(too old to reply)

Mindaugas R.

2007-07-12 14:40:48 UTC

Hello,
here is patch [1] for TCP socket buffers automatic sizing. I have ported it
from FreeBSD [2], but actually have not tested carefully.

Anyone could test and review it, please?

[1]. http://www.netbsd.org/~rmind/tcp_buf_autosizing.diff
[2]. http://mail-index.netbsd.org/tech-net/2007/02/04/0006.html

--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Ignatios Souvatzis

2007-07-12 14:59:50 UTC

Post by Mindaugas R.
Hello,
here is patch [1] for TCP socket buffers automatic sizing. I have ported it
from FreeBSD [2], but actually have not tested carefully.
Anyone could test and review it, please?

You're confusing me - wasn't this added to -current a few weeks ago?

-is

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas R.

2007-07-12 15:12:20 UTC

Post by Ignatios Souvatzis
You're confusing me - wasn't this added to -current a few weeks ago?

I have not seen neither source nor logs. Perhaps you are confused, just I
hope not by me ;)

--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas R.

2007-07-12 18:13:15 UTC

Post by Mindaugas R.
[1]. http://www.netbsd.org/~rmind/tcp_buf_autosizing.diff

I have missed one piece in the patch, sorry. If somebody is look at it,
please take an updated patch.

--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas R.

2007-07-21 07:08:17 UTC

Post by Mindaugas R.
Anyone could test and review it, please?
[1]. http://www.netbsd.org/~rmind/tcp_buf_autosizing.diff

Nobody cares?..
I would like to put this into the tree.

--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Zafer Aydogan

2007-07-21 07:10:22 UTC

Post by Mindaugas R.

Post by Mindaugas R.
Anyone could test and review it, please?
[1]. http://www.netbsd.org/~rmind/tcp_buf_autosizing.diff

Nobody cares?..

No, of course not. I will apply it now.

Post by Mindaugas R.
I would like to put this into the tree.

Please do.

Post by Mindaugas R.
--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Blair Sadewitz

2007-07-21 08:26:06 UTC

I certainly do care, and have wanted this feature for quite a while.
Thank you for the effort.

--Blair

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Greg Troxel

2007-07-21 11:01:13 UTC

Post by Mindaugas R.

Post by Mindaugas R.
Anyone could test and review it, please?
[1]. http://www.netbsd.org/~rmind/tcp_buf_autosizing.diff

Nobody cares?..
I would like to put this into the tree.

I looked it over, but not super carefully.

It seems that this will cause NetBSD to advertise a receive window
without actually having an allocated buffer. Will a segment that
doesn't fit inthe buffer but is in the window be received and stored if
there is actually memory available. In other words, are in-window
segments only dropped if resize allocation fails?

It seems that autosize is turned off permanently for the socket if there
is a single allocation failure, and that there is no way for it to be
turned back on.

@@ -4015,8 +4088,7 @@ syn_cache_add(struct sockaddr *src, stru
sc->sc_requested_s_scale = tb.requested_s_scale;
sc->sc_request_r_scale = 0;
while (sc->sc_request_r_scale < TCP_MAX_WINSHIFT &&
- TCP_MAXWIN << sc->sc_request_r_scale <
- so->so_rcv.sb_hiwat)
+ (0x1 << sc->sc_request_r_scale) < tcp_minmss)
sc->sc_request_r_scale++;
} else {
sc->sc_requested_s_scale = 15;

I didn't understand this hunk.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Blair Sadewitz

2007-07-21 11:01:19 UTC

It's working quite well for me; the improvement is noticeable; with
really fast remote sites (my cable modem can sustain 30mbps
downstream), it seems as if I can notice a ~20% improvement in speed.
However, this could just be variations in network congestion. I'll
have to run kttcp and see.

--Blair

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Greg Troxel

2007-07-21 12:03:27 UTC

Post by Mindaugas R.
[1]. http://www.netbsd.org/~rmind/tcp_buf_autosizing.diff

tcp(4) should discuss this.

Is there an RFC about this, or an implementation elsewhere? If so, that
should be mentioned. A quick search turns up prior implementation in
NetBSD 1.2 (!):

http://www.psc.edu/networking/ftp/papers/autotune_sigcomm98.ps

an early work that uses pre-measurement:

http://dast.nlanr.net/Projects/Autobuf/autotcp.html

a survey article:

http://www.lanl.gov/radiant/pubs/hptcp/hpdc02-drs.pdf

a paper that seems to talk about shrinking buffers too:

http://citeseer.ist.psu.edu/dovrolis04socket.html

FreeBSD had a patch in the fall - I don't know if it's in now:

http://www.freebsd.org/news/status/report-2006-10-2006-12.html#Automatic-TCP-Send-and-Receive-Socket-Buffer-Sizing

a summary page with lots of links

http://kb.pert.geant2.net/PERTKB/TCPBufferAutoTuning

Have you looked at connections with tcpdump2xplot and xplot (both in
pkgsrc/graphics/xplot)? It's very illuminating about TCP behavior.

I think I may have misunderstood in my previous comments. How does the
advertised window relate to the current allocated buffer size? If it's
only what's allocated, then my concerns about dropping in-window
segments are probably at least mostly incorrect. But, we have to
advertise a large window to get the sender to open up, even if our
application reads the data promptly.

I don't understand the "only if no reordering" constraint. It would
seem that increasing the buffer is warranted if we receive a segment
that we'd like to store. But I can see the point that if we missed a
segment, and therefore have not consumed data that we would have, that
the buffer doesn't need to stay big.

Consider the case of a network that has no drops and no reorders, a
large bandwidth-delay product, a sender with a large buffer, and a
receiver with an application that receives promptly. The rx buffer will
remain small. Now, assume one dropped packet. The entire in-flight
data will arrive and need buffering, at least until fast retransmit or
SACK causes a resend (RTT plus a few packets). Given that we've
advertised the receive window, it seems rude and contrary to the TCP
RFCs to not accept the segments if we have memory (if we don't have
memory, we've overcommitted and gotten unlucky, which is central to this
scheme - I'm not objecting to that case).

How does the receive window get set? Should it be the max that autosize
will go to? If it's not large, how does this help?

Shouldn't SB_AUTOSIZE only be enabled on sockets if the sysctl is on?
Shouldn't there be a socket option? Does expliciting setting a buffer
size clear SB_AUTOSIZE?

I think this is a good feature, especially for busy servers, so I'm not
saying you shouldn't move forward. I wouldn't even say that the above
is cause not to commit it.

Probably this should default to off at first (sysctl intial setting)
until more people have run it, which will happen once it's in current -
sysctl -w is easier than patch/rebuild. I'd turn it on, but haven't
done a reboot cycle to try it.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Joerg Sonnenberger

2007-07-21 13:18:09 UTC

Post by Blair Sadewitz
It's working quite well for me; the improvement is noticeable; with
really fast remote sites (my cable modem can sustain 30mbps
downstream), it seems as if I can notice a ~20% improvement in speed.
However, this could just be variations in network congestion. I'll
have to run kttcp and see.

I couldn't measure any difference here, but I am currently somewhat
limited in my resources as this notebook doesn't allow me very large
windows and I don't have a good remote site to test with (e.g. large
RTT, large windows, fat pipe).

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas R.

2007-07-21 14:20:58 UTC

Post by Greg Troxel
Is there an RFC about this, or an implementation elsewhere?

As I wrote in the primary email - this is a port from FreeBSD. One could take
a look at extensions written in RFC1323, but I do not know a concrete RFC
about this.

Post by Greg Troxel
http://www.freebsd.org/news/status/report-2006-10-2006-12.html#Automatic-TCP-Send-and-Receive-Socket-Buffer-Sizing

They have it in -current about 5 months, enabled by default.

Post by Greg Troxel
Have you looked at connections with tcpdump2xplot and xplot (both in
pkgsrc/graphics/xplot)? It's very illuminating about TCP behavior.

No, that is why I would be happy if someone would do some analysis of TCP
behaviour :)

Post by Greg Troxel
@@ -4015,8 +4088,7 @@ syn_cache_add(struct sockaddr *src, stru
sc->sc_requested_s_scale = tb.requested_s_scale;
sc->sc_request_r_scale = 0;
while (sc->sc_request_r_scale < TCP_MAX_WINSHIFT &&
- TCP_MAXWIN << sc->sc_request_r_scale <
- so->so_rcv.sb_hiwat)
+ (0x1 << sc->sc_request_r_scale) < tcp_minmss)
sc->sc_request_r_scale++;
} else {
sc->sc_requested_s_scale = 15;
I didn't understand this hunk.

The patch is updated. Please take a look at this place, there is a comment
from FreeBSD.

Post by Greg Troxel
<...> But, we have to advertise a large window to get the sender to open
up, even if our application reads the data promptly.
<...>
How does the receive window get set? Should it be the max that autosize
will go to? If it's not large, how does this help?

I am not sure what do you mean. The window scaling is done by stepping, as
written in the description of criteria, and algorithm.

Post by Greg Troxel
Shouldn't SB_AUTOSIZE only be enabled on sockets if the sysctl is on?

I do not see the necessity, right now.

Post by Greg Troxel
Shouldn't there be a socket option? Does expliciting setting a buffer
size clear SB_AUTOSIZE?

Currently, autosizing is a global setting, not per-socket. If we would make
it per-socket - both your points are reasonable.

Post by Greg Troxel
Probably this should default to off at first (sysctl intial setting)
until more people have run it, which will happen once it's in current -
sysctl -w is easier than patch/rebuild. I'd turn it on, but haven't
done a reboot cycle to try it.

Yes, I just have not mentioned that I would commit this with default off.

--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Greg Troxel

2007-07-21 15:03:03 UTC

Your answers all seem reasonable, and I think we're better off having
this as it is than not having it.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Matt Thomas

2007-07-21 15:28:55 UTC

Post by Mindaugas R.

Post by Greg Troxel
Shouldn't there be a socket option? Does expliciting setting a buffer
size clear SB_AUTOSIZE?

Currently, autosizing is a global setting, not per-socket. If we would make
it per-socket - both your points are reasonable.

This should definitely controllable be per socket.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Steven M. Bellovin

2007-07-21 16:03:54 UTC

On Sat, 21 Jul 2007 04:26:06 -0400

Post by Blair Sadewitz
I certainly do care, and have wanted this feature for quite a while.
Thank you for the effort.

Same here. When the subject was first mentioned, I cranked up my
buffer sizes to the largest NetBSD would permit via sysctl; I noticed a
very significant improvement on some of my connections.

--Steve Bellovin, http://www.cs.columbia.edu/~smb

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas R.

2007-07-21 16:36:52 UTC

Post by Steven M. Bellovin
Same here. When the subject was first mentioned, I cranked up my
buffer sizes to the largest NetBSD would permit via sysctl; I noticed a
very significant improvement on some of my connections.

Did you meant sendbuf_max? Do you have some details?

--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Steven M. Bellovin

2007-07-21 16:46:08 UTC

On Sat, 21 Jul 2007 19:36:52 +0300

Post by Mindaugas R.

Post by Steven M. Bellovin
Same here. When the subject was first mentioned, I cranked up my
buffer sizes to the largest NetBSD would permit via sysctl; I
noticed a very significant improvement on some of my connections.

Did you meant sendbuf_max? Do you have some details?

I have
net.inet.tcp.sendspace=131072
net.inet.tcp.recvspace=131072

in sysctl.conf. Those improved throughput significantly on large,
high-latency paths. (I set them six months ago and no longer recall
the details.)

--Steve Bellovin, http://www.cs.columbia.edu/~smb

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Thor Lancelot Simon

2007-07-21 17:20:39 UTC

Post by Greg Troxel
It seems that autosize is turned off permanently for the socket if there
is a single allocation failure, and that there is no way for it to be
turned back on.

I looked at the FreeBSD code a couple of months ago and was quite
dissatisfied with what seemed like some of the possible consequences
of how they shrink buffers (among other things, it's grossly unfair
between sockets, which seems like it could be trivially exploited by
a malicious application to reduce throughput for others).

But if they haven't had trouble with it in practice -- what's Linux do,
for comparison?

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Blair Sadewitz

2007-07-22 00:29:17 UTC

I should note that, as Steve Bellovin reported, I've gotten roughly
the same performance improvement simply by increasing the socket
buffer size. Of course, this can be a waste, but waste is always
relative to reward. ;)

As I said before, I haven't tested this systematically, but with
ftp/http transfers, it seems that I don't notice much benefit until I
hit around 1MiB/s. At ~2MiB/s, downloads from a site nearby seem to
run on average about 25% faster.

I wonder if there have been any reports of this feature being abused
as you describe on FreeBSD hosts for e.g. DoS attacks.

There is also some interesting stuff at
http://www.psc.edu/networking/projects (the corresponding FTP URL has
tarballs if the link to the file you want is broken). There's lots of
code for NetBSD, albeit from around the 1.6.2 days.

--Blair

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Steven M. Bellovin

2007-07-22 00:57:22 UTC

On Sat, 21 Jul 2007 20:29:17 -0400

Post by Blair Sadewitz
I should note that, as Steve Bellovin reported, I've gotten roughly
the same performance improvement simply by increasing the socket
buffer size. Of course, this can be a waste, but waste is always
relative to reward. ;)
As I said before, I haven't tested this systematically, but with
ftp/http transfers, it seems that I don't notice much benefit until I
hit around 1MiB/s. At ~2MiB/s, downloads from a site nearby seem to
run on average about 25% faster.

A lot depends on file size and path latency. TCP ramps up slowly, and
not many http files are big enough for the difference to matter much.
The bigger buffer matters a lot more for high-delay paths. (My office
is only 8 ms from my house; my primary Internet server is only 6 ms from
my office. But my secondary server is ~40 ms from my primary server.)

--Steve Bellovin, http://www.cs.columbia.edu/~smb

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Mindaugas R.

2007-07-29 02:20:22 UTC

Post by Mindaugas R.
Anyone could test and review it, please?
[1]. http://www.netbsd.org/~rmind/tcp_buf_autosizing.diff

I am going to commit this, but unfortunately nobody tested this patch
thoroughly. Hence, the word "experimental" is added in the sysctl
descriptions, also, auto-sizing would be disabled by default. If somebody
would take a look and/or improve this - would be great.

Any objections?

--
Best regards,
Mindaugas
www.NetBSD.org

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

20 Replies
2 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Mindaugas R. 2007-07-12 14:40:48 UTC

Ignatios Souvatzis 2007-07-12 14:59:50 UTC

Mindaugas R. 2007-07-12 15:12:20 UTC

Mindaugas R. 2007-07-12 18:13:15 UTC

Mindaugas R. 2007-07-21 07:08:17 UTC

Zafer Aydogan 2007-07-21 07:10:22 UTC

Blair Sadewitz 2007-07-21 08:26:06 UTC

Greg Troxel 2007-07-21 11:01:13 UTC

Blair Sadewitz 2007-07-21 11:01:19 UTC

Greg Troxel 2007-07-21 12:03:27 UTC

Joerg Sonnenberger 2007-07-21 13:18:09 UTC

Mindaugas R. 2007-07-21 14:20:58 UTC

Greg Troxel 2007-07-21 15:03:03 UTC

Matt Thomas 2007-07-21 15:28:55 UTC

Steven M. Bellovin 2007-07-21 16:03:54 UTC

Mindaugas R. 2007-07-21 16:36:52 UTC

Steven M. Bellovin 2007-07-21 16:46:08 UTC

Thor Lancelot Simon 2007-07-21 17:20:39 UTC

Blair Sadewitz 2007-07-22 00:29:17 UTC

Steven M. Bellovin 2007-07-22 00:57:22 UTC

Mindaugas R. 2007-07-29 02:20:22 UTC

about - legalese

Loading...