Discussion:
very old rtt calculation bug?
(too old to reply)
Thor Lancelot Simon
2010-07-27 03:30:01 UTC
Permalink
A coworker is looking at some ad-hoc stack tuning for heavy load which
has parameters that depend on rtt. We are seeing the smoothed rtt vary
for TCP connections on loopback in ways we don't expect.

We have begun to suspect that the changes in tcp_input.c 1.16 and 1.27
are wrong. In particular, as my coworker points out:

| I am pretty sure the NetBSD algo is wrong:
|
| if (tp->t_srtt != 0) {
| /*
| * srtt is stored as fixed point with 3 bits after the
| * binary point (i.e., scaled by 8). The following magic
| * is equivalent to the smoothing algorithm in rfc793 with
| * an alpha of .875 (srtt = rtt/8 + srtt*7/8 in fixed
| * point). Adjust rtt to origin 0.
| */
| delta = (rtt << 2) - (tp->t_srtt >> TCP_RTT_SHIFT);
| if ((tp->t_srtt += delta) <= 0)
| tp->t_srtt = 1 << 2;
|
| The (rtt << 2) means that we are adding 1/2 the RTT to srtt, rather than 1/8.
| Given that srtt is << 3, if we do not shift rtt by anything, adding it to
| srtt adds rtt/8 to it.

FreeBSD shifts rtt by two here, but they keep 5 bits of precision vs our 3,
so there it's correct.

Are we missing something?

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2010-11-10 13:06:03 UTC
Permalink
Post by Thor Lancelot Simon
A coworker is looking at some ad-hoc stack tuning for heavy load which
has parameters that depend on rtt. We are seeing the smoothed rtt vary
for TCP connections on loopback in ways we don't expect.
We have begun to suspect that the changes in tcp_input.c 1.16 and 1.27
|
| if (tp->t_srtt != 0) {
| /*
| * srtt is stored as fixed point with 3 bits after the
| * binary point (i.e., scaled by 8). The following magic
| * is equivalent to the smoothing algorithm in rfc793 with
| * an alpha of .875 (srtt = rtt/8 + srtt*7/8 in fixed
| * point). Adjust rtt to origin 0.
| */
| delta = (rtt << 2) - (tp->t_srtt >> TCP_RTT_SHIFT);
| if ((tp->t_srtt += delta) <= 0)
| tp->t_srtt = 1 << 2;
|
| The (rtt << 2) means that we are adding 1/2 the RTT to srtt, rather than 1/8.
| Given that srtt is << 3, if we do not shift rtt by anything, adding it to
| srtt adds rtt/8 to it.
FreeBSD shifts rtt by two here, but they keep 5 bits of precision vs our 3,
so there it's correct.
Are we missing something?
Bev Schwartz, Laura Ma, and I have been looking at this. This is from
memory -- defuzzed version and patch to follow probably next week.

There are several problems in the RTT code.

value is stored with a +1 to let 0 be special, but it's not subtracted
before use

rounding errors when dealing with very low RTTs

"very low" is not that low because this is based on a long timer tick

We have a fix for the first two problems; with the fix TCP performs
better under loss due to faster RTO processing.
YAMAMOTO Takashi
2011-03-09 04:11:34 UTC
Permalink
hi,
Post by Greg Troxel
Post by Thor Lancelot Simon
A coworker is looking at some ad-hoc stack tuning for heavy load which
has parameters that depend on rtt. We are seeing the smoothed rtt vary
for TCP connections on loopback in ways we don't expect.
We have begun to suspect that the changes in tcp_input.c 1.16 and 1.27
|
| if (tp->t_srtt != 0) {
| /*
| * srtt is stored as fixed point with 3 bits after the
| * binary point (i.e., scaled by 8). The following magic
| * is equivalent to the smoothing algorithm in rfc793 with
| * an alpha of .875 (srtt = rtt/8 + srtt*7/8 in fixed
| * point). Adjust rtt to origin 0.
| */
| delta = (rtt << 2) - (tp->t_srtt >> TCP_RTT_SHIFT);
| if ((tp->t_srtt += delta) <= 0)
| tp->t_srtt = 1 << 2;
|
| The (rtt << 2) means that we are adding 1/2 the RTT to srtt, rather than 1/8.
| Given that srtt is << 3, if we do not shift rtt by anything, adding it to
| srtt adds rtt/8 to it.
FreeBSD shifts rtt by two here, but they keep 5 bits of precision vs our 3,
so there it's correct.
Are we missing something?
Bev Schwartz, Laura Ma, and I have been looking at this. This is from
memory -- defuzzed version and patch to follow probably next week.
There are several problems in the RTT code.
value is stored with a +1 to let 0 be special, but it's not subtracted
before use
rounding errors when dealing with very low RTTs
"very low" is not that low because this is based on a long timer tick
We have a fix for the first two problems; with the fix TCP performs
better under loss due to faster RTO processing.
what's the status of this?

YAMAMOTO Takashi

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2011-03-09 23:55:33 UTC
Permalink
I have been looking at our changed code, and it's sufficiently hairy
that I'm not sure our change is correct.

Here are the issues:

* meta-issue: limited precision

The real problem is limited precision. RTT measurements have 500 ms
granularity. Stored SRTT is also too granular.

* +1 as flag bit

in tcp_input.c, around line 1687, ts_rtt is set to a difference of
timestamps PLUS ONE. The +1 is used as a flag bit to denote that the
calculation is valid

two places later, ts_rtt is tested for != 0, but then ts_rtt is used,
instead of ts_rtt - 1.

In tcp_xmit_timer, there's no explanation of how this extra 500ms is
removed.

* bad rounding

in tcp_xmit_timer, there's no rounding on the 1/8 of the old srtt.
This seems to prevent srtt from getting low enough.


Our code changes (I'm working on extracting the diff from larger
unrelated changes) basically address these two points, and our stack
then runs with more sensible timeout versions.

I am unclear on whether stored srtt, said to be in <<3 fixed point, is
in seconds, or in 500ms, or in ?

Thor Simon
2010-11-10 13:29:46 UTC
Permalink
Post by Greg Troxel
There are several problems in the RTT code.
value is stored with a +1 to let 0 be special, but it's not subtracted
before use
rounding errors when dealing with very low RTTs
"very low" is not that low because this is based on a long timer tick
We have a fix for the first two problems; with the fix TCP performs
better under loss due to faster RTO processing.
Thanks, Greg!

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...