Discussion:
kern/55567: tcp-send slows down to slow single byte transfers (analysis and fix)
(too old to reply)
Tom Ivar Helbekkmo
2020-09-02 15:39:32 UTC
Permalink
Frank,

I'm probably missing something - but the data in kern/55567 seems to
show that snd_wl1 is getting left behind, not snd_wl2. Is that just an
accidental mis-labeling of those numbers? Your patch obviously works,
and makes sense. I'm just confused by the seeming discrepancy.

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Frank Kardel
2020-09-02 17:44:46 UTC
Permalink
It is easy to get confused with the many sequence numbers for managing a

bi-directional connection.

The setting is the we as client send a data stream to the server. This
is managed by the

SND* variables.

We also receive in this case no data (except for the first SYN) from the
server.

The send side keeps in SND.WL1 the sequence number of the *server* when

updating the send window. As the server sends no data it's sequence number

does not increase after accounting for the SYN of the handshake. That's

why SND.WL1 and SEG.SEQ is not moving.

We are, however, getting a stream of ACK only packets from the server
(not data)

to acknowledge our send data. The ACK sequence number is stored in

SND.WL2 on window updates to make sure we only pick later ACKs for send
window updates.

Usually you expect SND.WL1 to follow the server sequence number stream
and SND.WL2

to follow the ACK sequence numbers in the range SND.UNA =< SEG.ACK <=
SND.MAX.

Now this bug is in the fast path that ignored to adjust SND.WL2 so it
does never

violate the invariant SND.UNA =< SEG.ACK <= SND.MAX. Thus the

SND.NXT could increase (with wraparound) so much that the window update

test for "newer" ACKs fails and no more window updates are done. In that
state

the send window closes down to zero and the stack reverts to zero window

probes sending 1 byte every 5 seconds slowing down over time even more.

All the bytes will be acknowledged, but the window will not be opened
giving us

a very slow send path where we might not even have the time an data to
overcome

this error condition.

tldr;

The server is not sending data, thus SEG.SEQ/SND.WL1 does not change.
SND.WL2

was not updated in the fast path for ACK only packets, thus the valid
sequence number window

could move so far away from SND.WL2 that the "greater" test will fail if
we have enough fast

path eligible pure ACK packets from the server.

Hope I didn't add too much to the confusion.

Frank
Post by Tom Ivar Helbekkmo
Frank,
I'm probably missing something - but the data in kern/55567 seems to
show that snd_wl1 is getting left behind, not snd_wl2. Is that just an
accidental mis-labeling of those numbers? Your patch obviously works,
and makes sense. I'm just confused by the seeming discrepancy.
-tih
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Tom Ivar Helbekkmo
2020-09-02 17:58:53 UTC
Permalink
Frank,

thank you! It's all clear, now. :)

-tih
--
Most people who graduate with CS degrees don't understand the significance
of Lisp. Lisp is the most important idea in computer science. --Alan Kay

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...