Mouse
2013-02-09 21:13:03 UTC
I've just started trying to use 5.2 (I've dabbled a little in 5.1, but
mostly have avoided 5.x so far). I had a connection hang mysteriously,
so I did it again with tcpdump running on an intermediate host.
It looks to me as though something's wrong with TCP retransmissions,
though that's a fairly superficial diagnosis.
Here's the tail of the tcpdump output.
15:40:28.696730 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34290468 win 32432 <nop,nop,timestamp 2413 2412>
15:40:28.697043 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34315332:34316780(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
15:40:28.697164 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34316780:34318228(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
15:40:28.697287 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34318228:34319676(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
15:40:28.916594 IP 216.46.14.122.22 > 10.0.7.14.65521: P 43441:43485(44) ack 34291616 win 33580 <nop,nop,timestamp 2414 2412>
15:40:28.916641 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34293912 win 32432 <nop,nop,timestamp 2414 2412>
15:40:28.917976 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34319676:34321124(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:28.918104 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34321124:34322572(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:28.918221 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34322572:34324020(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:28.918344 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34324020:34325468(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:29.180893 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2415 2412>
15:40:29.181292 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34325468:34326916(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2415>
15:40:31.190954 IP 216.46.14.122.22 > 10.0.7.14.65521: P 43441:43485(44) ack 34295060 win 33580 <nop,nop,timestamp 2419 2412>
15:40:31.191204 IP 10.0.7.14.65521 > 216.46.14.122.22: P 34326916:34327216(300) ack 43485 win 4197 <nop,nop,timestamp 2418 2419,nop,nop,sack sack 1 {43441:43485} >
15:40:31.406736 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2419 2412,nop,nop,sack sack 1 {34326916:34327216} >
15:40:31.407115 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34295060:34296508(1448) ack 43485 win 4197 <nop,nop,timestamp 2419 2419>
15:40:33.899361 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34296508:34297656(1148) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
15:40:33.899435 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34297656:34298504(848) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
15:40:33.899517 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34298504:34299440(936) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
15:40:34.006752 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 {34296508:34297656}{34326916:34327216} >
15:40:34.226326 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 {34296508:34298504}{34326916:34327216} >
15:40:34.226371 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 {34296508:34299440}{34326916:34327216} >
At this point, it's 15:51:36 and tcpdump hasn't shown anything further.
I checked with netstat on each end, and 10.0.7.14 has data to send
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 32156 10.0.7.14.65521 216.46.14.122.22 ESTABLISHED
but 216.46.14.122 doesn't
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 216.46.14.122.22 10.0.7.14.65521 ESTABLISHED
(and, yes, as that implies, there is no NAT on this connection, despite
oen end being RFC1918-private).
I see nothing wrong with the underlying connectivity; ping -n -c 2 -I
10.0.7.14 216.46.14.122 produces a quick response, which shows up as
expected in the tcpdump output:
15:52:41.989631 IP 10.0.7.14 > 216.46.14.122: icmp 64: echo request seq 0
15:52:42.024922 IP 216.46.14.122 > 10.0.7.14: icmp 64: echo reply seq 0
The obvious question is, since 10.0.7.14 has data to send, why isn't it
filling in the SACK-indicated holes? Or at least retransmitting
_something_? As an off-the-cuff guess, my first inclination would be
to suspect the retransmission timers. But that's such a basic thing
that if that were what's wrong, it surely would have been noticed long
since.
It's now 16:12 and the tcpdump indicates no further traffic at all.
The tcpdump is being done very close to the 10.0.7.14 end - it's on
10.0.7.1, which is 10.0.7.14's default route, and on the Ethernet
shared between the two.
Any ideas what could be going wrong here? I'm going to leave the
connection in this wedged state for at least an hour or two in case
anyone happens to see this promptly and has anything to suggest.
(Unfortunately I can't really leave it this way indefinitely, though if
it's as repeatable as this seems to indicate I can probably recreate it
fairly easily.)
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
mostly have avoided 5.x so far). I had a connection hang mysteriously,
so I did it again with tcpdump running on an intermediate host.
It looks to me as though something's wrong with TCP retransmissions,
though that's a fairly superficial diagnosis.
Here's the tail of the tcpdump output.
15:40:28.696730 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34290468 win 32432 <nop,nop,timestamp 2413 2412>
15:40:28.697043 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34315332:34316780(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
15:40:28.697164 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34316780:34318228(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
15:40:28.697287 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34318228:34319676(1448) ack 43441 win 4197 <nop,nop,timestamp 2413 2413>
15:40:28.916594 IP 216.46.14.122.22 > 10.0.7.14.65521: P 43441:43485(44) ack 34291616 win 33580 <nop,nop,timestamp 2414 2412>
15:40:28.916641 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34293912 win 32432 <nop,nop,timestamp 2414 2412>
15:40:28.917976 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34319676:34321124(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:28.918104 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34321124:34322572(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:28.918221 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34322572:34324020(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:28.918344 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34324020:34325468(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2414>
15:40:29.180893 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2415 2412>
15:40:29.181292 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34325468:34326916(1448) ack 43485 win 4197 <nop,nop,timestamp 2414 2415>
15:40:31.190954 IP 216.46.14.122.22 > 10.0.7.14.65521: P 43441:43485(44) ack 34295060 win 33580 <nop,nop,timestamp 2419 2412>
15:40:31.191204 IP 10.0.7.14.65521 > 216.46.14.122.22: P 34326916:34327216(300) ack 43485 win 4197 <nop,nop,timestamp 2418 2419,nop,nop,sack sack 1 {43441:43485} >
15:40:31.406736 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2419 2412,nop,nop,sack sack 1 {34326916:34327216} >
15:40:31.407115 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34295060:34296508(1448) ack 43485 win 4197 <nop,nop,timestamp 2419 2419>
15:40:33.899361 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34296508:34297656(1148) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
15:40:33.899435 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34297656:34298504(848) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
15:40:33.899517 IP 10.0.7.14.65521 > 216.46.14.122.22: . 34298504:34299440(936) ack 43485 win 4197 <nop,nop,timestamp 2424 2419>
15:40:34.006752 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 {34296508:34297656}{34326916:34327216} >
15:40:34.226326 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 {34296508:34298504}{34326916:34327216} >
15:40:34.226371 IP 216.46.14.122.22 > 10.0.7.14.65521: . ack 34295060 win 33580 <nop,nop,timestamp 2424 2412,nop,nop,sack sack 2 {34296508:34299440}{34326916:34327216} >
At this point, it's 15:51:36 and tcpdump hasn't shown anything further.
I checked with netstat on each end, and 10.0.7.14 has data to send
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 32156 10.0.7.14.65521 216.46.14.122.22 ESTABLISHED
but 216.46.14.122 doesn't
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 216.46.14.122.22 10.0.7.14.65521 ESTABLISHED
(and, yes, as that implies, there is no NAT on this connection, despite
oen end being RFC1918-private).
I see nothing wrong with the underlying connectivity; ping -n -c 2 -I
10.0.7.14 216.46.14.122 produces a quick response, which shows up as
expected in the tcpdump output:
15:52:41.989631 IP 10.0.7.14 > 216.46.14.122: icmp 64: echo request seq 0
15:52:42.024922 IP 216.46.14.122 > 10.0.7.14: icmp 64: echo reply seq 0
The obvious question is, since 10.0.7.14 has data to send, why isn't it
filling in the SACK-indicated holes? Or at least retransmitting
_something_? As an off-the-cuff guess, my first inclination would be
to suspect the retransmission timers. But that's such a basic thing
that if that were what's wrong, it surely would have been noticed long
since.
It's now 16:12 and the tcpdump indicates no further traffic at all.
The tcpdump is being done very close to the 10.0.7.14 end - it's on
10.0.7.1, which is 10.0.7.14's default route, and on the Ethernet
shared between the two.
Any ideas what could be going wrong here? I'm going to leave the
connection in this wedged state for at least an hour or two in case
anyone happens to see this promptly and has anything to suggest.
(Unfortunately I can't really leave it this way indefinitely, though if
it's as repeatable as this seems to indicate I can probably recreate it
fairly easily.)
/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de