Discussion:
A strange TCP timestamp problem?
(too old to reply)
Dave Huang
2015-06-05 21:34:16 UTC
Permalink
I wanted to record a video stream (HTTP Live Streaming) with ffmpeg,
but was getting errors like "Failed to open segment of playlist 0" and
"Connection timed out". However, I didn't have any problems watching
the video the regular way through a web browser on Windows (on the
same local network as the NetBSD machine running ffmpeg), so it didn't
seem like it was a network issue.

I tried a few more experiments and found that ffmpeg on Windows worked
fine, as did ffmpeg on Linux (Debian jessie), so it didn't seem like
an ffmpeg issue either (ffmpeg 2.6.3 in the case of NetBSD and Linux,
and on Windows, a git snapshot from 20141205).

This seemed to narrow the issue down to NetBSD. I collected some
tcpdump logs, and the problem seems to be that the remote server
doesn't always respond to NetBSD's TCP SYN packets. It almost seems
like the other end is doing some sort of rate limiting, since the
initial connection generally works (in the case of HTTP Live
Streaming, the one to download the playlist file), and the next one
works most of the time (downloading the first video segment), but the
connection to download the second segment of the video takes 3 SYNs to
get a response. Actually, ffmpeg times out before it connects, since
it expects to be able to download a new video segment every 5 seconds
or so, but I increased the timeout to see what would happen. But if I
wait a few dozen seconds and try again, the connection succeeds
immediately. In any case, it doesn't make sense for the remote end to
rate limit connections so aggressively, since the whole point of HLS
is that the client continually requests small chunks of video--but it
feels like that's what it's doing.

Looking at the tcpdump from Linux and Windows shows that all SYNs are
immediately ACKed. And just to take my network out of the equation, I
also did tests on some Amazon EC2 instances, one NetBSD, and one
FreeBSD. It seems that the SYN packets are all different, but the TCP
options from Linux and FreeBSD and kinda similar, whereas NetBSD is
more different--a lot more NOPs for example, causing it to send 24
bytes of optoins when Linux/FreeBSD can send the same options in 20
bytes. And Windows is also very different. After more experimenting, I
found that turning off TCP timestamps makes things work (sysctl -w
net.inet.tcp.timestamps=0). NetBSD always sends TSval 1, whereas Linux
and FreeBSD send a seemingly-random value. (And Windows apparently
doesn't enable timestamps by default?)

For consistency, I repeated all tests on EC2 instances. The command I
ran was:
ffmpeg -v verbose -y -i http://thaipbs-live.cdn.byteark.com/live/playlist_720p/index.m3u8 -t 60 -c copy test.ts

Here's an unanswered SYN from a NetBSD-x86_64-6.1.5-20141015-1523 EC2
instance (frame 2639 from
http://www.azeotrope.org/~khym/tpbs-netbsd.cap)

Transmission Control Protocol, Src Port: 65498 (65498), Dst Port: 80 (80), Seq: 0, Len: 0
Source Port: 65498 (65498)
Destination Port: 80 (80)
[Stream index: 3]
[TCP Segment Len: 0]
Sequence number: 0 (relative sequence number)
Acknowledgment number: 0
Header Length: 44 bytes
.... 0000 0000 0010 = Flags: 0x002 (SYN)
Window size value: 32768
[Calculated window size: 32768]
Checksum: 0x9253 [validation disabled]
Urgent pointer: 0
Options: (24 bytes), Maximum segment size, No-Operation (NOP), Window scale, SACK permitted, No-Operation (NOP), No-Operation (NOP), No-Operation (NOP), No-Operation (NOP), Timestamps
Maximum segment size: 1460 bytes
No-Operation (NOP)
Window scale: 3 (multiply by 8)
TCP SACK Permitted Option: True
No-Operation (NOP)
No-Operation (NOP)
No-Operation (NOP)
No-Operation (NOP)
Timestamps: TSval 1, TSecr 0

A SYN (answered) from a FreeBSD 10.1-STABLE-amd64-2015-06-02 EC2
instance (frame 826 from
http://www.azeotrope.org/~khym/tpbs-freebsd.cap):

Internet Protocol Version 4, Src: 172.31.30.65 (172.31.30.65), Dst: 163.44.103.164 (163.44.103.164)
Transmission Control Protocol, Src Port: 48941 (48941), Dst Port: 80 (80), Seq: 0, Len: 0
Source Port: 48941 (48941)
Destination Port: 80 (80)
[Stream index: 5]
[TCP Segment Len: 0]
Sequence number: 0 (relative sequence number)
Acknowledgment number: 0
Header Length: 40 bytes
.... 0000 0000 0010 = Flags: 0x002 (SYN)
Window size value: 65535
[Calculated window size: 65535]
Checksum: 0xd55f [validation disabled]
Urgent pointer: 0
Options: (20 bytes), Maximum segment size, No-Operation (NOP), Window scale, SACK permitted, Timestamps
Maximum segment size: 1460 bytes
No-Operation (NOP)
Window scale: 6 (multiply by 64)
TCP SACK Permitted Option: True
Timestamps: TSval 309833, TSecr 0

A SYN from an Amazon Linux AMI 2015.03 (HVM) EC2 instance (frame 397
from http://www.azeotrope.org/~khym/tpbs-linux.cap):

Internet Protocol Version 4, Src: 172.31.36.78 (172.31.36.78), Dst: 163.44.108.190 (163.44.108.190)
Transmission Control Protocol, Src Port: 38481 (38481), Dst Port: 80 (80), Seq: 0, Len: 0
Source Port: 38481 (38481)
Destination Port: 80 (80)
[Stream index: 7]
[TCP Segment Len: 0]
Sequence number: 0 (relative sequence number)
Acknowledgment number: 0
Header Length: 40 bytes
.... 0000 0000 0010 = Flags: 0x002 (SYN)
Window size value: 26883
[Calculated window size: 26883]
Checksum: 0xe086 [validation disabled]
Urgent pointer: 0
Options: (20 bytes), Maximum segment size, SACK permitted, Timestamps, No-Operation (NOP), Window scale
Maximum segment size: 8961 bytes
TCP SACK Permitted Option: True
Timestamps: TSval 44576, TSecr 0
No-Operation (NOP)
Window scale: 7 (multiply by 128)

And finally, a SYN from a Microsoft Windows Server 2012 R2 Base EC2
instance (frame 637 of
http://www.azeotrope.org/~khym/tpbs-windows.cap):

Internet Protocol Version 4, Src: 172.31.27.132 (172.31.27.132), Dst: 163.44.108.190 (163.44.108.190)
Transmission Control Protocol, Src Port: 49305 (49305), Dst Port: 80 (80), Seq: 0, Len: 0
Source Port: 49305 (49305)
Destination Port: 80 (80)
[Stream index: 3]
[TCP Segment Len: 0]
Sequence number: 0 (relative sequence number)
Acknowledgment number: 0
Header Length: 32 bytes
.... 0000 1100 0010 = Flags: 0x0c2 (SYN, ECN, CWR)
Window size value: 8192
[Calculated window size: 8192]
Checksum: 0xd7b4 [validation disabled]
Urgent pointer: 0
Options: (12 bytes), Maximum segment size, No-Operation (NOP), Window scale, No-Operation (NOP), No-Operation (NOP), SACK permitted
Maximum segment size: 8961 bytes
No-Operation (NOP)
Window scale: 8 (multiply by 256)
No-Operation (NOP)
No-Operation (NOP)
TCP SACK Permitted Option: True

So perhaps all of that is Too Much Information... but my question is
what are your opinions about what's going on? Is it good/bad/neutral
that NetBSD always sends TSval 1 in the initial SYN? What are Linux
and FreeBSD using as the initial value, and is it a trivial change to
have NetBSD do the same thing? I'm curious if doing that will fix the
issues connecting to this server. While it's possible/likely that the
other end is broken, I think there's pretty much zero chance that I'd
be able to get them to fix things, so having NetBSD get along with it
would be nice. And I suppose disabling timestamps is an OK
workaround...
--
Name: Dave Huang | Mammal, mammal / their names are called /
INet: ***@azeotrope.org | they raise a paw / the bat, the cat /
FurryMUCK: Dahan | dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 39 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2015-06-05 23:36:50 UTC
Permalink
Here's what the source says about why we send 1 on SYN:

/*
* Initialize our timebase. When we send timestamps, we take
* the delta from tcp_now -- this means each connection always
* gets a timebase of 1, which makes it, among other things,
* more difficult to determine how long a system has been up,
* and thus how many TCP sequence increments have occurred.
*
* We start with 1, because 0 doesn't work with linux, which
* considers timestamp 0 in a SYN packet as a bug and disables
* timestamps.
*/
tp->ts_timebase = tcp_now - 1;

We could easily modify this to send something based on uptime like
FreeBSD does and have it based on a sysctl. The question is, is rejecting
the packet based on tsval = 1 a reasonable behavior?

christos


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Dave Huang
2015-06-06 01:18:00 UTC
Permalink
Ah, I guess that does sound like a good thing in general :)

FWIW, I tried changing that line to tp->ts_timebase = 0, and that
fixed the problem with repeated connections to
thaipbs-live.cdn.byteark.com--ffmpeg was able to record the stream
without any issues. But as the comment implies, the initial timestamp
sent was very low, since I had just rebooted with the new kernel. I
wonder what other OSes do--start at some (pseudo-)random number?
--
Name: Dave Huang | Mammal, mammal / their names are called /
INet: ***@azeotrope.org | they raise a paw / the bat, the cat /
FurryMUCK: Dahan | dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 39 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2015-06-06 11:46:35 UTC
Permalink
Post by Christos Zoulas
tp->ts_timebase = tcp_now - 1;
We could easily modify this to send something based on uptime like
FreeBSD does and have it based on a sysctl.
I wouldn't mind using a mangled version of the current uptime as
ts_timebase, even unconditionally. But we shouldn't leak the uptime
without good reason.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Robert Elz
2015-06-06 00:42:49 UTC
Permalink
Date: Fri, 5 Jun 2015 23:36:50 +0000 (UTC)
From: ***@astron.com (Christos Zoulas)
Message-ID: <mktbqi$nsm$***@ger.gmane.org>

| The question is, is rejecting
| the packet based on tsval = 1 a reasonable behavior?

No, but it is believable that implementations might do that (not that the
"1" in particular should be important - more likely just the repeated
unchanging value.)

It is common to use tsval in combination with the seq number to
extend the range of the latter - making seq number roll around less
of a problem (it never used to be in the days of 56K bps links, buut
with 10GBps links, it can be a real problem.)

As long as the ISN is varying in the SYN packets for the new connections,
it should not be a problem.

But I can imagine a system relying on tsval being some kind of monotonic
time representation (over multiple connections) in order to get the larger
seq number space benefit, and treating multiple syns with the same tsval as
some kind of attack.

kre


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...