Discussion:
Odd TCP snd_cwnd behaviour, resulting in pathetic throughput?
(too old to reply)
Paul Ripke
2022-02-09 06:05:22 UTC
Permalink
I've recently been trying to get duplicity running for backups of my
main home server, out to the magical cloud (aws s3 to wasabi.com).

I have discovered that bandwidth oscillates in a sawtooth fashion,
ramping up from near nothing, to saturate my uplink (somewhere around
20-30Mbit/s), then I get a dropped packet, and it drops to near nothing
and repeats, each cycle taking around a minute. Looking at netstat -P
for the PCB in question, I see the snd_cwnd following the pattern,
which makes sense. I've flipped between reno, newreno & cubic, and
while subtly different, they all have the snd_cwnd dropping to near
nothing after a single dropped packet. I didn't think this was
expected behaviour, especially with SACKs enabled.

Reading tcpdump, the only odd thing I see is that the duplicate ack
triggering the fast retransmit is repeated 70+ times. But tracing
other flows, this doesn't seem abnormal.

It's worth noting that running their "speedtest" thru firefox running
on the same machine is fine - and bandwidth is as I'd expect.

Is there anyone willing to take a look at a pcap and tell me what
I'm missing? ie. cluebat, please?

fwiw, I do have npf and altq configured, but disabling altq doesn't
appear to change the behaviour.

fwiw#2, I briefly toyed with the idea of bringing BBR from FreeBSD,
but I think we'd need more infrastructure for doing pacing? And while
it might "fix" this, I think we're better off fixing whatever is
actually broken.

Thanks,
--
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael van Elst
2022-02-09 06:33:19 UTC
Permalink
Post by Paul Ripke
I've recently been trying to get duplicity running for backups of my
main home server, out to the magical cloud (aws s3 to wasabi.com).
I have discovered that bandwidth oscillates in a sawtooth fashion,
ramping up from near nothing, to saturate my uplink (somewhere around
20-30Mbit/s), then I get a dropped packet, and it drops to near nothing
and repeats, each cycle taking around a minute.
That could be someone doing bandwidth shaping on the connection.

If ALTQ is working (means: you need to bump HZ for modern network speeds),
you could try to throttle outgoing data with CBQ to see if the pattern
goes away.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Greg Troxel
2022-02-09 12:47:04 UTC
Permalink
Post by Paul Ripke
I've recently been trying to get duplicity running for backups of my
main home server, out to the magical cloud (aws s3 to wasabi.com).
I have discovered that bandwidth oscillates in a sawtooth fashion,
ramping up from near nothing, to saturate my uplink (somewhere around
20-30Mbit/s), then I get a dropped packet, and it drops to near nothing
and repeats, each cycle taking around a minute. Looking at netstat -P
for the PCB in question, I see the snd_cwnd following the pattern,
which makes sense. I've flipped between reno, newreno & cubic, and
while subtly different, they all have the snd_cwnd dropping to near
nothing after a single dropped packet. I didn't think this was
expected behaviour, especially with SACKs enabled.
Long ago I almost found a bug in how our TCP retransmits, and didn't
manage, for reasons not really about the bug, to identify and land a fix.
Post by Paul Ripke
Reading tcpdump, the only odd thing I see is that the duplicate ack
triggering the fast retransmit is repeated 70+ times. But tracing
other flows, this doesn't seem abnormal.
That is normal. However on the third dupack it should cause "fast
recovery", clocking out one new (or retransmitted) packet every dupack.
Post by Paul Ripke
It's worth noting that running their "speedtest" thru firefox running
on the same machine is fine - and bandwidth is as I'd expect.
huh.
Post by Paul Ripke
Is there anyone willing to take a look at a pcap and tell me what
I'm missing? ie. cluebat, please?
Sure, send it to me, or put it up for download.


The right tool for this is in pkgsrc as xplot-devel, and it may be that
you need a not-packaged modified tcpdump2xplot to keep up with tcpdump
drift.
Post by Paul Ripke
fwiw, I do have npf and altq configured, but disabling altq doesn't
appear to change the behaviour.
fwiw#2, I briefly toyed with the idea of bringing BBR from FreeBSD,
but I think we'd need more infrastructure for doing pacing? And while
it might "fix" this, I think we're better off fixing whatever is
actually broken.
What you should be seeing is a sawtooth in speed, but from half to full
and then dropping to half, very very roughly.
Paul Ripke
2022-02-10 23:29:25 UTC
Permalink
Post by Michael van Elst
Post by Paul Ripke
I've recently been trying to get duplicity running for backups of my
main home server, out to the magical cloud (aws s3 to wasabi.com).
I have discovered that bandwidth oscillates in a sawtooth fashion,
ramping up from near nothing, to saturate my uplink (somewhere around
20-30Mbit/s), then I get a dropped packet, and it drops to near nothing
and repeats, each cycle taking around a minute.
That could be someone doing bandwidth shaping on the connection.
If ALTQ is working (means: you need to bump HZ for modern network speeds),
you could try to throttle outgoing data with CBQ to see if the pattern
goes away.
While I haven't bumped HZ (hanging out for tickless, etc :) ), I do use
altq with cbq, and if I do (with an appropriate TBR), I end up blowing
the queue on the altq class, since the configured TCP window size is
greater than the maximum queue I can configure, and altq ends up dropping
packets and triggering the same behaviour.

afaict, altq "mostly works" even without bumping HZ, but perhaps not
very optimally.
--
Paul Ripke
"Great minds discuss ideas, average minds discuss events, small minds
discuss people."
-- Disputed: Often attributed to Eleanor Roosevelt. 1948.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...