Discussion:
wm(4) performance issues
(too old to reply)
Jonathan A. Kollasch
2008-03-05 21:46:06 UTC
Permalink
Hi,

I recently picked up a Intel Pro/1000 PT Desktop wm(4).
(Because nfe(4) was rather unhappy for me. But that's another
story.)

I was surprised to find it has performance issues under NetBSD.

On a amd64 4.99.54 box (Socket 754, nforce4) I couldn't get it to source
or sink much more than 25 Mbyte/s. On another instance of the same model
of motherboard running 4.99.31, 57 Mbyte/s was obtainable. Both of these
rates are rather disappointing considering the speeds obtained under
Linux can reach 90+ Mbyte/s.

The peer machines I've used for testing are either a nforce4 nfe(4)
or a BCM5705 (32-bit Legacy PCI) bge(4) alone on a Intel 6300ESB's PCI-X
bus. Both of these boxes run NetBSD-current > 4.99.30.

Testing consists of my usual sending of /dev/zero via progress(1)
and pkgsrc/net/netcat6.

Any ideas how the performance of this wm(4) could be improved?

Jonathan Kollasch
Jonathan A. Kollasch
2008-03-06 15:42:00 UTC
Permalink
Post by Jonathan A. Kollasch
Hi,
I recently picked up a Intel Pro/1000 PT Desktop wm(4).
(Because nfe(4) was rather unhappy for me. But that's another
story.)
I was surprised to find it has performance issues under NetBSD.
On a amd64 4.99.54 box (Socket 754, nforce4) I couldn't get it to source
or sink much more than 25 Mbyte/s. On another instance of the same model
of motherboard running 4.99.31, 57 Mbyte/s was obtainable. Both of these
rates are rather disappointing considering the speeds obtained under
Linux can reach 90+ Mbyte/s.
The peer machines I've used for testing are either a nforce4 nfe(4)
or a BCM5705 (32-bit Legacy PCI) bge(4) alone on a Intel 6300ESB's PCI-X
bus. Both of these boxes run NetBSD-current > 4.99.30.
Testing consists of my usual sending of /dev/zero via progress(1)
and pkgsrc/net/netcat6.
Any ideas how the performance of this wm(4) could be improved?
Well, it appears that the interrupt moderation timers are to
blame. But that's odd, as the commit that adjusted them recently
claimed that they were changed to improve performance.

It's almost like the chip I have has some sort of
limit as to how many packets it will allow to be
received for a single interrupt (which are already
limited by the moderation timers).

Jonathan Kollasch
Thor Lancelot Simon
2008-03-08 14:07:20 UTC
Permalink
Post by Jonathan A. Kollasch
Hi,
I recently picked up a Intel Pro/1000 PT Desktop wm(4).
(Because nfe(4) was rather unhappy for me. But that's another
story.)
I was surprised to find it has performance issues under NetBSD.
On a amd64 4.99.54 box (Socket 754, nforce4) I couldn't get it to source
or sink much more than 25 Mbyte/s. On another instance of the same model
of motherboard running 4.99.31, 57 Mbyte/s was obtainable. Both of these
This is a single-stream test? What are your send and receive socket buffer
sizes at each end?

Tuning the driver (almost any driver, really) for 1Gbit/sec throughput with
our tiny default socket buffer sizes requires an unacceptably high interrupt
rate limit and CPU consumption. With reasonable socket buffer sizes for
gigabit networking, the driver seems to perform quite well for me (though
I think Simon is going to check in some more adjustments to the interrupt
timer code soon).

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Steven M. Bellovin
2008-03-08 16:35:28 UTC
Permalink
On Sat, 8 Mar 2008 09:07:20 -0500
Post by Thor Lancelot Simon
Post by Jonathan A. Kollasch
Hi,
I recently picked up a Intel Pro/1000 PT Desktop wm(4).
(Because nfe(4) was rather unhappy for me. But that's another
story.)
I was surprised to find it has performance issues under NetBSD.
On a amd64 4.99.54 box (Socket 754, nforce4) I couldn't get it to
source or sink much more than 25 Mbyte/s. On another instance of
the same model of motherboard running 4.99.31, 57 Mbyte/s was
obtainable. Both of these
This is a single-stream test? What are your send and receive socket
buffer sizes at each end?
Tuning the driver (almost any driver, really) for 1Gbit/sec
throughput with our tiny default socket buffer sizes requires an
unacceptably high interrupt rate limit and CPU consumption. With
reasonable socket buffer sizes for gigabit networking, the driver
seems to perform quite well for me (though I think Simon is going to
check in some more adjustments to the interrupt timer code soon).
Thor
Performance issues are quite complex. I ran a few tests with ttcp on
my home gigE network, on a variety of machines. The results are
certainly not intuitive. The machines differ widely in CPU speed; most
have some variety of wm. For example, a 1.5 Ghz AMD running 4.0rc4 can
receive from a 1.667 Ghz AMD running 4.0 at 480M bps, but it can only
send at 230M bps. Both have i82541PI chips with the IGP01E1000 phy.

Talking to a dual-core, 2.2 Ghz amd64-current laptop with a i82801H
chip, the faster of those two machines can send at 267M bps and receive
at 323M bps. That makes it seem as if -current can't send that fast,
compared to 4.0. However, that very same laptop can send at 670M bps
to a fast -current desktop with a bge card -- but it only receives from
it at 311M bps. (Both -current machines have tcp.sendbuf_auto and
tcp.recvbuf_auto set to 1.)


--Steve Bellovin, http://www.cs.columbia.edu/~smb

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jonathan A. Kollasch
2008-03-08 15:03:46 UTC
Permalink
Post by Thor Lancelot Simon
Post by Jonathan A. Kollasch
Hi,
I recently picked up a Intel Pro/1000 PT Desktop wm(4).
(Because nfe(4) was rather unhappy for me. But that's another
story.)
I was surprised to find it has performance issues under NetBSD.
On a amd64 4.99.54 box (Socket 754, nforce4) I couldn't get it to source
or sink much more than 25 Mbyte/s. On another instance of the same model
of motherboard running 4.99.31, 57 Mbyte/s was obtainable. Both of these
This is a single-stream test? What are your send and receive socket buffer
sizes at each end?
Single stream I believe. Defaults.
Post by Thor Lancelot Simon
Tuning the driver (almost any driver, really) for 1Gbit/sec throughput with
our tiny default socket buffer sizes requires an unacceptably high interrupt
rate limit and CPU consumption. With reasonable socket buffer sizes for
gigabit networking, the driver seems to perform quite well for me (though
I think Simon is going to check in some more adjustments to the interrupt
timer code soon).
Ah. I just find it a bit odd that wm(4) is different and possibly
perceived as worse than say, a low-end bge(4) or integrated nfe(4).

I was able to tune the moderation timers (mostly based on the
defaults in FreeBSD's em(4)) to get 950 to 980 mbit/s with
kttcp. However, I didn't bother to get a baseline from
before I adjusted them. :/

BTW,
What are the best ways to test this sort of thing?

Jonathan Kollasch
Thor Lancelot Simon
2008-03-08 15:07:51 UTC
Permalink
Post by Jonathan A. Kollasch
Post by Thor Lancelot Simon
This is a single-stream test? What are your send and receive socket buffer
sizes at each end?
Single stream I believe. Defaults.
To be blunt, running a single-stream TCP test on a gigabit network with
32 kilobyte socket buffers is a considerable waste of time; tuning based
on the results of the same a prodigious one.

You'll get more performance from your system with far less effort by
adjusting the default socket buffer sizes with sysctl (or turning on
automatic socket buffer sizing) than by turning down interrupt latency
in device drivers until your kernel is doing little else...

Thor

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...