Discussion:
wm(4) and the maximum buffer length for TSO
(too old to reply)
David Young
2012-09-19 23:34:15 UTC
Permalink
wm(4) sets up its Tx DMA maps like this,

if ((error = bus_dmamap_create(sc->sc_dmat, WM_MAXTXDMA,
WM_NTXSEGS, WTX_MAX_LEN, 0, 0,
&sc->sc_txsoft[i].txs_dmamap)) != 0) {

WM_MAXTXDMA is round_page(IP_MAXPACKET) == round_page(65535) ==
65536. Thus wm(4) will fail to map for Tx any mbuf whose m_pkthdr.len
65536. That's ok if tcp_output() produces a buffer no longer
than 65536 bytes for the NIC to segment, but in practice it will
produce a longer buffer because first it clamps the length to
IP_MAXPACKET,

if (use_tso) {
/*
* Truncate TSO transfers to IP_MAXPACKET, and make
* sure that we send equal size transfers down the
* stack (rather than big-small-big-small-...).
*/
#ifdef INET6
CTASSERT(IPV6_MAXPACKET == IP_MAXPACKET);
#endif
len = (min(len, IP_MAXPACKET) / txsegsize) * txsegsize;

...

and then it adds in the combined length of the IP and TCP headers:

m->m_pkthdr.len = hdrlen + len;

In this way, wm(4) can see m->m_pkthdr.len greater than 65536 and fail
to map m. It will send no feedback to TCP to stop trying to send such
long un-segmented buffers. Also, it looks to me like it will retry
forever to map the same mbuf for DMA---that matches the misbehavior that
we're seeing at $DAYJOB, where the wm(4) ceases to transmit anything.

It seems to me that drivers should advertise the maximum unsegmented
buffer length that they support (at least some wm instances support
1MB), and tcp_output() should be very careful not to send a buffer any
longer than what is supported. What do you think?

Dave
--
David Young
***@pobox.com Urbana, IL (217) 721-9981

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Manuel Bouyer
2012-09-20 07:12:00 UTC
Permalink
Post by David Young
wm(4) sets up its Tx DMA maps like this,
if ((error = bus_dmamap_create(sc->sc_dmat, WM_MAXTXDMA,
WM_NTXSEGS, WTX_MAX_LEN, 0, 0,
&sc->sc_txsoft[i].txs_dmamap)) != 0) {
WM_MAXTXDMA is round_page(IP_MAXPACKET) == round_page(65535) ==
65536. Thus wm(4) will fail to map for Tx any mbuf whose m_pkthdr.len
65536. That's ok if tcp_output() produces a buffer no longer
than 65536 bytes for the NIC to segment, but in practice it will
produce a longer buffer because first it clamps the length to
IP_MAXPACKET,
if (use_tso) {
/*
* Truncate TSO transfers to IP_MAXPACKET, and make
* sure that we send equal size transfers down the
* stack (rather than big-small-big-small-...).
*/
#ifdef INET6
CTASSERT(IPV6_MAXPACKET == IP_MAXPACKET);
#endif
len = (min(len, IP_MAXPACKET) / txsegsize) * txsegsize;
...
m->m_pkthdr.len = hdrlen + len;
In this way, wm(4) can see m->m_pkthdr.len greater than 65536 and fail
to map m. It will send no feedback to TCP to stop trying to send such
long un-segmented buffers. Also, it looks to me like it will retry
forever to map the same mbuf for DMA---that matches the misbehavior that
we're seeing at $DAYJOB, where the wm(4) ceases to transmit anything.
I's strange that I didn't run into this, I do use TSO with wm(4).
I guess that the problem is dependant on the value of txsegsize:
if its value is right, len will be rounded down and there is
enough space for the header.
Post by David Young
It seems to me that drivers should advertise the maximum unsegmented
buffer length that they support (at least some wm instances support
1MB),
Some are also limited to 64k
Post by David Young
and tcp_output() should be very careful not to send a buffer any
longer than what is supported. What do you think?
I think we need something like that.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2012-09-20 15:19:16 UTC
Permalink
Post by Manuel Bouyer
Post by David Young
wm(4) sets up its Tx DMA maps like this,
if ((error = bus_dmamap_create(sc->sc_dmat, WM_MAXTXDMA,
WM_NTXSEGS, WTX_MAX_LEN, 0, 0,
&sc->sc_txsoft[i].txs_dmamap)) != 0) {
WM_MAXTXDMA is round_page(IP_MAXPACKET) == round_page(65535) ==
65536. Thus wm(4) will fail to map for Tx any mbuf whose m_pkthdr.len
65536. That's ok if tcp_output() produces a buffer no longer
than 65536 bytes for the NIC to segment, but in practice it will
produce a longer buffer because first it clamps the length to
IP_MAXPACKET,
if (use_tso) {
/*
* Truncate TSO transfers to IP_MAXPACKET, and make
* sure that we send equal size transfers down the
* stack (rather than big-small-big-small-...).
*/
#ifdef INET6
CTASSERT(IPV6_MAXPACKET == IP_MAXPACKET);
#endif
len = (min(len, IP_MAXPACKET) / txsegsize) * txsegsize;
...
m->m_pkthdr.len = hdrlen + len;
In this way, wm(4) can see m->m_pkthdr.len greater than 65536 and fail
to map m. It will send no feedback to TCP to stop trying to send such
long un-segmented buffers. Also, it looks to me like it will retry
forever to map the same mbuf for DMA---that matches the misbehavior that
we're seeing at $DAYJOB, where the wm(4) ceases to transmit anything.
I's strange that I didn't run into this, I do use TSO with wm(4).
if its value is right, len will be rounded down and there is
enough space for the header.
I rounded 65535 using several values of txsegsize that I thought were
likely, but none seemed to yield len that wasn't substantially less than
65535, so there should have been plenty of room for a header. I'll have
to log the txsegsize sometime and see what is actually chosen.
Post by Manuel Bouyer
Post by David Young
and tcp_output() should be very careful not to send a buffer any
longer than what is supported. What do you think?
I think we need something like that.
Ok. It seems that tcp_output() should be rejiggered to compute the
hdrlen before finalizing len (if that is possible) and then to clamp len
at (IP_MAXPACKET - hdrlen). I guess that we can prevail upon the driver
to create a DMA map whose size is IP_MAXPACKET + ethernet header length
+ other encapsulation overhead (e.g., VLAN tag), but I *think* I would
prefer that all of the encapsulation overhead between TCP and the wire
was conveyed to TCP. Maybe in the TCP/IP Stack Overhaul of Our Dreams,
we can do something like that.

Dave
--
David Young
***@pobox.com Urbana, IL (217) 721-9981

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...