[patch] bug fix & TCP networking performance improvements

Discussion:

(too old to reply)

David Young

2011-04-08 17:16:20 UTC

I am preparing to commit some improvements to TCP networking and a bug
fix both contributed by CoyotePoint Systems, Inc. The patch is large,
so I posted it at
<ftp://elmendorf.ojctech.com/users/netbsd-bdffaa79/devel-vtw-diff1>.

Please review the patches and send any questions/comments you may have.

Here's the proposed commit message:

Sometimes the reclamation of protocol buffers by tcp_drain(),
arp_drain(), frag6_drain(), or ip_drain() will sleep. When a
protocol-drain routines called in an interrupt context sleeps, the
system deadlocks. Avoid a deadlock when memory is low by deferring to a
thread context the reclamation of protocol buffers. The protocol's fast
time-out timer, protosw.pr_fasttimo, runs in thread context, so defer
reclamation to the next fast time-out.

Even after a TCP session enters the TIME_WAIT state, its corresponding
socket and protocol control blocks (PCBs) stick around until the TCP
Maximum Segment Lifetime (MSL) expires. On a host whose workload
necessarily creates and closes down many TCP sockets, the sockets & PCBs
for TCP sessions in TIME_WAIT state amount to many megabytes of dead
weight in RAM.

This patch reduces the resources demanded by TIME_WAIT-state sessions
using methods called Vestigial Time-Wait and Maximum Segment Lifetime
Truncation.

Maximum Segment Lifetimes Truncation (MSLT) assigns each TCP session to
a class based on the nearness of the peer. Corresponding to each class
is an MSL, and a session uses the MSL of its class. The classes are
loopback (local host equals remote host), local (local host and remote
host are on the same link/subnet), and remote (local host and remote
host communicate via one or more gateways). Classes corresponding to
nearer peers have lower MSLs by default: 2 seconds for loopback, 10
seconds for local, 60 seconds for remote. Loopback and local sessions
expire more quickly when MSLT is used.

Vestigial Time-Wait (VTW) replaces a TIME_WAIT session's PCB/socket
dead weight with a compact representation of the session, called a
"vestigial PCB". VTW data structures are designed to be very fast and
memory-efficient: for fast insertion and lookup of vestigial PCBs,
the PCBs are stored in a hash table that is designed to minimize the
number of cacheline visits per lookup/insertion. The memory both
for vestigial PCBs and for elements of the PCB hashtable come from
fixed-size pools, and linked data structures exploit this to conserve
memory by representing references with a narrow index/offset from the
start of a pool instead of a pointer. When space for new vestigial PCBs
runs out, VTW makes room by discarding old vestigial PCBs, oldest first.
VTW cooperates with MSLT.

It may help to think of VTW as a "FIN cache" by analogy to the SYN
cache.

A 2.8-GHz Pentium 4 running a test workload that creates TIME_WAIT
sessions as fast as it can is approximately 17% idle when VTW is active
versus 0% idle when VTW is inactive. It has 103 megabytes more free RAM
when VTW is active (approximately 64k vestigial PCBs are created) than
when it is inactive.

Dave

--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 344-0444 x24

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de

Matthias Drochner

2011-04-14 20:44:36 UTC