Discussion:
4.0.1 NAT checksum failure?
(too old to reply)
der Mouse
2011-04-06 02:13:43 UTC
Permalink
I'm seeing something which looks like failure to recompute the IP
header checksum when NATting packets with 4.0.1.

I can't believe this wouldn't've been noticed long ago if it were a
generic problem (I'm even on i386), so there's obviously some respect
in which I'm pushing an envelope here.

For example, here's the post-NAT header of a NATted ping:

45 ip_hl=5 ip_v=4
00 ip_tos [Routine]
00 54 ip_len [84] (dropping 2 trailing bytes)
1a d8 ip_id
00 00 ip_off [0]
fe ip_ttl [254]
01 ip_p [ICMP]
18 82 ip_sum
45 c4 b5 1d ip_src [69.196.181.29]
d8 2e 05 0d ip_dst [216.46.5.13]

ip_sum is definitely wrong. But the pre-nat source address was
172.16.0.3, and if I compute the checksum with 45 c4 b5 1d replaced
with ac 10 00 03, I find that 18 82 is correct.

There's another machine (also i386 4.0.1) which is set up to do NAT for
two others, and it works for one of them and doesn't work for the
other. The only thing that I can see that could be related is that in
each of the failure cases, the failing address is an alias address on
the interface in question rather than being the principal address.

For example, to return to the ping whose header I quoted above, the
ping arrived on the NATting machine via ex0:

ex0: flags=8863<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=3f00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx>
enabled=0
address: 00:b0:d0:24:eb:c9
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet 10.0.7.1 netmask 0xffffff00 broadcast 10.0.7.255
inet alias 172.16.0.1 netmask 0xfffffff0 broadcast 172.16.0.15
inet6 fe80::2b0:d0ff:fe24:ebc9%ex0 prefixlen 64 scopeid 0x2

Note that 172.16.0.3 is on an alias network.

On the "one works, one doesn't" machine, the relevant interface is also
ex0, configured

ex0: flags=8b63<UP,BROADCAST,NOTRAILERS,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500
capabilities=3f00<IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx,TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx>
enabled=0
address: 00:10:5a:71:ba:b0
media: Ethernet autoselect (10baseT)
status: active
inet 10.0.4.1 netmask 0xffffff00 broadcast 10.0.4.255
inet alias 10.0.255.1 netmask 0xffffff00 broadcast 10.0.255.255
inet6 fe80::210:5aff:fe71:bab0%ex0 prefixlen 64 scopeid 0x3

and the working NAT is for 10.0.4.128 while the failing NAT is on the
10.0.255.* network (I can't recall the last octet offhand, the machine
isn't alive right now, and I'm not there to kick it).

Can anyone confirm or refute the theory that 4.0.1's NAT simply doesn't
get checksums right for addresses on alias networks in this sense?
I'll be digging through the code, but I don't know that code, and
strengthening or refuting my guess would help me focus my search.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matthew Mondor
2011-04-06 03:33:31 UTC
Permalink
On Tue, 5 Apr 2011 22:13:43 -0400 (EDT)
Post by der Mouse
Can anyone confirm or refute the theory that 4.0.1's NAT simply doesn't
get checksums right for addresses on alias networks in this sense?
I'll be digging through the code, but I don't know that code, and
strengthening or refuting my guess would help me focus my search.
It's not exactly clear to me which is inbound and outbound in the
example (it's also late in my tz, admitedly), but is it possible
that because hardware TCP4 checksum is enabled the dumps don't report
it properly yet output packets get the sum adjusted at actual delivery?

Thanks,
--
Matt

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2011-04-06 03:41:53 UTC
Permalink
Post by Matthew Mondor
Post by der Mouse
Can anyone confirm or refute the theory that 4.0.1's NAT simply
doesn't get checksums right for addresses on alias networks [...]
It's not exactly clear to me which is inbound and outbound in the
example (it's also late in my tz, admitedly), but is it possible that
because hardware TCP4 checksum is enabled the dumps don't report it
properly yet output packets get the sum adjusted at actual delivery?
Well, not TCP4, because this isn't TCP; it happens even on pings. But
presumably IP-layer checksums have similar bits.

It definitely is not as simple as I thought/feared it might be; I did
some more tests and found a test case where a ping from a non-alias
network does not get NATted correctly. Now I need to figure out what
the _actually_ relevant difference between working and broken is. :-/

Also, it's not just checksum offload; all network interfaces on this
machine are configured with no checksum offload, even the ones that are
capable of doing it. I definitely need to focus on the checksum code,
though since it's a checksum issue that's been obvious from the start.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Matthew Mondor
2011-04-06 04:22:51 UTC
Permalink
On Tue, 5 Apr 2011 23:41:53 -0400 (EDT)
Post by der Mouse
Well, not TCP4, because this isn't TCP; it happens even on pings. But
presumably IP-layer checksums have similar bits.
Indeed
Post by der Mouse
Also, it's not just checksum offload; all network interfaces on this
machine are configured with no checksum offload, even the ones that are
capable of doing it. I definitely need to focus on the checksum code,
though since it's a checksum issue that's been obvious from the start.
Oh, I didn't even notice that those were listed as capabilities but
indeed disabled, or they'd also show on the first line. Sorry about
that.

In case this can help any (this could have changed since netbsd-4), but
in netbsd-5 where I'm more familiar with code location, I seem to see
calls to in_delayed_cksum() in sys/netinet/ip_output.c which calls
in4_cksum.c's in4_cksum(), in turn using cpu_in_cksum.c's
cpu_in_cksum() (I've not checked but it'd be possible for each arch to
supply its own perhaps). It seems somewhat tricky as the offset/length
are parameters that could be wrongly passed in any caller.
Then in sys/dist/ipf/netinet/ there are a number of matching lines for
cksum as well, including in ip_nat.c ...

Good night and happy debugging,
--
Matt

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2011-04-06 04:43:04 UTC
Permalink
Also, it's not just checksum offload; [...]
Okay, I believe I know what was going on here.

In my tree, if_srt.c has code to call pfil_run_hooks, so that you can
configure srt to point out an interface and then configure NAT for
outgoing packets on that interface and have it work. (Without this,
the NAT code will run for the srt interface, not the forwarded-to
interface - but, because the return traffic comes in on the
forwarded-to interface rather than the srt interface, return traffic
won't be handled right even if you configure the NAT on the srt.)

Turned out the actual relevant difference between my "works" and
"fails" cases were that the "works" cases were hitting host routes that
bypassed the relevant srt - nothing to do with primary versus alias
addresses anywhere.

I've added code to recompute checksums after running pfil hooks

--- a/sys/net/if_srt.c
+++ b/sys/net/if_srt.c
@@ -298,6 +298,13 @@ static int srt_if_output(
{ simple_unlock(&sc->lock);
return(rv);
}
+ if (dst->sa_family == AF_INET)
+ { struct ip *ip;
+ ip = mtod(m,struct ip *);
+ ip->ip_sum = 0;
+ ip->ip_sum = in_cksum(m,ip->ip_hl<<2);
+ m->m_pkthdr.csum_flags &= ~M_CSUM_IPv4;
+ }
#endif
/*
* We have to hold sc->lock across the underlying interface's output

and now NAT works just fine in conjunction with srt.

Not directly relevant to NetBSD, because, as far as I can tell,
NetBSD's srt doesn't have the pfil hooks calls to make NAT even *try*
to work with srt-based routing. (If anyone is interested in changing,
that, I'll be happy to do what I can to help, though the code has been
mangled badly enough by things like KNFification that it probably will
require some human-layer intelligence, not just diffs.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...