Discussion:
Revamping optimised in_cksum/in4_cksum/in6_cksum support
(too old to reply)
Joerg Sonnenberger
2008-01-10 15:12:08 UTC
Permalink
Hi all,
there are currently three very similiar checksum algorithms in the
IPv4/IPv6 stack that platforms should provide assembler versions for.

in_cksum is the core and implemented on all platforms.

in4_cksum and in6_cksum are variants of in_cksum that skip parts of the
mbuf chain and instead use a separately computed initial checksum over
the header. in6_cksum is currently only implement for i386.

I would like to simplify this and implement in4_cksum and in6_cksum in C
and have a single MD backend function (md_in_cksum).

md_in_cksum takes the mbuf, len, initial cksum (as uint32_t) and offset
to skip. The required modifications for the existing in_cksum
implementation is loading the initial checksum instead of zeroing it and
a small loop to compensate for the offset. I expect this to two loads
and a branch for in_cksum, which should be cheap enough to not bother
with a single entry point.

Opinions?

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2008-01-24 18:16:49 UTC
Permalink
Post by Joerg Sonnenberger
I would like to simplify this and implement in4_cksum and in6_cksum in C
and have a single MD backend function (md_in_cksum).
With the a small modification (cpu_in_cksum as MD backend), the patch
http://www.netbsd.org/~joerg/cpu_in_cksum.diff implements this. The
portable version was tested on Sparc and Alpha. i386 and amd64 are
converted, other platforms will need some changes.

A regression test can be found in src/regress/sys/net/in_cksum.

The existing in_cksum implementation need to be adjusted to do the
equivalent of the first for loop in the portable code.

I would like to continue with this without compat code, the MD work is
relative straight forward. The MD versions should be timed, at least on
Alpha the portable version is faster than the current MD code, I
wouldn't be surprised if other platforms have similiar issues.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Gerald Lee
2008-01-24 19:32:16 UTC
Permalink
Compiling for an EV64260, I got an error building tcp_usrreq.
The issue relates to the static function inet6_ident_core not being
behind an INET6 ifdef. Diff below.
--

RCS file: /cvsroot/src/sys/netinet/tcp_usrreq.c,v
retrieving revision 1.129.2.1
diff -c -r1.129.2.1 tcp_usrreq.c
*** tcp_usrreq.c 21 Jan 2008 20:17:48 -0000 1.129.2.1
--- tcp_usrreq.c 24 Jan 2008 19:31:46 -0000
***************
*** 1150,1155 ****
--- 1150,1156 ----
return copyout_uid(sockp, oldp, oldlenp);
}

+ #ifdef INET6
static inline int
inet6_ident_core(struct in6_addr *raddr, u_int rport,
struct in6_addr *laddr, u_int lport,
***************
*** 1180,1185 ****
--- 1181,1187 ----
else
return copyout_uid(sockp, oldp, oldlenp);
}
+ #endif

/*
* sysctl helper routine for the net.inet.tcp.drop and


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2008-01-24 19:36:04 UTC
Permalink
Post by Gerald Lee
Compiling for an EV64260, I got an error building tcp_usrreq.
The issue relates to the static function inet6_ident_core not being
behind an INET6 ifdef. Diff below.
It is in my tree. It was added last June.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Christos Zoulas
2008-01-25 18:46:56 UTC
Permalink
Post by Gerald Lee
Compiling for an EV64260, I got an error building tcp_usrreq.
The issue relates to the static function inet6_ident_core not being
behind an INET6 ifdef. Diff below.
--
RCS file: /cvsroot/src/sys/netinet/tcp_usrreq.c,v
retrieving revision 1.129.2.1
diff -c -r1.129.2.1 tcp_usrreq.c
*** tcp_usrreq.c 21 Jan 2008 20:17:48 -0000 1.129.2.1
--- tcp_usrreq.c 24 Jan 2008 19:31:46 -0000
***************
*** 1150,1155 ****
--- 1150,1156 ----
return copyout_uid(sockp, oldp, oldlenp);
}
+ #ifdef INET6
static inline int
inet6_ident_core(struct in6_addr *raddr, u_int rport,
struct in6_addr *laddr, u_int lport,
***************
*** 1180,1185 ****
--- 1181,1187 ----
else
return copyout_uid(sockp, oldp, oldlenp);
}
+ #endif
/*
* sysctl helper routine for the net.inet.tcp.drop and
thanks, it seems to have been fixed on head.

christos




--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Simon Burge
2008-01-27 11:23:31 UTC
Permalink
Post by Joerg Sonnenberger
Post by Joerg Sonnenberger
I would like to simplify this and implement in4_cksum and in6_cksum in C
and have a single MD backend function (md_in_cksum).
With the a small modification (cpu_in_cksum as MD backend), the patch
http://www.netbsd.org/~joerg/cpu_in_cksum.diff implements this. The
portable version was tested on Sparc and Alpha. i386 and amd64 are
converted, other platforms will need some changes.
A regression test can be found in src/regress/sys/net/in_cksum.
The existing in_cksum implementation need to be adjusted to do the
equivalent of the first for loop in the portable code.
I would like to continue with this without compat code, the MD work is
relative straight forward. The MD versions should be timed, at least on
Alpha the portable version is faster than the current MD code, I
wouldn't be surprised if other platforms have similiar issues.
One part of this concerns me:

while (mlen >= 32) {
__builtin_prefetch(data + 32);

On at least some MIPS and PowerPC implementations (and possibly others)
a prefetch on an address that isn't mapped can cause a bus error or
similar exception. If you have an buffer that finishes on a page
boundary and the next page isn't mapped, you may trip up on this.

I'm not sure how to best deal with this. Adding extra tests before
calling __builtin_prefetch may negate the benefits of that prefetch.

Simon.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2008-01-27 13:54:48 UTC
Permalink
Post by Simon Burge
while (mlen >= 32) {
__builtin_prefetch(data + 32);
On at least some MIPS and PowerPC implementations (and possibly others)
a prefetch on an address that isn't mapped can cause a bus error or
similar exception.
__builtin_prefetch is not supposed to generate traps. If you can
actually trigger that, it is a GCC bug.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Simon Burge
2008-01-27 14:18:03 UTC
Permalink
Post by Joerg Sonnenberger
Post by Simon Burge
while (mlen >= 32) {
__builtin_prefetch(data + 32);
On at least some MIPS and PowerPC implementations (and possibly others)
a prefetch on an address that isn't mapped can cause a bus error or
similar exception.
__builtin_prefetch is not supposed to generate traps. If you can
actually trigger that, it is a GCC bug.
At least MIPS and PowerPC will generate "pref" and "dcbt" instructions
respectively that point to potentially invalid addresses with the code
as it currently is.

I had a look at the gcc description of __builtin_prefetch() and I'm
curious as to how "Data prefetch does not generate faults if ADDR
is invalid" can be guaranteed if it just blindly issues the raw
cache/prefetch instructions with the addresses you pass it when those
instructions can fault on at least some CPUs of both the architectures
I've looked at so far...

Simon.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
der Mouse
2008-01-27 17:08:09 UTC
Permalink
Post by Simon Burge
Post by Simon Burge
__builtin_prefetch(data + 32);
__builtin_prefetch is not supposed to generate traps. If you can
actually trigger that, it is a GCC bug.
At least MIPS and PowerPC will generate "pref" and "dcbt"
instructions respectively that point to potentially invalid addresses
with the code as it currently is.
And they can trap? Then surely that's just a bug in the MIPS/PPC
configuration for gcc, no?
Post by Simon Burge
I had a look at the gcc description of __builtin_prefetch() and I'm
curious as to how "Data prefetch does not generate faults if ADDR is
invalid" can be guaranteed if it just blindly issues the raw
cache/prefetch instructions with the addresses you pass it when those
instructions can fault on at least some CPUs of both the
architectures I've looked at so far...
Just means the configurations for those CPUs are broken, I'd say.

Sounds to me as though someone implemented prefetching on those CPUs
without reading the specs closely enough (whether the gcc specs or the
CPU specs, I'm in no position to say).

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Joerg Sonnenberger
2008-01-27 14:17:40 UTC
Permalink
Post by Simon Burge
At least MIPS and PowerPC will generate "pref" and "dcbt" instructions
respectively that point to potentially invalid addresses with the code
as it currently is.
I remember some discussion of the Power4 modifications for Linux to map
NULL for the same reason. If in doubt, we can conditionalize that
statement for platforms that don't have a save prefetch.

Joerg

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Simon Burge
2008-01-27 14:49:59 UTC
Permalink
Post by Joerg Sonnenberger
Post by Simon Burge
At least MIPS and PowerPC will generate "pref" and "dcbt" instructions
respectively that point to potentially invalid addresses with the code
as it currently is.
I remember some discussion of the Power4 modifications for Linux to map
NULL for the same reason. If in doubt, we can conditionalize that
statement for platforms that don't have a save prefetch.
The cases I'm referring to aren't against zero or low addresses, but for
any generic virtual addresses that aren't mapped.

I'm thinking that it might be safer to default to no prefetch (or at
least to only prefetching known valid addresses) given that we have two
analysed architectures so far and both have the possibility of problems.

Simon.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2008-01-27 23:47:50 UTC
Permalink
Post by Joerg Sonnenberger
Post by Simon Burge
At least MIPS and PowerPC will generate "pref" and "dcbt" instructions
respectively that point to potentially invalid addresses with the code
as it currently is.
I remember some discussion of the Power4 modifications for Linux to map
NULL for the same reason. If in doubt, we can conditionalize that
statement for platforms that don't have a save prefetch.
Joerg
just FYI:
http://www.kerneltraffic.org/kernel-traffic/kt20031006_234.html#3

(i don't know if __builtin_prefetch can produce the relevant instructions.)

YAMAMOTO Takashi

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...