Discussion:
pluggable pktq_rps_hash()
(too old to reply)
Kengo NAKAHARA
2021-09-24 04:27:26 UTC
Permalink
Hi,

I implement some of pktq_rps_hash() and make pluggable it for each
interface type such as ether, pppoe, gif, and ipsecif.
Here is the patch
https://github.com/knakahara/netbsd-src/commit/68a43105b541aec79c6f067e81dbe34ad2282dec

Of course, the default behavior is not changed. We can change RPS
behavior by "sysctl -w net.{ether,pppoe,gif,ipsecif}.rps_hash=${value}".
Currently, the values are
- zero : always return 0
- curcpu : return current cpuid
- toeplitz : return toeplitz hash value which calculated from mbuf

I think it is easy to add new rps hash functions to the above patch
and to apply the pluggable rps hash function to other interfaces.

Could you comment the patch?


Thanks,
--
//////////////////////////////////////////////////////////////////////
Internet Initiative Japan Inc.

Device Engineering Section,
Product Division,
Technology Unit

Kengo NAKAHARA <k-***@iij.ad.jp>



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2021-10-07 03:45:11 UTC
Permalink
Hi,
Can I commit the following patch?
If there are no objection, I will commit the patch next week.
I’m sorry I missed it before! I just reviewed it and it looks fine to me.

-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Taylor R Campbell
2021-10-07 09:26:24 UTC
Permalink
Date: Thu, 7 Oct 2021 12:06:08 +0900
Post by Kengo NAKAHARA
Hi,
I implement some of pktq_rps_hash() and make pluggable it for each
interface type such as ether, pppoe, gif, and ipsecif.
Here is the patch
��� https://github.com/knakahara/netbsd-src/commit/68a43105b541aec79c6f067e81dbe34ad2282dec
Of course, the default behavior is not changed.� We can change RPS
behavior by "sysctl -w net.{ether,pppoe,gif,ipsecif}.rps_hash=${value}".
Currently, the values are
��� - zero : always return 0
��� - curcpu : return current cpuid
��� - toeplitz : return toeplitz hash value which calculated from mbuf
I think it is easy to add new rps hash functions to the above patch
and to apply the pluggable rps hash function to other interfaces.
Could you comment the patch?
Looks reasonable! Some minor comments:

- Where you do atomic_swap_ptr(func, ...) and discard the result, you
should just do atomic_store_relaxed(func, ...) instead. Cheaper way
to get the same effect.

- pktqueue.c, pktq_rps_hash_toeplitz, line 266:
}
else if (...) {
should be
} else if (...) {

- In the sysctl definitions, no need for explicit (void *) cast.

- Indentation in the sysctl_createv calls is a little confusing -- I
suggest putting `CTL_CREATE, CTL_EOL' on its own line to make it
clear that `PKTQ_RPS_HASH_NAME_LEN' is not actually part of the
sysctl mib.

(Is it necessary to pass PKTQ_RPS_HASH_NAME_LEN, anyway? A lot of
sysctl_createv calls just pass zero here. I can't tell from the man
page what it does.)

- pktqueue.c, sysctl_pktq_rps_hash_handler, lines 320 and 325: no need
for a stack buffer to copy the type name into; can just do

const char *type;
...
type = pktq_get_rps_hash_type(*func);

- Instead of copying & pasting the conditional default everywhere,

static pktq_rps_hash_func_t xyzif_pktq_rps_hash_p =
#ifdef NET_MPSAFE
pktq_rps_hash_curcpu;
#else
pktq_rps_hash_zero;
#endif

how about defining a default in pktqueue.c/h and using that
everywhere?

static pktq_rps_hash_func_t xyzif_pktq_rps_hash_p =
pktq_rps_hash_default;

Or is there a reason to make the default subsystem-specific?

- Could avoid writing lots of subsystem-specific wrapper functions
like ether_pktq_rps_hash by defining a function to call the
subsystem-specific pktq rps hash function:

/* pktqueue.c */
uint32_t
pktq_rps_hash(pktq_rps_hash_func_t *funcp, struct mbuf *m)
{
pktq_rps_hash_func_t func = atomic_load_relaxed(funcp);

KASSERT(func != NULL);

return (*func)(m);
}

/* if_ethersubr.c */
const uint32_t h = pktq_rps_hash(&ether_pktq_rps_hash_p, m);

This would also avoid triggering sanitizers which object to loading
ether_pktq_rps_hash_p with a non-atomic load. (Not actually a
problem on any CPUs we care about, but helps reduce false alarms
from sanitizers.)

This would also give us more flexibility if the pktq rps hash had
more state, e.g. maybe a random seed alongside it. (Not sure that's
an important avenue of flexibility, but it keeps the API details
more neatly confined to pktqueue.c.)

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Taylor R Campbell
2021-10-08 09:33:21 UTC
Permalink
Date: Fri, 8 Oct 2021 14:56:03 +0900
Post by Taylor R Campbell
- In the sysctl definitions, no need for explicit (void *) cast.
Yes, however, that causes following build failure.
====================
[...]
/disk2/home/k-nakahara/repos/netbsd-src/sys/sys/sysctl.h:1152:42: note: expected 'char *' but argument is of type 'uint32_t (**)(const struct mbuf *)' {aka 'unsigned int (**)(const struct mbuf *)'}
1152 | __sysctl_verify_##ctl_type##_arg(c_type *arg) \
====================
https://nxr.netbsd.org/xref/src/sys/sys/sysctl.h#1159
I think the above verifying macro is wrong, hmm, it should be fixed
another time...
I see -- it's because you defined it with CTLTYPE_STRING, but the
object stored in the kernel and passed to sysctl_pktq_rps_hash_handler
isn't actually a string. So the (void *) cast is correct after all.
Post by Taylor R Campbell
(Is it necessary to pass PKTQ_RPS_HASH_NAME_LEN, anyway? A lot of
sysctl_createv calls just pass zero here. I can't tell from the man
page what it does.)
Yes, because the "flags" of sysctl_createv is CTLFLAG_READWRITE. "newlen"
argument is used to set buffer size from userland. A lot of sysctl which
calls pass zero would have CTLFLAG_READONLY "flags".
OK, thanks. This sysctl(9) API sure is difficult to understand...
Post by Taylor R Campbell
- pktqueue.c, sysctl_pktq_rps_hash_handler, lines 320 and 325: no need
for a stack buffer to copy the type name into; can just do
const char *type;
...
type = pktq_get_rps_hash_type(*func);
I think that is required by sysctl_lookup() to pass old value to userland.
Sorry if I am wrong.
I guess so!
I rebase and update patch, here is the update diff,
https://github.com/knakahara/netbsd-src/commit/7ce7c9293782916c8a2e486ca6605911f4b4fea1
and here is unified diff with a little update,
https://github.com/knakahara/netbsd-src/pull/8
Thanks! One tiny nit left: why pktq_rps_hash(&pktq_rps_hash_default,
...) in dev/pci/xmm7360.c while all the others get their own sysctl?
Otherwise, LGTM.

(A legitimate answer is `I don't have the hardware to test changes to
xmm7360.c, so someone else will have to do it.')

(Maybe wg(4) should also have the pktq_rps_hash sysctl treatment too;
for some reason I don't remember I (or maybe Ozaki-san?) just wrote
curcpu()->ci_index with a comment `// pktq_rps_hash(m)'.)

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Kengo NAKAHARA
2021-10-11 04:18:51 UTC
Permalink
Hi,

Thank you for your comments.
Post by Taylor R Campbell
Thanks! One tiny nit left: why pktq_rps_hash(&pktq_rps_hash_default,
...) in dev/pci/xmm7360.c while all the others get their own sysctl?
Otherwise, LGTM.
(A legitimate answer is `I don't have the hardware to test changes to
xmm7360.c, so someone else will have to do it.')
Exactly. Moreover, I am not sure which sysctl entry should be used
"net.wwan.rps_hash", "hw.xmm7360.rps_hash", or something else.
Post by Taylor R Campbell
(Maybe wg(4) should also have the pktq_rps_hash sysctl treatment too;
curcpu()->ci_index with a comment `// pktq_rps_hash(m)'.)
Ozaki-san says he don't know the comment and the subsequent pktq_enqueue().
Hmm, I think wg(4) may require more than one pktq_rps_hash sysctl.


Thanks,
Post by Taylor R Campbell
Date: Fri, 8 Oct 2021 14:56:03 +0900
Post by Taylor R Campbell
- In the sysctl definitions, no need for explicit (void *) cast.
Yes, however, that causes following build failure.
====================
[...]
/disk2/home/k-nakahara/repos/netbsd-src/sys/sys/sysctl.h:1152:42: note: expected 'char *' but argument is of type 'uint32_t (**)(const struct mbuf *)' {aka 'unsigned int (**)(const struct mbuf *)'}
1152 | __sysctl_verify_##ctl_type##_arg(c_type *arg) \
====================
https://nxr.netbsd.org/xref/src/sys/sys/sysctl.h#1159
I think the above verifying macro is wrong, hmm, it should be fixed
another time...
I see -- it's because you defined it with CTLTYPE_STRING, but the
object stored in the kernel and passed to sysctl_pktq_rps_hash_handler
isn't actually a string. So the (void *) cast is correct after all.
Post by Taylor R Campbell
(Is it necessary to pass PKTQ_RPS_HASH_NAME_LEN, anyway? A lot of
sysctl_createv calls just pass zero here. I can't tell from the man
page what it does.)
Yes, because the "flags" of sysctl_createv is CTLFLAG_READWRITE. "newlen"
argument is used to set buffer size from userland. A lot of sysctl which
calls pass zero would have CTLFLAG_READONLY "flags".
OK, thanks. This sysctl(9) API sure is difficult to understand...
Post by Taylor R Campbell
- pktqueue.c, sysctl_pktq_rps_hash_handler, lines 320 and 325: no need
for a stack buffer to copy the type name into; can just do
const char *type;
...
type = pktq_get_rps_hash_type(*func);
I think that is required by sysctl_lookup() to pass old value to userland.
Sorry if I am wrong.
I guess so!
I rebase and update patch, here is the update diff,
https://github.com/knakahara/netbsd-src/commit/7ce7c9293782916c8a2e486ca6605911f4b4fea1
and here is unified diff with a little update,
https://github.com/knakahara/netbsd-src/pull/8
Thanks! One tiny nit left: why pktq_rps_hash(&pktq_rps_hash_default,
...) in dev/pci/xmm7360.c while all the others get their own sysctl?
Otherwise, LGTM.
(A legitimate answer is `I don't have the hardware to test changes to
xmm7360.c, so someone else will have to do it.')
(Maybe wg(4) should also have the pktq_rps_hash sysctl treatment too;
curcpu()->ci_index with a comment `// pktq_rps_hash(m)'.)
--
//////////////////////////////////////////////////////////////////////
Internet Initiative Japan Inc.

Device Engineering Section,
Product Division,
Technology Unit

Kengo NAKAHARA <k-***@iij.ad.jp>



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...