Discussion:
SoC ideas - mbuf API
(too old to reply)
Pavel Cahyna
2006-06-21 07:47:49 UTC
Permalink
I would like to present some proposals about planned modifications to the
mbuf API for my SoC project.

In short, I think that the current mbuf code makes too difficult to write
correct code, and too easy to make mistakes, as noted in this post:

http://permalink.gmane.org/gmane.os.netbsd.devel.network/7299

My proposal precises my reply in that thread:

http://permalink.gmane.org/gmane.os.netbsd.devel.network/7302

I see basically two problems with the current mbuf API:

The 1st problem: if the code does not use m_pullup/pulldown when it should, it
works in most cases, but when it encounters a packet fragmented in a right
way, it malfunctions.

Example: kern/29014.

The 2nd problem: mbufs can be read-only, and there are cases when they are
overwritten without checking for writability. See the above-referenced
thread for an example, or PR kern/33162.

Moreover, m_pullup always copies from the mbuf cluster, even if the data
are contiguous, for historical reasons, so the current code avoids calling
it if not necessary making it even more complicated. The result looks like:

if (m->m_len < sizeof(struct ether_header)) {
m = m_pullup(m, sizeof(struct ether_header));
if (m == NULL)
return NULL;
}
eh = mtod(m, struct ether_header *);

Such code is present in about hundred of places in the kernel.

Solution: simplify this common idiom. Let's make a mptr(m, type) macro which
would be used like:

eh = mptr_pullup(m, struct ether_header, 0)
if (eh == NULL)
recovery from error;


where the mptr_pullup(struct mbuf *m, type, off)

and it would return the pointer to data in m starting at off, cast to
const type * (in tis case to const struct ether_header *), and make the
mbuf contiguous for sizeof (type) bytes, like m_pulldown does.

Then deprecate mtod from all the code which does not have to deal with mbuf
internals.

For the code which "knows" that the packet is already contiguous it would
be enough to have a mptr() which would not do m_pulldown but check
and panic if DIAGNOSTIC. (The goal is to pull-up early and make this the
common case.)

For most of the uses the version returning const should be enough. Code
which needs a writable pointer to the mbuf could use a mptr_rw macro,
which would do a m_makewritable on the returned region. Or such code
should be converted to m_copyback, then mptr_rw wouldn't be needed. There
shouldn't be many places where writing to mbufs is necessary.

Comments? What do people think about the naming of the proposed functions?

Pavel Cahyna

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Pavel Cahyna
2006-06-26 09:45:42 UTC
Permalink
Post by Pavel Cahyna
Solution: simplify this common idiom. Let's make a mptr(m, type) macro which
should have been "mptr(m, type, off) and mptr_pullup(m, type, off)"
Post by Pavel Cahyna
eh = mptr_pullup(m, struct ether_header, 0)
if (eh == NULL)
recovery from error;
mptr_pullup can need to free the mbuf m and allocate a new one. As it has
to be a macro, it could do it and change the "m" argument. But to provide
a more function-like semantics, it would be better to pass a pointer to
the mbuf pointer, so the "prototype" would look like:

const type * mptr_pullup(struct mbuf **mp, type, int off).

In general, functions which may allocate a new mbuf and free the old one
should IMHO have a similar prototype, instead of accepting a mbuf pointer
and returning a new one like m_pullup does, to reduce risks of using the
old, invalid pointer.

The mptr variand which does not move data does not need it, but I would
like to change it to the same "prototype" for consistency.

Another detail is error handling. Currently m_pullup/pulldown consider a
too short mbuf chain an error and free it, returning NULL, just as they do
when failing to allocate memory. With the proposed macros, it is possible
to report errors in a more fine-grained way. When the packet is too short,
the macros could return NULL without altering the mbuf chain, while on
allocation failure, they could both return NULL and free the mbuf chain,
while setting the mbuf pointer to NULL.

A question arises, what should the mptr variant (which does not do an
equivalent of m_pullup) do when the packet is too short, but contiguous?
Should it panic, so that the responsibility for checking is left to the
caller, or should it rather report the error to the caller?

The implementation I'm working on is flexible enough to allow quickly
changing such details, so I think such decisions should be postponed until
I get more experience with the needs of the consumers of this API.

Pavel Cahyna

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2006-06-26 10:04:52 UTC
Permalink
Post by Pavel Cahyna
The 1st problem: if the code does not use m_pullup/pulldown when it should, it
works in most cases, but when it encounters a packet fragmented in a right
way, it malfunctions.
Example: kern/29014.
i've played with some debug code in mtod() while ago. (a patch attached)

YAMAMOTO Takashi
Pavel Cahyna
2006-06-26 10:23:00 UTC
Permalink
Post by YAMAMOTO Takashi
Post by Pavel Cahyna
The 1st problem: if the code does not use m_pullup/pulldown when it should, it
works in most cases, but when it encounters a packet fragmented in a right
way, it malfunctions.
Example: kern/29014.
i've played with some debug code in mtod() while ago. (a patch attached)
Thanks. My ideas were similar. But I decided to replace the pointer type
in mtod by the pointed-to type, because otherwise there is the nonobvious
requirement of not using void*. Also it is easy to apply const in the
macro itself.

Pavel Cahyna

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2006-06-27 16:54:21 UTC
Permalink
Post by Pavel Cahyna
I would like to present some proposals about planned modifications to the
mbuf API for my SoC project.
In short, I think that the current mbuf code makes too difficult to write
http://permalink.gmane.org/gmane.os.netbsd.devel.network/7299
http://permalink.gmane.org/gmane.os.netbsd.devel.network/7302
The 1st problem: if the code does not use m_pullup/pulldown when it should, it
works in most cases, but when it encounters a packet fragmented in a right
way, it malfunctions.
Example: kern/29014.
The 2nd problem: mbufs can be read-only, and there are cases when they are
overwritten without checking for writability. See the above-referenced
thread for an example, or PR kern/33162.
Moreover, m_pullup always copies from the mbuf cluster, even if the data
are contiguous, for historical reasons, so the current code avoids calling
if (m->m_len < sizeof(struct ether_header)) {
m = m_pullup(m, sizeof(struct ether_header));
if (m == NULL)
return NULL;
}
eh = mtod(m, struct ether_header *);
Such code is present in about hundred of places in the kernel.
Solution: simplify this common idiom. Let's make a mptr(m, type) macro which
eh = mptr_pullup(m, struct ether_header, 0)
if (eh == NULL)
recovery from error;
where the mptr_pullup(struct mbuf *m, type, off)
and it would return the pointer to data in m starting at off, cast to
const type * (in tis case to const struct ether_header *), and make the
mbuf contiguous for sizeof (type) bytes, like m_pulldown does.
Then deprecate mtod from all the code which does not have to deal with mbuf
internals.
For the code which "knows" that the packet is already contiguous it would
be enough to have a mptr() which would not do m_pulldown but check
and panic if DIAGNOSTIC. (The goal is to pull-up early and make this the
common case.)
For most of the uses the version returning const should be enough. Code
which needs a writable pointer to the mbuf could use a mptr_rw macro,
which would do a m_makewritable on the returned region. Or such code
should be converted to m_copyback, then mptr_rw wouldn't be needed. There
shouldn't be many places where writing to mbufs is necessary.
Careful: m_makewritable does not guarantee that the mbuf is contiguous,
only that each segment is writable. That deserves to be explicitly
mentioned in mbuf(9); I keep forgetting to add that.

Dave
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Pavel Cahyna
2006-06-27 21:29:39 UTC
Permalink
Post by David Young
Post by Pavel Cahyna
For most of the uses the version returning const should be enough. Code
which needs a writable pointer to the mbuf could use a mptr_rw macro,
which would do a m_makewritable on the returned region. Or such code
should be converted to m_copyback, then mptr_rw wouldn't be needed. There
shouldn't be many places where writing to mbufs is necessary.
Careful: m_makewritable does not guarantee that the mbuf is contiguous,
only that each segment is writable. That deserves to be explicitly
mentioned in mbuf(9); I keep forgetting to add that.
Yes, I'm aware of that. A question is if the mptr_rw macro would also
ensure a contiguous region, or if there should be a separate
mptr_rw_pullup macro. (At this point the number of possibilities becomes
large enough that it might be preferable to have just two macros, mptr and
mptr_rw, with an extra "flags" argument, and make the _pullup variants one
of the flags.)

Pavel

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2006-06-27 23:07:23 UTC
Permalink
On Jun 27, 2006, at 2:29 PM, Pavel Cahyna wrote:

If we're going to rework / add mbuf APIs, I'd like to propose
something a little more radical...

- Make mbufs an opaque type... "typedef struct mbuf *mbuf_t;", hide
the definition of "struct mbuf", and use accessors / mutators that
have well-defined semantics to make it less easy for mbuf users to
screw themselves.

- mbuf_data() would return a const void *. If a user wants a
writable buffer, then mbuf_mutable_data() would return a void *.
This would allow us to encapsulate the whole "make writable" thing,
and allow us also to keep some state in the mbuf itself to possibly
avoid unnecessary iterations over the chain (if it's already
mutable). Anyway, this would also force us to const poison the
networking code, which in turn would help the compiler generate
better code which in turn could lead to better performance. It's all
good.

Take a look at /System/Library/Frameworks/Kernel.framework/Headers/
sys/kpi_mbuf.h on a Mac OS X 10.4 system for what I think is a good
starting point for where I think we should go in this regard.
Post by Pavel Cahyna
Post by David Young
Post by Pavel Cahyna
For most of the uses the version returning const should be
enough. Code
which needs a writable pointer to the mbuf could use a mptr_rw macro,
which would do a m_makewritable on the returned region. Or such code
should be converted to m_copyback, then mptr_rw wouldn't be
needed. There
shouldn't be many places where writing to mbufs is necessary.
Careful: m_makewritable does not guarantee that the mbuf is
contiguous,
only that each segment is writable. That deserves to be explicitly
mentioned in mbuf(9); I keep forgetting to add that.
Yes, I'm aware of that. A question is if the mptr_rw macro would also
ensure a contiguous region, or if there should be a separate
mptr_rw_pullup macro. (At this point the number of possibilities becomes
large enough that it might be preferable to have just two macros, mptr and
mptr_rw, with an extra "flags" argument, and make the _pullup
variants one
of the flags.)
Pavel
-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Pavel Cahyna
2006-06-27 23:19:33 UTC
Permalink
Post by Jason Thorpe
If we're going to rework / add mbuf APIs, I'd like to propose
something a little more radical...
- Make mbufs an opaque type... "typedef struct mbuf *mbuf_t;", hide
the definition of "struct mbuf", and use accessors / mutators that
have well-defined semantics to make it less easy for mbuf users to
screw themselves.
- mbuf_data() would return a const void *. If a user wants a
writable buffer, then mbuf_mutable_data() would return a void *.
This would allow us to encapsulate the whole "make writable" thing,
and allow us also to keep some state in the mbuf itself to possibly
avoid unnecessary iterations over the chain (if it's already
mutable). Anyway, this would also force us to const poison the
networking code, which in turn would help the compiler generate
better code which in turn could lead to better performance. It's all
good.
Isn't it mostly what I have proposed? I have proposed mptr_rw returning
writable pointer, and mptr... returning const pointer.

A difference is that the proposed macros would also do the cast to the
required data type, to be more similar to the current mtod() and at the
same time to be able to use sizeof on this type.

Pavel

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2006-06-27 23:25:51 UTC
Permalink
Post by Pavel Cahyna
Isn't it mostly what I have proposed? I have proposed mptr_rw
returning
writable pointer, and mptr... returning const pointer.
A difference is that the proposed macros would also do the cast to the
required data type, to be more similar to the current mtod() and at the
same time to be able to use sizeof on this type.
You proposed macros, which I don't like so much for this. I would
also rather pass in a real length, rather than using sizeof magic
inside the macro itself.

I also would prefer different names other than mptr*(). I'm not
really much of a fan of the current m_*() names either :-)

-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Pavel Cahyna
2006-09-03 21:23:06 UTC
Permalink
Hello,

back to this old topic, as SoC is finished and time for integrating the
project came...
Post by Pavel Cahyna
Isn't it mostly what I have proposed? I have proposed mptr_rw returning
writable pointer, and mptr... returning const pointer.
A difference is that the proposed macros would also do the cast to the
required data type, to be more similar to the current mtod() and at the
same time to be able to use sizeof on this type.
You proposed macros, which I don't like so much for this. I would also
rather pass in a real length, rather than using sizeof magic inside the
macro itself.
I'm not a big fan of macro magic. But I think that in this case it is
justified. I wanted to have something to replace mtod with additional
sanity checks, similar to what yamt@ suggested for mtod. For this, a
function which returns void * and accepts a real length is not suitable,
because it won't automatically guarantee that the length is sensible.

In some cases a function accepting real length is preferable, I have added
this.

My experience with using the new API is that it is well suited for
replacing most of the current uses of mtod.

BTW I have looked at the Mac OS X mbuf KPI and wasn't impressed. There is
not much innovation (renaming struct mbuf * - > mbuf_t, m_ -> mbuf_ does
not count). I don't see what is gained by replacing the direct access to
struct mbuf members by accessor functions, especially when the semantics is
not well defined (how do the function that change mbuf length interact
with the packet length?), or not well chosen (mbuf_setdata? why does it
accept a pointer when it must set the data pointer to data inside the mbuf
storage?)
I also would prefer different names other than mptr*(). I'm not really
much of a fan of the current m_*() names either :-)
Let's discuss it in a separate thread.

Pavel

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2007-04-21 12:45:03 UTC
Permalink
[ replying to a little old mail ]
Post by Pavel Cahyna
I also would prefer different names other than mptr*(). I'm not really
much of a fan of the current m_*() names either :-)
Let's discuss it in a separate thread.
Pavel
has it been discussed since then?
i also prefer names other than mptr*.

YAMAMOTO Takashi

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Pavel Cahyna
2007-04-21 20:00:23 UTC
Permalink
Post by YAMAMOTO Takashi
[ replying to a little old mail ]
Post by Pavel Cahyna
I also would prefer different names other than mptr*(). I'm not really
much of a fan of the current m_*() names either :-)
Let's discuss it in a separate thread.
Pavel
has it been discussed since then?
I expected follow-up on
http://permalink.gmane.org/gmane.os.netbsd.devel.network/7959 , but the
amount of feedback was underwhelming.
Post by YAMAMOTO Takashi
i also prefer names other than mptr*.
Suggest something, then...

Pavel

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2007-04-22 06:16:54 UTC
Permalink
Post by Pavel Cahyna
Post by YAMAMOTO Takashi
[ replying to a little old mail ]
Post by Pavel Cahyna
I also would prefer different names other than mptr*(). I'm not really
much of a fan of the current m_*() names either :-)
Let's discuss it in a separate thread.
Pavel
has it been discussed since then?
I expected follow-up on
http://permalink.gmane.org/gmane.os.netbsd.devel.network/7959 , but the
amount of feedback was underwhelming.
i thought you were going to create a dedicated thread for the naming. :)
Post by Pavel Cahyna
Post by YAMAMOTO Takashi
i also prefer names other than mptr*.
Suggest something, then...
MBUF_PTR*().

YAMAMOTO Takashi

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Pavel Cahyna
2007-04-22 06:52:37 UTC
Permalink
Post by YAMAMOTO Takashi
Post by Pavel Cahyna
Post by YAMAMOTO Takashi
i also prefer names other than mptr*.
Suggest something, then...
MBUF_PTR*().
I would prefer M_PTR*(), to be consistent with existing mbuf macros.

Pavel

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2007-04-23 03:25:32 UTC
Permalink
Post by Pavel Cahyna
Post by YAMAMOTO Takashi
Post by Pavel Cahyna
Post by YAMAMOTO Takashi
i also prefer names other than mptr*.
Suggest something, then...
MBUF_PTR*().
I would prefer M_PTR*(), to be consistent with existing mbuf macros.
I'm ambivalent about that. The existing mbuf API / ABI needs to be
completely gutted anyway.

-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
David Young
2007-04-22 07:00:30 UTC
Permalink
Post by YAMAMOTO Takashi
MBUF_PTR*().
Pavel,

The names are fine the way they are. Words written in lowercase letters
are easier to read, apparently because of the mixed height[1].

Dave

[1] For an interesting discussion,
<http://www.tc-forum.org/topicus/ru28theu.htm>.
--
David Young OJC Technologies
***@ojctech.com Urbana, IL * (217) 278-3933

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
YAMAMOTO Takashi
2007-04-23 08:01:02 UTC
Permalink
Post by David Young
Post by YAMAMOTO Takashi
MBUF_PTR*().
Pavel,
The names are fine the way they are. Words written in lowercase letters
are easier to read, apparently because of the mixed height[1].
Dave
[1] For an interesting discussion,
<http://www.tc-forum.org/topicus/ru28theu.htm>.
--
David Young OJC Technologies
if it was something which can be written as a function, i don't care.
however, it's inherently a macro.

YAMAMOTO Takashi

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Jason Thorpe
2006-09-05 02:59:25 UTC
Permalink
Post by Pavel Cahyna
BTW I have looked at the Mac OS X mbuf KPI and wasn't impressed. There is
not much innovation (renaming struct mbuf * - > mbuf_t, m_ -> mbuf_ does
not count). I don't see what is gained by replacing the direct
access to
struct mbuf members by accessor functions, especially when the
semantics is
not well defined (how do the function that change mbuf length interact
with the packet length?), or not well chosen (mbuf_setdata? why does it
accept a pointer when it must set the data pointer to data inside the mbuf
storage?)
The main benefits of was OS X did here was:

- clean up the symbol namespace (important for having nice, stable ABIs)

- Make the mbuf type opaque (again, for stable ABIs)

-- thorpej


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...