Discussion:
Checking mbuf cluster usage
(too old to reply)
Edgar Fuß
2022-04-12 10:18:02 UTC
Permalink
Hello.

I just had a network hickup on our gateway, probably due to "mclpool limit
reached; increase kern.mbuf.nmbclusters". I should probably monitor that,
so I get warned before the limit is reached and the gateway stops functioning.

But how do I do that? netstat -m reports mbufs allocated, while
kern.mbuf.nmbclusters is about, well, mbuf clusters. How do these values
corelate?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-04-12 10:57:08 UTC
Permalink
vmstat -m should tell you (check mclpl)
Ah, thanks. And the number in use is Requsts minus Releases?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-04-26 16:15:26 UTC
Permalink
vmstat -m should tell you (check mclpl)
OK, thanks.

For the bounded pools, how do I calculate the relative usage?

I have

Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
[...]
mclpl 2048 108878 0 108234 10005 9679 326 1471 4 261858 4
[...]

so I guess the number I'm looking for is some term in (108878-108234), 2048, 326 and 261858.

I have kern.mbuf.nmbclusters = 523716, but fail to corelate that with any of the numbers above.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael van Elst
2022-04-26 21:00:07 UTC
Permalink
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
mclpl 2048 108878 0 108234 10005 9679 326 1471 4 261858 4
so I guess the number I'm looking for is some term in (108878-108234), 2048, 326 and 261858.
I have kern.mbuf.nmbclusters = 523716, but fail to corelate that with any of the numbers above.
This is simple:

261858 * 2 = 523716

or in detail with MCLBYTES = 2048:

nmbclusters * MCLBYTES = Maxpg * PageSz

where PageSz is shown by vmstat -mW as 4096. This is the "pool page size",
often but not necessarily identical to the CPU page size.



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael van Elst
2022-04-27 16:54:47 UTC
Permalink
Post by Edgar Fuß
Thanks.
That 2048 is from the "Size" column, right?
Vice versa :) MCLBYTES is the definition and the pool is create with an
item size of MCLBYTES.
Post by Edgar Fuß
Post by Michael van Elst
where PageSz is shown by vmstat -mW as 4096.
Oh, I didn't know about -W.
What exactly is the "Util" column in -W output?
(Requests - Releases) * Size / (Npage * PageSz).

Pools are allocated in "pool pages", so that's the
ratio of used memory vs. memory taken from the VM
system.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-04-26 21:10:13 UTC
Permalink
Thanks.
That 2048 is from the "Size" column, right?
Post by Michael van Elst
where PageSz is shown by vmstat -mW as 4096.
Oh, I didn't know about -W.

What exactly is the "Util" column in -W output?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-04-27 14:56:04 UTC
Permalink
Post by Edgar Fuß
I just had a network hickup on our gateway, probably due to "mclpool limit
reached; increase kern.mbuf.nmbclusters". I should probably monitor that,
so I get warned before the limit is reached and the gateway stops functioning.
So I'm doing that now and during a few hours, usage rose from 27% to 42%
and now 43%. At least, I now can re-boot before the gateway becomes
non-funtional.

Are there any known memory leaks in 8.2? I'm heavily using IPFilter on that
gateway, but it never ran out of mbuf cluster during years of operation and
then twice within a few weeks without any major change to the configuration
or filter rules.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Mouse
2022-04-27 15:06:20 UTC
Permalink
I just had a network [hiccup] on our gateway, probably due to
"mclpool limit reached; increase kern.mbuf.nmbclusters". [...]
Are there any known memory leaks in 8.2? I'm heavily using IPFilter
on that gateway, but it never ran out of mbuf cluster during years of
operation and then twice within a few weeks without any major change
to the configuration or filter rules.
It might or might not be relevant: one of my 1.4T/sparc machines rarely
survives more than about 50 days of uptime without crashing "malloc:
out of space in kmem_map". I have so far not managed to track down the
leak (when an edit-test-debug cycle takes two months, debugging is not
fast), so it's pure speculation that it even has anything to do with
networking.

But it might be worth at least considering the possibility that what
you're seeing is a very old issue.

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-04-27 19:42:57 UTC
Permalink
Another idea: can I get more details on what those mbuf clusters are used for?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael van Elst
2022-04-27 20:06:51 UTC
Permalink
Post by Edgar Fuß
Another idea: can I get more details on what those mbuf clusters are used for?
An mbuf stores 256 bytes that is used for data and some metadata
of a network packet (some systems have 512 bytes instead of 256).

For network packets that are larger, only the metadata is kept
in the mbuf, but the data is stored externally in a cluster of 2048
bytes, so you need 1 mbuf + 1 cluster.

Input data packets are often stored in clusters to easily accomodate
arbitrary ethernet packets (usually 1514 bytes or a bit larger).


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-04-29 14:49:18 UTC
Permalink
I decided to re-boot again at 75%. One oberservation was that I had zero
mcpl Releases on the gateway, while I do have Releases on other machines.

Can this be a locking problem, i.e., is there a lock preventing mbuf clusters
from being released?

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael van Elst
2022-04-30 10:29:33 UTC
Permalink
However the system is running just fine and happily forwarding traffic
and has been for the last 160 days. Also fur the duration of the time
I've been writing this e-mail that number has remained totally constant
even though I've been interacting with it over SSH.
There is only pressure on the pool caches when the system runs out
of memory. Then the pool caches will be invalidated, and all
unused items will be returned to the pools. If that frees
pool pages, these will be returned to the VM system.

E.g.:

Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
mclpl 2048 45 0 0 27 0 27 27 0 129204 4

[ ... Fetching a program that just allocates all memory and running it. ... ]

Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
mclpl 2048 50 0 45 27 4 23 27 0 129204 20


The interesting part is why the cache needed 27 pages (at 2 items
per page, you'd need 23) and why the remaining 50-45 items still
need 23 pages (worst case should be 5 pages for 5 items).



--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael van Elst
2022-05-01 12:58:52 UTC
Permalink
Post by Michael van Elst
There is only pressure on the pool caches when the system runs out
of memory.
I don=E2=80=98t get that. What if the mbuf cluster pool runs full? Something=
must lead to Reclaims, because I observe them on other machines.=
If the cluster pool is full, you hit a limit and the allocation fails.

When you free a cluster, it's given back to the cache layer (so that the
next cluster allocation is fast). The cache layer may eventually give
it to the pool, and the pool may release it to the VM system, but that
only happens when the system runs out of memory and tries to reclaim
such unused space.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-05-01 13:20:35 UTC
Permalink
Post by Michael van Elst
When you free a cluster
So we‘re back on Square One: I don‘t observe any Reclaims (that‘s what free means, no?) on that machine. One of my questions was whether that could be a locking error.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Michael van Elst
2022-04-29 15:17:00 UTC
Permalink
Post by Edgar Fuß
I decided to re-boot again at 75%. One oberservation was that I had zero
mcpl Releases on the gateway, while I do have Releases on other machines.
Can this be a locking problem, i.e., is there a lock preventing mbuf clusters
from being released?
My router has also zero releases in mclpl (and 38 requests).

Reason is that the mclpl pool is a "pool cache", things get allocated
from the pool but only lazily freed to the pool (and to the VM system)
when there is a memory shortage. A system like a router rarely requires
lots of memory, so it's possible that things never get freed.

With vmstat -C you see the cache statistics.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-04-29 15:25:25 UTC
Permalink
Post by Michael van Elst
Reason is that the mclpl pool is a "pool cache", things get allocated
from the pool but only lazily freed to the pool (and to the VM system)
when there is a memory shortage.
Ah, thanks for the explanation. But things should also be freed in case
the pool cache runs full, no? What I observe is, that (in the trigegred-by-
something-unknown case that it massively grows) the mbuf cluster pool
runs full up to the point where the machine's network becomes non-operational.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Edgar Fuß
2022-05-01 11:55:46 UTC
Permalink
Post by Michael van Elst
There is only pressure on the pool caches when the system runs out
of memory.
I don‘t get that. What if the mbuf cluster pool runs full? Something must lead to Reclaims, because I observe them on other machines.

--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-***@muc.de
Loading...