Problems with mbuf allocation

Problems with mbuf allocation

Post by David A. L » Wed, 07 Oct 1992 10:24:50

I am experencing a very anoying problem with the mbuf routines on AIX.
Our device driver uses mbuf clusters, and lots of them.
Typical load may only need 50 clusters at a time, but under peak
conditions, we may need up to 1000 mbuf 4k clusters.    
The problem is that I cannot reliably set the mbuf pool to the ammount
I need.    Changing the value in smit that says "Maximum # of physical
4K pages for network buffers" (or something close, I'm not at site now)
SHOULD set "thewall" but does nothing I can detect, even after reboot.
Using "no -o thewall=6144" does set "thewall" but only occasionally
have I actually seen the buffer pool increase to anywhere near "thewall".
I have 512 MB of real system memory, and I am the only user of this machine
so I doubt that real memory availible is the problem.

One day, I used "no -o thewall=6144" and actually got the expected results.
The next day, (after a reboot, and REsetting the wall) it didn't work!
Our device driver is registering expected and low-water marks when
It is configured, and I see the change reflected in "no -a" output.
However, the failure mode is that m_getclust() waits forever (M_WAIT)
and netstat -m reports increasing memory request denied/delayed counts,
then NFS, tcpip, and X-Windows hang and I have to reboot the system.

I am going to call austin tomorrow to try to get to the bottom
of this "feature" , but would appreciate anyone with suggestions about
what I could be doing wrong, or what else I could try.  If I cant get
this fixed, I will be forced to xmalloc() a huge chunk of memory at
config time, but I really dont want to do this.

David Lee


Problems with mbuf allocation

Post by Tom Trusco » Thu, 08 Oct 1992 11:12:11

mbuf tuning is explained in "AIX V3.2 ... Performance Monitoring
and Tuning Guide", order # SC23-2365.
Look in the index in the back under "mbuf".
I think a version of this info was posted to this newsgroup a while back.
Here is a shell script for tuning the mbuf pools for a busy NFS system,
hand typed, sorry.


no -o thewall=10000

# set minimum number of small mbufs
no -o lowmbuf=3000

#generate traffic to force pool expansion
ping 1000 1 > /dev/null

#restore default lowmbuf to prevent netm thrashing
no -d lowmbuf

#set max # of free clusters (about 6MB)
no -o mb_cl_hiwat=1500

#gradually expand cluster pool
while [ $N -lt 1500 ]
        no -o lowclust=$N
        ping 1000 1 > /dev/null
        let N=N+10

#restore default lowclust to prevent netm thrashing
no -d lowclust


Run "netstat -m" to see if you have enough clusters ("mapped pages").

This does seem awfully kludgy:
0.  I have no idea why a loop is needed.

1.  "no" itself should trigger mbuf allocation (or deallocation),
one should not need to "ping".

2.  There are two missing options: "minclust" which is the minimum
# of clusters that should be present (whether free or allocated),
and "minmbuf" which is ditto for mbufs.
Then you would be able to do:
    no -a minclust=1000
and that would be that.  Internally the system would calculate:
    effective_minclust = max(minclust, lowclust);
    effective_mb_cl_hiwat = max(mb_cl_hiwat, effective_minclust**);
        ** perhaps increased by some amount to avoid thrashing?
    /* since there should be at least as many mbufs as clusters ... */
    effective_lowmbuf = max(lowmbuf, lowclust);
    effective_minmbuf = max(minmbuf, effective_lowmbuf);
    /* hmmm, maybe there needs to be an "mb_hiwat" option too ... */

3.  The options are badly named.
Isn't "maxfreeclust" clearer than mb_cl_hiwat,
and "minfreeclust" clearer than "lowclust"?
And it would be nice if the kernel provided a general mechanism
for getting/setting options, rather than having each subsystem
provide their own.  But that is getting carried away I supposed.

Tom Truscott


Problems with mbuf allocation

Post by Curt Finch 903 2F021 c.. » Thu, 08 Oct 1992 00:47:41 (David A. Lee) writes:

>I am experencing a very anoying problem with the mbuf routines on AIX.
>Our device driver uses mbuf clusters, and lots of them.
>Typical load may only need 50 clusters at a time, but under peak
>conditions, we may need up to 1000 mbuf 4k clusters.    

You should be able to have your problem solved with a 5 minute phone
conversation with level 2 or level 3 people.

The smit portion of configing the mbuf pool did not work but was fixed
a few months ago. Proper configing of mbuf pool should start by reading
Network Tuning Guide included below.

Changing the wall does not effect the pool at all. It merely sets a


                       AIX 3.2 Network Tuning Guide

                              Revision: 1.0

                              April 29, 1992

                        IBM AIX System Performance
                            11400 Burnet Road
                             Austin, TX 78758

                                                     April 29, 1992

       Revision History

       Revision 1.0

                                                     April 29, 1992

       1.  Tuning the memory buffer (mbuf) pools

       1.1  Why tune the mbuf pools

       The network subsystem uses a memory management facility that
       revolves around a data structure called an "mbuf".  Mbufs
       are mostly used to store data for incoming and outbound
       network traffic.  Having mbuf pools of the right size can
       have a very positive effect on network performance. If the
       mbuf pools are configured improperly, both network and
       system performance can suffer.  AIX offers the capability
       for run-time mbuf pool configuration. With this convenience
       comes the responsibility for knowing when the pools need
       adjusting and how much they should be adjusted.

       1.2  Overview of the mbuf management facility

       The mbuf management facility controls two pools of buffers:
       a pool of small buffers (256 bytes each), which are simply
       called "mbufs", and a pool of large buffers (4096 bytes
       each), which are usually called "mbuf-clusters" or just
       "clusters". The pools are created from system memory by
       making an allocation request to the Virtual Memory Manager
       (VMM). The pools consist of pinned pieces of virtual memory;
       this means that they must always reside in physical memory
       and are never paged out. The result is that the real memory
       available for paging-in application programs and data has
       been decreased by the amount that the mbuf pools have been
       increased. This is a non-trivial cost that must always be
       taken into account when considering an increase in the size
       of the mbuf pools.

       The initial size of the mbuf pools is system-dependant.
       There is a minimum number of (small) mbufs and clusters
       allocated for each system, but these minimums are increased
       by an ammount that depends on the specific system
       configuration.  One factor affecting how much they are
       increased is the number of communications adapters in the
       system. The default pool sizes are initially configured to
       handle small to medium size network loads (network traffic
       100-500 packets/second). The pool sizes dynamically increase
       as network loads increase. The cluster pool size is reduced
       as network loads decrease.  The mbuf pool is never reduced.
       To optimize network performance, the administrator should
       balance mbuf pool sizes with network loads (packets/second).
       If the network load is particularly oriented towards UDP
       traffic (e.g. NFS server) the size of the mbuf pool should
       be 2 times the packet/second rate. This is due to UDP

                                    1                Network Tuning

                                                     April 29, 1992

       traffic consuming an extra small mbuf.

       To provide an efficient mbuf allocation service, an attempt
       is made to maintain a minimum number of free buffers in the
       pools at all times. The following network options (which can
       be manipulated using the no command)  are used to define
       these lower limits:

          o lowmbuf

          o lowclust

       The lowmbuf option controls the minimum number of free
       buffers for the mbuf pool. The lowclust option controls the
       minimum number of free buffers for the cluster pool.  When
       the number of buffers in the pools drop below the lowmbuf or
       lowclust thresholds the pools are expanded by some amount.
       The expansion of the mbuf free pools is not done
       immediately, but is scheduled to be done by a kernel process
       with the process name of "netm".  When netm is dispatched,
       the pools will be expanded to meet the minimum requirements
       of lowclust and lowmbuf. Having a kernel process do this
       work is required by the structure of the VMM.

       An additional function that netm provides is to limit the
       growth of the cluster pool. The network option that defines
       this maximum value is:

          o mb_cl_hiwat

       The mb_cl_hiwat option controls the maximum number of free
       buffers the cluster pool can contain. When the number of
       free clusters in the pool exceeds mb_cl_hiwat, netm will be
       scheduled to release some of the clusters back to the VMM.

       The last network option that is used by the mbuf management
       facility is

          o thewall

       The thewall option controls the maximum RAM (in K bytes)
       that the mbuf management facility can allocate from the VMM.
       This option is used to prevent unbalanced VMM resources
       which result in poor system performance.

                                                     April 29, 1992

       1.3  When to tune the mbuf pools

       When and how much to tune the mbuf pools is directly related
       to the network load a given machine is being subjected to. A
       server machine that is supporting many clients is a good
       candidate for having the mbuf pools tuned to optimize
       network performance.  It is important for the system
       administrator to understand the networking load for a given
       system. By using the netstat command you can get a rough
       idea of the network load in packets/second. For example:
       netstat -I tr0 1 reports the input and output traffic for
       both the tr0 network interface and for all network
       interfaces on the system. The output below shows the
       activity caused by a large ftp operation:

           input   (tr0)     output          input  (Total)    output
       packets errs  packets errs  collspackets errs  packets errs  colls
       183     0     349     0     0     183     0     349     0     0
       183     0     353     0     0     183     0     353     0     0
       203     0     380     0     0     203     0     380     0     0
       189     0     363     0     0     189     0     363     0     0
       158     0     293     0     0     158     0     293     0     0
       191     0     365     0     0     191     0     365     0     0
       179     0     339     0     0     179     0     339     0     0

       The netstat command also has an option, -m, that gives
       detailed information about the use and availability of the
       mbufs and clusters

       182 mbufs in use:
               17 mbufs allocated to data
               2 mbufs allocated to packet headers
               60 mbufs allocated to socket structures
               83 mbufs allocated to protocol control blocks
               11 mbufs allocated to routing table entries
               6 mbufs allocated to socket names and addresses
               3 mbufs allocated to interface addresses
       16/54 mapped pages in use
       261 Kbytes allocated to network (41% in use)
       0 requests for memory denied
       0 requests for memory delayed
       0 calls to protocol drain routines

       The line that begins "16/54 mapped pages..." indicates that
       there are 54 pinned clusters, of which 16 are currently in
       use. If the "requests for memory denied" value is nonzero,

                                    3                Network Tuning

                                                     April 29, 1992

       the mbuf and/or cluster pools may need to be expanded.

       This report can be compared against the existing system
       parameters by issuing the command no -a which reports all of
       the current settings (the following report has been

                        lowclust = 29
                         lowmbuf = 88
                         thewall = 2048
                     mb_cl_hiwat = 58

       It is clear that on the test system the "261 Kbytes
       allocated to the network" is considerably short of thewall
       value of 2048K and the (64-16 = 38) free clusters are short
       of mb_cl_hiwat limit of 58.

       The "requests for memory denied" counter is maintained by
       the mbuf management facility and is incremented each time a
       request for an mbuf allocation cannot be satisfied.
       Normally the  "requests for memory denied" value will be
       zero. If a system experiences a high burst of network
       traffic, the default configured mbuf pools will not be
       sufficient to meet the demand of the incoming burst, causing
       the error counter to be incremented once for each mbuf
       allocation request that fails. Usually this is in the
       thousands due to the large number of packets arriving all at
       once. The request for memory denied statistic will
       correspond with dropped packets on the network. Dropped
       network packets mean re-transmissions, resulting in degraded
       network performance.  If the "requests for memory denied"
       value is greater than zero it may be appropriate to tune the
       mbuf parameters -- see "How to tune the mbuf Pools", below.

       The "Kbytes allocated to the network" statistic is
       maintained by the mbuf management facility and represents
       the current amount of

read more »