vmstat: is the key metric page-in numbers or page-out numbers?

vmstat: is the key metric page-in numbers or page-out numbers?

Post by Bob Harfo » Thu, 06 Jun 2002 05:53:53



Here's a quote from Bill Hassell, Hewlett-Packard Response Center, on
this usenet group:  "Yes, but it is very important to zero the stats
and measure the
page outs over a limited period. Page in always includes program
start up as well as return from swap so it isn't useful in performance
measurements.  Page outs are real writes to swap...but if 32,000
pages were swapped out over the last year, you don't have a
problem.  32,000 in one day means you are seriously short on RAM."

Here's a quote from Donald Burleson in "Oracle High-Performance Tuning
with Statspack" (page 95):  "In sum, page-out operations are a normal
part of virtual memory operation, but page-in (pi) operations indicate
that the server has excessive RAM demands."

So who's right?  Is being short on RAM indicated by a high pi number
or a high po number?  Also, what exactly is too high -- e.g., a 100
page outs over 10 minutes?

Thanks.

--Bob Harford

Oracle/data warehousing consultant (not UNIX SA)

 
 
 

vmstat: is the key metric page-in numbers or page-out numbers?

Post by ECSta » Thu, 06 Jun 2002 13:04:12


Robert,

Both are true. Though I think Bill's is the more practical answer.

Page-out's are a normal part of VM and occur when you're short of free memory
(or to a lesser extent for certain memory mapped file operations).

Page-in's are more normal :-) and also an indicator of memory demands but not
necessarily memory shortfalls.

For your subjective question as to which is the key metric, for most admins
it'll be page out's as an indicator of memory shortfalls.

As Bill pointed out, use the frequency of page out's to guage the severity of
the memory shortfall.

An interesting and related HP Measureware metric is GBL_MEM_QUEUE - indicating
the number of processes during the given interval that are blocked while
waiting for pages to come back from disk. In other words - an indicator of what
affect the paged out memory has on system performance.

Maybe you've got X-hundred Mb's paged out to disk. But the system will tend to
page out memory that has been less recently accessed. And memory that's not as
often needed should have less impact on system performance when it's paged out.

In vmstat I also like to watch the "re" (reclaim) value - memory that was paged
out but then had to be reclaimed ie the system made a not so good guess in
those cases.

Regards,

Eric Stahl


>Here's a quote from Bill Hassell, Hewlett-Packard Response Center, on
>this usenet group:  "Yes, but it is very important to zero the stats
>and measure the
>page outs over a limited period. Page in always includes program
>start up as well as return from swap so it isn't useful in performance
>measurements.  Page outs are real writes to swap...but if 32,000
>pages were swapped out over the last year, you don't have a
>problem.  32,000 in one day means you are seriously short on RAM."

>Here's a quote from Donald Burleson in "Oracle High-Performance Tuning
>with Statspack" (page 95):  "In sum, page-out operations are a normal
>part of virtual memory operation, but page-in (pi) operations indicate
>that the server has excessive RAM demands."

>So who's right?  Is being short on RAM indicated by a high pi number
>or a high po number?  Also, what exactly is too high -- e.g., a 100
>page outs over 10 minutes?

>Thanks.

>--Bob Harford

>Oracle/data warehousing consultant (not UNIX SA)


 
 
 

vmstat: is the key metric page-in numbers or page-out numbers?

Post by b.. » Thu, 06 Jun 2002 20:14:49



> Here's a quote from Donald Burleson in "Oracle High-Performance Tuning
> with Statspack" (page 95):  "In sum, page-out operations are a normal
> part of virtual memory operation, but page-in (pi) operations indicate
> that the server has excessive RAM demands."

  Not correct for HP-UX.  Page-in is defined by the operating system
  as returning swapped pages back to memory *OR* bringing in new
  programs into memory--there is no way to tell the difference.
  So if you start 100 programs that each require 100 pages of RAM
  then page-in will show 10,000 page-in pages.  Not an indicator
  of memory pressure unless you know that no new programs were
  started during the measurement period.

  However, page-out means that real memory pressure has occurred, that
  RAM had to be freed in order to continue.  Just to muddy the waters
  slightly, memory mapped files (if used by processes) can generate
  some page-outs too.

Quote:> So who's right?  Is being short on RAM indicated by a high pi number
> or a high po number?  Also, what exactly is too high -- e.g., a 100
> page outs over 10 minutes?

  Page-out is the only useful metric as it is a sure indicator that
  memory is being moved from RAM to swap.  100 page-outs in 10 minutes
  isn't worth measuring.  100 page-outs in 1 second may be a problem
  *BUT* ignore that burst rate if it only occurs every few minutes.
  On the other hand, that 100 page-out/minute rate, if sustained for
  10 minutes, would be 60,000 pages and indeed indicates a massive
  memory limitation.  

  This would occur if two or more programs need
  more than half of RAM and the programs truly run all day (as opposed
  to interactive programs like shells that wait on slow humans most
  of the day).  If both programs run for 4 hours by themselves (no
  paging), then running them both at the same time could take as much
  as a *WEEK* to run at the same time.  That's because every few
  seconds, one program is deactivated, then pageas are removed until
  there is enough room to bring in the missing pages for program 2,
  then program 2 runs for a few seconds, and so on. The programs will
  be spending most of their time moving back and forth in the swap
  area.

  Now interactive programs are very different.  They spend most of their
  time waiting on I/O (the slow carbon-based life form at the keyboard).
  I ran a system with as many as 260 copies of LaserROM (the predecessor
  to HP's Instant Information CD) on a small D370 with only 128 megs of
  RAM.  Each copy of LaserROM needed 2-8 megs of RAM, average=3megs or
  780 megs of RAM to run everything.  Yet these 200+ users were very
  pleased with the performance because it took only a second or two to
  wake up and while the user interacted with the program, they saw
  fullspeed performance.  True, pages were removed from other programs
  but those users were on the phone or getting coffee, etc.

  So if you see an occasional page-out, ignore the metric.  For Oracle,
  you should be maximizing SGA to accomodate in-core sorts (no temp
  sort area on disk), creating a large cache for data, etc, and
  reading the details about shared memory in the memory and process
  management white papers found in /usr/share/doc.  If you are
  running 64bit Oracle, there's nothing to consider--just ask for
  several Gb of shared memory and you're home free.

  But for 32bit Oracle (just like any other 32 bit program), having 10
  or 20 Gb of RAM is meaningless since a process can't address more than
  960 megs in the data areas without special handling (hint:  look for
  the word MAGIC in the docs).  Even then, shared memory is limited to
  about 1750 megs but also subject to fragmentation caused by multiple
  uses of this one memory map.  A workaround is to use memory windows to
  eliminate fragmentation issues.  The white paper on memory windows is
  also found in /usr/share/doc but if you don't see it, then you're
  system is not patched correctly.

--

Bill 'shortsig' Hassell, HP Remote Engineering Services

 
 
 

vmstat: is the key metric page-in numbers or page-out numbers?

Post by Bob Harfo » Fri, 07 Jun 2002 05:12:06


Bill,

Many thanks indeed for taking the time to write a definitive answer.
Your answer on the rate of page outs is particularly appreciated.

As someone who has been charged with tuning a problematic database for
the last several months, it's a bit frustrating to coming across
advice like "check that the OS page outs don't get too high, or add
more memory to Oracle if too many data dictionary reads are coming
from disk."  This type of advice is obviously useless since the
measurement time length and a good benchmark aren't provided.

--Bob Harford
Web Data Access, data warehousing consultant