Technical explanation of load?

Technical explanation of load?

Post by Jay Alle » Tue, 01 Oct 1996 04:00:00



Can anyone give a technical explanation of load? How is it calculated?
What does a given load number mean in terms of resources being consumed
on a given Unix box? Is load a calculation made by the kernel (I think
it is), and do different Unix systems use different algorithms to
calculate load?

If anyone can point me to a good explanation of load, on-line or
otherwise, that answers my questions, please be my guest.

-J-

 
 
 

Technical explanation of load?

Post by Mark McCullou » Wed, 02 Oct 1996 04:00:00




Quote:>Can anyone give a technical explanation of load? How is it calculated?
>What does a given load number mean in terms of resources being consumed
>on a given Unix box? Is load a calculation made by the kernel (I think
>it is), and do different Unix systems use different algorithms to
>calculate load?

The understanding I have is that the load average is really three
numbers.  It's the average number of jobs waiting for the CPU over
one minute, five minutes, and fif* minutes.  I don't think different
systems use a different calculation method, but I can't say for sure.

Oh, general rule I go by is that if the five minute load average breaks
10, or the fif* minute load average breaks five, the system is
effectively disabled and drastic steps need to be taken.  Of course
there was the time where the load average stayed at 20 all morning...

--
Mark McCullough                             Systems-Programmer


 
 
 

Technical explanation of load?

Post by Ling Wan » Wed, 02 Oct 1996 04:00:00





> >Can anyone give a technical explanation of load? How is it calculated?
> >What does a given load number mean in terms of resources being consumed
> >on a given Unix box? Is load a calculation made by the kernel (I think
> >it is), and do different Unix systems use different algorithms to
> >calculate load?

> The understanding I have is that the load average is really three
> numbers.  It's the average number of jobs waiting for the CPU over
> one minute, five minutes, and fif* minutes.  I don't think different
> systems use a different calculation method, but I can't say for sure.

Isn't load just the average number of jobs on queue +
ones blocked for resources + number of processes in the CPU?
 Or
O + R field under process state in the ps command

So a 64 CPU CS6400 with a load of 60 means it is not being
fully utilized(Of course, there is more to it than that)

vmstat's proc fields or sar -q would give also give you
the process in queue

 
 
 

Technical explanation of load?

Post by Mark McCullou » Thu, 03 Oct 1996 04:00:00




[snip]

Quote:>> The understanding I have is that the load average is really three
>> numbers.  It's the average number of jobs waiting for the CPU over
>> one minute, five minutes, and fif* minutes.  I don't think different
>> systems use a different calculation method, but I can't say for sure.

>Isn't load just the average number of jobs on queue +
>ones blocked for resources + number of processes in the CPU?
> Or

Can't be.  :-)  If my system breaks load average five, it's hard to use,
and I have a _lot_ more than 5 processes blocked for resources.  Looking
at my man pages, it looks like I was kind of right.  It says (of uptime)
that it prints "the average number of jobs in the run queue over the
last 1, 5 and 15 minutes."  I'm just used to single cpu systems, so for
a single cpu system, my explanation will give load + 1.  Of course,
multiple cpu systems would be able to handle a higher load average
reasonably, if this explanation is correct.

Quote:>O + R field under process state in the ps command

>So a 64 CPU CS6400 with a load of 60 means it is not being
>fully utilized(Of course, there is more to it than that)

I'd have to see that myself, I just have a hard time not panicking
when I do "uptime" and see 15 minute load averages around 60-100.  Who
knows, maybe it truly averages it out over each cpu...

Quote:>vmstat's proc fields or sar -q would give also give you
>the process in queue

Now if only I could fix my system so that sar would work again.  

--
Mark McCullough                             Systems-Programmer

 
 
 

Technical explanation of load?

Post by Ling Wan » Thu, 03 Oct 1996 04:00:00






> [snip]

> >> The understanding I have is that the load average is really three
> >> numbers.  It's the average number of jobs waiting for the CPU over
> >> one minute, five minutes, and fif* minutes.  I don't think different
> >> systems use a different calculation method, but I can't say for sure.

> >Isn't load just the average number of jobs on queue +
> >ones blocked for resources + number of processes in the CPU?
> > Or

> Can't be.  :-)  If my system breaks load average five, it's hard to use,
> and I have a _lot_ more than 5 processes blocked for resources.  Looking
> at my man pages, it looks like I was kind of right.  It says (of uptime)
> that it prints "the average number of jobs in the run queue over the
> last 1, 5 and 15 minutes."  I'm just used to single cpu systems, so for
> a single cpu system, my explanation will give load + 1.  Of course,
> multiple cpu systems would be able to handle a higher load average
> reasonably, if this explanation is correct.

Wait for resource meaning process is in CPU, but it cannot
proceed because it has to access resource from other sources.
Something like wio% in sar or paging in vmstat, not waiting
for packet or some keyboard input.

On Solaris machines, waiting for resource is not included in
the load average, but on SunOS machines, it is.

So on a machine with very heavy I/O from paging or just plain
Disk read/writes, a solaris machine might show a light load,
while a SunOS machine will show a load that signifies
that there is activities going on.

It is best to not use uptime as a measuring tool, it is deceptive
and easily misinterpreted.

- Show quoted text -

> >O + R field under process state in the ps command

> >So a 64 CPU CS6400 with a load of 60 means it is not being
> >fully utilized(Of course, there is more to it than that)

> I'd have to see that myself, I just have a hard time not panicking
> when I do "uptime" and see 15 minute load averages around 60-100.  Who
> knows, maybe it truly averages it out over each cpu...

> >vmstat's proc fields or sar -q would give also give you
> >the process in queue

> Now if only I could fix my system so that sar would work again.

> --
> Mark McCullough                             Systems-Programmer


 
 
 

Technical explanation of load?

Post by Larry Mascarenha » Thu, 03 Oct 1996 04:00:00



> Date: 2 Oct 1996 15:04:19 GMT

> Newsgroups: comp.unix.admin
> Subject: Re: Technical explanation of load?




> [snip]

> >> The understanding I have is that the load average is really three
> >> numbers.  It's the average number of jobs waiting for the CPU over
> >> one minute, five minutes, and fif* minutes.  I don't think different
> >> systems use a different calculation method, but I can't say for sure.

> >Isn't load just the average number of jobs on queue +
> >ones blocked for resources + number of processes in the CPU?
> > Or

> Can't be.  :-)  If my system breaks load average five, it's hard to use,
> and I have a _lot_ more than 5 processes blocked for resources.  Looking
> at my man pages, it looks like I was kind of right.  It says (of uptime)
> that it prints "the average number of jobs in the run queue over the
> last 1, 5 and 15 minutes."  I'm just used to single cpu systems, so for
> a single cpu system, my explanation will give load + 1.  Of course,
> multiple cpu systems would be able to handle a higher load average
> reasonably, if this explanation is correct.

I have a Sequent with 16 Pentium 100 processors & at a load of 3.5 or
more, it starts crawling. My guess is that irrespective of the number of
CPUs, when the run queue is 1, it probably means 1 per CPU...:)

I once had a problem when the load averag went to 20 & the system was
cruising & I was panicking, until I found a faulty program that was
spawning several processes & the processes were not really doing much. I
would probably never feel comfortable witha load greater that 4.00.

Quote:> >O + R field under process state in the ps command

> >So a 64 CPU CS6400 with a load of 60 means it is not being
> >fully utilized(Of course, there is more to it than that)

> I'd have to see that myself, I just have a hard time not panicking
> when I do "uptime" and see 15 minute load averages around 60-100.  Who
> knows, maybe it truly averages it out over each cpu...

> Now if only I could fix my system so that sar would work again.  

what's the problem with sar. Maybe we can help.
Larry Mascarenhas
-----------------
Systems Administrator
CSC Networks / Prentice Hall in NYC

 
 
 

Technical explanation of load?

Post by Matthew Tove » Fri, 04 Oct 1996 04:00:00



> Can anyone give a technical explanation of load? How is it calculated?
> What does a given load number mean in terms of resources being consumed
> on a given Unix box? Is load a calculation made by the kernel (I think
> it is), and do different Unix systems use different algorithms to
> calculate load?

> If anyone can point me to a good explanation of load, on-line or
> otherwise, that answers my questions, please be my guest.

Check the 'uptime' man page for some description of load.

Load means the same on all (unix) systems - it is the number of
processes on the system in a 'runnable' state. Why is this number not an
integer then? Because the number is averaged over an amount of time.
This can vary between systems - under Digital Unix, the 3 figures given
by 'uptime' represent the load average over the last 5 seconds, 30
seconds, and 60 seconds.

Depending on the circumstances, the load figure can be very useful or
completely useless. A rough guide is that if the load is equal to the
number of processors in the machine, the CPU resources are fully
utilised. If you are running a highly CPU intensive job, it will
generally add 1 to the load.

However if you have a number of I/O intensive jobs, they may push the
load up much higher, without having consumed all available CPU
resources. In this case using 'top' or similar to check on the CPU idle
percentage gives a better guide to CPU utilisation.

Regards,

Matt
--
Matt Tovey                              http://wwwcn.cern.ch/~mtovey

Linux is okay, but the boxes it comes in are too small.

 
 
 

Technical explanation of load?

Post by Galen.Arno » Fri, 04 Oct 1996 04:00:00


On our Sun 670 MP machine (4 cpu), the machine responds as fast with a load
of 3 as it does with a load of .10 when it's running cpu intensive apps.
Solaris 2.5 does a great job of multiprocessing.  The situation can change
dramatically though if 3 processes are using the same i/o device.  In
Solaris, the load ave. can equal the number of cpu's and performance is
good--if the i/o is spread across various devices or the jobs are compute
heavy.
 --
____________________________________________________________________________

Illinois State Geological Survey, Champaign, IL 61820         (217) 244-2514
____________________________________________________________________________

 
 
 

Technical explanation of load?

Post by Ling Wan » Fri, 04 Oct 1996 04:00:00




> > Date: 2 Oct 1996 15:04:19 GMT

> > Newsgroups: comp.unix.admin
> > Subject: Re: Technical explanation of load?




> > [snip]

> > >> The understanding I have is that the load average is really three
> > >> numbers.  It's the average number of jobs waiting for the CPU over
> > >> one minute, five minutes, and fif* minutes.  I don't think different
> > >> systems use a different calculation method, but I can't say for sure.

> > >Isn't load just the average number of jobs on queue +
> > >ones blocked for resources + number of processes in the CPU?
> > > Or

> > Can't be.  :-)  If my system breaks load average five, it's hard to use,
> > and I have a _lot_ more than 5 processes blocked for resources.  Looking
> > at my man pages, it looks like I was kind of right.  It says (of uptime)
> > that it prints "the average number of jobs in the run queue over the
> > last 1, 5 and 15 minutes."  I'm just used to single cpu systems, so for
> > a single cpu system, my explanation will give load + 1.  Of course,
> > multiple cpu systems would be able to handle a higher load average
> > reasonably, if this explanation is correct.

> I have a Sequent with 16 Pentium 100 processors & at a load of 3.5 or
> more, it starts crawling. My guess is that irrespective of the number of
> CPUs, when the run queue is 1, it probably means 1 per CPU...:)

> I once had a problem when the load averag went to 20 & the system was
> cruising & I was panicking, until I found a faulty program that was
> spawning several processes & the processes were not really doing much. I
> would probably never feel comfortable witha load greater that 4.00.

That is why I said somewhere else in the same thread that
uptime's load is not an accurate indicator of the system
busyness.

SVR4 does not take into account of processes waiting for
various types of I/O, so they do not show up on the load.

vmstat might show that there are many blocked processes, maybe
indicating that these processes are generating heavy disk I/O,
hence slowing everything else down.