Performance Problems on an L3000

Performance Problems on an L3000

Post by KENNETH NICAIS » Mon, 15 Jan 2001 11:47:02



Hello All,

I've got one that had just about has me beat.

We've got an in-house developed application running on a L3000 with Quad
500MHz 8600 processors that just can't get out of it's own way.  The
application itself was developed in a toolset called CellWorks and is used
to control / manage manufacturing equipment on one of  our production lines.
Here's the lowdown on what I'm seeing:

First, the system has 1000Base-SX network controllers in it and is seeing on
average ~400-600 pps.  I would not consider this a heavy load (we've been
able to easily sustain 10X this level during file transfer operations).
Second, the system has 2GB of memory of which only 1GB is actually in use.
Dynamic buffer cache is set to 5%-20%.  IE: there's plenty of memory. Third,
physical IO to disk is very light this was verified using both the tried and
true method of watching the activity lights on the drives themselves and we
also had the opportunity to load this entire application onto a solid state
disk drive and saw no noticeable performance improvement.

Incase you were wondering, this app uses a very small (<25MB) memory
resident database to keep track of where production units are on a
manufacturing line.  As a unit moves through the line each piece of
equipment sends a message up to the app telling it in effect I've got the
unit #????? can I proceed, the app looks into it's database and if
everything is aok says yeah go ahead.  This is not rocket science....

However, we're seeing 4,8,10 second delays in responding back to equipment
requests when things really get cranked up.  And all I can find is a bunch
of apps waiting in "GBL_OTHER_IO_WAIT"..(many of the key modules greater the
90%).  The CPU's are running roughly 1x60%, and 3x20% during peak loads, I
can't find a single process the being priority suspended. We've already got
PRM running on this system.  I'm using one of the tuned kernel configuration
parameter sets and have only fiddled with maxdsiz_64bit and maxssix_64bit as
per HP's recommendation (which also had no impact).

Does any one have any thoughts on just what exactly is GBL_OTHER_IO_WAIT, or
where I could look next??  Or would you agree that we've thrown enough
hardware at this baby and the reality is it's just not gonna run any better
regardless.

Any relevant thoughts would be greatly appreciated....

KEN

 
 
 

Performance Problems on an L3000

Post by Simon Water » Tue, 16 Jan 2001 03:57:01



> I've got one that had just about has me beat.

It always feels like that before you solve it. Otherwise you won't feel
satisfied when you do fix it.

Quote:> Dynamic buffer cache is set to 5%-20%.  IE: there's plenty of memory.

A bit confused, you say you only fiddled with maxd[ds]siz_bit64, but
this is not the default DBC settings if IIRC, so I assume you picked
this up from one of the pre-configured kernel settings? (Which?)

Based on past experience, I suspect HP don't test these pre-set kernel
configurations in great detail, certainly I had one where the values
were in conflicts with the constraints so that SAM wouldn't let you
change any values (vi /stand/system worked 8-).

Quote:> Incase you were wondering, this app uses a very small (<25MB) memory
> resident database to keep track of where production units are on a
> manufacturing line.

What kind of database:
relational/object/in-house/proprietary/commercial..
Does it perform any locking? If so is it at row, page, object or table
level?

Quote:> This is not rocket science....

I've always been of the opinion that since rockets work by conservation
of momentum, rocket science can't be much more than O level (read High
School?!) physics. Where as understanding the inner workings of a modern
computer operating system with over 17,000 files, supporting multiple
processors, and many complex and detailed standards, now that is an
intellectual challenge covering quantum mechanics to queuing theory.

Quote:> And all I can find is a bunch
> of apps waiting in "GBL_OTHER_IO_WAIT"..(many of the key modules greater the
> 90%).

I'd be really interested to know how fast it runs on one CPU - call it a
gut feeling - I guess experimenting with a semiconductor(?) production
facility is a bit out of order.

Quote:> Does any one have any thoughts on just what exactly is GBL_OTHER_IO_WAIT, or
> where I could look next??  Or would you agree that we've thrown enough
> hardware at this baby and the reality is it's just not gonna run any better
> regardless.

Throwing hardware at a problem is rarely the best solution.

Other I/O wait states - I suspect exactly what it says - all the waiting
for I/O not covered by the other I/O wait.

Documentation on wait states seems scarce, but most of the other states
seem to be related to disk I/O (and a little term I/O), so anything
involving database locking, spins locks or buffer cache could well end
up dumped in this category.

Is the database in your application always memory resident, or was it
disk resident before and you got some other problem?

 
 
 

Performance Problems on an L3000

Post by Rick Jone » Wed, 17 Jan 2001 01:59:03


A tusc trace of one or more of the processes involved might yield
interesting data. If the process is multi-threaded, be sure to have
tusc disaply thread IDs. Include the option for timestamps. For even
more data, and more overhead, ask tusc to trace both syscall entry
_and_ exit. Look for the big time gaps.

Also, while ~600 packets per second is something that shouldnot make a
GbE interface or even L3000 break a networking sweat, it might be
interesting to take a tcpdump trace of 60 seconds of time. Get the
latest tcpdump/libpcap from www.tcpdump.org. I would suggest writing
to a binary file with the -w option, and then post processing it later
with -r (the manpage will add clarity :)

rick jones
--
ftp://ftp.cup.hp.com/dist/networking/misc/rachel/
these opinions are mine, all mine; HP might not want them anyway... :)
feel free to email, OR post, but please do NOT do BOTH...
my email address is raj in the cup.hp.com domain...

 
 
 

Performance Problems on an L3000

Post by Mark Land » Wed, 17 Jan 2001 05:49:43


On Sun, 14 Jan 2001 02:47:02 GMT, "KENNETH NICAISE"


>Hello All,

>I've got one that had just about has me beat.

Well you don't mention what flavor of 11.x you are running, but I
would first make sure you know what all the latest "performance
patches" are for your system, and consider applying those which you
don't have yet.
 
 
 

1. Linux NFS Server and HP-UX Client performance problem

Hello,

I have a performance problem with an Linux (SuSE8.0) NFS server and a HP-UX
10.20 client. If I try to copy 100 small files (200 Bytes each) from the
HP-UX client disc to the Linux NFS server, this takes 25 seconds for the
copy and 23 seconds for a rm command. With an HP-UX NFS server and HP-UX
client it takes 3.5 seconds. I now about the NFS problem with small files,
but is there a way to tune the HP-UX/Linux connection to HP-UX/HP-UX speed ?
(or faster)

thanks and regards
Rolf

2. Eudora & Speakable Items

3. Samba Performance Problem with HPUX11 and pwrite

4. XBOX COnsole

5. Java: Performance problem using Thread.sleep()

6. Zip keyboard

7. performance problem

8. Warp won't Install !!!

9. DB Performance problems under 10.20 - shmmax?

10. performance problem

11. Weir network performance problem

12. Performance problems with parallel compiler directives