Cache Control

Cache Control

Post by bill » Wed, 07 Sep 1994 12:32:47



Does the cache control function work on a 4D/320S and should I
use it when talking to a memory mapped device on the VME?

Thanks

 
 
 

Cache Control

Post by Dave Ols » Wed, 07 Sep 1994 17:17:44


| Does the cache control function work on a 4D/320S and should I

Yes

| use it when talking to a memory mapped device on the VME?

No.  I/O addresses are never cached anyway.
--

The most beautiful things in the world are              |   Dave Olson
those from which all excess weight has been             |   Silicon Graphics


 
 
 

1. Controlling Data Cache

I have LOTS of questions here about _data cache_ on SGI machines.

Can someone there please explain to me how a programmer can control the use
of the _CPU data cache_ on a multiple-CPU SGI machine?

I would like to control what data is copied from memory into each of the
individual caches (one cache per CPU), and when the data is copied.

Is this possible?

If it is not possible, is there some way to tell how efficiently the caches
are being utilized during a run?

It's my understanding that multi-CPU SGI machines use shared memory.
Apparently when a program is run on several CPUs in parallel, these machines
generate a certain amount of overhead making sure that when one CPU computes
a new value for some variable, each of the CPU caches (one for each CPU)
reflect the updated value for that variable (if the variable is represented
in any of the CPU caches).  I believe this is called prevention of "cache
missing".  When more CPUs are crunching in parallel, more cross-checking must
be done between the caches, making more overhead.

Apparently the newer SGI machines (e.g., Power Challenge) crunch numbers much
faster than the older ones.  HOWEVER, the newer ones do not check for cache
missing proportionately faster than the older ones.  Consequently, while a
program runs much faster on a newer SGI using N-cpus than on an older SGI using
N-cpus, the speed-up one gets from going from N-cpus to N+1 cpus is MUCH LESS
than it was on the older SGIs.  

When we run the NPARC code on the older SGIs (e.g., renegade), we cut our CPU
time by almost 50% by going from 1 CPU to 2.  But on the Power Challenge, we
only get a 30% speedup.  The speedup in adding a 3rd CPU to a Power Challenge
run is dismal.  We think the machine is bottlenecked in checking for cache
missing.

However, it is also possible that the problem could be drastically reduced if
our arrays were structured differently.  But how should they be structured?
When an array element is accessed by a program, are other data moved into
cache too?  Which?  When are they, and when are they not?

Is there some way to manage the cache to improve the performance of a program
running on a Power Challenge?

Thanks.

--
Robert Michael Wood                | "Do you know what it's like to die

                                   |  It's redundant." - Bakersfield P.D.

2. how to add line feeder in XSLT output

3. Controlling Modem-Control Signals on Serial Port?

4. Why IE5 can not open XML file generated by Sun

5. SGI audio/video syncrony check & serial i/o control

6. Auto delete at startup.....

7. How to control dynamic memory allocation ?

8. Where To Get The PGP Interactions Page (8-12)

9. fasttrack server access control

10. VTR control V-LAN

11. video panel controls on O2

12. Wanted : Version Control System for SGI ( RCS, SCCS or CVS) where

13. Giving the other thread control