Core (was Re: I need a new 'hard drive'/ 'CPU')

Core (was Re: I need a new 'hard drive'/ 'CPU')

Post by M. Ranjit Mathew » Mon, 16 Mar 1998 04:00:00



Quote:> A big problem in large-scale scientific and engineering simulations is
> that there is typically very little cache reuse.  There is very often
> only about one FLOP (floating-point operation) per memory reference.
> This is what leads to the DRAM bottleneck.  Peak peformance of
> cache-based microprocessor may be around 1 GFLOP, but you're lucky to
> get 10% of this in a real application. (To IBM's credit, they've
> designed their systems with this in mind.)

I'm sure some hardware designer at IBM would be pleased to hear this but being a software man,
I don't know who he/she is. Off the top of my head, all I can say is thatIBM produces general
purpose systems, not ones tuned purely for compute intensive applications. Most super computer
vendors seem to have gone out of business because they can't find enough customers willing to
pay the price premium that would be required for such a CPU to primary memory bandwidth.

Quote:> You're right about new features (such as speculative execution)
> addressing this, but in my experience they aren't very effective so far.

If you take a look at what it takes to deliver 100 Megaflops on multiplication of numbers in
IEEE floating point format, assuming your code is entirely in cache, you need two incoming
streams of 800 MB/sec and one outgoing stream of 800 MB/sec. That's 2.4 GB/sec. I'm not a
hardware expert, but I suspect that it would be very expensive to deliver a memory and bus
system with ten times that throughput. On a 256 byte bus, that would require 100 million bus
transactions per second, which doesn't seem altogether infeasible, but then I'm a layperson in
this area. Find me a multi-billion dollar market and I'll try to have IBM deliver such a
computer to your door :-)
 
 
 

Core (was Re: I need a new 'hard drive'/ 'CPU')

Post by John McCalp » Tue, 17 Mar 1998 04:00:00




Quote:>> A big problem in large-scale scientific and engineering simulations is
>> that there is typically very little cache reuse.  There is very often
>> only about one FLOP (floating-point operation) per memory reference.
>> This is what leads to the DRAM bottleneck.

Sometimes.....

I recently did a study of commercial science/engineering software
that had been optimized for Silicon Graphics servers.  Few
of the codes displayed a significant bandwidth bottleneck, some
displayed a modest bottleneck, and over half showed no significant
performance degradation due to bandwidth limitations.

That is not to say that if the average user tried to implement
these algorithms that he/she would get such good results ---
sometimes a lot of work and experience was required.

Quote:> If you take a look at what it takes to deliver 100 Megaflops on
> multiplication of numbers in IEEE floating point format, assuming
> your code is entirely in cache, you need two incoming streams of
> 800 MB/sec and one outgoing stream of 800 MB/sec. That's 2.4
> GB/sec.

Yes, although many algorithms have adds in there, too, which would
boost it to 200 MFLOPS for that 2.4 GB/s bandwidth.

Quote:> I'm not a hardware expert, but I suspect that it would be
> very expensive to deliver a memory and bus system with ten times
> that throughput.

A sustainable 2.4 GB/s is not far above what current external caches
provide, and should be easily sustainable in the next generation of
"killer micros".  On the other hand, I don't know of any systems at
less than about $200k that can sustain 2.4 GB/s from DRAM.

Measured results for lots of systems are at:

        http://www.veryComputer.com/.*ia.edu/stream/

Quote:> Find me a multi-billion dollar market and I'll try to have IBM
> deliver such a computer to your door :-)

The global technical compute market for machines in the $100k US
and up range is somewhere in the $3e9-$4e9 range.  A fraction of
that has very high main memory bandwidth demands.  As caches get
bigger and as microprocessors get better bandwidth and as scalable
algorithms become more widespread, many of these high bandwidth
customers are migrating to scalable microprocessor-based systems.
--
--
John D. McCalpin, Ph.D.      Server System Architect
Server Platform Engineering  http://www.veryComputer.com/


 
 
 

Core (was Re: I need a new 'hard drive'/ 'CPU')

Post by Peter Moyl » Tue, 17 Mar 1998 04:00:00



>> A big problem in large-scale scientific and engineering simulations is
>> that there is typically very little cache reuse.  There is very often
>> only about one FLOP (floating-point operation) per memory reference.
>> This is what leads to the DRAM bottleneck.  Peak peformance of
>> cache-based microprocessor may be around 1 GFLOP, but you're lucky to
>> get 10% of this in a real application. (To IBM's credit, they've
>> designed their systems with this in mind.)

Since this is alt.usage.english:

That GFLOP should be GFLOPS.  You have to fit the "per second"
in somewhere.  (You realise, I'm sure, that you can afford to ignore
those people who think that "GFLOPS" is the plural of "GFLOP".)

Unless, of course, you mean a thousand million floating point operations
per memory reference.

--

 
 
 

1. 'Unusual' cpu's

All,

The C standard is current under revision (currently called,
C9X, what else?).  We on the committee have heard about machines
with 'unusual' properties.  This was an argument in favour
of some of the lattitude allowed by the current C standard.

We are currently back to discussing the same basic issues.
There is lots of talk about 'unusual' machines.  But do they
still exist?

Some of the issues that frequently crop up are:

   o Not everybody uses IEEE floating point

   o Not everybody uses two's compliment

   o Not all machines use 'simple' byte addressing.

When pushed to name such machines and describe some of their
properties I am often reduced to naming a few special cases
that I happen to know about.

Life would be much easier if there was a web page to consult.
So I have decided to create one.

Would people please send me references to 'unusual' machines
that they know about.  If possible please include:

    o A URL to the machine instruction set

    o A list of C compilers known to exist for the machine

    o A list of unusual, suprising or odd behaviours

    o Some idea of how widely used the machine is

I will collate the results and create a web page.

derek

--
Derek M Jones                                     tel: +44 (0) 1252 520 667

Applications Standards Conformance Testing       http://www.knosof.co.uk

2. EOB Level 3 help

3. Internet Resource Lists - IC's, CPU's, MPU's, etc. -- Pointer

4. Reading Excel files, limit on number of records using proc ac cess?

5. IA-64 CPU2000 results (was 'alpha - ia64')

6. Hebrew Fonts

7. mail trouble (anyone but peter@mips.com hit 'n')

8. Fourth European Workshop on Learning Robots: Program & RI

9. Poor quality software (Was 'Craft guild for programmers?')

10. 'Tall, thin' versus 'short, fat' architects/designers

11. Bus comparisons and "Intel's P6 'Commodity' Server Push'

12. 'conservative' GC == 'risky' GC