I'm sure some hardware designer at IBM would be pleased to hear this but being a software man,Quote:> A big problem in large-scale scientific and engineering simulations is
> that there is typically very little cache reuse. There is very often
> only about one FLOP (floating-point operation) per memory reference.
> This is what leads to the DRAM bottleneck. Peak peformance of
> cache-based microprocessor may be around 1 GFLOP, but you're lucky to
> get 10% of this in a real application. (To IBM's credit, they've
> designed their systems with this in mind.)
I don't know who he/she is. Off the top of my head, all I can say is thatIBM produces general
purpose systems, not ones tuned purely for compute intensive applications. Most super computer
vendors seem to have gone out of business because they can't find enough customers willing to
pay the price premium that would be required for such a CPU to primary memory bandwidth.
If you take a look at what it takes to deliver 100 Megaflops on multiplication of numbers inQuote:> You're right about new features (such as speculative execution)
> addressing this, but in my experience they aren't very effective so far.
IEEE floating point format, assuming your code is entirely in cache, you need two incoming
streams of 800 MB/sec and one outgoing stream of 800 MB/sec. That's 2.4 GB/sec. I'm not a
hardware expert, but I suspect that it would be very expensive to deliver a memory and bus
system with ten times that throughput. On a 256 byte bus, that would require 100 million bus
transactions per second, which doesn't seem altogether infeasible, but then I'm a layperson in
this area. Find me a multi-billion dollar market and I'll try to have IBM deliver such a
computer to your door :-)