I finally tried to measure memory move speed on our Sparc Server 1000.
It has 4 50MHz cpu modules (TI,TMS390Z55 from prtconf) .
What I find is that when I use my own version of memcpy, (which I am
pretty proud of, a lot better on big transfers than the one in the C
library) I find that a Sparc 10 with a 40MHz (TI,TMS390Z50) runs 20%
faster. With suns own memcpy the sparc 10 is about 30 % faster (its
speed is dependent on alignment and cache collisions).
Now I have read these white papers about this fantastic XDbus, with
throughput up to 250MB/s. I have also studied this magnificent cache
I have also read about this bcopy/bzero accelerators. Where are they?
Not in the c library memcpy at least. But from my device writer
experience I have not seen anything very fast. Are they used somewhere
in the kernel under Solaris 2.2? Under Solaris 2.3 ?
Now doing memcpy of memory around is not very useful. But I have an
application that reads a lot of memory, does a lot of bitfickling,
checks the data a bit and demultiplexes the data (turns a big matrix
around). Now this application moves data around as fast as my own
very much tuned memcpy (I hope) !
I also see that even though the cpu should be 25% faster, this
application only speeds up less that 15%. Seems like I have hit the
memory move barrier.
One thing more. When I do several processes to move memory in paralell
the speed fell with 25 %. We should be prety far from the 250MB/s
limit of the XDBUS.
Copying 2 MB 100 times:
on Sparc Server 1000: 10.5 seconds 19 MB/s on the fastest
12.5 seconds 16 MB/s on the slowest
Very dependent on small differences. Proberly becauce of cache lengths and collisions.
on Sparc 10: 8.1 seconds 24.7 MB/s
Thorbj?rn Willoch | Schlumberger Geco-Prakla
Phone: +47-67575548 | Jongs?svn 4