Hi.
i'm porting my new perspective filler into asm and
for last some hours i've been trying to detect how many CPU cycles each
asm. instruction takes. (Genuine Intel Pentium CPU)
I've already got working perspective filler and even though my
inner loop contains 15 asm instructions (only add's and mov's) it makes only
15 fps on 640x480x8 (a simple 10-polygonal star)
And the way CPU behaves makes me sick.
exampel:
Tloop:
nop
nop
dec esi
jnz Tloop
each iteration takes 2 cycles (first 2 nop's -- 1 cycle, dec and jnz --
another one)
Tloop:
nop
dec esi
jnz Tloop
each iteration takes again 2 cycles
BUT
Tloop:
dec esi
jnz Tloop
TAKES 5 CYCLES >8\\
That's the very simple example... believe me, others are even more
*...
mov al, byte ptr [ecx+Offset]
takes 18 cycles (ECX ranging 0...65536) -- because of cache misses
I need to know how the working filler is done... just for comparison to my
code
and I need some docs on intel's optimization (pairing rules, pairing
integer/floating instructions...)
and hey, my L2 cache is 512 kB (like many others) -- is there possibility to
force
cache to store entire bitmap to minimize misses ??
big thanxx in advance for any help... i really count on this or else i'd
become mad...
CU
ps: as soon as i got this code working, my current sources would be placed
at my page /archives section
--
***> http://www.veryComputer.com/ <*** collection of my demos / sources ***