|> :> I'm looking for optimization hints for the Aplha 21164. Any input will
|>
|> As far as compiling goes: Get a recent egcs (egcs.cygnus.com), install it.
|> Compile with something like:
|>
|> -fomit-frame-pointer -funroll-loops -O5 -finline-functions -ffast-math
|>
|> and only use -mieee if you really need it (if you get weird Floating point
|> exceptions, try it).
But egcs still doesn't produce good code...
Some suggestions:
Avoid byte/short accesses, each one translates (without the BWX extension of
21164PC) to at least 4 to 5 instructions. The sieve benchmark shows over 50%
speedup with an int-array instead of char, despite the higher memory consumption.
BTW: A Pentium2 loses more than half of its performance with the int array :-/
Try to prefetch data and deposit it in registers. I've made mesurements with the
inner loop of the mpg123-decoder. It consists mainly of the following loops
(already unrolled in the code):
float *a,*b;
for(n...)
{
sum=*a++ * *b++; /* multiply and accumulate MAC*/
sum+=*a++ * *b++;
/* 16 times total */
/* store sum... */
Quote:}
egcs-1.01 does a *y stupid work and translates each line to the following
(approx):
ldt $f1,($1)
ldt $f2,($2)
mult $f1,$f2,$f3
addt $f4,$f3,$4
lda $1,8($1)
lda $2,8($2)
This is very inefficient, since the mult has always to wait for the data (and the
memory latency is very high, compared to the instruction cycles). So the
pipeline is never filled and you can forget your MFLOPs...
To avoid that, you can prefetch the data:
ldt $f1,($1)
ldt $f2,($2)
ldt $f3,8($1)
ldt $f4,8($2)
/* etc */
mult $f1,$f2,$f0
mult $f3,$f4,$f3
/* etc */
addt $f0,$f3,$f0
/* etc */
I've tried this on the assembler side with 4 MACs (ie. prefetching 8 values)
and interleaving with the ldts (after a mult the source operands are free and can
be loaded with the next data). This decreased the needed processor cycles to 50%!
IMHO can this also be achieved with the right compiler hints (register variables)
and good coding.
Just my 0.02(hey, where's the euro-key?)
--
Bye
http://www.veryComputer.com/~acher/
"Oh no, not again !" The bowl of petunias