Hi,
I release optimized BLAS routine(all of Level 3 and some Level 2
routines).
Though we can use gemm_based Level 3 routines at netlib, I made
other optimized(blocked) routines. Available routines are followed at
end of this message.
Features
1. Fast and Fast
Most Level 3 routines perform over 1GFlops on 21264 677MHz,
and faster than CXML or ATLAS(they uses my optimized routine,
though)
Especially, complex routines(c-, z-) run much faster than
CXML and ATLAS.
2. Size-independent
This routine sustains high-speeds even if the sizes are large.
And the small-matrix performances are improved than before.
3. Auto-detect architecture
You do not have to check your machine's architecture. Because
this Level 3 and Level 2 routines can automatically detect
architecture whether the architecture is based on ev5 or ev6. You
may only link your program with this library.
4. 'R' option is supported
Some Level 3 routines are supported 'R' option(Non-Transposed and
Conjugate) like CXML.
5. Available optimized routines.
All Level 1 routines(except for xerbla).
lsame, dcabs1, scabs1
isamax, idamax, icamax, izamax
saxpy, daxpy, caxpy, zaxpy
scopy, dcopy, ccopy, zcopy
sdot, sdsdot, ddot, dsdot
cdotc, cdotu, zdotc, zdotu
snrm2, dnrm2, scnrm2, dznrm2
srot, drot, csrot, zdrot
crotg, srotg, drotg, zrotg
srotm, drotm, srotmg, drotmg
sscal, dscal, cscal, zscal, csscal, zdscal
sasum, dasum, scasum, dzasum
sswap, dswap, cswap, zswap
Some Level 2 routines
sgemv, dgemv, cgemv, zgemv
sger, dger, cger, zger
strsv, dtrsv, ctrsv, ztrsv
All Level 3 routines
sgemm, dgemm, cgemm, zgemm
ssymm, dsymm, csymm, zsymm
strmm, dtrmm, ctrmm, ztrmm
strsm, dtrsm, ctrsm, ztrsm
ssyrk, dsyrk, csyrk, zsyrk
ssyr2k, dsyr2k, csyr2k, zsyr2k
chemm, zhemm
cherk, zherk
cher2k, zher2k
6. TO DO
Level 2 routines will be available soon. But I must re-optimize
GEMV and GER routine, because I think these routines are not
well-optimized yet.
7. Getting source
ftp://www.netstat.ne.jp/pub/Linux/Linux-Alpha-JP/BLAS
Enjoy "GFLOPS" World!!
Thanks,