Hi, libffm and BLAS users,
I made a libffm patch to add exceptional handling and some optimized
1. exceptional handling patch for libffm-0.21.
Now libffm's current version is 0.21, but this routine can not
handle exceptional value(NaN, +-Inf, Subnormal). This patch enables
to handle such a value. For examble, sqrt() routine can calculate
subnormal value exactly(it's not emulation, so it can calculate
a little bit slower than normal value).
Most routines are as fast as before, but exp() routine is a little
## atan/asin/acos routine are not finished yet. ###
Also, I added some useful(maybe??) routines.
sqrti : calculate 1.0/sqrt(), it's fast.
sqrtv/sqrtiv : vectorlized sqrt/sqrti routine. This routine can
calculate only 17 clocks/factor.
At now, I made only double float version for C. If you
want single float version or for FORTRAN version, please
let me know.
This patch is for TEST ONLY, so you can not attach your pacakges.
Please wait until next public release.
2. optimized BLAS routine.
Some optimized BLAS routines are available(these values are at 21164
600MHz LX with 2MB L3 cache machine).
sgemm/dgemm : 960/820 MFlops constantly.
Can you hear "Alpha resonance"? I do not know why,
but I can hear a kind of resonance from 21164.
sgemv/dgemv : if the data is in cache, it runs about 700MFlops.
sdot, ddot, dsdot, zdotu, zdotc, cdotu, cdotc :
pretty fast, but I do not know exact value
(maybe 650 to 700 MFlops).
saxpy/daxpy : joke :-)
Now, I'm trying to make caxpy/daxpy, cgemv, zgemv routine. It'll be
available until next week.