BLAS and new LibFFM routine for Alpha

BLAS and new LibFFM routine for Alpha

Post by Kazushige Go » Thu, 21 Jun 2001 21:55:48

Hi,  this is an annoucement of BLAS and new LibFFM routine for Alpha.

  This a optimized BLAS library for alpha including ...

         1. Level 1, Level 2, Level 3
         2. Some extended Level 1
         3. Compaq's extented routine(GEMA, GEMS, GEMT)
         4. Some Lapack routine(LASWP, GETF2, GETRF, GETRS)

  Level 1, GEMV, GER, GEMM(Level 3) routines are written in assembler.
Especially, Level 3 GEMM routine performs near theoretical peak
performance(SGEMM : 94.5% of Peak,  DGEMM : 92.5% of Peak, 667MHz
21264 with DDR cache).  Also small matrix performance is much better
than before(faster than ATLAS and just unrolled assembler routine).

  Other features ...
         1. Supported SMP systems.  The Linpack peak performance of
            4 CPU is 4430 MFlops (83% of peak).
         2. works on Linux/Alpha and Tru64 UNIX.  I does not make sure
            if it works on *BSD.

LibFFM :

 I've just re-started to develop Free Fast Math libraries.  The target
is "faster and more accurate".  I think it's really difficult, but
I'll try it.  Anyway, I've made new SIN/COS/TAN routines.  The SIN/COS
routines are much accurate and faster than before(I will use
polynomial functions, not table algorithms).  TAN routine is slightly

# I don't know anyone who wants a new libFFM.  I will also make
# vectorlized routine which is compatible for Compaq's VLIB(CXML).

The sources are available at

good luck,


1. libffm patch and BLAS routine

Hi, libffm and BLAS users,

I made a libffm patch to add exceptional handling and some optimized
BLAS routines.

1. exceptional handling patch for libffm-0.21.

  Now libffm's current version is 0.21, but this routine can not
  handle exceptional value(NaN, +-Inf, Subnormal).  This patch enables
  to handle such a value.  For examble, sqrt() routine can calculate
  subnormal value exactly(it's not emulation, so it can calculate
  a little bit slower than normal value).
  Most routines are as fast as before, but exp() routine is a little
  bit slower.

  ##  atan/asin/acos routine are not finished yet.  ###

  Also, I added some useful(maybe??) routines.

  sqrti         : calculate 1.0/sqrt(), it's fast.
  sqrtv/sqrtiv  : vectorlized sqrt/sqrti routine.  This routine can
                  calculate only 17 clocks/factor.
                  At now, I made only double float version for C. If you
                  want single float version or for FORTRAN version, please
                  let me know.


  This patch is for TEST ONLY,  so you can not attach your pacakges.
  Please wait until next public release.

  See at

2. optimized BLAS routine.

  Some optimized BLAS routines are available(these values are at 21164
  600MHz LX with 2MB L3 cache machine).

   sgemm/dgemm : 960/820 MFlops constantly.
                 Can you hear "Alpha resonance"?  I do not know why,
                 but I can hear a kind of resonance from 21164.

   sgemv/dgemv : if the data is in cache, it runs about 700MFlops.

   sdot, ddot, dsdot, zdotu, zdotc, cdotu, cdotc :
                 pretty fast, but I do not know exact value
                 (maybe 650 to 700 MFlops).  

   saxpy/daxpy : joke :-)

  See at

  Now, I'm trying to make caxpy/daxpy, cgemv, zgemv routine.  It'll be
  available until next week.


2. I was wondering...

3. FastMath routines? libffm.a?

4. Laptop for Linux?

5. Version 0.21 of free fast math routines (libffm, preliminary version) now released !

6. please help me concerning Imagemaps with CERN.

7. Free fast math routines (libffm, preliminary version) being released !

8. Hotkey config file in sawfish

9. Optimized BLAS/Lapack routine

10. Fast opmized BLAS(Level 1) routine is available

11. calling Fortran BLAS Routines

12. Fast BLAS routine(Level 3)

13. BLAS for Alpha/Linux?