FastMath routines? libffm.a?

FastMath routines? libffm.a?

Post by fchi.. » Thu, 06 Jan 2000 04:00:00



The DEC Alpha had/has some custom libraries for it which were tuned
for the Alpha.  It apparently was way faster than dumb old libm.a

Has anyone bothered to write some of these for the new P III SSE
 (SIMD+MMX) or G4 AltiVec units?

Libraries like that could really speed up JPEG/MPEG and OpenGL stuff
on Linux as well as make it a better scientific system now that
 P III and Athlons are so fast.  It would be nice to put SGI out
of their misery and have another reason not to buy a Sun for those
floating point users out there.

I don't know how much kernel support is needed to allow these things to
work.  The old MMX *was a waste and took forever to switch modes.  I
would assume that Intel fixed that for the new stuff and that Motorola
did it right in the first place.  It should be a bunch of save
system state calls when you interrupt a vector calculation so that
would mean Kernel support just to get the libraries working.

Any ideas?

Sent via Deja.com http://www.veryComputer.com/
Before you buy.

 
 
 

FastMath routines? libffm.a?

Post by Christopher Brow » Fri, 07 Jan 2000 04:00:00




>The DEC Alpha had/has some custom libraries for it which were tuned
>for the Alpha.  It apparently was way faster than dumb old libm.a

>Has anyone bothered to write some of these for the new P III SSE
> (SIMD+MMX) or G4 AltiVec units?

>Libraries like that could really speed up JPEG/MPEG and OpenGL stuff
>on Linux as well as make it a better scientific system now that
> P III and Athlons are so fast.  It would be nice to put SGI out
>of their misery and have another reason not to buy a Sun for those
>floating point users out there.

>I don't know how much kernel support is needed to allow these things to
>work.  The old MMX *was a waste and took forever to switch modes.  I
>would assume that Intel fixed that for the new stuff and that Motorola
>did it right in the first place.  It should be a bunch of save
>system state calls when you interrupt a vector calculation so that
>would mean Kernel support just to get the libraries working.

>Any ideas?

There is a notable distinction between IA-32 and Alpha (and I can't
speak for PPC):

   IA-32 has FP instructions specifically for trig and transcendental
   functions.

   Alpha doesn't.

The "original" libm.a for Alpha was coded in C, wasn't terribly tuned,
and provided rather slower performance than you'd get with the
hardware FP instructions on IA-32.

It was *extremely* worthwhile to hand-craft these functions in Alpha
assembly language.

Note that this has nothing to do with MMX, AltiVec, or the Alpha
equivalent (whose name escapes me...).

There is merit to creating libraries to try to take advantage of
"MMX-like" technologies; I suspect they'd need to be fairly
application-oriented.

<http://www.veryComputer.com/~rfisher/Research/Libmmx/libmmx.html>
<http://www.veryComputer.com/;

I expect there would be merit to having some MMX/ 3DNow/ AltiVec/
... support in XFree86, but it's not worth *thinking* about 'til
XFree86 4.0 is actually released.

--
"Using Java  as a general purpose application  development language is
like  going big  game  hunting  armed with  Nerf  weapons."
-- Author Unknown


 
 
 

1. BLAS and new LibFFM routine for Alpha

Hi,  this is an annoucement of BLAS and new LibFFM routine for Alpha.

BLAS :
  This a optimized BLAS library for alpha including ...

         1. Level 1, Level 2, Level 3
         2. Some extended Level 1
         3. Compaq's extented routine(GEMA, GEMS, GEMT)
         4. Some Lapack routine(LASWP, GETF2, GETRF, GETRS)

  Level 1, GEMV, GER, GEMM(Level 3) routines are written in assembler.
Especially, Level 3 GEMM routine performs near theoretical peak
performance(SGEMM : 94.5% of Peak,  DGEMM : 92.5% of Peak, 667MHz
21264 with DDR cache).  Also small matrix performance is much better
than before(faster than ATLAS and just unrolled assembler routine).

  Other features ...
         1. Supported SMP systems.  The Linpack peak performance of
            4 CPU is 4430 MFlops (83% of peak).
         2. works on Linux/Alpha and Tru64 UNIX.  I does not make sure
            if it works on *BSD.

LibFFM :

 I've just re-started to develop Free Fast Math libraries.  The target
is "faster and more accurate".  I think it's really difficult, but
I'll try it.  Anyway, I've made new SIN/COS/TAN routines.  The SIN/COS
routines are much accurate and faster than before(I will use
polynomial functions, not table algorithms).  TAN routine is slightly
faster.

# I don't know anyone who wants a new libFFM.  I will also make
# vectorlized routine which is compatible for Compaq's VLIB(CXML).

The sources are available at

http://members.jcom.home.ne.jp/kgoto/

good luck,

2. bochs-2000 and LinuxPPC

3. Version 0.21 of free fast math routines (libffm, preliminary version) now released !

4. No KDE or GNOME

5. libffm patch and BLAS routine

6. Aopen Sound Cards

7. Free fast math routines (libffm, preliminary version) being released !

8. Help with zone file format

9. libffm 0.21 announcement, nicer version !??

10. libffm & ByteMark 2.1

11. libffm.0.28 released !!

12. Any date routines?

13. STANDARDIZING PPP SETUP ROUTINE