Floating Point Exception

Floating Point Exception

Post by Tone Kokal » Wed, 08 Sep 1999 04:00:00



Hi All !

Please, can somebody help me.

I have the following problem. I have access to two Linux/Alphas
PC21164-P7.  But I am not able to do something really useful with
them, since I got "Floating Point Exceptions" and the console report
something like:

arithmetic trap at 0000000120090248: 11 0000000800000000

It is not that the applications I am running are bad, since they are
well established and runs on other platforms quite well.

There is one more thing that bothers me and that is that console reports

messages like, whenever I run some program (for example compiler or some

other programs):

<sc 208(11ffffbba,3e8,64)>

And I don't know what this messages means.

I am running kernel 2.0.37 and Debian/GNU Linux slink 2.1.

Does anybody have the same problems or knows how to resolve them ????

Thanks,
Tone

--
+------------------------------------------------------------------------+

|
| Department of Physical and Organic Chemistry Phone: x 386 61 177 3520
|
| Jozef Stefan Institute                         Fax: x 386 61 177 3811
|
| Jamova 39, SI-1000 Ljubljana
|
| SLOVENIA
|
+------------------------------------------------------------------------+

 
 
 

Floating Point Exception

Post by Greg Linda » Wed, 08 Sep 1999 04:00:00



> I have the following problem. I have access to two Linux/Alphas
> PC21164-P7.  But I am not able to do something really useful with
> them, since I got "Floating Point Exceptions" and the console report
> something like:

> arithmetic trap at 0000000120090248: 11 0000000800000000

Unfortunately the FAQ never got an entry for this. Can someone write one?

-- g

 
 
 

Floating Point Exception

Post by J. Josh Fen » Wed, 08 Sep 1999 04:00:00



> Hi All !

> Please, can somebody help me.

> I have the following problem. I have access to two Linux/Alphas
> PC21164-P7.  But I am not able to do something really useful with
> them, since I got "Floating Point Exceptions" and the console report
> something like:

> arithmetic trap at 0000000120090248: 11 0000000800000000

> It is not that the applications I am running are bad, since they are
> well established and runs on other platforms quite well.

I had this quite often. I assume you use GCC (egcs more precisely). My
guess is that some
variable in your program is used before a value is assigned to it. For
example:

....

double x;
double y[10];

printf("x = %le, y0 = %le\n", x, y[0]);

....

In other platforms, variables are initialized to zero (maybe), but in
Alpha, variables
are initialized to some special value (like NaN or something), which may
be the cause
of the problem.

Quote:> There is one more thing that bothers me and that is that console reports

> messages like, whenever I run some program (for example compiler or some

> other programs):

> <sc 208(11ffffbba,3e8,64)>

> And I don't know what this messages means.

> I am running kernel 2.0.37 and Debian/GNU Linux slink 2.1.

I doubt this may be a debugging message from the kernel. Since I move to
kernel 2.2.10,
they never appear, AFAIK. I would recommand CPML, which can be
downloaded at

http://www.unix.digital.com/linux/software.htm

It really soups up programs.

--
J. Joshua Feng

 
 
 

Floating Point Exception

Post by Tone Kokal » Thu, 09 Sep 1999 04:00:00




> > Hi All !

> > Please, can somebody help me.

> > I have the following problem. I have access to two Linux/Alphas
> > PC21164-P7.  But I am not able to do something really useful with
> > them, since I got "Floating Point Exceptions" and the console report
> > something like:

> > arithmetic trap at 0000000120090248: 11 0000000800000000

> > It is not that the applications I am running are bad, since they are
> > well established and runs on other platforms quite well.

> I had this quite often. I assume you use GCC (egcs more precisely). My
> guess is that some
> variable in your program is used before a value is assigned to it. For
> example:

First, thanks for the answer. But it is really not linked with GCC. I could
compile theprogram with FORTRAN90 on DEC Alpha with static linking the
libraries, which will
run 100% OK on DEC, but will FPE fail on Linux/GNU Alpha.

This morning I did a little of debugging. I found out that the program
(written in FORTRAN77 and compiled with g77) crashed while executing
FORMAT sentence. More precisely it crashed in
../../../../libf2c/libI77/wref.c:211. And that is terrifying - to crash in
FORMAT.

The only "specialized" program that runs succeessfully is my own (written in
C),
since I wrote small  FPE signal handler. But I would really omit this
FPE signal
handler in numerical programs, since I don't know what would be its impact
(maybe all obtained values will be meaningless).

Best regards,
Tone

--
+------------------------------------------------------------------------+

| Department of Physical and Organic Chemistry Phone: x 386 61 177 3520  |
| Jozef Stefan Institute                         Fax: x 386 61 177 3811  |
| Jamova 39, SI-1000 Ljubljana                                           |
| SLOVENIA                                                               |
+------------------------------------------------------------------------+

 
 
 

Floating Point Exception

Post by J. Josh Fen » Thu, 09 Sep 1999 04:00:00





> > > Hi All !                    

> > > Please, can somebody help me.

> > > I have the following problem. I have access to two Linux/Alphas
> > > PC21164-P7.  But I am not able to do something really useful with
> > > them, since I got "Floating Point Exceptions" and the console report
> > > something like:

> > > arithmetic trap at 0000000120090248: 11 0000000800000000

> > > It is not that the applications I am running are bad, since they are
> > > well established and runs on other platforms quite well.

> > I had this quite often. I assume you use GCC (egcs more precisely). My
> > guess is that some
> > variable in your program is used before a value is assigned to it. For
> > example:

> First, thanks for the answer. But it is really not linked with GCC. I could
> compile theprogram with FORTRAN90 on DEC Alpha with static linking the
> libraries, which will
> run 100% OK on DEC, but will FPE fail on Linux/GNU Alpha.

I think g77 will translate the Fortran program to a C program first and
then
uses gcc.

Quote:> This morning I did a little of debugging. I found out that the program
> (written in FORTRAN77 and compiled with g77) crashed while executing
> FORMAT sentence. More precisely it crashed in
> ../../../../libf2c/libI77/wref.c:211. And that is terrifying - to crash in
> FORMAT.  

Try this little fortran program

        real x, y

        write(*,*) x, y

        end

It will give FPE. Now modify it to

        real x, y

        x = 0;
        y = 0;
        write(*,*) x, y

        end

Then it runs ok. It seems that the I/O routine (wref.c) can not handle
'undefined' values (there must be some bits set in undefined variables
before
you assign any values to them such that those bits will confuse I/O
routine).

Quote:> The only "specialized" program that runs succeessfully is my own (written in
> C),
> since I wrote small  FPE signal handler. But I would really omit this
> FPE signal
> handler in numerical programs, since I don't know what would be its impact
> (maybe all obtained values will be meaningless).

I have this kind of concern also. So I don't use any signal handler in
my
programs, and if they run ok, then I have to assume or believe that
everything
works the way it should. Frankly, I don't really know the tech reason
for FPE.  
I have tried to hack GCC compiler on Alpha, but it is just beyond my
comprehension.

--
J. Joshua Feng                            (O)608/262-3640
Electrical & Computer Engr., Univ. of Wisconsin - Madison

 
 
 

Floating Point Exception

Post by Greg Linda » Thu, 09 Sep 1999 04:00:00



Quote:> In other platforms, variables are initialized to zero (maybe), but in
> Alpha, variables
> are initialized to some special value (like NaN or something), which may
> be the cause
> of the problem.

This is incorrect. On both other platforms and the Alpha, processes
start with zeroed memory.

Quote:> I doubt this may be a debugging message from the kernel.

But it is. It's an unknown system call.

-- g

 
 
 

Floating Point Exception

Post by Greg Linda » Thu, 09 Sep 1999 04:00:00



Quote:> I think g77 will translate the Fortran program to a C program first
> and then uses gcc.

No.

Quote:> Then it runs ok. It seems that the I/O routine (wref.c) can not handle
> 'undefined' values (there must be some bits set in undefined variables
> before
> you assign any values to them such that those bits will confuse I/O
> routine).

No. x and y in your example get their values at random off the stack.

-- g

 
 
 

Floating Point Exception

Post by Tone Kokal » Fri, 10 Sep 1999 04:00:00




> > In other platforms, variables are initialized to zero (maybe), but in
> > Alpha, variables
> > are initialized to some special value (like NaN or something), which may
> > be the cause
> > of the problem.

> This is incorrect. On both other platforms and the Alpha, processes
> start with zeroed memory.

> > I doubt this may be a debugging message from the kernel.

> But it is. It's an unknown system call.

> -- g

Hello Greg and Josh and others !!!

First, I do think that FPE is not due to GCC, since compiled programs with
DEC FORTAN90 have
nothing to do with GCC, but yet FPE occurs.

Yesterday I compiled kernel 2.2.12. It runs OK, and there are no more <sc *>
messages.

However, I still don't know how to get rid of FPEs. Do You have any recipes
how
to get rid of that???

Best regards,
Tone

--
+------------------------------------------------------------------------+

| Department of Physical and Organic Chemistry Phone: x 386 61 177 3520  |
| Jozef Stefan Institute                         Fax: x 386 61 177 3811  |
| Jamova 39, SI-1000 Ljubljana                                           |
| SLOVENIA                                                               |
+------------------------------------------------------------------------+

 
 
 

Floating Point Exception

Post by Eberhard Bur » Fri, 10 Sep 1999 04:00:00



> First, I do think that FPE is not due to GCC, since compiled programs with
> DEC FORTAN90 have
> nothing to do with GCC, but yet FPE occurs.

> Yesterday I compiled kernel 2.2.12. It runs OK, and there are no more <sc *>
> messages.

> However, I still don't know how to get rid of FPEs. Do You have any recipes
> how
> to get rid of that???

hmm, seems when two weeks ago after I had just installed this nice
SX164, my news system was misconfiged and thus my call for help in the
very same subject didn't get thru... I've since learned the following
and I'd appreciate to get even more (or better) information on it.

Here we go:
 1) the Alpha is not fully IEEE compliant by itself and differs from
    other processors in that the FPU exception cannot be switched off.
    When a division by zero appears then other processors represent
    the result with a special bit pattern which has the meaning "this
    result is invalid" while the alpha fires an exception (same for
    underflows, overflows and undefined operations, of course)
 2) the alpha is faster in floating point. Part of this is because it
    doesn't have to treat those special bit patterns (NaN, INF and
    friends, which are defined somewhere in math.h on i86 Linux).
 3) someone suggested using the -mieee switch to gcc to get the same
    behaviour as with the defaults on i86. I have not tried it though.
 4) Exception handling is easy:

   a) make a global variable to hold error codes
     volatile int my_err;

   b) write the error handler, which might be as lean as
     static void my_fpu_handler(int err)
     {
       my_err = err;  /* my_err defined globally and volatile */
     }

   c) install the handler:

     void (*oldhandler)();  /* keep a reference to the initial value
                               so it can be restored later */

     oldhandler = signal(SIGFPE, my_fpu_handler);
     if (SIG_ERR == oldhandler) {
        /* you might want to write your own BARF(), of course */
        BARF("cannot install floating point exception handler");
    }

   d) instead of using the functions isnan(), finite() and friends to
    check results for validity, check the content of my_err and reset
    it to zero after having taken sensible measures.

 5) Code which neither uses exception handling nor the ieee-conformant
    finite() isnan() ... functions to test results is broken by
    design.

 6) the gcc-documentation is rather good, it's just a bit hard to read
    because it's in info format. Same for the glibc-documentation;
    -mieee and friends are described in the egcs node about
    architecture-dependent options, the signal() function is in the
    glibc docs. Both are definitely worth reading if you want to make
    use of the alpha's superior floating point performance.

kind regards,
--
Eberhard Burr    check http://www.uni-karlsruhe.de/~Eberhard.Burr/publickey.asc
                 for PGP Key -- #include <stddisc.h> -- electric cookie follows
Truly simple systems... require infinite testing.
                -- Norman Augustine

 
 
 

Floating Point Exception

Post by Tone Kokal » Sat, 11 Sep 1999 04:00:00



> hmm, seems when two weeks ago after I had just installed this nice
> SX164, my news system was misconfiged and thus my call for help in the
> very same subject didn't get thru... I've since learned the following
> and I'd appreciate to get even more (or better) information on it.

> Here we go:
>  1) the Alpha is not fully IEEE compliant by itself and differs from
>     other processors in that the FPU exception cannot be switched off.
>     When a division by zero appears then other processors represent
>     the result with a special bit pattern which has the meaning "this
>     result is invalid" while the alpha fires an exception (same for
>     underflows, overflows and undefined operations, of course)
>  2) the alpha is faster in floating point. Part of this is because it
>     doesn't have to treat those special bit patterns (NaN, INF and
>     friends, which are defined somewhere in math.h on i86 Linux).
>  3) someone suggested using the -mieee switch to gcc to get the same
>     behaviour as with the defaults on i86. I have not tried it though.
>  4) Exception handling is easy:

>    a) make a global variable to hold error codes
>      volatile int my_err;

>    b) write the error handler, which might be as lean as
>      static void my_fpu_handler(int err)
>      {
>        my_err = err;  /* my_err defined globally and volatile */
>      }

>    c) install the handler:

>      void (*oldhandler)();  /* keep a reference to the initial value
>                                so it can be restored later */

>      oldhandler = signal(SIGFPE, my_fpu_handler);
>      if (SIG_ERR == oldhandler) {
>         /* you might want to write your own BARF(), of course */
>         BARF("cannot install floating point exception handler");
>     }

>    d) instead of using the functions isnan(), finite() and friends to
>     check results for validity, check the content of my_err and reset
>     it to zero after having taken sensible measures.

>  5) Code which neither uses exception handling nor the ieee-conformant
>     finite() isnan() ... functions to test results is broken by
>     design.

>  6) the gcc-documentation is rather good, it's just a bit hard to read
>     because it's in info format. Same for the glibc-documentation;
>     -mieee and friends are described in the egcs node about
>     architecture-dependent options, the signal() function is in the
>     glibc docs. Both are definitely worth reading if you want to make
>     use of the alpha's superior floating point performance.

Thank You Eberhard. I will see what I can do. Certainly, above are nice
hints. When I will study them and perform some tests I will be probably
back; I may have a couple of questions then.

Best regards,
Tone

--
+------------------------------------------------------------------------+

| Department of Physical and Organic Chemistry Phone: x 386 61 177 3520  |
| Jozef Stefan Institute                         Fax: x 386 61 177 3811  |
| Jamova 39, SI-1000 Ljubljana                                           |
| SLOVENIA                                                               |
+------------------------------------------------------------------------+