Strange floating point behavior

Strange floating point behavior

Post by Jeff Wa » Mon, 09 Feb 2004 08:44:09



I am working on code to implement IEEE 754-complient floating point in
software (specificially in Java, but that does not matter).  I am
using the "testfloat" and "softfloat" programs to test my code and I
have come up to a few examples that I cannot explain.

Assume 64-bit double format is in use and consider the operation
(0.00...01 * 2^-1023) + (0.11...11 * 2^-1023).  This is the smallest
possible positive number added to the largest possible subnormal
number.  The result should be (1.00...00 * 2^-1023).  The problem here
is that an exponent of -1023 indicates a subnormal number, which must
have a zero in the most significant bit of the significand.  Hence my
code outputs zero for a result and does not set any flags.  The "real"
answer, given by "testfloat" as well as an IA-32 and a Sun system, is
the number (1.00..00 * 2^-1022) with no exception flags thrown.  It
seems they round the number up so that it is normalized, but if that
were the case, should not an inexact exception be thrown?

Am I missing something completly obvious here, or is this some kind of
flaw in the IEEE 754 standard?

-Jeff Ward

 
 
 

Strange floating point behavior

Post by Norbert Juff » Mon, 09 Feb 2004 12:10:12



> I am working on code to implement IEEE 754-complient floating point in
> software (specificially in Java, but that does not matter).  I am
> using the "testfloat" and "softfloat" programs to test my code and I
> have come up to a few examples that I cannot explain.

> Assume 64-bit double format is in use and consider the operation
> (0.00...01 * 2^-1023) + (0.11...11 * 2^-1023).  This is the smallest
> possible positive number added to the largest possible subnormal
> number.  The result should be (1.00...00 * 2^-1023).  The problem here
> is that an exponent of -1023 indicates a subnormal number, which must
> have a zero in the most significant bit of the significand.  Hence my
> code outputs zero for a result and does not set any flags.  The "real"
> answer, given by "testfloat" as well as an IA-32 and a Sun system, is
> the number (1.00..00 * 2^-1022) with no exception flags thrown.  It
> seems they round the number up so that it is normalized, but if that
> were the case, should not an inexact exception be thrown?

> Am I missing something completly obvious here, or is this some kind of
> flaw in the IEEE 754 standard?

IEEE 754 has this to say about "inexact":

"If the rounded result of an operation is not exact or if it overflows
 without an overflow trap, then the inexact exception shall be signaled."

In this particular case, there is no overflow, and the result is exact,
i.e. the result does not differ from the result one would have gotten
using arithmetic employing infinite precision and unbounded exponent
range.

I assume you meant to write "The result should be (1.00...00 * 2^-1022)"
not "... (1.00...00 * 2^-1023)" ? The expected sum of smallest denormal
and largest denormal is the smallest normal, as you observed on the test
platforms.

-- Norbert

 
 
 

Strange floating point behavior

Post by Jeff Wa » Mon, 09 Feb 2004 23:51:12




> > I am working on code to implement IEEE 754-complient floating point in
> > software (specificially in Java, but that does not matter).  I am
> > using the "testfloat" and "softfloat" programs to test my code and I
> > have come up to a few examples that I cannot explain.

> > Assume 64-bit double format is in use and consider the operation
> > (0.00...01 * 2^-1023) + (0.11...11 * 2^-1023).  This is the smallest
> > possible positive number added to the largest possible subnormal
> > number.  The result should be (1.00...00 * 2^-1023).  The problem here
> > is that an exponent of -1023 indicates a subnormal number, which must
> > have a zero in the most significant bit of the significand.  Hence my
> > code outputs zero for a result and does not set any flags.  The "real"
> > answer, given by "testfloat" as well as an IA-32 and a Sun system, is
> > the number (1.00..00 * 2^-1022) with no exception flags thrown.  It
> > seems they round the number up so that it is normalized, but if that
> > were the case, should not an inexact exception be thrown?

> > Am I missing something completly obvious here, or is this some kind of
> > flaw in the IEEE 754 standard?

> IEEE 754 has this to say about "inexact":

> "If the rounded result of an operation is not exact or if it overflows
>  without an overflow trap, then the inexact exception shall be signaled."

> In this particular case, there is no overflow, and the result is exact,
> i.e. the result does not differ from the result one would have gotten
> using arithmetic employing infinite precision and unbounded exponent
> range.

> I assume you meant to write "The result should be (1.00...00 * 2^-1022)"
> not "... (1.00...00 * 2^-1023)" ? The expected sum of smallest denormal
> and largest denormal is the smallest normal, as you observed on the test
> platforms.

No, I did not mean to write this, and this is the exact source of my
confusion.  Why is the exponent -1022 and not -1023?  The problem,
simplified, is essentially the following, is it not?

   0.111 * 2^0 = 0.875
  +0.001 * 2^0 = 0.125
  ------         -----
   1.000 * 2^0   1.000

The exponents in the operands are equal going into the operation and
the result does not carry into the next highest place, so no shifting
or changing of exponents should occur.  If some shifting or rounding
is necessary to make the number representable in a given format, than
why isn't the number considered "inexact"?

-Jeff Ward

 
 
 

Strange floating point behavior

Post by Jeff Kento » Tue, 10 Feb 2004 02:17:45


Jeff:

Your problem is with your interpretation of denormalized format.  The exponent
  is always -1022, not -1023, and the results you see are correct.  This is
often shown in the manuals by having an exponent bias of 1023 for normal
numbers, but only 1022 for denormals.  This makes sense:  the smallest normal
is 1.0*2^-1022, and the largest denormal is 0.11...11*2-1022.


> I am working on code to implement IEEE 754-complient floating point in
> software (specificially in Java, but that does not matter).  I am
> using the "testfloat" and "softfloat" programs to test my code and I
> have come up to a few examples that I cannot explain.

> Assume 64-bit double format is in use and consider the operation
> (0.00...01 * 2^-1023) + (0.11...11 * 2^-1023).  This is the smallest
> possible positive number added to the largest possible subnormal
> number.  The result should be (1.00...00 * 2^-1023).  The problem here
> is that an exponent of -1023 indicates a subnormal number, which must
> have a zero in the most significant bit of the significand.  Hence my
> code outputs zero for a result and does not set any flags.  The "real"
> answer, given by "testfloat" as well as an IA-32 and a Sun system, is
> the number (1.00..00 * 2^-1022) with no exception flags thrown.  It
> seems they round the number up so that it is normalized, but if that
> were the case, should not an inexact exception be thrown?

> Am I missing something completly obvious here, or is this some kind of
> flaw in the IEEE 754 standard?

> -Jeff Ward

--

-------------------------------------------------------------------------
=    Jeff Kenton      Consulting and software development               =
=                     http://home.comcast.net/~jeffrey.kenton           =
-------------------------------------------------------------------------

 
 
 

Strange floating point behavior

Post by Jeff Kento » Tue, 10 Feb 2004 03:49:52


Jeff:

Your problem is in your interpretation of denormalized format.  For normalized
DP numbers, the exponent bias is +1023, so the smallest normalized exponent is
-1022.  But for denormalized numbers the bias is +1022, so denormalized
exponents are always -1022.  In your example, this means that that you are
adding 0.11...11*2^-1022 + 0.0......1*2^-1022, resulting in 1.0... * 2^-1022,
which is the smallest normal number.




>>>I am working on code to implement IEEE 754-complient floating point in
>>>software (specificially in Java, but that does not matter).  I am
>>>using the "testfloat" and "softfloat" programs to test my code and I
>>>have come up to a few examples that I cannot explain.

>>>Assume 64-bit double format is in use and consider the operation
>>>(0.00...01 * 2^-1023) + (0.11...11 * 2^-1023).  This is the smallest
>>>possible positive number added to the largest possible subnormal
>>>number.  The result should be (1.00...00 * 2^-1023).  The problem here
>>>is that an exponent of -1023 indicates a subnormal number, which must
>>>have a zero in the most significant bit of the significand.  Hence my
>>>code outputs zero for a result and does not set any flags.  The "real"
>>>answer, given by "testfloat" as well as an IA-32 and a Sun system, is
>>>the number (1.00..00 * 2^-1022) with no exception flags thrown.  It
>>>seems they round the number up so that it is normalized, but if that
>>>were the case, should not an inexact exception be thrown?

>>>Am I missing something completly obvious here, or is this some kind of
>>>flaw in the IEEE 754 standard?

>>IEEE 754 has this to say about "inexact":

>>"If the rounded result of an operation is not exact or if it overflows
>> without an overflow trap, then the inexact exception shall be signaled."

>>In this particular case, there is no overflow, and the result is exact,
>>i.e. the result does not differ from the result one would have gotten
>>using arithmetic employing infinite precision and unbounded exponent
>>range.

>>I assume you meant to write "The result should be (1.00...00 * 2^-1022)"
>>not "... (1.00...00 * 2^-1023)" ? The expected sum of smallest denormal
>>and largest denormal is the smallest normal, as you observed on the test
>>platforms.

> No, I did not mean to write this, and this is the exact source of my
> confusion.  Why is the exponent -1022 and not -1023?  The problem,
> simplified, is essentially the following, is it not?

>    0.111 * 2^0 = 0.875
>   +0.001 * 2^0 = 0.125
>   ------         -----
>    1.000 * 2^0   1.000

> The exponents in the operands are equal going into the operation and
> the result does not carry into the next highest place, so no shifting
> or changing of exponents should occur.  If some shifting or rounding
> is necessary to make the number representable in a given format, than
> why isn't the number considered "inexact"?

> -Jeff Ward

--

-------------------------------------------------------------------------
=    Jeff Kenton      Consulting and software development               =
=                     http://home.comcast.net/~jeffrey.kenton           =
-------------------------------------------------------------------------