## FLT_ROUNDS

### FLT_ROUNDS

...

Quote:> A few notes on the program Pete's presented here earlier:

> It finds the infinite sum in the form

>          s = SUM(i=0,...,inf) 2**(-i)

> or
>          s = 1 + 0.5 + 0.25 + 0.125 + ...

> This sum rapidly converges (and its limit is certainly
> well known, it is equal to 1/(1-0.5) = 2, see below) and in

That sum certainly converges; the question is whether that sum
correctly describes what the code does. Pete is suggesting that if
round-to-infinity mode is in effect, then the division by 2 will cease
to cut the magnitude of the argument in half each time. If delta has
shrunk down to the smallest representable positive floating point
value, then delta/2.0 == delta, and from that point on the sum is over
a constant value. Also because of the rounding mode, new_float+delta >
old_float, even when delta<old_float*DBL_EPSILON, and that will make
the sum start differing from your formula long before
(delta/2)==delta. Therefore, the sum would not terminate until it
overflows.

### FLT_ROUNDS

...

Quote:> A few notes on the program Pete's presented here earlier:

> It finds the infinite sum in the form

>          s = SUM(i=0,...,inf) 2**(-i)

> or
>          s = 1 + 0.5 + 0.25 + 0.125 + ...

> This sum rapidly converges (and its limit is certainly
> well known, it is equal to 1/(1-0.5) = 2, see below) and in

That sum certainly converges; the question is whether that sum
correctly describes what the code does. Pete is suggesting that if
round-to-infinity mode is in effect, then the division by 2 will cease
to cut the magnitude of the argument in half each time. If delta has
shrunk down to the smallest representable positive floating point
value, then delta/2.0 == delta, and from that point on the sum is over
a constant value. Also because of the rounding mode, new_float+delta >
old_float, even when delta<old_float*DBL_EPSILON, and that will make
the sum start differing from your formula long before
(delta/2)==delta. Therefore, the sum would not terminate until it
overflows.

### FLT_ROUNDS

> >>So on the first (recursive) call, when you're trying to determine
> >>ln(sqrt(2)), C will be 0.0, right?

> > Yes. At that point, x will be equal to A,
> > which means that n will be equal to zero,
> > and (b = C * n + a) will be equal to a.

> >>What's wrong with simply stating this directly instead?

> >>   static double A = 1.4142135623730951,
> >>                 B = 0.70710678118654757,
> >>                 C = 0.3465735902799727;

> > The function is intended to use all of the available precision,
> > regardless of how much is available.

> I realized that that had to be the case, but a _very_ long decimal
> expansion would still suffice, or wouldn't it? (Those I used was
> intended to be  good for about 64-bit mantissas.)

> I.e. you are programming with 'double' here,
> not arbitrary precision. ;-)

The intent is to accomodate all theoretical conforming
definitions of double. It's an academic exercise.

--
pete

### FLT_ROUNDS

> ...
> > A few notes on the program Pete's presented here earlier:

> > It finds the infinite sum in the form

> >          s = SUM(i=0,...,inf) 2**(-i)

> > or
> >          s = 1 + 0.5 + 0.25 + 0.125 + ...

> > This sum rapidly converges (and its limit is certainly
> > well known, it is equal to 1/(1-0.5) = 2, see below) and in

James,

Quote:> That sum certainly converges; the question is whether that sum
> correctly describes what the code does.

Excellent observation, however my assumption was that the code
had been written with the intention to compute that sum. I might
be wrong in this...

Quote:> Pete is suggesting that if
> round-to-infinity mode is in effect, then the division by 2 will cease
> to cut the magnitude of the argument in half each time. If delta has
> shrunk down to the smallest representable positive floating point
> value, then delta/2.0 == delta, and from that point on the sum is over
> a constant value.

Yes, very much so if two conditions are met: round up mode and delta(i) > 0.
I posted a correction to my first post to the thread, where I claimed
that (new_float != old_float) is not a save condition.

Quote:> Also because of the rounding mode, new_float+delta > old_float,

Yes very much so, if two previous condition are valid plus
old_float and new_float both > 0.

Quote:> even when delta<old_float*DBL_EPSILON, and that will make
> the sum start differing from your formula long before
> (delta/2)==delta. Therefore, the sum would not terminate until it
> overflows.

I am afraid I do not follow you in this. The condition for terminating
the loop I had suggested was (delta >= new_float*EPSILON). Essentially,
the loop terminates as soon as delta is small in comparison with new_float.
As you noticed, (delta/2)==delta when delta is the smallest representable
number. The condition I suggested stops the iterations when delta is
on the order of 2*macheps which is much much larger than the smallest
representable number.

It works for this particular algorithm in any rounding mode for either
precision, and ensures that the loop calculates the correct result within
machine relative error bounds.

Rgds,
Andrew

### FLT_ROUNDS

...
> >>What's wrong with simply stating this directly instead?

> >>   static double A = 1.4142135623730951,
> >>                 B = 0.70710678118654757,
> >>                 C = 0.3465735902799727;

> > The function is intended to use all of the available precision,
> > regardless of how much is available.

> I realized that that had to be the case, but a _very_ long decimal
> expansion would still suffice, or wouldn't it? (Those I used was
> intended to be  good for about 64-bit mantissas.)

No matter how long you made it, someday it might need to be a little
longer. There will be no warning to the users, it will simply give out
answers with less precision than it could with the other approach.

### FLT_ROUNDS

> > ...
> > > A few notes on the program Pete's presented here earlier:

> > > It finds the infinite sum in the form

> > >          s = SUM(i=0,...,inf) 2**(-i)

> > > or
> > >          s = 1 + 0.5 + 0.25 + 0.125 + ...

> > > This sum rapidly converges (and its limit is certainly
> > > well known, it is equal to 1/(1-0.5) = 2, see below) and in

> James,

> > That sum certainly converges; the question is whether that sum
> > correctly describes what the code does.

> Excellent observation, however my assumption was that the code
> had been written with the intention to compute that sum. I might
> be wrong in this...

A reasonable assumption. The question that was being asked, was
whether or not it does it actually compute that sum in all rounding
modes.

....

Quote:> > even when delta<old_float*DBL_EPSILON, and that will make
> > the sum start differing from your formula long before
> > (delta/2)==delta. Therefore, the sum would not terminate until it
> > overflows.

> I am afraid I do not follow you in this. The condition for terminating
> the loop I had suggested was (delta >= new_float*EPSILON). Essentially,

I was referring to the code as originally written. While the message
in which you suggested that alternative has an older timestamp than my
own, I had not yet seen that message from when I composed mine.

### FLT_ROUNDS

>>I realized that that had to be the case, but a _very_ long decimal
>>expansion would still suffice, or wouldn't it? (Those I used was
>>intended to be  good for about 64-bit mantissas.)

> No matter how long you made it, someday it might need to be a little
> longer. There will be no warning to the users, it will simply give out
> answers with less precision than it could with the other approach.

In principle I agree absolutely.

However, I _really_ cannot ever see this happening to programs written
in C, defining variables of type 'double'.

Eventually getting 'long double' into the standard might be feasible,
but not redefining what a regular 'double' is.

Had it been called 'myMaximumPrecisionFloat' or some such, then I'd
really agree with the approach.

Terje
--

"almost all programming can be viewed as an exercise in caching"

### FLT_ROUNDS

Quote:>Eventually getting 'long double' into the standard might be feasible,
>but not redefining what a regular 'double' is.

long double has been a standard C feature since C89, i.e. since day one.
And the definition of both double and long double is a lot less precise
than most people think.

Dan
--
Dan Pop
DESY Zeuthen, RZ group

### FLT_ROUNDS

> >>I realized that that had to be the case, but a _very_ long decimal
> >>expansion would still suffice, or wouldn't it? (Those I used was
> >>intended to be  good for about 64-bit mantissas.)

> > No matter how long you made it, someday it might need to be a little
> > longer. There will be no warning to the users, it will simply give out
> > answers with less precision than it could with the other approach.

> In principle I agree absolutely.

> However, I _really_ cannot ever see this happening to programs written
> in C, defining variables of type 'double'.

Someone (Bill Gates?) was famously certain that personal computers
would never need more than 64K of memory. Why unnecessarily restrict
the portability of your program? Sure, it's important to pay attention
to characteristics of currently popular implementations, but youshould
avoid unnecessarily tying your code to those characterics.

Quote:> Eventually getting 'long double' into the standard might be feasible,
> but not redefining what a regular 'double' is.

'long double' is already standard, so I have to agree that it is
feasible. However, there's no upper limit on the precision of double,
or even of float, for that matter. All three types could be 128 bit
types - in the not-too-distant future there might even be machines
where that's a reasonable way to implement them.

### FLT_ROUNDS

Quote:>or even of float, for that matter. All three types could be 128 bit
>types - in the not-too-distant future there might even be machines
>where that's a reasonable way to implement them.

Not very likely, given the current popularity of IEEE-754 (or whatever
its current name might be).

The trends in *hosted* computing are certainly converging, the oddballs
being on their way out.  Gone are the word-addressed architectures,
gone are the non-octet byte architectures, gone are the non-two's
complement architectures, practically gone are the architectures
without a linear address space, non-IEEE-754 fp is supported for
backward compatibility purposes by architectures that also offer
IEEE-754 fp support.  Byte order is, by far, the most important
difference still surviving.  The market forces are pushing this
convergence so hard that it is difficult to believe that these trends
can be reversed.  If there is something that's worth considering as a
likely (reasonably near) future feature, it's widespread availability of
128-bit long doubles.

Of course, things are different in the world of freestanding computing,
but the C standard doesn't have much relevance there, mostly because
the language section specifies too much and the library section too
little.  As a result, there are precious few conforming implementations
of standard C for the embedded control market.  The wild architectural
differences between processors are not well accomodated by the C language
specification.  Double precision support is too much for most 8-bit
controllers, the complete lack of support for interrupt handlers and
low level I/O as well as the lack of a standard library specification
significantly reduce the amount of portable code that can be written for
such platforms.

Given this state of facts, it would make sense to have two different C
standards: one for hosted computing and another for freestanding
computing.  The attempt to support both in the same standard didn't
yield particularly encouraging results: for each type of implementation,
there is, at the same time, too much and too little, because the same
document must support the other kind of implementation at the same time.

Dan
--
Dan Pop
DESY Zeuthen, RZ group