## cordic: sqrt(x^2+y^2+z^2) ??

### cordic: sqrt(x^2+y^2+z^2) ??

Can anyone help me with following Problem:

I would like to compute sqrt(x^2+y^2+z^2) using
CORDIC. Is there a simple possibility to extend
the 2D algorithm to the 3D case?

Robert Klinski

Sent via Deja.com http://www.deja.com/

### cordic: sqrt(x^2+y^2+z^2) ??

>> Can anyone help me with following Problem:

>> I would like to compute sqrt(x^2+y^2+z^2) using
>> CORDIC. Is there a simple possibility to extend
>> the 2D algorithm to the 3D case?

>If you can multiply cheaply enough, just perform the 3 squares first, and
>then use the cordic algorithm to find the root.  If the values are integral
>and small, you might even keep a table of squares if you have enough silicon
>for that.

The cordic algorithm does not calculate the square root, but the length

Since sqrt(x^2+y^2+z^2) = sqrt( sqrt(x^2+y^2)^2 + z^2),
it appears, that 3d length can be performed by two successive
rounds of cordic - first to rotate [x y z] to [x' y' 0] (around y-axis), and
then to [x'' 0 0] (around z-axis).

--
Problems      1) do NOT write a virus or a worm program
"A.K.Dewdney, The New Turing Omnibus; Chapter 60: Computer viruses"

### cordic: sqrt(x^2+y^2+z^2) ??

: Here is a very fast integer square root algorithm written by Lawrence Kirby:
: static unsigned fred_sqrt(unsigned long x)
: {
:     static const unsigned char sqq_table[] = {
[ snipped for space, see references or email me ]

I was doubtful that an algorithm with a table lookup
would work all that well on modern memory-bound CPUs, so
I wrote a "test harness" to compare it against the "vanilla"
algorithm:

#define N_BITS  32
#define MAX_BIT ((N_BITS+1)/2-1)

unsigned int isqrt(unsigned long x)
{
register unsigned long xroot, m2, x2;

xroot = 0;
m2 = 1L<<(MAX_BIT * 2); /* Start looking at MSB of input register */

do {
x2 = xroot + m2; /* Form a "guess" from current root and next bit */
xroot >>= 1;       /* Pre-shift to make add below neater */
if(x2 <= x) {         /* If our guess is still low,... */
x -= x2;     /* Reduce input by root-so-far */
xroot += m2; /* Add the bit that succeded to root */
}
} while(m2 >>= 2);     /* Shift "mask" down until it disappears */
if(xroot < x) ++xroot;
return(xroot);

Quote:}

To my surprise (and temporary delight), fred_sqrt is about
50% faster on a PPro (gcc -O2), and 3X faster on an SGI Indy
(MIPS CC -O2). Then that "little voice" said: "But does it get
truncating to be the right answer". On a test of 100000
random (from rand(), seeded by srand(42)) numbers, fred_sqrt
returned a lower answer than isqrt almost half the time, and
in every case, squaring isqrt's answer produced a value closer
to the original number than squaring fred_sqrt's answer.

FYI. "Trust, but verify" :-)

Mike

I can't figure out how to do the Newton Raphson iteration for sqrt.
Here's what I've got so far:

Let's solve this equation for X:
X**N = C

We do it by finding a zero of f(X) = X**N - C

Given an approximation X0, we can find a next approximation by:

X1 = X0 - f(X0)/f'(X0)
= X0(1-1/N) + C/N*X0**(1-N)

So, the iteration for reciprocal is:
N=-1.  X1 = 2*X0 - C*X0*X0

for reciprocal square root, the iteration is:
N=-2.  X1 = 3/2*X0 - C/2*X0*X0*X0

And for square root, the iteration is:
N=2.   X1 = 0.5*X0 + C/2/X0

Recip and rsqrt are done with multiplies and adds, so they can reuse
the existing multipliers and adders in the datapath.  But sqrt() has
a reciprocal in it.  So how is sqrt() done with a Newton-Raphson
iteration WITHOUT a reciprocal?

BTW: I think Terje suggested that an X**(-3/2) primitive might be
useful.  I tried to see if there was a nice N-R iterator for such a
primitive, but it looks to me like there is no gain (except maybe a
bit or two of precision) over simply cubing the argument in software
and then applying the rsqrt() primitive.  I looked at solving the
equation

X**N = C**M

The iterator is just what we have above, with C replaced by C**M,
which makes sense when you think about it.

I'm convinced that the solution to the sqrt() iterator as well as
Terje's iterator lie in solving some other equation to get the same