Hi all,

I've written a simple zsh script to calculate the standard deviation of a

set of numbers. For convenience, I read the numbers from stdin (1 per line) -

this enables me to use my script in a pipe. (Very handy in my situation.)

I've accomplished this, by reading the data values into an array.

Unfortunately, I'm finding that once I start using more than a few

thousand data values my script runs terribly slowly. On a 2.6GHz CPU, it

takes ~11 seconds to compute the std dev for 10,000 values. It takes

minutes to compute for 100,000 values. The largest data set I can

reasonably expect to use has 500,000 data values.

So my question is: is there anything I can do to optimize my script?

An obvious solution is to store the data in a file instead of an array,

however, I'd really like to know if I'm doing something inherantly wrong

in the way I'm using arrays in zsh.

Any info/pointers muchly appreciated.

SCoTT. :)

PS I'm using zsh 4.2.0 on RH9 Linux.

#!/usr/local/bin/zsh

float sum=0.0

# Read the input into an array & count the number of elements.

let n=0

while read datum ; do

(( n++ ))

data[$n]=$datum

done

echo n is $n

(( $n == 0 )) && { echo "No data!" 1>&2 ; exit(2) }

# Sum the elements.

for datum in $data ; do

(( sum += $datum ))

done

# Calculate the mean value.

let mean=$sum/$n

# Calculate the sum of the square of the residuals.

float sumSq=0.0

for datum in $data ; do

(( sumSq += ($datum - $mean) ** 2 ))

done

# Calculate standard deviation.

let sd="($sumSq/$n)**0.5"

# if (( $sd < 0.0 )) ; then

# let sd=-$sd

# fi

printf "sum: %g\n" $sum

printf "mean: %g\n" $mean

# printf "sumSq: %g\n" $sumSq

printf "stdDev: %g\n" $sd