Hi all,
I've written a simple zsh script to calculate the standard deviation of a
set of numbers. For convenience, I read the numbers from stdin (1 per line) -
this enables me to use my script in a pipe. (Very handy in my situation.)
I've accomplished this, by reading the data values into an array.
Unfortunately, I'm finding that once I start using more than a few
thousand data values my script runs terribly slowly. On a 2.6GHz CPU, it
takes ~11 seconds to compute the std dev for 10,000 values. It takes
minutes to compute for 100,000 values. The largest data set I can
reasonably expect to use has 500,000 data values.
So my question is: is there anything I can do to optimize my script?
An obvious solution is to store the data in a file instead of an array,
however, I'd really like to know if I'm doing something inherantly wrong
in the way I'm using arrays in zsh.
Any info/pointers muchly appreciated.
SCoTT. :)
PS I'm using zsh 4.2.0 on RH9 Linux.
#!/usr/local/bin/zsh
float sum=0.0
# Read the input into an array & count the number of elements.
let n=0
while read datum ; do
(( n++ ))
data[$n]=$datum
done
echo n is $n
(( $n == 0 )) && { echo "No data!" 1>&2 ; exit(2) }
# Sum the elements.
for datum in $data ; do
(( sum += $datum ))
done
# Calculate the mean value.
let mean=$sum/$n
# Calculate the sum of the square of the residuals.
float sumSq=0.0
for datum in $data ; do
(( sumSq += ($datum - $mean) ** 2 ))
done
# Calculate standard deviation.
let sd="($sumSq/$n)**0.5"
# if (( $sd < 0.0 )) ; then
# let sd=-$sd
# fi
printf "sum: %g\n" $sum
printf "mean: %g\n" $mean
# printf "sumSq: %g\n" $sumSq
printf "stdDev: %g\n" $sd