## What is stochastic universal sampling?

### What is stochastic universal sampling?

Jonathan,

This is what Michalewicz says about Stochastic Universal Sampling (SUS)
in his book, "Genetic Algoorithms + Data Structures = Evolution
Programs", 2nd edition, 1994, p. 57:

This method uses a single wheel spin.  This wheel, which is
contructed in the standard way (Chapter 2), is spun with a
number of equally spaced markers equal to the population size as
opposed to a single one.

He indicated it is a "new improved method" (in comparison to the other
selection schemes such as stochastic remainder sampling); and he uses
it as the basis of his modGA GA, which in turn is the basis of other
algorithms such as Genocop.  (I looked at Genocop2.0, and it doesn't look
like stochastic universal sampling to me, if I recall, it was the basic
SGA style wheel spin for each selection).

Anyway, here is what I think SUS is, by example:

Say the pop size is 5 and fitness values are 10,30,15,35,10
Sum of fitnesses is 10+30+15+35+10 = 100.

Construct a roulette wheel in the traditional manner: individual 1 gets
10/100 = 10% or the roulette wheel, individual 2 gets 30% of the
roulette wheel space, etc (fitnesses are shown in the boxes, and below
the boxes at the box boundaries are the cumulative fitnesses normalized
to sum to 1.0.  Above the boxes is the number of the "individual"
owning the space):

1         2         3          4         5
|  10  |     30     |  15   |     35      | 10  |
0     .1           .4      .55           .9     1.0

Now the simplest roulette wheel method (not the SUS method) spins
the wheel 5 times, if the 5 spins produce the 5 random numbers
.7, .3, .6, .8, .2, then individuals 4,2,4,4,2 are selected.  This
is terrible, an example of stochastic error, where the number of
times an individual is selected can be far greater than its
expected number of selections.  The expected number of times
each individual should appear in the new population (or in the mating
pool or whatever) is:

#1: 0.5, #2: 1.5, #3: 0.75, #4: 1.75, #5: 0.5

i.e. the expected number of appearances of indiv #1 in the new popn is
0.5, the expected number of appearances of indiv #2 in the new popn is
1.5, etc.

Goldberg has some techniques for greatly reducing stochastic error
sampling, one that comes to mind is called something like the remainder
sampling method (I'm doing this from memory).  But he upshot of this
method is that individual #1 will appear in the new population only 0
or 1 times (never 2 or more times), individual #2 will appear in the
new population 1 or 2 times (never 0 times, and never 3 or more times)
etc.

The Goldberg methods, while fine, and doing the job well, aren't as
elegant as SUS.  In the SUS method you put in 5 (i.e. popsize) equally
spaced markers around but outside the roulette wheel (on the boundary
part that doesn't spin).  Thus markers are spaced 0.2 units apart (on
a sum-to-one normalized scale).  You spin the wheel once, by generating
a random 0 to 1 real number, to set the offset of the first marker.
Say the random number turned out to be .3.  Then the first marker is
placed .3*.2 = .06 to the right of the 0 point.  The other markers are
each 0.2 to the right of the previous marker.  So after the funny single
wheel spin, the 5 markers end up at .06, .26, .46, .66, and .86, as shown
below by the "^" marks:

1         2         3          4         5
|  10  |     30     |  15   |     35      | 10  |
0     .1           .4      .55           .9     1.0
^        ^         ^        ^       ^         <-markers
.06      .26       .46      .66     .86

The result is that #1 is selected once, #2 is selected once, #3 is
selected 1 time, #4 is selected twice, and #5 is not selected.  This
meets the criteria that the number of times an individual is selected
is close (within 1) of the expected number of times the individual is
selected.

The above was assuming the 100% replacement model -- you want to make
5 selections to replace (well actually the children of the 5 selected
individuals replace) the 5 individuals in the population.  If, on
the other hand, you want to replace 60% of the population, then you
would do the same as above, but use 3 equally spaced markers
spaced 0.333 apart, and with an offset randomly between 0 and 0.333.

The main drawback of the method is that one needs to know how many
selections one is going to make in advance.  For example, in the 60%
model we know we WANT to select 3 individuals, whose children are to
replace 3 members of the population.  But what if a child produced is
bad (as occurs in some genetic operations methods in Genocop), or if
a child is identical to some individual already in the population (some
methods, to maintain diversity, require that all individuals in the
population be unique).  So what could happen is that say 4 selections
had to be made to produce the 3 good children that are desired.
But since we only have 3 equally spaced markers, we must do something
to produce the 4th selection.  Its not a devastating problem to
overcome, but it makes it less clean and elegant than it appears on
the surface.

Good luck with it.

Jim Larson

Here's what I just did, and it worked. Sorry I haven't had time to put
in comments yet. I got it from page 167 of Melanie Mitchell's "An
Introduction to Genetic Algorithms" (a great book), and Melanie got it
from J. E. Baker.

I wrote it to run in MS-Windows. I have a population of just ten, and
the expected values of the chromosomes are in ExpVal. The "cout i;" is
in place of selecting a chromosome; my program just tells you which one
it would select.

The stuff at the beginning is just to make rand() more random (on my
computer, at least).

Dan Amodeo

#include <ctime>
#include <iostream>
using std::cout;
using std::endl;

int main()
{
unsigned int unsi=330333333+((time(0)%483)*1000000);
srand(unsi);

double p = (double(rand())/double(RAND_MAX+1.0));
//      double ExpVal[10] = {3.2, 1.5, 1.3, 1.0, 0.7, 0.7, 0.7, 0.3, 0.3,
0.3};
double ExpVal[10] = {3.2, 1.8, 1.5, 1.0, 0.7, 0.7, 0.7, 0.3, 0.3, 0.3};

double sum=0.0;
for (int i=0; i<10; i++)
for ( sum += ExpVal[i]; sum > p; p++)
cout << i;

cout << endl;

return 0;