Jonathan,

This is what Michalewicz says about Stochastic Universal Sampling (SUS)

in his book, "Genetic Algoorithms + Data Structures = Evolution

Programs", 2nd edition, 1994, p. 57:

This method uses a single wheel spin. This wheel, which is

contructed in the standard way (Chapter 2), is spun with a

number of equally spaced markers equal to the population size as

opposed to a single one.

He indicated it is a "new improved method" (in comparison to the other

selection schemes such as stochastic remainder sampling); and he uses

it as the basis of his modGA GA, which in turn is the basis of other

algorithms such as Genocop. (I looked at Genocop2.0, and it doesn't look

like stochastic universal sampling to me, if I recall, it was the basic

SGA style wheel spin for each selection).

Anyway, here is what I think SUS is, by example:

Say the pop size is 5 and fitness values are 10,30,15,35,10

Sum of fitnesses is 10+30+15+35+10 = 100.

Construct a roulette wheel in the traditional manner: individual 1 gets

10/100 = 10% or the roulette wheel, individual 2 gets 30% of the

roulette wheel space, etc (fitnesses are shown in the boxes, and below

the boxes at the box boundaries are the cumulative fitnesses normalized

to sum to 1.0. Above the boxes is the number of the "individual"

owning the space):

1 2 3 4 5

| 10 | 30 | 15 | 35 | 10 |

0 .1 .4 .55 .9 1.0

Now the simplest roulette wheel method (not the SUS method) spins

the wheel 5 times, if the 5 spins produce the 5 random numbers

.7, .3, .6, .8, .2, then individuals 4,2,4,4,2 are selected. This

is terrible, an example of stochastic error, where the number of

times an individual is selected can be far greater than its

expected number of selections. The expected number of times

each individual should appear in the new population (or in the mating

pool or whatever) is:

#1: 0.5, #2: 1.5, #3: 0.75, #4: 1.75, #5: 0.5

i.e. the expected number of appearances of indiv #1 in the new popn is

0.5, the expected number of appearances of indiv #2 in the new popn is

1.5, etc.

Goldberg has some techniques for greatly reducing stochastic error

sampling, one that comes to mind is called something like the remainder

sampling method (I'm doing this from memory). But he upshot of this

method is that individual #1 will appear in the new population only 0

or 1 times (never 2 or more times), individual #2 will appear in the

new population 1 or 2 times (never 0 times, and never 3 or more times)

etc.

The Goldberg methods, while fine, and doing the job well, aren't as

elegant as SUS. In the SUS method you put in 5 (i.e. popsize) equally

spaced markers around but outside the roulette wheel (on the boundary

part that doesn't spin). Thus markers are spaced 0.2 units apart (on

a sum-to-one normalized scale). You spin the wheel once, by generating

a random 0 to 1 real number, to set the offset of the first marker.

Say the random number turned out to be .3. Then the first marker is

placed .3*.2 = .06 to the right of the 0 point. The other markers are

each 0.2 to the right of the previous marker. So after the funny single

wheel spin, the 5 markers end up at .06, .26, .46, .66, and .86, as shown

below by the "^" marks:

1 2 3 4 5

| 10 | 30 | 15 | 35 | 10 |

0 .1 .4 .55 .9 1.0

^ ^ ^ ^ ^ <-markers

.06 .26 .46 .66 .86

The result is that #1 is selected once, #2 is selected once, #3 is

selected 1 time, #4 is selected twice, and #5 is not selected. This

meets the criteria that the number of times an individual is selected

is close (within 1) of the expected number of times the individual is

selected.

The above was assuming the 100% replacement model -- you want to make

5 selections to replace (well actually the children of the 5 selected

individuals replace) the 5 individuals in the population. If, on

the other hand, you want to replace 60% of the population, then you

would do the same as above, but use 3 equally spaced markers

spaced 0.333 apart, and with an offset randomly between 0 and 0.333.

The main drawback of the method is that one needs to know how many

selections one is going to make in advance. For example, in the 60%

model we know we WANT to select 3 individuals, whose children are to

replace 3 members of the population. But what if a child produced is

bad (as occurs in some genetic operations methods in Genocop), or if

a child is identical to some individual already in the population (some

methods, to maintain diversity, require that all individuals in the

population be unique). So what could happen is that say 4 selections

had to be made to produce the 3 good children that are desired.

But since we only have 3 equally spaced markers, we must do something

to produce the 4th selection. Its not a devastating problem to

overcome, but it makes it less clean and elegant than it appears on

the surface.

Good luck with it.

Jim Larson