Dear all

Collegue Rivero, last 28th August, pointed to a very important problem

to those who use Cluster Analysis in SPSS: The absence of measures of

validating a given classification.

Everyone is aware of the subjectivity inherent to Cluster Analysis,

specially (which I think is the most) when we do not have any idea a

priori on the stucture of the data. So, usually, we experiment a

combination, say, 3 distance measures between observations, with 3

classification methods, which gives 3x3=9 different classifications.

Which one reflects better the original structure of the data?

There are severall methods to find out, among them: (1) The Cophenetic

correlation; (2) The Monte Carlo procedures, and (3) The significance

tests.

None of this is implemented in SPSS. Without any of these, or others, in

my opinion, any classification is allmost useless.

So, we have to write a syntax on it. Maybe the most feasible of the

methods would be the Cophenetic correlation, which involves the

calculation of a correlation between two matrices (A and B), which I think

is more or less like:

<A.B> (internal product)

--------------

||A||.||B|| (product of "norms")

I think this is hard to do in SPSS, so we would use the "RESHAPE"

procedure as Dr. Nichols pointed out. The problem is that we do not have

the second matrix, ie, the matrix of "fusion distances".

By hand, we could create this matrix from, for instance, the dendrogram.

But once again, SPSS transforms ("rescales") the original distances

(without allowing us to state not to do so) becoming impossible to

compare the two things.

Anyway, this was only feasible with small sets of data.

I had this problem, ie, TO OBTAIN THE FUSION MATRIX a few months ago

(if anyone remember) but I did not end up with a solution.

If anyone could please give any idea, or criticise the ideas on this

e-mail, I (and many other people I guess) would be very grateful.

Sorry for the long e-mail.

Thanks and Regards to all,

