> I have recently found out from SPSS

> statistical support that the

> 'weighting' function (from the data -

> weight data pull down option) is

> intended for frequency analysis ONLY.

> Even with only one weighting variable,

> using this function leads to inaccurate

> estimates. The weighting function

> should be used with an integer number

> weight varible only, I was told. I had

> wanted to weight a regression analysis.

> To do this, I was told I needed

> Complex Samples.

> Is this true? Can I not weight

> regression analysis or means tables and

> trust the validity of my results?

Doris, this subject matter has been dealt with repeteadly in this liost

lately. I copy below a recent message of mine about this problem. In

short, what they told you (or you say they told you) is not accurate.

Either your weights are good for your sample or not; if they are good,

they are good to estimate absolute values (frequencies) and to weigh

your cases in a regression analysis, or they are wrong in both.

What they probably meant was:

1. The usual 'weights' used in surveys accomplish two functions: to

'expand' (from sample to universe size) and to 'weigh' proper, i.e. to

give each case a different weight (if needed) reflecting different

sampling proportions. In a simple random survey, the weight is the same

for all, so you can use no weights (weight=1 for all cases) or you can

'expand' to universe size (weight=N/n for all cases). In this Simple

Random Sample situation, you should 'weigh' (i.e. you should expand)

your data to get frequencies reflecting the size of your population, but

you shouldn't 'weigh' (i.e. you shoulqdn't expand' your data when

producing a correlation or any other statistical analysis, because the

significance tests are computed in SPSS taking as 'sample size' the

WEIGHTED number of cases. If your sample is expanded, SPSS assumes you

have a sample the size of your universe, and correspondingly

underestimates your sample error. Thus their advice>: use weights for

frequencies, do not use them for correlations or regressions or whatever

else. And (separately) if you don't have a simple random sample, your

weights will be different for different cases, so you may use them for

frequencies but not for other analyses.

The question is, samples are not simple random samples quite often. They

may be multistage, stratified, clustered samples, or whatever mix of

sampling techniques that may happen to have been used in your particular

survey. In that case, the significance values yielded by SPSS would not

be accurate. If you apply correct (different) weights to your data,

reflecting different sampling probabilities, the regression coefficient

estimates (or whatever you're estimating) will be fine, it is the

significance that would be wrong.

What you would need is WesVar Complex Samples, to estimate the

significance of a result obtained with a complex sampling model.

Otherwise, you should use some approximate shortcut as the one I

recommend in the message below. The essential insight to grasp that

approach is to distinguish between 'expanding' and 'weighting', or if

you prefer, between absolute and relative weighting. You can have one

without the other.

Here's an extract of my former message:

The matter stands as follows:

1. SPSS uses the weighted data total to compute 'sample size' when it

computes statistical tests. Thus, chi square for a sample of 100 that

the weights expand to 1,000,000 is treated as a sample of one million

people. This greatly distorts the significance of your results, making

usually for a falsely higher significance.

2. Weights accomplish two functions: to weight in the strict sense, and

to expand the scale to population size. Strict weighting means giving

each sample case its correct weight, when sampling probabilities have

been unequal (otherwise all cases should be equally weighted).

3. One can weight without expanding, but this is a trick with a cost. It

works as follows:

a. You COMPUTE a second weighting variable:

COMPUTE WEIGHT_2 = WEIGHT_1 * (n/N)

where n=sample size and N=population size.

b. Use this new variable for weighting your cases.

As a result, your tables would yield a 'total number of cases' of n

(the

sample size), plus or minus a slight rounding error, but your individual

cases will have different weights: some less than 1, some more than 1,

as long as your original weights (weight_1) were not uniform in the

first place.

Using these new weights takes away the 'scale effect' from the

significance tests: it gives you a significance level referred to your

actual sample size.

Is this all you need? Not quite. Your sample was not a simple random

sample, but a stratified one. Thus, your true significance is somewhat

higher than this procedure yields. Before you were overestimating the

significance (usually by too much) and now you're underestimating it

(usually by not so much).

If you could estimate the size 'm' of the hypothetical simple random

sample the significance of which would equal the significance achieved

with your (smaller) stratified sample of size 'n', you might get a

better approximation to true significance if you compute

WEIGHT_3=WEIGHT_1*(m/N), where 'n' is replaced by 'm'=the equivalent

simple random sample. This way, your results will be expanded to 'm'

cases instead of remaining at 'n', but you'd be closer to an unbiased

estimate of significance.

Unfortunately, SPSS regards all sampling as simple random. SPSS Inc is

announcing a new product which would deal with more complex sample

designs. Keep tuned.

[Apart from stratification, the sampling model may involve clustering,

which tends to INCREASE sampling error, whereas stratification tends to

decrease it].

SPSS would ignore the additional error

coming from sample clustering, and the additional precision coming from

stratified sampling. These two factors, however, tend to compensate each

other,

so in rough terms the significance computed by SPSS with weights that

preserve

overall sample size (such as yours) can give you some guidance, though

it does

not represent the exact level of significance of your results.

If you know the overall effect of your sample design, you can modify

your

weights to yield a weighted number of cases which equals the

hypothetical

simple random sample giving the same level of significance your sample

yields.

That alternative sample size might be lower or higher than your actual

sample,

according to the pre* of stratification or clustering effects.

If you

use those alternative weights you may be closer to the effective levels

of

significance of your results.

In the end, however, you must face the disappointing fact that SPSS does

not

deal with complex sampling designs. If you want better results, you

should use WesVar Complex Samples indeed.

Hector Maletta

Universidad del Salvador

Buenos Aires, Argentina