> I have recently found out from SPSS
> statistical support that the
> 'weighting' function (from the data -
> weight data pull down option) is
> intended for frequency analysis ONLY.
> Even with only one weighting variable,
> using this function leads to inaccurate
> estimates. The weighting function
> should be used with an integer number
> weight varible only, I was told. I had
> wanted to weight a regression analysis.
> To do this, I was told I needed
> Complex Samples.
> Is this true? Can I not weight
> regression analysis or means tables and
> trust the validity of my results?
Doris, this subject matter has been dealt with repeteadly in this liost
lately. I copy below a recent message of mine about this problem. In
short, what they told you (or you say they told you) is not accurate.
Either your weights are good for your sample or not; if they are good,
they are good to estimate absolute values (frequencies) and to weigh
your cases in a regression analysis, or they are wrong in both.
What they probably meant was:
1. The usual 'weights' used in surveys accomplish two functions: to
'expand' (from sample to universe size) and to 'weigh' proper, i.e. to
give each case a different weight (if needed) reflecting different
sampling proportions. In a simple random survey, the weight is the same
for all, so you can use no weights (weight=1 for all cases) or you can
'expand' to universe size (weight=N/n for all cases). In this Simple
Random Sample situation, you should 'weigh' (i.e. you should expand)
your data to get frequencies reflecting the size of your population, but
you shouldn't 'weigh' (i.e. you shoulqdn't expand' your data when
producing a correlation or any other statistical analysis, because the
significance tests are computed in SPSS taking as 'sample size' the
WEIGHTED number of cases. If your sample is expanded, SPSS assumes you
have a sample the size of your universe, and correspondingly
underestimates your sample error. Thus their advice>: use weights for
frequencies, do not use them for correlations or regressions or whatever
else. And (separately) if you don't have a simple random sample, your
weights will be different for different cases, so you may use them for
frequencies but not for other analyses.
The question is, samples are not simple random samples quite often. They
may be multistage, stratified, clustered samples, or whatever mix of
sampling techniques that may happen to have been used in your particular
survey. In that case, the significance values yielded by SPSS would not
be accurate. If you apply correct (different) weights to your data,
reflecting different sampling probabilities, the regression coefficient
estimates (or whatever you're estimating) will be fine, it is the
significance that would be wrong.
What you would need is WesVar Complex Samples, to estimate the
significance of a result obtained with a complex sampling model.
Otherwise, you should use some approximate shortcut as the one I
recommend in the message below. The essential insight to grasp that
approach is to distinguish between 'expanding' and 'weighting', or if
you prefer, between absolute and relative weighting. You can have one
without the other.
Here's an extract of my former message:
The matter stands as follows:
1. SPSS uses the weighted data total to compute 'sample size' when it
computes statistical tests. Thus, chi square for a sample of 100 that
the weights expand to 1,000,000 is treated as a sample of one million
people. This greatly distorts the significance of your results, making
usually for a falsely higher significance.
2. Weights accomplish two functions: to weight in the strict sense, and
to expand the scale to population size. Strict weighting means giving
each sample case its correct weight, when sampling probabilities have
been unequal (otherwise all cases should be equally weighted).
3. One can weight without expanding, but this is a trick with a cost. It
works as follows:
a. You COMPUTE a second weighting variable:
COMPUTE WEIGHT_2 = WEIGHT_1 * (n/N)
where n=sample size and N=population size.
b. Use this new variable for weighting your cases.
As a result, your tables would yield a 'total number of cases' of n
sample size), plus or minus a slight rounding error, but your individual
cases will have different weights: some less than 1, some more than 1,
as long as your original weights (weight_1) were not uniform in the
Using these new weights takes away the 'scale effect' from the
significance tests: it gives you a significance level referred to your
actual sample size.
Is this all you need? Not quite. Your sample was not a simple random
sample, but a stratified one. Thus, your true significance is somewhat
higher than this procedure yields. Before you were overestimating the
significance (usually by too much) and now you're underestimating it
(usually by not so much).
If you could estimate the size 'm' of the hypothetical simple random
sample the significance of which would equal the significance achieved
with your (smaller) stratified sample of size 'n', you might get a
better approximation to true significance if you compute
WEIGHT_3=WEIGHT_1*(m/N), where 'n' is replaced by 'm'=the equivalent
simple random sample. This way, your results will be expanded to 'm'
cases instead of remaining at 'n', but you'd be closer to an unbiased
estimate of significance.
Unfortunately, SPSS regards all sampling as simple random. SPSS Inc is
announcing a new product which would deal with more complex sample
designs. Keep tuned.
[Apart from stratification, the sampling model may involve clustering,
which tends to INCREASE sampling error, whereas stratification tends to
SPSS would ignore the additional error
coming from sample clustering, and the additional precision coming from
stratified sampling. These two factors, however, tend to compensate each
so in rough terms the significance computed by SPSS with weights that
overall sample size (such as yours) can give you some guidance, though
not represent the exact level of significance of your results.
If you know the overall effect of your sample design, you can modify
weights to yield a weighted number of cases which equals the
simple random sample giving the same level of significance your sample
That alternative sample size might be lower or higher than your actual
according to the pre* of stratification or clustering effects.
use those alternative weights you may be closer to the effective levels
significance of your results.
In the end, however, you must face the disappointing fact that SPSS does
deal with complex sampling designs. If you want better results, you
should use WesVar Complex Samples indeed.
Universidad del Salvador
Buenos Aires, Argentina