Hi all,
I want to simulate the difference between sampling variance using simple random sampling (SRS) rather than Stratified Sampling (SS). In particular I want to test the case when stratification is not useful to define homogeneous groups.
I use a simple universe of 50 units. The values VV are drawn from a normal distribution of mean=100000 and variance=1000.
Then I divided the observation in two groups. The inner variance of both groups is very similar so I expect to get no gain using SS instead of SRS.
I run the following code to create any possible sample of 5 elements. The first part simulate the case of stratified sampling (2 elements from the first group of 15 units and 3 elements from the second of 35 units), while the second part simulate the case SRS with a sample of 5 elements.
Using these samples I want to estimate the universe sum.
--------------------------------------------------------------------------- data univ; data univ1; data univ2; proc means data=univ1; proc means data=univ2; --------------------------------------------------------------------------- According to statistical theory, the expected value of sample sum is equal to universe sum for both techniques, while sampling variance should be not greater for SS. What I get is that variances are very similar, but SRS variance is slightly smaller than SS variance!!! --------------------------------------------------------------------------- SS SRS --------------------------------------------------------------------------- Could you help me to understand where is the mistake? Thanks for any suggestion. Gianluca
s1=98732.4827088742;
s2=107216.090125439;
s3=89101.1839281418;
s4=93957.4081508908;
s5=119316.121324664;
s6=98747.8986440692;
s7=87339.9701755261;
s8=115679.779608035;
s9=97374.1523717763;
s10=100898.148755368;
s11=110504.163583391;
s12=98789.5193953591;
s13=92648.2587397913;
s14=121355.481294449;
s15=100180.92123355;
s16=115262.139640981;
s17=88184.2086324468;
s18=93293.704392272;
s19=83388.4430699982;
s20=111601.719052123;
s21=109095.720088226;
s22=116420.062820544;
s23=99392.6167007885;
s24=94120.9580356372;
s25=108706.911103218;
s26=104347.884896561;
s27=98898.8747645635;
s28=95823.0205265863;
s29=97495.1151671121;
s30=93106.7009129038;
s31=103422.81509802;
s32=89250.9777055238;
s33=108031.497600314;
s34=93863.4800855652;
s35=93438.609635632;
s36=97742.2362462676;
s37=113571.025192505;
s38=86556.5996515215;
s39=93765.3001325089;
s40=110225.357982563;
s41=101765.147317201;
s42=87894.8528930778;
s43=100758.450369176;
s44=86273.6331124324;
s45=104331.900527177;
s46=104758.589966514;
s47=95794.6215545235;
s48=108587.949196226;
s49=79214.5899846219;
s50=94804.1249808739;
zf1=15/2; zf2=35/3; zf=50/5;
run;
set univ;
array gp1 (*) s1--s15;
array gp2 (*) s16--s50;
do i=1 to (dim(gp1)-1);
do j=i+1 to dim(gp1);
do k=1 to (dim(gp2)-2);
do w=k+1 to (dim(gp2)-1);
do x=w+1 to (dim(gp2));
vvgp1s=gp1(i)* zf1 + gp1(j)* zf1;
vvgp2s=gp2(k)* zf2 + gp2(w)* zf2 + gp2(x)* zf2;
vvstot=vvgp2s+vvgp1s;
output;
end;
end;
end;
end;
end;
run;
set univ;
array gp1 (*) s1--s50;
do i=1 to (dim(gp1)-4);
do j=i+1 to (dim(gp1)-3);
do k=j+1 to (dim(gp1)-2);
do w=k+1 to (dim(gp1)-1);
do x=w+1 to (dim(gp1));
vvtots=(gp1(i)+gp1(j)+gp1(k)+gp1(w)+gp1(x))*zf;
output;
end;
end;
end;
end;
end;
run;
var vvstot;
run;
var vvtots;
run;
N = 687225
Mean = 4995031.42
Std.Dev. = 210736.37
N = 2118760
Mean = 4995031.42
Std.Dev. = 208152.14