## sample size and sampling error?

### sample size and sampling error?

Hello All.

Sounds as though this may be a reasonable question for the list.  Hope so at least.  Would greatly appreciate any feedback you might have on the subject.

We have an estimated 479,000 hunters in Ohio and we want to conduct a survey
to estimate such things as hunter success rates, participation rates, and
opinions on various issues related to deer management.  The first question
of course, is how large of a sample?  My former boss conducted a similar
survey and ended up with 3800 (actually he mailed surveys to 6700 hunters,
apparently he new that the response rate would be down around 45-55% and
took this into consideration when calculating the necessary sample size)
useable responses.  I'm now getting ready to conduct a similar survey and
the question of sample size once again needs to be addressed.  There was
little documentation on how he arrived at 6,700 or 3,800 for that matter, so
I'm left with coming up with my own estimate and of course justifying it.
In all of the STATS text books that I've been able to lay my hands on, they
all deal with minimum sample sizes for estimating the mean for a given
variable or a proportion.  In each case, you're asked to specify a
confidence level (typically 95%) and also the bound or error that you are
willing to accept, for instance plus/minus 2.5lbs in the case of the average
weight of a particular strain of egg plant.  In this survey that I plan on
running, I'm going to ask 40 questions.  Am I to do this for every variable
and take the maximum sample size needed to achieve the desired level of
confidence or what?  If not, is there a similar formula that one uses in
situations such as mine to come up with a sample size required for x-level
of confidence?

On a related note, after discussing this issue with a statistician in
*ia, he sent me an excel spreadsheet that asked for 2 inputs - the size
of the sample and the size of the population and the output was the maximum
% sampling error.  The inputs and outputs are presented below.

Population Size  Sample Size  Max Sampling Error (95% CI)    Percent Error
(95% CI)
479000.00  3800.00  0.0158345305    1.583453047

I'm having a tough time grasping just exactly what the 1.58% means (in
simple terms that adminstrators can understand!).  Does that mean that in
repeated sampling of n=3800, 95 times out of 100, the sample mean plus or
minus 1.58% of the mean will contain the actual population mean?  How, or
should I say, does this relate in anyway to the standard error and CV.  I
know that the CV is actually a percentage (the SE expressed as a percent of
the mean).  Is this 1.58% the maximum CV for all variables in the survey?
If anyone can help me sort this out, I would greatly appreciate it.

Thanks in advance for any assitance you might be able to offer.

Mike Tonkovich

Michael J. Tonkovich, Ph.D.
Wildlife Research Biologist
ODNR, Division of Wildlife

### sample size and sampling error?

Mike

How accurate your results are depends MUCH more on how good your response rate is, and on how representative the respondents are, compared to the total population.

IF your respondents (as opposed to those you mail out the survey to) are a random sample, then you can get fairly good estimates with relatively small samples.  Exactly how accurate the result will be depends on exactly what you are trying to measure.  From what you say, it sounds like you are estimating proportions.  For which are not very close to 0 or 1, the SE formula is (pq/n)^ .5; where p is the proportion saying "yes" (or whatever), and q = 1 - p.  So, for example, if you had 1000 respondents, and 500 said "yes" your standard error = (.5*.5/1000)^.5 = .0158.  With 10,000 respondents, it only goes down to .005 (unless I've pushed the wrong button somewhere)

But if your respondents are NOT random, then no sample size is going to help.  SO, I'd recommend allocating resources to get as many people to respond as possible

Peter L. Flom, Ph.D.
Principal Research Associate
National Development and Research Institutes, Inc.
16th floor
New York, NY 10048

(212) 845-4485
(212) 845-4698 (fax)

Hello All.

Sounds as though this may be a reasonable question for the list.  Hope so at least.  Would greatly appreciate any feedback you might have on the subject.

We have an estimated 479,000 hunters in Ohio and we want to conduct a survey
to estimate such things as hunter success rates, participation rates, and
opinions on various issues related to deer management.  The first question
of course, is how large of a sample?  My former boss conducted a similar
survey and ended up with 3800 (actually he mailed surveys to 6700 hunters,
apparently he new that the response rate would be down around 45-55% and
took this into consideration when calculating the necessary sample size)
useable responses.  I'm now getting ready to conduct a similar survey and
the question of sample size once again needs to be addressed.  There was
little documentation on how he arrived at 6,700 or 3,800 for that matter, so
I'm left with coming up with my own estimate and of course justifying it.
In all of the STATS text books that I've been able to lay my hands on, they
all deal with minimum sample sizes for estimating the mean for a given
variable or a proportion.  In each case, you're asked to specify a
confidence level (typically 95%) and also the bound or error that you are
willing to accept, for instance plus/minus 2.5lbs in the case of the average
weight of a particular strain of egg plant.  In this survey that I plan on
running, I'm going to ask 40 questions.  Am I to do this for every variable
and take the maximum sample size needed to achieve the desired level of
confidence or what?  If not, is there a similar formula that one uses in
situations such as mine to come up with a sample size required for x-level
of confidence?

On a related note, after discussing this issue with a statistician in
*ia, he sent me an excel spreadsheet that asked for 2 inputs - the size
of the sample and the size of the population and the output was the maximum
% sampling error.  The inputs and outputs are presented below.

Population Size  Sample Size  Max Sampling Error (95% CI)    Percent Error
(95% CI)
479000.00  3800.00  0.0158345305    1.583453047

I'm having a tough time grasping just exactly what the 1.58% means (in
simple terms that adminstrators can understand!).  Does that mean that in
repeated sampling of n=3800, 95 times out of 100, the sample mean plus or
minus 1.58% of the mean will contain the actual population mean?  How, or
should I say, does this relate in anyway to the standard error and CV.  I
know that the CV is actually a percentage (the SE expressed as a percent of
the mean).  Is this 1.58% the maximum CV for all variables in the survey?
If anyone can help me sort this out, I would greatly appreciate it.

Thanks in advance for any assitance you might be able to offer.

Mike Tonkovich

Michael J. Tonkovich, Ph.D.
Wildlife Research Biologist
ODNR, Division of Wildlife

### sample size and sampling error?

Mike Tonkovich wrote [in part]:

Quote:> On a related note, after discussing this issue with a statistician in
> *ia, he sent me an excel spreadsheet that asked for 2 inputs - the
size
> of the sample and the size of the population and the output was the
maximum
> % sampling error.  The inputs and outputs are presented below.

> Population Size  Sample Size  Max Sampling Error (95% CI)    Percent
Error (95% CI)
> 479000.00        3800.00      0.0158345305                   1.583453047

Yes, the 1.58% is 1.96 * the [finite-size-corrected] standard error for
the estimated proportion that would yield the largest possible error
[which happens to be when p is exactly 0.5 ].  So you would be guaranteed
that [if you have 3800 returned] your 95% confidence intervals would all
have a half-width of 1.58% or less.  If you don't need that tight a
confidence interval, you can make do with a smaller sample size.

But your *real* problem is mentioned elsewhere:

Quote:>                                      My former boss conducted a similar
> survey and ended up with 3800 (actually he mailed surveys to 6700
hunters,
> apparently he new that the response rate would be down around 45-55% and
> took this into consideration when calculating the necessary sample size)
> useable responses.

If you have a non-response rate of 43% , you are in big trouble.  Any
variation whatsoever in the populations of responders and non-responders
would be potentially catastrophic.. and you wouldn't know, since you can't
characterize the population of non-responders.  I strongly suggest you
consider some of the standard ways of coping with a non-response effect
this large.  One potential way would be a re-mailing to a random subset of
those who didn't respond, with some sort of incentive to get them to send
the
info back, and then the use of sampling-theory estimators to combine the
first and second sample results to extrapolate to the entire population of
interest.  Alternatively, some incentive up-front [like a drawing from all
returned names and ten free licenses would be given out, as one wacky
example]
might get your non-response down to something more workable.

David
--
David Cassell, CSC

Senior computing specialist
mathematical statistician

Hi -

We're training entry level software testers, and I wondered if there
was a shareware or commercially available product that contained a
definable number of bugs (for training purposes) that testers could
find and be measured on, and perhaps a sample functional spec for
composing test cases?

thanks!