# Estimating population size from sample

#### Paul King-Fisher

##### New Member
Folks:

A colleague administered a suvery and is seeking my advice. I am showing my rustiness in posing this question. A survey was sent to a universe of 445, 103 of which responded.

The population is assumed to be normally distributed. One question was binomial; 20 of the 103 responded in the affirmative.

I calculated the margin of error at +/- 7.6% on a 95% confidence level using the formula:

1. ME = z x sqrt [p(1-p))/n], where z = 1.96, and n = 103

Because the population is small and a relatively large proportion was sampled, I applied a finite population adjustment factor of 0.8714, where:

2. FPAF = sqrt (N-n)/(N-1), where N = 425 and n = 103

With this adjustment, the margin of error is +/- 6.6

This means that 95% of the time, the “true” number of yeses in the population is between 12.8% and 26%, , i.e., 86, +/- 29.

A second question generated ordinal-level data. Of the 20 yeses, a total of 24events were recorded, with the vast majority of the 20 indicating one event.

My question is, what is the most statistically valid or robust way of expressing the number of events and the margin of error at the population level? It seems to me a crude method would be to multiply the mean or modal value by the estimated number of yeses in the population, e.g., 1 x 86, +/- 29.

Is there another statistical test that is valid to apply to this data?

#### BioStatMatt

##### TS Contributor
I'd like to make a couple of comments. You mentioned:

"This means that 95% of the time, the “true” number of yeses in the population is between 12.8% and 26%, , i.e., 86, +/- 29. "

This is really incorrect as the "true' number of yes's in the population is a fixed number, not a random variable. Two better interpretations:

"I am 95% confident that the "true" mean lies within these boundaries",

"If I were to draw many samples of size n from the population, 95% of the sample means would lie within the these bounds."

As for the second problem, are these data counts of a certain event given they answered affirmative on the previous question? For example, are they questions like:

1) "Have you ever played cards?"
2) "How many times have times have you played cards in the past month?"

If this is the case, you can model the data from the second question with a binomial (as before) or poisson distribution. When you present the data, you can make a statement like. "of those respondents who have ever played cards, the average number of games played per month is <insert mean> (95% CI <insert 95%CI>)"

I hope this helps

~Matt

#### Paul King-Fisher

##### New Member
Matt:

Thanks for your response; apologies for the delay in responding. I am going to continue to show my rustiness. The card games analogy works reasonably well, although the second question more precisely is about how many of a type of asset respondents currently own, and the assets are long-lived, can be owned concurrently, but ownership is still otherwise independent.

With respect to the first question, aside from the survey result (now 21 of 103 respondents of a universe of 445 indicating they own at least one of this type of asset), I don’t have prior knowledge of the likelihood of them owning this asset. I know that the sample meets the tests for being able to be modeled in a binomial or poisson. Assuming it is appropriate to use that percentage in a binomial, does this mean can I assert there is a 9.7% probability that 20 people randomly selected from the population of 445 will own the asset (in Excel, =BINOMDIST(20,103,0.2039,FALSE)), or that there is only a 14% probability that more than 25 people randomly selected from the population own the asset?

How does this relate to my determination of the confidence interval (0.2039 ± 0.068), i.e., the true mean of the number of those owning the asset lies somewhere between 13.6% and 27.2% of the population, i.e., the true mean of those who own at least one of this type of asset lies somewhere between 60 and 121 out of the population 445?

With respect to the second question – how many of this asset do respondents own, and by inference, what can be said about how many are owned in the population of 445 – there are 21 yeses to owning the asset, but of those, only 16 respondents gave a count total (a total of 21 assets, with a modal value of 1).

It’s not clear to me how to proceed. Do I still model this using a binomial, i.e., use the sample of 103 to model the 21 count? Again, I don’t have data that would enable me to assign a probability…

With respect to the confidence interval, I assumed because of the small sample of 16 I should use a two-tailed t-distribution to estimate the confidence interval, i.e., 95% confident that, of the people who own this asset, the true mean of the number owned by at the population level lies between 0.809 and 1.81