t distribution for Proportion impossible?

#1
Good afternoon,

What are the reasons that we only use Z distribution for proportion confidence intervals, but we can use both Z and t distributions for mean confidence intervals?

My thinking is that t distribution is more spread out so it can account for added estimation error when using sample standard deviation, s, when calculating a confidence interval for mean. However, proportion is not calculated with standard deviation.

Am I thinking along the right lines? Any advice would be appreciated :)
 
#2
Under the null hypothesis, the variance of the proportion can be calculated exactly (it depends only on the hypothesized proportion and the sample size). Thus, the z-distribution is appropriate. The t-distribution is used when you have to estimate the variance of the mean.
 
#3
I'm not sure that the original question has really been answered. The question was about confidence intervals, but the answer just given relates to significance tests. Confidence intervals make no reference to any null hypothesis.

Why is the t-distribution not used when computing confidence intervals for a population proportion - especially since the *sample* proportion (i.e. p-hat) is used in that formula's margin of error, just like the *sample* standard deviation is used when computing an interval for a mean. Also, the sample proportion IS a type of sample mean (a mean of 1's and 0's).

So why again, can't the t-distribution be used when computing a confidence interval for a population proportion?
 

Dason

Ambassador to the humans
#4
Because it's silly to model proportion data with a normal in the first place. It's clearly not normal. But we do have asymptotic results that tell us as the sample size gets larger the sampling distribution of the proportion will go to a normal distribution.

There is absolutely nothing about the T-distribution in there. It's an asymptotic result so we need n to be large. But there is no theoretical justification for using a t-distribution instead of a normal distribution.
 
#5
Is there a distribution which describes the small-sample behavior of the sample proportion, when the standard error of that statistic (i.e. p-hat) is being used instead of an assumed-known standard deviation for it? If so, such a distribution would need to account for the increased variability contributed by the standard deviation's estimator (i.e. the standard error), just as the t-distribution does this when a normal distribution's standard deviation is replaced with its estimator...

[Edit: See post below...]
 
Last edited:
#7
...There is absolutely nothing about the T-distribution in there... there is no theoretical justification for using a t-distribution instead of a normal distribution.
I'm still unclear why you said the above; it seems that of course the t-distribution is involved here? Since p-hat is in fact a sample mean (an x-bar), you can substitute p-hat for x-bar in the single-sample t-confidence interval formula for a population mean. If the sample size is "large" enough to overcome the population's non-normality (say n is at least 40), the distribution of the sample proportion of successes (p-hat) should technically be a bit more accurately described by the corresponding t-distribution versus a normal distribution, although these differences are small and get smaller with larger sample sizes. It seems the reason the t-distribution approach should be (slightly) more accurace, is because variance is indeed unknown here and being estimated. Is this not correct?
 

Dason

Ambassador to the humans
#8
If the sample size is "large" enough to overcome the population's non-normality (say n is at least 40), the distribution of the sample proportion of successes (p-hat) should technically be a bit more accurately described by the corresponding t-distribution versus a normal distribution, although these differences are small and get smaller with larger sample sizes. It seems the reason the t-distribution approach should be (slightly) more accurace, is because variance is indeed unknown here and being estimated. Is this not correct?
But we're only making a single estimate. In the case with the normal distribution you're separately estimating the variance. You say 'the distribution of the sample proportion of successes (p-hat) should technically be a bit more accurately described by the corresponding t-distribution versus a normal distribution' but I don't think you have any theoretical justification for this. Can you back up your feelings with anything other than 'it just feels like we should have to use the t-distribution'?

Edit: Maybe I'll throw together a simulation sometime and we can see how some of this stuff plays out.
 
#9
...Can you back up your feelings with anything other than 'it just feels like we should have to use the t-distribution'?
Hi Dason,

Just for the record, your last line is not what I actually said; note that I did say "Since p-hat is in fact a sample mean (an x-bar), you can substitute p-hat for x-bar in the single-sample t-confidence interval formula for a population mean."

By this, I mean substituting p-hats for ALL the x-bars in the above-mentioned formula - including within the formula for s, in the standard-error side of that formula.

With large sample sizes, approximately normal behavior of p-hat is guaranteed by the CLT; so it certainly seems all theoretical justifications for using Gosset's t-distribution are indeed satisfied. So why shouldn't we use it?
 

BGM

TS Contributor
#11
First of all if the sample \( X_1, X_2, ..., X_n \sim N(\mu, \sigma^2) \)
with both \( \mu, \sigma^2 \) unknown,

then the pivotal quantity

\( \sqrt{n}\frac {\bar{X} - \mu} {S} \sim t(n - 1) \)

and this is exact. Thats why we rely on t-distribution to derive the
confidence interval for the mean of normal population, when the variance is
unknown in most of the case.

Second, note by CLT,

\( \frac {\hat{p} - p} {\sqrt{\frac {p(1-p)} {n}}} \to N(0, 1) \)

But since \( \frac {\hat{p}(1-\hat{p})} {n} \) is a consistent estimator for \( \frac {p(1-p)} {n} \),

by Slutsky Theorem, you may replace the estimator in the denominator
to obtain the asymptotic normality as well. It is not related to t-distribution.

i.e. When n is small, it may well deviate from a normal distribution, but you
do not know whether it is close to t-distribution or not.