# 95% confidence interval using standard error of a proportion

#### throughstream2

##### New Member
Hi, firstly apologies my stats knowledge is limited.

I am looking at linear binomial regression.

I have a study that has a yes or no output so binomial. Out of 50 cases I get a yes rate of 49 so a probability of success (p) value of 98%. I want to work out the 95% confidence of this probability of success. My tutor has stated that I should use the standard error of a proportion: Using this formula gives me a standard error rate of 0.019799

I believe I then add the p value to the standard error rate to work out the confidence interval which in this case = 96-100%

However is that at a 95% confidence interval, I read that doing the above calculation give 68% CI and I need to multiple the SE by 1.96. However it doing so the confidence intervals would be higher than 100% which is impossible?

Any help would be most appreciated.

#### throughstream2

##### New Member
Hi thank you so much for getting back to me:

I wondered if you could help me with a worked example.

I presume you are referring to the following formulae when you suggest using beta distribution confidence interval: From the original information:

Alpha = 0.05
K= 49
n=50
Betainv = not sure how you know/calculate this?
I'm guessing the (,) indicate to multiple.

Again thank you so much for your help.

#### checkthebias

##### New Member
Betainv is the cumulative distribution function.
Beta distribution has two shape parameters (alfa and beta), so you need to calculate the cumulative distribution function Betainv(0.975, alfa, beta).

#### obh

##### Active Member
Hi,

I think that the sample size of 50 is small enough to use the binomial distribution instead of any other approximation distribution.
Since the distribution is discrete the confidence level won't be exactly the required level.

On the other hand, maybe sample size of 50 is big enough to use the normal distribution even with a high ratio??
It would be interesting to run a simulation to check.

#### throughstream2

##### New Member
Betainv is the cumulative distribution function.
Beta distribution has two shape parameters (alfa and beta), so you need to calculate the cumulative distribution function Betainv(0.975, alfa, beta).
Hi thank you, sorry this is still a bit beyond me can I just check so for calculating Betainv is α = # of successes in n trials and β = # of failures in n trials.
Which is straight forward to calculate in excel.

So where does this formulae come into play? Last edited:

#### EdGr

##### Member
I would recommend using the exact method. Do it in Excel.

=1-BINOM.DIST(48,50,0.90,TRUE)

This formula calculates the probability of getting from 0 to 48 successes in 50 trials if the probability on each trial is 0.9. This is then subtracted from 1 to give the probability of getting *more* that 48 successes (since you had more, namely 49).

You then fool around with the 0.90 proportion until you find a value that gives exactly 0.05 probability of getting greater than 48/50. That value of p is your lower bound 95% one-tailed CI. I would do this one-tailed -- the upper bound is basically 1.

For example, using 0.9 I get 0.033. So, if the true proportion was 0.9, I would get more than 48 only 3.3% of the time. If the true proportion was 0.9, you would get an observed value of 48 or less 1 - 0.033 or 96.7% of the time. By definition, 0.9 is the lower bound of a one-tailed 0.967 CI.

You want a value such that the observed data would occur only 5% of the time or more, thus you can be (sort of) 95% sure the true value is no bigger than it. That's how we interpret CI (it isn't technically correct, of course).

But of course, you want a 0.95 CI. Somewhere around 0.91. Play with it!

#### throughstream2

##### New Member
Thank you so much for this, really appreciate you taking the time to explain it.

#### throughstream2

##### New Member

Within this test I agree that with 48 successes out of 50 the upper bound is basically 1 however if I had a similar test which has say 38 success out of 50 how do I go about finding the upper bound?

Can you explain a bit more why this is only "sort of 95% sure" and "technically not correct".

Last edited:

#### obh

##### Active Member
Hi,

If you want to use a binomial distribution:

p=38/50=0.76

P( x > 44 ) = 0.0106530. 44/50=0.88
P( x > 45 ) = 0.0279590. too big, bigger than 0.025 (0.05/2).

P(x<33)=0.0384254. too big
P(x<32)= 0.0190879.. 32/50=0.64

So the confidence interval includes the edges is: [0.64,0.88]

But I'm not sure if this is the best method. http://users.stat.ufl.edu/~aa/articles/agresti_coull_1998.pdf

Generally, I would only expect that the accurate method will always be more correct than any approximation.
I can't say the attached article is very clear to me.

You may say that since it is discrete distribution it makes sense to get only CI from discrete values.
The problem I can see with using the binomial distribution is that the discrete value is based on estimated probability, so actually thew values between the discrete results of the binomial are also possible values.
So because of the discrete results, the binomial CI produces a bigger confidence interval with a bigger actual confidence level which is bigger than the required confidence level.

The method called exact (that checkthebias mentioned) is actually Clopper-Pearson based on the Beta distribution, which is as I understand also an approximation?
I checked in R (library(Hmisc), there are also other options) and didn't see they use the binomial distribution as an option. (Normal / Clopper-Pearson, Wilson)

In https://www.rdocumentation.org/packages/Hmisc/versions/4.2-0/topics/binconf they write :
"Following Agresti and Coull, the Wilson interval is to be preferred and so is the default."

So what is the most accurate method for confidence interval? @Miner @Dason

Why no binomial? is it what I wrote above?

Last edited:

#### EdGr

##### Member
I see another person (OBH) chimed in, but I fear he or she is doing it wrong. OBH is starting with the observed proportion and figuring out what a range of values that would occur with 95% probability if that was the population proportion.

It may not make a huge difference, but believe the recommended approach is to take possible population values and find ones that would have a low probability of exceeding what you obtained in either direction.

So, with 38/50 (0.76) we play with probabilities. If the true proportion was 0.64, then you would get a value greater than 38/50 2.5% of the time. Around 0.854 population proportion would give less than 38/50 around 2.46% of the time. So my interval is 0.64 to 0.854.

None of these methods are perfect. What we ideally want is an interval with 95% probability of containing the population parameter. But that depends on the likely location of the population parameter before we did the study, which is unknown, subjectively estimated, or both. So we use methods that work fairly well given that our data are the primary source of information. They never give a true probability that the population parameter is in the interval.

#### obh

##### Active Member
Hi Edgr,

Thanks for the correction, it makes sense!

When using the binomial distribution, I get the following results:
P( x ≥ 38 | p=0.618309) = 0.975
P( x ≤ 38 | p=0.86939 ) = 0.025

So I can see I get exactly the same results as the Clopper-Pearson (exact) CI, which uses the beta distribution...
So I assume this is the reason why they call it 'exact'.

Questions
1. I didn't see R uses the Binomial distribution, but only the Clopper-Pearson (and Wilson and normal)
Is it because the Clopper-Pearson - beta distribution gives a very good approximation for the Binomial distribution? or is this exactly the same as using the binomial?

2. Per my understanding, the "exact" method is not the best method for a confidence interval, and the Wilson score gives a better result.
"coverage probability tend to be too large for 'exact' confidence interval"
Any Idea why the 'exact' method is not correct.? as it sounds like an accurate calculation ...?
In the link below they write: "This procedure is necessarily conservative because of the discreteness of the binomial distribution (Neyman 1935)"
http://users.stat.ufl.edu/~aa/articles/agresti_coull_1998.pdf

Last edited:

#### obh

##### Active Member
I run some simulations to get a better feeling. I got similar results as in the article.

The reason that the exact is not so good is ... as they wrote the "discreteness of the binomial distribution"

If p=0.5 and n=15: mean=np=15*0.5=7.5 but you can't really get this value.

So when I run a simulation with n=15 and confidence level of 0.95 with method="exact"
The actual confidence level is not 0.95, is also depend on the p.
A minor different change in p change the confidence level

For example: (rep=20,000)
P actual confidence levle
0.5 0.96455
0.51 0.96315
0.52 0.98275
0.53 0.98415

Average of random p: 0.98