Which SD to use in z-test for proportion in population

I am testing the prediction ability of games in a league season and as a naive approach would like to check if the proportion of correctly predicted games is > 0.5 and statistically significant.
To achieve that - I sampled ~100 games for every season over few years and the corresponding predictions and now I have the proportions of predicted games for each season.

The Question:
My description above is analogous to a sample of 100 coin flips and I would like to test the hypothesis:
let p represent the proportion of correctly predicted games (win/lose)
H0: p = 0.5
H1: p > 0.5

Now, to calculate the appropriate z-value I would use the following formula: (sample proportion - expected proportion 0.5) / SE

My question is which is the appropriate SE to use here and why?
My options are:
1. I can calculate the SE by using sample SD and dividing it by sqrt(n) - which I understand is the case when we actually assume we don't know the actual distribution (P~binom(n, ?)) - but in that case it feels to me like the null hypothesis is wrong because it assumes the P~binom(n, 0.5)
2. I can calculate the SE by using theoretical SD and dividing it by sqrt(n) - which I understand is the case when I assume I actually know that the distribution is P~binom(n, 0.5).

However, according to all resources available I see that the first option is the one I should use.
Thank you all in advance :)
Last edited:
My understanding of a z-score is that it's a measurement of distance from the center of a bell curve to some sample point:
(sampleValue - expectedValue)

and the resulting distance is converted to another unit of measure, a count of standard deviations from expectedValue to sampleValue :
z-score = (sampleValue - expectedValue) / standardDeviation
Last edited:
I would use the standard deviation assumed under the null hypothesis i.e. sqrt(0.5(1-0.5)/n). This is because you are performing a hypothesis test and the distribution under the null is what you assume. The test statistic may show evidence against the null or not. It depends on the data of course.