Brier score calculation: 2 methods should yield same result


New Member
I have a set of 234 predictions of tennis match outcomes and 5 different prediction models. I use the two different methods for calculating the Brier score described, for instance, here.

The first method is:

\(BS=\frac{1}N\sum_{t=1}^N(f_{t}-o_{t})^2 \)

Where \(N\) is the number of forecasting instances, \(f_{t}\) is the forecast probability of the \(t\)-th instance, and \(o_{t}\) is the outcome (either \(0\) or \(1\)).

The second method decomposes the brier score into *Resolution*, *Reliability*, and *Uncertainty*:

\(BS=\frac{1}{N}\sum\limits _{k=1}^{K}{n_{k}(\mathbf{f_{k}}-\mathbf{\bar{o}}_{\mathbf{k}})}^{2}-\frac{1}{N}\sum\limits _{k=1}^{K}{n_{k}(\mathbf{\bar{o}_{k}}-\bar{\mathbf{o}})}^{2}+\mathbf{\bar{o}}\left({1-\mathbf{\bar{o}}}\right) \)

With \(\textstyle N\) being the total number of forecasts issued, \(\textstyle K\) the number of unique forecasts issued, \(\mathbf{\bar{o}}={\sum_{t=1}^{N}}\mathbf{{o_t}}/N\) the observed base rate for the event to occur, \( n_{k}\) the number of forecasts with the same probability category and \(\mathbf{\overline{o}}_{\mathbf{k}}\) the observed frequency, given forecasts of probability \(\mathbf{f_{k}}\).

For more details see Wikipedia.

So for my calculations I would expect that both methods yield the same results. However, I can't achieve this with my data, hence, there must be an error. I provide a minimal working example with my data (google spreadsheet) where I show how I calculate the brier score in both ways.

Here is the link to the minimal "working" example:
(nb: the *Resolution* in my example is 0, because in this example the data set is not split into bins, hence the *bin base rate* (aka *observed frequency*) equals the *overall observed base rate*)

I would greatly appreciate, if

- you could point me to errors in my calculation
- explain, what I am doing wrong
- provide the correct calculations for both ways of calculating the Brier score

I already spent a few hours on getting this right, but did not succeed. As pointed out above, I think I probably have a slight error when applying the decomposed brier score formula (second method). However, I could not identify, what I am doing wrong exactly. One thing I noticed is, that the second formula speaks about *total number of forecasts* and *number of unique of forecasts*. Since my example (for simplicity) only uses 1 bin, I wonder if *total number of forecasts* and *number of unique of forecasts* are the same, or do they mean something different?