The concept of probability averaging only arises in relation to some prescribed probability sampling schemes. Thus, for simple random sampling we have the concept of the expected value of \(y_i\), the \(ith\) observation in the sample. That is,

\(E[y_i] = \sum_{j=1}^{N} Y_j Pr(y_i=Y_j)=\frac{1}{N}\sum_{j=1}^{N} Y_j=\bar{Y}\)

The result that \(Pr(y_i=Y_j) = \frac{1}{N}\) holds because the number of samples with \(y_i=Y_j\) is \(\frac{(N-1)!}{(N-n)!}\) and each has probability \(\frac{(N-n)!}{N!}\)

---------------------------------------------

I unsure as to justify to myself that \(y_i=Y_j\) is \(\frac{(N-1)!}{(N-n)!}\), the book doesn't discuss this further.

In terms of notation, we have a population \(Y_1, Y_2, ...,Y_N\) and a sample \(y_1, y_2, ...,y_n\)