Probability of test passes

I have X students that take tests of equal difficulty
There are Y tests in total and you can either pass or fail
I have data on all of the X students but so far not every student has taken every test - some have taken all Y, some have taken y-1, some y-2, etc. down to 1.

I want to know how I’d calculate

1.) the probability that a future student passes all Y tests
2.) the expected total number of fails for a future student

To work it out I thought about just treating each exam as an independent event and using a flat probability of passing each exam (total passes / total attempts) for any student. However, that doesn’t capture the fact that some students are better than others / 1 fail increases likelihood of another (which is what the data shows)


Less is more. Stay pure. Stay poor.
I did not stare too hard at your question, but if there are 4 tests and say 80% of people have completed all 4 tests and the outstanding 20% is representative of those that completed the tests, you can just frame out the probabilities based on the 80%.

75% of people that passed 3/3 initial tests passed the fourth test.
50% of people that passed 2/3 initial tests passed the fourth test.
20% of people that passed 1/3 initial tests passed the fourth test.


Active Member
I suspect without knowing that the percentage of pass/fail is different for different students, but the questions are generalized to a generic future student. So this seems like a measure of the test rather than the student.

If we lump all test results into one set, and call them the student body's performance against a given test, then it could be looked at as a Bernoulli Trial. If the data from the student body does indeed look like a bell curve.

Which can answer these questions:
Another way to examine the results if there are many is to get the average and standard deviation for each student. This can then be looked at from either perspective: Student performance vs a test, or test performance vs students. I mention this example, not because it's directly applicable to the binary pass/fail problem, but because it gives a clear sense of the attribution of blame either to student or to test, which may or may not be correct.
Last edited: