- Thread starter Janthony
- Start date

Each person has a distribution of test scores, so you could see how many standard deviations away a score of say 95 would be or perform 5 one-sample ttests and compare the output. That doesn't quite get at your question, but may get the thought juices flowing. Another issue is that scores are likely 0-100 bound, right?

Assume that the test scores are a good indicator of the final exam score.

Shuffle the test scores among the tests with replacement, keeping the scores for each test together. Work out the average for each student. Record which student has the highest score.

Repeat, say 5000 times.

Find the proportion of the 5000 times each student was top. That is the probability you want.

You would need a spreadsheets with some Monte Carlo functions.

For each student make a range for their mean using =NORMINV(RAND(),mean,SD/SQRT(15)) with the 15 for the 15 tests.

Find which student has the maximum mark =MATCH(MAX(mark range),mark range,0)

Copy down say 2500 times.

Collate.

Here's what it looks like with your sample data

Should give you a good idea.

Student Grade Study with additional columns, +/-1.65σ and +/-1.28σ, the 95% and 80% confidence range for each student.

For a generic test, it can be said with 95% certainty these students will achieve a score in the +/-1.65 range. Or equivalently, that 95% of their scores on a test fall in this range. The range can be narrowed by trading away confidence the scores will appear in a narrower range band, shown with the 80% range of +/-1.28 standard deviations from mean.

The merit of a test could also be measured against the history of test results in a similar manner. A good test would produce the least variance of student scores from their means, while a poor test would result in student scores farther away from their respective averages.

For a generic test, it can be said with 95% certainty these students will achieve a score in the +/-1.65 range. Or equivalently, that 95% of their scores on a test fall in this range. The range can be narrowed by trading away confidence the scores will appear in a narrower range band, shown with the 80% range of +/-1.28 standard deviations from mean.

The merit of a test could also be measured against the history of test results in a similar manner. A good test would produce the least variance of student scores from their means, while a poor test would result in student scores farther away from their respective averages.

Last edited:

The merit of a test could also be measured against the history of test results in a similar manner. A good test would produce the least variance of student scores from their means, while a poor test would result in student scores farther away from their respective averages.

Student Grade Study with additional columns, +/-1.65σ and +/-1.28σ, the 95% and 80% confidence range for each student.

For a generic test, it can be said with 95% certainty these students will achieve a score in the +/-1.65 range. Or equivalently, that 95% of their scores on a test fall in this range. The range can be narrowed by trading away confidence the scores will appear in a narrower range band, shown with the 80% range of +/-1.28 standard deviations from mean.

View attachment 3871

The merit of a test could also be measured against the history of test results in a similar manner. A good test would produce the least variance of student scores from their means, while a poor test would result in student scores farther away from their respective averages.

For a generic test, it can be said with 95% certainty these students will achieve a score in the +/-1.65 range. Or equivalently, that 95% of their scores on a test fall in this range. The range can be narrowed by trading away confidence the scores will appear in a narrower range band, shown with the 80% range of +/-1.28 standard deviations from mean.

View attachment 3871

The merit of a test could also be measured against the history of test results in a similar manner. A good test would produce the least variance of student scores from their means, while a poor test would result in student scores farther away from their respective averages.

I did a little spreadsheet work on the dice model interpretation, didn't get to the end of it though.

Somebody mentioned this might be a calculus integration problem, something like maybe having two probabilities a={mean:64, stdev:15}, b={mean:72, stdev:13} and working out the overlap from there?

Somebody mentioned this might be a calculus integration problem, something like maybe having two probabilities a={mean:64, stdev:15}, b={mean:72, stdev:13} and working out the overlap from there?

Last edited:

The problem with averaging out monte carlo is that I'd be back to the mean, no?

One basic problem is that although a student's test average is probably a good indicator of their expected exam score, we need some idea of the random variation that a student may have about that expected value.

No, I don't think so. The output from the Monte Carlo is the ID# of the student with the highest exam score. So the output looks like 2 4 3 2 3 2 1 2 3 2 ... and from this you can get the proportion (probability) of each student being top.

One basic problem is that although a student's test average is probably a good indicator of their expected exam score, we need some idea of the random variation that a student may have about that expected value.

One basic problem is that although a student's test average is probably a good indicator of their expected exam score, we need some idea of the random variation that a student may have about that expected value.

Is this real data? Are you running a sweep?