Hi, I'm not a student and this isn't homework, just to clarify, I am doing this for my job. I ran across a statistic my office has been calculating and I feel like it's flawed but I can't pin down what the correct method would be, and searching for the right method online has proved fruitless, I may be looking at the wrong keywords. Anyways, here's my problem...

I work for an office that does 3-stage competitions (where the 3rd stage is essentially having won the competition). We are trying to figure out the "success rate" of various groups of people from competition 1 to 2, 2 to 3, and 1 to 3. To calculate success of group A going from stage 1 to 2, we have been using the equation: (# of people in group A in stage 2) / (# of people in group A in stage 1). This has been their simple "success rate of group A" ratio. HOWEVER, I perceive a problem that is not being addressed, which has to do with sample size. The point of the 3-stage competition is essentially to reduce the number of applicants that make it to stage 3. So as an example, we usually have 400 people in stage 1, 100 people in stage 2 and 20 people in stage 3 (so about 20 people win).

If there are 5 people in group A when there are 400 people, and 3 of them make it to the next stage where there are 100 people, their relative success to me seems higher than group B where there are 395 people in the 1st stage and 97 of them make it to stage 2, but that's not always showing in the numbers and this is what sent up red flags for me.

Am I making a mountain out of a mole hill? How do people usually calculate this? Is the sample size difference even relevant? We want to ensure the competition is treating all groups fairly, which is why they are trying to figure out these numbers and I am worried that if the method they are using for calculating it is flawed then their conclusions about the fairness of the competition could be flawed.

Thank you very much for any help you can provide. If you know of a reference where the methodology you recommend is used I would greatly appreciate that as well (but I'll take any help so post even if you don't have a reference at hand).
THANK YOU!

If it is a competitive game where different match ups occur it is never fair in that sense unless you invoke a pretty elaborate scoring scheme or everyone plays everyone. Here elaborate means some how relative performance is considered. You didnt mention anything like that so I think we can assume that is not presence.

Without that the statistics intuition says though if it is a game where players can "hurt" other players by being "good" (ie hold them back) then keeping the numbers in a group roughly "medium or larger" will go along way to keeping it fair.

The reason is the skill of the players is sampled from a population. If the groups are "large" then the distribution of the groups skill will tend to be that of the larger population. However, if the group is "small" the distribution of the group skill varies alot more from the larger population. For example the 5 players in group A might all suck or they might all rock.

But if it was 50 players in group A and 50 players in group B then you might expect they both have roughly the same distribution of skill and suck.

Now, as for fair. It really depends on the context. Even though 5 person match ups are unfair in the sense that the wrong draw might be to your detriment (imagine 2 people playing) it might still be called fair if the 5 person matchup was assumed random (though it doesnt sound like thats a factor in your game)

