I work for an office that does 3-stage competitions (where the 3rd stage is essentially having won the competition). We are trying to figure out the "success rate" of various groups of people from competition 1 to 2, 2 to 3, and 1 to 3. To calculate success of group A going from stage 1 to 2, we have been using the equation: (# of people in group A in stage 2) / (# of people in group A in stage 1). This has been their simple "success rate of group A" ratio. HOWEVER, I perceive a problem that is not being addressed, which has to do with sample size. The point of the 3-stage competition is essentially to reduce the number of applicants that make it to stage 3. So as an example, we usually have 400 people in stage 1, 100 people in stage 2 and 20 people in stage 3 (so about 20 people win).

If there are 5 people in group A when there are 400 people, and 3 of them make it to the next stage where there are 100 people, their relative success to me seems higher than group B where there are 395 people in the 1st stage and 97 of them make it to stage 2, but that's not always showing in the numbers and this is what sent up red flags for me.

Am I making a mountain out of a mole hill? How do people usually calculate this? Is the sample size difference even relevant? We want to ensure the competition is treating all groups fairly, which is why they are trying to figure out these numbers and I am worried that if the method they are using for calculating it is flawed then their conclusions about the fairness of the competition could be flawed.

Thank you very much for any help you can provide. If you know of a reference where the methodology you recommend is used I would greatly appreciate that as well (but I'll take any help so post even if you don't have a reference at hand).

THANK YOU!