how compare the kappa of different groups?

Hi there,

I want to conduct a study in which 15 raters will rate 100 photo's of cutaneous nevi. These 15 raters are divided into three groups of five people based on their level of training and expertise. To evaluate the level of agreement between the raters within each group, I will use the kappa statistics as described by Fleiss. Now comes the hard part for me. How do I compare the level of agreement between the three groups? I'm having a hard time finding out which statistical test I should use. Would I simply have to compare the confidence intervals of these groups?

Any help is appreciated!
kind regards,
Last edited:
No, I'm currently busy with writing the research proposal. But I have to mention which statistical tests I am going to use.


Active Member
Ok. I only mention SPSS because I know it will give you approximate standard errors and confidence intervals for Fleiss' kappa but I guess you will have some source for them that does the same job.
One method is to compare the kappa 95% CI's for each pair of groups as you suggest. A pair can overlap and still be significantly different. A pair that just touch indicate a p value of less than 1%.However, you will be making three comparisons so the cutoff for significance should be less than 0.05, say >2% (Bonferroni?), so the no overlap method is close. The CI's SPSS calculates are large sample ones and so the true CI's will be larger. In short, "no overlap for significance" probably works out about right but may take some explaining.
Next method is to use the SE's that your software gives you. Then, for each pair, calculate the difference in kappa, calculate the SE of the difference which is SQRT(SE1^2+SE2^2) then difference/SE difference is approximately normal. Find the tail below 0 and double it for a two sided p value. Again, use an adjusted critical p value for significance. This at least has some calculating credibility.
Perhaps the best method, in my opinion, is a randomization test on the differences. It generates its own SE difference and produces a p value. Bonferroni will still be needed.
Thanks for the answer! I'm going to take a look at your explanation with the rest of team! I'll let you know how it worked out.