Meta-analysis of kappa statistics

I'm looking at doing a meta-analysis on the reliability of a few pathology outcomes. I'm trying to follow the method set out in this paper as the reliability is usually measured using Cohen's Kappa. So far just a couple of questions.

1). Some papers I look at report a weighted kappa statistic, while others do not (even though some outcomes have 3 categories of mild, moderate and severe). The weighted kappas will give lower values than would a normal kappa, so is it still reasonable to pool these results - or should unweighted kappas be excluded (the vast majority of papers do not give enough data for me to construct the results myself).

2.) Many papers give a series of pairwise kappa statistics. However, in the example in the paper I'm following above, only one kappa score per study has been given. I'm thinking it's perhaps Fleiss' kappa they are reporting, though this is not made explicit. Again, does it make sense to pool Fleiss' and Cohen's kappa? They seem like different things to me, even though they measure similar things.

As always, any insights appreciated.


Omega Contributor
You can always try to contact "Corresponding Authors" for clarity of published results. If there seems to be a systematic threat related to the formatting of results, I would ask if there are enough studies just to run multiple M-As for each type of results?
I will contact some authors, good idea, but some studies are over 30 years old and were published before emails were a thing (apparently there was such a time). They are still relevant though, that's how slowly the field of pathology moves - hell, they're still using techniques essentially the same as from the Edwardian era.

I'm hoping there are enough to run multiple studies. So far I have ten studies, but there are 3 more which look relevant but I have yet to find a full text version. Another problem though is that not every study uses the same outcome measures. For instance the measure of grade of a tissue sample is sometimes measured as mild, moderate or severe, but sometimes simply as mild or severe.

To clarify, I'm looking at the determination of grade, type and invasion of suspected colon cancer biopsy tissues. There are guidelines but ultimately relies on a pathologist's opinion.


Omega Contributor
Is your end game to try and publish a meta-analysis? If so, you really need to create a protocol before commencing, that way you don't let the results and availability of results bias your process.

Yes, trying to pool results across studies has many limitations, including study designs, data collection, and reporting. Good Luck.
Cheers. It's just for my own research at the moment.

So now I have a question regarding whether to use fixed, random or mixed effects model.

All along I thought a mixed effects model would be best as the studies don't study the exact same outcomes - there are always lesser or greater differences, and I know what some of them are so I can include them as moderators in a mixed-effects model.

But my ultimate aim is simply to compare the reliability of pathologists regardless of the types of samples they are working with. So now I'm wondering whether a random effects model be sufficient to this end.