- Thread starter Burnsie_UK
- Start date

So, I am trying to compare the scores given by two groups.

One group is an expert group, and the other is a novice group.

I am trying to see if the scores both groups give relate to each other.

The scores are going to be ordinal, in that 1= fail, 2=pass with error, 3=perfect.

These individuals scores make up a whole score for a whole test for one person.

Looking at some papers, some have used ICC, others have used kappa.

I have found a paper that uses an unweighted kappa for ordinal data (and a stats paper saying weighted kappa is for ordinal).

They used ICC for inter rater reliability who the whole composite score (7-21), and unweighted kappa for the individual scores (1-3s).

Confused.com

- Kappa is treating ordinal data as if it were nominal data. This results in the loss of information.
- Weighted kappa takes the ordinal nature of the data into consideration, but results could be influenced by the weighting used
- Kendall's is often used with ordinal data
- ICC is used for continuous data or when there are enough ordinal levels to consider the data to be continuous (3 would not meet this criteria, but 7-21 would). There is risk involved in making this assumption.

- Kappa is treating ordinal data as if it were nominal data. This results in the loss of information.
- Weighted kappa takes the ordinal nature of the data into consideration, but results could be influenced by the weighting used
- Kendall's is often used with ordinal data
- ICC is used for continuous data or when there are enough ordinal levels to consider the data to be continuous (3 would not meet this criteria, but 7-21 would). There is risk involved in making this assumption.

Kendall's Tau??

So comparing rater groups for the individual screens (1-3 scoring system) using Kendall's

And comparing the composite scores between groups, ICC?

I feel I should also look at intra rater scores, but this would involve a further testing protocol as currently I'm trying to do "live" testing in the real world, and thus test-retest is pretty impossible!.... But if I was to do so....?

We then compare all raters (in two groups) from Monday on all participants. If there is agreement between the groups, then great. But surely with only 3 possible outcomes, the chance of accidental agreement is high.

Thus, test then all again on Wednesday and Friday and see if rater A gives the same score for the same screw as they did the other days?