measuring concordance

I have just come into a project where the data has already been collected and I’m not sure how to analyse it, so I hope someone can help me!

The research is related to what aspects of therapy people find most helpful.

Around 400 Participant statements relating to what people find helpful.
3 models relating to helpfulness (2 with 6 categories, 1 with 8 categories).

To evaluate which model is the best

Each statement has been rated (by 8 raters) in relation to each model (and assigned to 1 category in each model)

I know that kappas or kendalls test of concordance might be involved, but I’m not sure exactly how to go about it.
I also imagine there is a problem with the models containing different numbers of categories – would this mean there would need to be some weighting to adjust for it?

8 raters have allocated 400 statements 3 times (to 1 category per model, where each model has more than 1 category)

How do I assess which model has the best overall concordance between raters?

My little brain just can’t seem to think its way through this so I would be hugely grateful to anyone who could help me.

Thanks, dalia