Comparing Cronbach's alpha intraclass correlations with varying rater permutations

#1
I've done a study for my PhD comparing different methods/conditions that might affect inter-rater agreement when rating the creativity of other people's drawings. I had 600+ drawings that needed to be rated, so it wasn't feasible to have the same few people rate all the drawings. So 24 raters were each given a randomized sequence of 87 drawings to rate, each on 5 different measures of creativity. This ensured that each drawing was rated by at least 3 different raters (but not always the same permutation of 3 raters). I was comparing 3 different methods/conditions. The 24 raters were different for each of the 3 methods (independent samples). I have calculated intraclass correlations (Cronbach's alpha; absolute agreement) for each measure, for each method/condition, but now I want to statistically compare them in order to see if one method/condition produced significantly higher ICCs relative to the others.

I was told by a medical statistician friend that she didn't think I could compare them, because the raters were different for each drawing, as well as between conditions, but she wasn't sure. I've had a Google and found references to how to compare ICCs but in all those examples the raters stayed constant, there wasn't a mixture of permutations throughout, so I am not sure if those methods apply in my case.

I also found something that had a calculator for a Feldt comparison test and a Fisher-Bonett test, the first for large samples, the 2nd for small ones (I'm assuming the sample they're referring to here would be the number of ICCs you're comparing, i.e., 3?, rather than the 663 drawings?). But I'm not sure if I can use those either in this case? Or can I?

Can anyone please help? I just need some reassurance that I am doing this correctly as my supervisors aren't sure either. Thanks so much in advance!