Comparison of two different rating methods.


I would like to compare different rating methods statistically. I will have 200 students' essays to be rated in two different methods (double raters + discussion in case of a difference VS double-blind raters + third rater in case of a difference) and I would like to compare the results.

Can I calculate z scores and compare the z scores to suggest that one method is better than the other?

Can you suggest an alternative analysis?