# Help me with a statistical hypothesis

#### truonghoquang

##### New Member
Hi,

I have 46 dissertations from students. I asked 46 students and 4 experts to grade the dissertations using 8-scale mark (1 ... 8)
Each student have to evaluate at least 10 different dissertations, and each expert have to evaluate at least 20.
In the end, each dissertation is graded by at least 10 students and 2 experts.

My test-hypothesis is: Are the evaluation (grading) of students and experts consistent?

I tried to find the correlation between student's evaluation and expert's evaluation but I am not sure it is the right test. Can you please help me with the problem? What type of test I should use?

Regards,
Truong

#### noetsi

##### No cake for spunky
test hypothesis are not stated as questions. It would be more logically Students and experts grade consistantly. Or alternately, the difference between the average ratings of students and experts is zero.

To decide what test you will use you have to decide if a 8 point scale is reasonably interval or not (formally it is not, but such scales are often considered 'interval like' and means calculated). If you can consider it interval than you can use regression, ANOVA and similar methods to test the results. If it isn't there are various non-parametric test (including those that test median or ranks). There is also ordered logistic regression although 8 distinct levels would probably cause problems in practice to do that.

An entirely different approach would be interrater reliability commonly used in education to decide if graders are consistant in their grading. It has been too long since I worked with that for me to comment on it.

#### rogojel

##### TS Contributor
hi,
it deoends a bit on how you define "consistent"? I think that an 8 point scale is too fine to get a high percentage of agreements but maybe you could use a rougher scale, say BAD, NEUTRAL, GOOD and define consistent as agreement on this scale - i.e.if the expert and the students all classified a work as GOOD then they are consistent.

From a test perspective you might want to look at Fischers Kappa ( implemented in Minitab under Appraiser Agreement Analysis)

regards