Testing reliability for video coding of durations


I do research in primatology that involves coding monkey behavior from video. I've checked inter-observer and intra-observer reliability for *frequency* of events (and got a high percent agreement, Pearson's R and Cohen's Kappa).

But how do I do a reliability check on the *durations* arrived at by two different coders (or recodes by one coder)?

Imagine a list with 2 columns: score 1 and score 2
And maybe a third column: the difference between them (or abs value of diff?)

That's what I've got to work with. 293 comparisons, most of them very close (the *sum* of the differences is -6, and times range up to 30s intervals). So I think it's going to be reliable, but I need a proper statistical test.

Unfortunately, my three stats texts do not touch on this issue, even in the index. Any ideas?


New Member
This is not my area, but I would probably plot the data first to see if there are any obvious deviations.

Maybe you could make a straight line forced through the origin? If the B value is close to 1, and if R is high- the reliability should be high. I would maybe try to make a BLUE straight line through the origin and see how well it fits. Or draw a line that represents pefect reliability (B-value 1, through the origin) and see how well it fits. Or maybe you could just calculate the mean absolute difference and make a confidence interval.