Methods for trimming reaction times


I am a student currently working on analyzing data from an oddball experiment. We have 30 participants who went through two conditions (I will call them condition A and B) two times each. The order of the conditions was counterbalanced, so each participant would either do ABAB or BABA. In each condition (taken together across blocks), we got approximately 95 reaction times (RTs) from "standard" stimuli and 95 RTs from "odd" or deviant stimuli. (It might seem strange that we have as many standards as odds, but it makes sense within the context of the experimental design.) Thus, We get four sets of ~95 RTs from each condition: 95 A standards, 95 A odds, 95 B standards and 95 B odds. The aim of the experiment outside of this should be irrelevant for my question.

The thing is, I want to filter the RTs properly, removing outliers in an approved way. I've read a few articles on the subject, but my lack of proficiency with statistics has made it hard for me to interpret them properly.

To begin with, take this article. The most relevant section is page 391-393. They eventually suggest filtering reaction time data by eliminating RTs outside ±3 SD by subject and by item (and then testing both the filtered and the unfiltered data with ANOVA to see if the analysis yields significant results in both, just one, or neither cases). I assume the "by subjects" part means that you make a distribution of all RTs for each individual subject and then remove all RTs 3 SD below and above the mean for that subject. However, I am not sure that I get the “by item” part. Is the implication that we should make a distribution of all 30 RTs across subjects that will be accumulated for each ~380 (95*4) items/trials (30 for the first standard, 30 for the second standard, 30 for the first odd ... etc.) and then exclude the most extreme RTs from each item using the +/- 3 SD rule (that is, using the local SD for each item)? If so, should one use the by-item-trim or the by subject-trim first? Or am I misunderstanding all of this completely?

Also, is there an argument to be made for using an absolute cutoff in addition to remove the most obvious/extreme outliers, for instance removing all RTs below 200 ms and above 1500 ms, before using the ±3 SD rule to trim the data? Or are the above methods assumed to take care of such cases?

Now, if I get answers to the above questions I might not need anything else, but I'll also mention that I've read this study, which is referenced a lot. On page 519, he sums up his recommendations. I do not get his steps here. But I'll get back to that if necessary after looking more into the other questions above.
Last edited: