Suppose you are conducting a study with n=10 subjects. Each subject runs through a behavioural experiment for treatment A and treatment B, and several data points are collected. Before conducting the ANOVA, an AVERAGE score is calculated for each subject based on all their individual data points. Finally, the average scores are submitted to an ANOVA.

My question is—why is the variability in the individual data points essentially ignored? One only performs the ANOVA on the mean values. Is the proper technique here to conduct the ANOVA, and also obtain a measure of reliability of the behavioural task? If this is so, then it seems that not enough people perform reliability measures of their tasks, not to mention that more replication studies should be done. On the topic of replication, I think more credit should be given to people who replicate results than they are currently receiving.