Utilizing Individual Data Points in an ANOVA, Reliability, Replication

Hi all.

Suppose you are conducting a study with n=10 subjects. Each subject runs through a behavioural experiment for treatment A and treatment B, and several data points are collected. Before conducting the ANOVA, an AVERAGE score is calculated for each subject based on all their individual data points. Finally, the average scores are submitted to an ANOVA.

My question is—why is the variability in the individual data points essentially ignored? One only performs the ANOVA on the mean values. Is the proper technique here to conduct the ANOVA, and also obtain a measure of reliability of the behavioural task? If this is so, then it seems that not enough people perform reliability measures of their tasks, not to mention that more replication studies should be done. On the topic of replication, I think more credit should be given to people who replicate results than they are currently receiving.
Hi Jeff,

When you have multiple measurements, sometimes it makes sense to sum/average them to produce a summary score, and other times it can be grossly misleading. Averaging may eliminate important data features such as clusters of response patterns, temporal trends, or other important and informative aspects of your data.

When thinking about summarizing data using an average it's important to consider the data generating mechanisms:
1) What are the sources of variability in the measurements?
2) Are there order effects? Practice effects, fatigue, habituation, differences in test forms, ...
3) What are the relationships of the quantities you're averaging? Are they all really measuring the same underlying construct?

I'd rather let the model do the averaging: it fully utilizes the data, and can actually give you more information, such as lack of fit and variance components. While it takes a bit more work to understand conceptually, it gives you a much richer understanding of your data.