I'm currently working on neurons. We measure some phenotypic parameters (e.g. the frequency of activity or the amplitude of a given parameter) and on the same neurons the expression of a gene (mRNA expression level). The idea (not mine) is to calculate the correlation between a parameter and a gene on grouped data. For example, we may divide our neurons (n= 144) into 6 groups or more (groups size 24) according to the ordered frequency of activity (very high, high, medium, slow, very slow, silent), the mean of each subgroup being our independent values and within the same groups the mean of gene expression being the dependent variables. Thus, we can compute a correlation with 6 (or 12 xy pairs instead of 144). I wonder if we can do this because it seems biased to me in the way that reducing variability by averaging data would increase r (with 2 groups r would be 1 right?). I tried some modeling and it is not so clear when group number is large (10).

I wonder if we could keep this grouping idea but then use the groups as categorical data instead of continuous ones (Anova).

Thanks a lot

Fabien