Question - Linear regression using mean values

#1
Hi,

I am conducting a meta-analysis. I collected average values of two variables: length and weight of fish of several species. I have 160 averages of length and 160 averages of weight. Is it correct to correlate these averages? Thank you
 
#6
I want to compare the length-weight relationship (i.e., the slope of the regression between these variables) between tropical freshwater environments (lakes, great lakes, wetlands and Rivers). I have mean and standard deviation of length and weigth of several fish from several lakes, rivers and etc. My idea is to make a linear regression between these mean values in each ecossytem and compare the results.
 

j58

Active Member
#9
It is important to understand that the relationship observed at the aggregate level will generally not apply at the individual level. Compared with the correlation between the means, the correlation measured at the individual level can be stronger, weaker, or even in the opposite direction.
 
#10
I understand. Even though it is not possible to apply at individual level, i still be able to represent the ecossytem? For instance, the correlation between 160 means length values and 160 means weight values will represent the length-weight relationship of a given ecossystem?
 

j58

Active Member
#11
It will represent the relationship between the mean length and mean weight in the population from which the measurements were derived.
 

Dason

Ambassador to the humans
#12
Yeah with just the means/sds of the different groups it's theoretically possible (although it wouldn't seem likely in this case) that there is a negative correlation within each group but a positive correlation for the data aggregated to the means. I attached an example of how that could play out in practice (even if it is unlikely) Flip_Example.png

The black dots are the individual data points for the groups. The red dots are the means for the groups. Each black line is a rough best fit line within each group (they all have negative slope). The red line is the best fit line for the aggregated data. So it's possible that the relationship 'flips' depending on what you actually care about. Note that even in this example if we had the raw data and ignored group the overall trend would have a positive slope. If we included the grouping variable to at least give a different intercept we would end up with a negative slope though.