Regression R2-value Vs spearman correlation coefficient

#1
Greetings

I have been assessing some of the relationships between EDTA extractable Pb Vs Pb in plants etc using regression analysis in minitab. I just found a study that did a correlation analysis instead providing the correlation coefficient and p-value in order to assess if the relationship is significant or not.

I tried this my self using minitab - Basic stats - Correaltion - Spearman (non-para) and I found that the relationships with a relatively strong R2-value in my regression analysis was different from what the correaltion analysis found to be significant.

Can someone explain this to me?`

Regards
 

hlsmith

Omega Contributor
#2
Please provide the values you got from both analyses, along with a description of variables and how they were formatted and sample size.


Was the regression model appropriately specified, model assumptions met?


Did the regression model control for other variables.


Did you have confidence intervals around your r^2 value to actually know if it was significantly different from zero?


Pearson correlation can be derived from regression, but Spearman is a rank based approach, so can deviate.
 
#3
Please provide the values you got from both analyses, along with a description of variables and how they were formatted and sample size.


Was the regression model appropriately specified, model assumptions met?


Did the regression model control for other variables.


Did you have confidence intervals around your r^2 value to actually know if it was significantly different from zero?


Pearson correlation can be derived from regression, but Spearman is a rank based approach, so can deviate.
The sample size is normally N = 18-30. For example, by applying regression analysis to Total Zn in plants vs EDTA Zn I got R-sq = 26.4 (0.264) and R-sq (adj) = 21.8% (0.218) but with peason correlation analysis I got a correlation coefficient of 0.513 and p-value of 0.029 and with spearman I got 0.713 and a p-value of 0.001.

This is the output from the regression analysis:

Regression Analysis: Total Zn plant versus EDTA Zn

The regression equation is
Total Zn plant = 557.4 + 1.029 EDTA Zn


S = 471.479 R-Sq = 26.4% R-Sq(adj) = 21.8%


Analysis of Variance

Source DF SS MS F P
Regression 1 1273080 1273080 5.73 0.029
Error 16 3556677 222292
Total 17 4829757
 
#4
Please provide the values you got from both analyses, along with a description of variables and how they were formatted and sample size.


Was the regression model appropriately specified, model assumptions met?


Did the regression model control for other variables.


Did you have confidence intervals around your r^2 value to actually know if it was significantly different from zero?


Pearson correlation can be derived from regression, but Spearman is a rank based approach, so can deviate.
Okay so I just read that regression analysis is one of the most abused statistical method among scientists and that there was in fact loads of assumptions assosiated with it (as you mentioned). I was not aware of this and have now removed all my regression analysis from my report as my data is not normally distributed and change it to spearman correlation analysis. Cheers! I almost did a massive misstake there.
 

hlsmith

Omega Contributor
#5
Yeah, that is a concern for sure. Side note, ddi you see how your R^2 = correlation^2? Yeah the two correlations measures have different algorithms, so the results can definitely differ.
 
#6
Yeah, that is a concern for sure. Side note, ddi you see how your R^2 = correlation^2? Yeah the two correlations measures have different algorithms, so the results can definitely differ.
No I did not, not until you pointed it out! So is that one of the differances between R2 and a correlation coefficient? And alos, is that why its R^2? Or did it occur by chance?

Thanks again, really appreciate the help
 

hlsmith

Omega Contributor
#7
Correlation is referred to as rho, r; and yes the amount of variability explained is equal to r*r or r^2. I believe this direct equivalency only holds for simple linear regression. If you had more than one predictor in the model the formula is different to account for partial correlation, etc.