- Thread starter Biostat212
- Start date

In a problem where they ask you to "predict <y variable> from <x variable>", then you should do a linear regression.

Having a linear regression, you can create a formula like y=2x+1. So then for a given X, you can determine what Y should be.

On the other hand, having a correlation, you cannot determine Y for given values of X. Instead, a correlation that is high (close to +1, or close to -1) can tell you that there is a strong linear dependence between the X and Y variables. ie a correlation can be used by you to figure out how much you can rely on your linear regression.

You often want to check correlations, if you are planning to do regressions (linear/logistic/etc).

On the other hand, having a correlation, you cannot determine Y for given values of X. Instead, a correlation that is high (close to +1, or close to -1) can tell you that there is a strong linear dependence between the X and Y variables.

1) Correlation gives a measure of strength of linear relationship between y and x.

2) Regression explains how much of variance in y can be explained by x

1) and 2) are related: 1) gives correlation r and; 2) gives explained variance squared-r. So lower correlation means lower explained variance and poor prediction.

Sometimes you could have a reasonable correlation, say 0.5. This means the regression relationship explains only 25% of the total variance, meaning you may not get a reasonable prediction of y using x.

bbkaran

I was just trying to distinguish between meanings of regression and correlation.

I think what you also said is good: "If you want to determine y for given values of x (ie regression) then you want a good correlation."

I think what you also said is good: "If you want to determine y for given values of x (ie regression) then you want a good correlation."