# How to calculate "expected" value given correlation coefficient, etc.?

#### jonmrich

##### New Member
I have two independent variables: number of stores per metro area (about 100 areas) and number of actual visitors. I was able to figure out (or rather had Tableau figure out) that the two have an R-squared of 0.61 (r=0.78), so there is a strong correlation, which I would expect. I also have values for standard error, SSE, MSE, and p-value. So far, so good.

So, here's the question: I'd like to calculate what the number of visitors "should" be based on the number of stores given the correlation. For example, metro area X with 10 stores had 25,000 visitors...how many "should" they have had given the data?

What's the best...if any...way to do this?

And I should also say that this isn't for homework, but it didn't seem to fit anywhere else and since I need to figure it out for a work project, I guess it may as well be homework.

#### Dason

That isn't really what correlation is used for - it sounds like you actually want to fit a regression.

#### jonmrich

##### New Member
Well...that's the problem. Thanks.

Should have been thinking linear regression...it's been a long time.

Semi-related question: I have the linear trend line plotted and the r. I am given the option to fit the trend line with or without the y-intercept at zero. I think you generally don't do this, but wondering if it would make sense in my case. In this case, the y axis is the number of stores in metro area (independent variable...I think) and the x is visitors to stores in metro area. Since there can't ever be less than zero stores, shouldn't the intercept be at 0?

#### jonmrich

##### New Member
And a related question...do I have my independent and dependent variables mixed up? It kills me that I used to know this stuff, but it's been wiped from my brain apparently.

#### Dason

I would say don't restrict the intercept to be 0. Even if it makes sense from a theoretically standpoint it's not worth it from an applied point of view because it can mess up the fit and we don't typically think of the regression line as providing a perfect fit over the full range of unobserved x values so why force the intercept to be something that will provide a worse fit on the range of values we did observe?

For the case of regression the 'independent' variable is also known as the 'predictor' and the dependent variable is the 'response' (the value that you are trying to predict).

#### jonmrich

##### New Member
Makes sense. Thanks for the explanation.

The last sentence for independent v. dependent helps a ton.

Appreciate the help.