# Dependent Variable vs Independent Variable

#### jkow

##### New Member
Hi all,

You will need to forgive me if this sounds a bit primitive but I'm not a stats guy. Working on a project which I've come across a bit of a bump with.

I've been given a data series. Property information including Past Sale Price (Dependent), No# of Bedrooms, Bathrooms , Carparks (Ind V) and Rental Yield (%)(Ind V).

My issue is that rental yield (%) is derived from Past Sale Price. Is it possible to complete a regression analysis where an independent variable is derived from the dependent variable without it be bias?

Secondly, a property that might have sold for $500,000 could have a rental yield of 4%. Similarly a property that sold for$400,000, $900,000 or$750,000 could also have a rental yield of 5% not giving any defining nature to it.

The data is from 100 properties.

Looking forward to hearing some feedback

#### obh

##### Active Member
Hi Jkow,

There is always some correlation between the IVs, so this is not necessarily a problem.
You may encourage multicollinearity, but it may affect the coefficients and the ability to choose coefficients, but shouldn't affect the DV if you chose the correct coefficients. (http://www.statskingdom.com/doc_regression_validation.html)

#### GretaGarbo

##### Human
Is it possible to complete a regression analysis where an independent variable is derived from the dependent variable without it be bias?
That seems to be a problem.

If the dependent variable is Past Sale Price (PSP) and one of the explanatory variables is rental yield (RY) wich is a function of PSP

So that the model looks like:

PSP = a + b1*RY(PSP) + (other) + epsilon

Then the explanatory variable RY will be correlated with the disturbance term epsilon. That will lead to biased and inconsistent estimates.

-- -

But why do you have to use RY in the model? Or maybe you can find an instrumental variable.

You can search for "Hedonic price indices" about house prices and number of rooms etc.

#### obh

##### Active Member
Hi Greta

Generally, a high correlation between IVs doesn't affect the DV, the model as a whole, only increase the variance of the predictors
So you can't know how each predictor separately influences the DV.

Do you mean that PSP will be biased because not only there is a high correlation between RY and PSP, but also RY depends on PSP ?

Thanks

#### GretaGarbo

##### Human
Yes, I am not talking about the usual correlation among x-variables, i.e. multicolinearity.

I am talking about when one or several x-variables is correlated with the stochastic disturbance term (epsilon). That will mean that there is no statistical exogeneity.

(Usually the x-variabels are thought of as "fixed numbers", possibly set by an exerimental design, or that they are stohastic but independent of epsilon.)

but also RY depends on PSP
Yes, that is what I mean. That means that there is a simultanity. PSP depends on RY but also RY depends on PSP. Without other identifying restrictions there will be a bias and inconsistent estimates.

#### jkow

##### New Member
obh, not quite what I mean. You're looking at relationship between IV and IV. I'm looking at DV and IV relationship and that IV should not be influenced by DV.

Greta Garbo is very much confirming what I initially thought. IV can't or shouldn't be affected by DV. It creates a perpetuity.

This was a data set given by my university for an assignment on Property and Real Estate analysis (Regression Model). Otherwise I wouldn't use it. The marks are out and I got a good mark (so I'm not worried about collusion on this assignment) but I'm still butting heads with lecturers regarding this argument that Rental Yield % creating a bias. They cannot see this point of view. My opinion was that the inclusion of rental yield % in the data set would make the model bias as it was some form of simultaneity and lacked exogeneity (exact words used in my assignment).

Their argument is that the rental yield is statistically significant, which I confirmed it is but purely because it's a number derived from the DV itself. It draws its significance from the DV that is valuing it.

Many thanks for all your replies.

Hi Greta

Generally, a high correlation between IVs doesn't affect the DV, the model as a whole, only increase the variance of the predictors
So you can't know how each predictor separately influences the DV.

Do you mean that PSP will be biased because not only there is a high correlation between RY and PSP, but also RY depends on PSP ?

Thanks

#### jkow

##### New Member
Also, I have no idea what epsilom is so didn't venture that far into correlation of disturbance of epsilom. We didn't get that far into it.

Not a statistics major. Just a Property and Real Estate student.

Yes, I am not talking about the usual correlation among x-variables, i.e. multicolinearity.

I am talking about when one or several x-variables is correlated with the stochastic disturbance term (epsilon). That will mean that there is no statistical exogeneity.

(Usually the x-variabels are thought of as "fixed numbers", possibly set by an exerimental design, or that they are stohastic but independent of epsilon.)

Yes, that is what I mean. That means that there is a simultanity. PSP depends on RY but also RY depends on PSP. Without other identifying restrictions there will be a bias and inconsistent estimates.

#### obh

##### Active Member
obh, not quite what I mean. You're looking at relationship between IV and IV. I'm looking at DV and IV relationship and that IV should not be influenced by DV.

Greta Garbo is very much confirming what I initially thought. IV can't or shouldn't be affected by DV. It creates a perpetuity.

This was a data set given by my university for an assignment on Property and Real Estate analysis (Regression Model). Otherwise I wouldn't use it. The marks are out and I got a good mark (so I'm not worried about collusion on this assignment) but I'm still butting heads with lecturers regarding this argument that Rental Yield % creating a bias. They cannot see this point of view. My opinion was that the inclusion of rental yield % in the data set would make the model bias as it was some form of simultaneity and lacked exogeneity (exact words used in my assignment).

Their argument is that the rental yield is statistically significant, which I confirmed it is but purely because it's a number derived from the DV itself. It draws its significance from the DV that is valuing it.

Many thanks for all your replies.

#### jkow

##### New Member
And do you agree with what she is saying?

A second opinion is always warranted.

#### obh

##### Active Member
Also, I have no idea what epsilom is so didn't venture that far into correlation of disturbance of epsilom. We didn't get that far into it.

Not a statistics major. Just a Property and Real Estate student.
In the regression, not all the variance is explained by the predictors, I assume eplison is the random part, the residuals, that should distribute normally with 0 mean.

#### jkow

##### New Member
I have no idea what you just said... sorry... Way over my head.

#### obh

##### Active Member
Residual = Observed value - Predicted value. (residuals are the estimates of the errors. since you use in the model the estimates of the coefficients, not the real value of the coefficients).
e = y - ŷ

for example, if the estimated regression model is:
ŷ=2X+1

Y=3.1 and x=1.
ŷ=2*1+1=3
e = 3.1-3=0.1

#### jkow

##### New Member
Would you suggest that there are any Journal articles or websites that would agree with my point of view? I've struggled to find any that confirm my opinion.

Yes, I am not talking about the usual correlation among x-variables, i.e. multicolinearity.

I am talking about when one or several x-variables is correlated with the stochastic disturbance term (epsilon). That will mean that there is no statistical exogeneity.

(Usually the x-variabels are thought of as "fixed numbers", possibly set by an exerimental design, or that they are stohastic but independent of epsilon.)

Yes, that is what I mean. That means that there is a simultanity. PSP depends on RY but also RY depends on PSP. Without other identifying restrictions there will be a bias and inconsistent estimates.

#### GretaGarbo

##### Human
Would you suggest that there are any Journal articles or websites that would agree with my point of view? I've struggled to find any that confirm my opinion.
You, or your teachers, can have a look at an econometrics book about simultaneous equations as for example in Greens book in (Chapter 15 Simultaneous-Equations Models). But this is not the easiest reading.

By the way, if the variable RY is based on a previous value (i.e. it is a lagged value, like RY_t-1) then it is a predetermined value and there vill be no bias.

#### jkow

##### New Member
You, or your teachers, can have a look at an econometrics book about simultaneous equations as for example in Greens book in (Chapter 15 Simultaneous-Equations Models). But this is not the easiest reading.