Dependent Variable vs Independent Variable

jkow

New Member
#1
Hi all,

You will need to forgive me if this sounds a bit primitive but I'm not a stats guy. Working on a project which I've come across a bit of a bump with.

I've been given a data series. Property information including Past Sale Price (Dependent), No# of Bedrooms, Bathrooms , Carparks (Ind V) and Rental Yield (%)(Ind V).

My issue is that rental yield (%) is derived from Past Sale Price. Is it possible to complete a regression analysis where an independent variable is derived from the dependent variable without it be bias?

Secondly, a property that might have sold for $500,000 could have a rental yield of 4%. Similarly a property that sold for $400,000, $900,000 or $750,000 could also have a rental yield of 5% not giving any defining nature to it.

The data is from 100 properties.

Looking forward to hearing some feedback
 
#3
Is it possible to complete a regression analysis where an independent variable is derived from the dependent variable without it be bias?
That seems to be a problem.

If the dependent variable is Past Sale Price (PSP) and one of the explanatory variables is rental yield (RY) wich is a function of PSP

So that the model looks like:

PSP = a + b1*RY(PSP) + (other) + epsilon

Then the explanatory variable RY will be correlated with the disturbance term epsilon. That will lead to biased and inconsistent estimates.

-- -

But why do you have to use RY in the model? Or maybe you can find an instrumental variable.

You can search for "Hedonic price indices" about house prices and number of rooms etc.
 
#4
Hi Greta :)

Generally, a high correlation between IVs doesn't affect the DV, the model as a whole, only increase the variance of the predictors
So you can't know how each predictor separately influences the DV.


Do you mean that PSP will be biased because not only there is a high correlation between RY and PSP, but also RY depends on PSP ?

Thanks
 
#5
Yes, I am not talking about the usual correlation among x-variables, i.e. multicolinearity.

I am talking about when one or several x-variables is correlated with the stochastic disturbance term (epsilon). That will mean that there is no statistical exogeneity.

(Usually the x-variabels are thought of as "fixed numbers", possibly set by an exerimental design, or that they are stohastic but independent of epsilon.)

but also RY depends on PSP
Yes, that is what I mean. That means that there is a simultanity. PSP depends on RY but also RY depends on PSP. Without other identifying restrictions there will be a bias and inconsistent estimates.
 

jkow

New Member
#6
obh, not quite what I mean. You're looking at relationship between IV and IV. I'm looking at DV and IV relationship and that IV should not be influenced by DV.

Greta Garbo is very much confirming what I initially thought. IV can't or shouldn't be affected by DV. It creates a perpetuity.

This was a data set given by my university for an assignment on Property and Real Estate analysis (Regression Model). Otherwise I wouldn't use it. The marks are out and I got a good mark (so I'm not worried about collusion on this assignment) but I'm still butting heads with lecturers regarding this argument that Rental Yield % creating a bias. They cannot see this point of view. My opinion was that the inclusion of rental yield % in the data set would make the model bias as it was some form of simultaneity and lacked exogeneity (exact words used in my assignment).

Their argument is that the rental yield is statistically significant, which I confirmed it is but purely because it's a number derived from the DV itself. It draws its significance from the DV that is valuing it.

Many thanks for all your replies.

Hi Greta :)

Generally, a high correlation between IVs doesn't affect the DV, the model as a whole, only increase the variance of the predictors
So you can't know how each predictor separately influences the DV.


Do you mean that PSP will be biased because not only there is a high correlation between RY and PSP, but also RY depends on PSP ?

Thanks
 

jkow

New Member
#7
Also, I have no idea what epsilom is so didn't venture that far into correlation of disturbance of epsilom. We didn't get that far into it.

Not a statistics major. Just a Property and Real Estate student.


Yes, I am not talking about the usual correlation among x-variables, i.e. multicolinearity.

I am talking about when one or several x-variables is correlated with the stochastic disturbance term (epsilon). That will mean that there is no statistical exogeneity.

(Usually the x-variabels are thought of as "fixed numbers", possibly set by an exerimental design, or that they are stohastic but independent of epsilon.)



Yes, that is what I mean. That means that there is a simultanity. PSP depends on RY but also RY depends on PSP. Without other identifying restrictions there will be a bias and inconsistent estimates.
 
#8
obh, not quite what I mean. You're looking at relationship between IV and IV. I'm looking at DV and IV relationship and that IV should not be influenced by DV.

Greta Garbo is very much confirming what I initially thought. IV can't or shouldn't be affected by DV. It creates a perpetuity.

This was a data set given by my university for an assignment on Property and Real Estate analysis (Regression Model). Otherwise I wouldn't use it. The marks are out and I got a good mark (so I'm not worried about collusion on this assignment) but I'm still butting heads with lecturers regarding this argument that Rental Yield % creating a bias. They cannot see this point of view. My opinion was that the inclusion of rental yield % in the data set would make the model bias as it was some form of simultaneity and lacked exogeneity (exact words used in my assignment).

Their argument is that the rental yield is statistically significant, which I confirmed it is but purely because it's a number derived from the DV itself. It draws its significance from the DV that is valuing it.

Many thanks for all your replies.
Sorry, I didn't read correctly your question, but Greta read correctly :)
 
#10
Also, I have no idea what epsilom is so didn't venture that far into correlation of disturbance of epsilom. We didn't get that far into it.

Not a statistics major. Just a Property and Real Estate student.
In the regression, not all the variance is explained by the predictors, I assume eplison is the random part, the residuals, that should distribute normally with 0 mean.
 
#13
Residual = Observed value - Predicted value. (residuals are the estimates of the errors. since you use in the model the estimates of the coefficients, not the real value of the coefficients).
e = y - ŷ

for example, if the estimated regression model is:
ŷ=2X+1

Y=3.1 and x=1.
ŷ=2*1+1=3
e = 3.1-3=0.1
 

jkow

New Member
#14
Would you suggest that there are any Journal articles or websites that would agree with my point of view? I've struggled to find any that confirm my opinion.

Yes, I am not talking about the usual correlation among x-variables, i.e. multicolinearity.

I am talking about when one or several x-variables is correlated with the stochastic disturbance term (epsilon). That will mean that there is no statistical exogeneity.

(Usually the x-variabels are thought of as "fixed numbers", possibly set by an exerimental design, or that they are stohastic but independent of epsilon.)



Yes, that is what I mean. That means that there is a simultanity. PSP depends on RY but also RY depends on PSP. Without other identifying restrictions there will be a bias and inconsistent estimates.
 
#15
Would you suggest that there are any Journal articles or websites that would agree with my point of view? I've struggled to find any that confirm my opinion.
You, or your teachers, can have a look at an econometrics book about simultaneous equations as for example in Greens book in (Chapter 15 Simultaneous-Equations Models). But this is not the easiest reading.

That would confirm your opinion.

By the way, if the variable RY is based on a previous value (i.e. it is a lagged value, like RY_t-1) then it is a predetermined value and there vill be no bias.
 

jkow

New Member
#16
You, or your teachers, can have a look at an econometrics book about simultaneous equations as for example in Greens book in (Chapter 15 Simultaneous-Equations Models). But this is not the easiest reading.

That would confirm your opinion.

By the way, if the variable RY is based on a previous value (i.e. it is a lagged value, like RY_t-1) then it is a predetermined value and there vill be no bias.
Simultaneous-Equations Models is exactly what I was looking for. The creation of endogeneity by a variable is basis for my issue with including rental yield. I can see that there are tests that you can apply to see if it can be overcome but on a basic level it should be removed from the model.