Increasing variance in linear regression

#1
Hi,
I have a response and a predictor variable, weight and height, to which I've fitted a linear regression. The relationship appears to be approximately linear and passes roughly through the origin. However, the variance increases as x increases.


Is there anything I can do with this? I've got a vague memory about ratio estimation being valid only if the line passes through the origin and the variance increases...

Maybe my line is valid if I fit a line with no constant?

Any help gratefully appreciated. :)
 

TheEcologist

Global Moderator
#2
Hi,
I have a response and a predictor variable, weight and height, to which I've fitted a linear regression. The relationship appears to be approximately linear and passes roughly through the origin. However, the variance increases as x increases.


Is there anything I can do with this? I've got a vague memory about ratio estimation being valid only if the line passes through the origin and the variance increases...

Maybe my line is valid if I fit a line with no constant?

Any help gratefully appreciated. :)
What about the coefficient of variation, does it increase as well?

How are your residuals distributed? Do they get systematically larger?

If so, A Generalized linear model might be better suited.
http://faculty.ucr.edu/~hanneman/linear_models/c10.html
 
#5
Thank you both for your suggestions.

Does the coefficient of variation increase?
I'm not sure what you mean by this. My R-squared value is 98.5% ( so it's a good fitting model). I thought it was a constant value for the fitted model and would only change if I fitted a different model? :confused:

How are your residuals distributed? Do they get systematically larger?
Residuals are distributed evenly either side of 0, except they get systematically larger as x increases.

Or just take logs of the data. That might be the simplest fix in this case
If I take logs I do get constant residuals, but the line is curved and not as good a fit (even with a quadratic model) as the linear on the untransformed data.

I just wondered if there was some standard technique for fixing increasing variance. Perhaps there isn't.:)

I'm familiar with GLM's, but rusty. How can it help me with my variance issue?
 

TheEcologist

Global Moderator
#6
Thank you both for your suggestions.

Does the coefficient of variation increase?
I'm not sure what you mean by this. My R-squared value is 98.5% ( so it's a good fitting model). I thought it was a constant value for the fitted model and would only change if I fitted a different model? :confused:
a coefficient of variation is this:
http://en.wikipedia.org/wiki/Coefficient_of_variation

Or just take logs of the data. That might be the simplest fix in this case
If I take logs I do get constant residuals, but the line is curved and not as good a fit (even with a quadratic model) as the linear on the untransformed data.
Could you post your scatter plot? There might be a simple solution to this.


I just wondered if there was some standard technique for fixing increasing variance. Perhaps there isn't.:)

I'm familiar with GLM's, but rusty. How can it help me with my variance issue?
GLM's don’t necessarily need the residuals to be of a constant magnitude, as you can have different error structure to your model than “normal”. A Poisson error structure has a linear increasing variance and a gamma error structure increases non-linearly, ever faster. So basically GLM’s would be one of the standard techniques for dealing with this. However I still believe that the log-transformation will work. Just post your log-transformed scatter plot and we will see.
 
#7
It looks like I've got a lot of reading to do!

The stuff I'm studying isn't as advanced as the GLM stuff you suggested, so I think the log transform is my best bet.

Thanks again

:)