# Increasing variance in linear regression

#### Statstastic

##### New Member
Hi,
I have a response and a predictor variable, weight and height, to which I've fitted a linear regression. The relationship appears to be approximately linear and passes roughly through the origin. However, the variance increases as x increases.

Is there anything I can do with this? I've got a vague memory about ratio estimation being valid only if the line passes through the origin and the variance increases...

Maybe my line is valid if I fit a line with no constant?

Any help gratefully appreciated.

#### TheEcologist

##### Global Moderator
Hi,
I have a response and a predictor variable, weight and height, to which I've fitted a linear regression. The relationship appears to be approximately linear and passes roughly through the origin. However, the variance increases as x increases.

Is there anything I can do with this? I've got a vague memory about ratio estimation being valid only if the line passes through the origin and the variance increases...

Maybe my line is valid if I fit a line with no constant?

Any help gratefully appreciated.
What about the coefficient of variation, does it increase as well?

How are your residuals distributed? Do they get systematically larger?

If so, A Generalized linear model might be better suited.
http://faculty.ucr.edu/~hanneman/linear_models/c10.html

#### brianjd

##### New Member
Or just take logs of the data. That might be the simplest fix in this case

#### TheEcologist

##### Global Moderator
Or just take logs of the data. That might be the simplest fix in this case
LOL, Yeah that could do the trick. Now why didn't I think of that

#### Statstastic

##### New Member
Thank you both for your suggestions.

Does the coefficient of variation increase?
I'm not sure what you mean by this. My R-squared value is 98.5% ( so it's a good fitting model). I thought it was a constant value for the fitted model and would only change if I fitted a different model?

How are your residuals distributed? Do they get systematically larger?
Residuals are distributed evenly either side of 0, except they get systematically larger as x increases.

Or just take logs of the data. That might be the simplest fix in this case
If I take logs I do get constant residuals, but the line is curved and not as good a fit (even with a quadratic model) as the linear on the untransformed data.

I just wondered if there was some standard technique for fixing increasing variance. Perhaps there isn't.

I'm familiar with GLM's, but rusty. How can it help me with my variance issue?

#### TheEcologist

##### Global Moderator
Thank you both for your suggestions.

Does the coefficient of variation increase?
I'm not sure what you mean by this. My R-squared value is 98.5% ( so it's a good fitting model). I thought it was a constant value for the fitted model and would only change if I fitted a different model?
a coefficient of variation is this:
http://en.wikipedia.org/wiki/Coefficient_of_variation

Or just take logs of the data. That might be the simplest fix in this case
If I take logs I do get constant residuals, but the line is curved and not as good a fit (even with a quadratic model) as the linear on the untransformed data.
Could you post your scatter plot? There might be a simple solution to this.

I just wondered if there was some standard technique for fixing increasing variance. Perhaps there isn't.

I'm familiar with GLM's, but rusty. How can it help me with my variance issue?
GLM's don’t necessarily need the residuals to be of a constant magnitude, as you can have different error structure to your model than “normal”. A Poisson error structure has a linear increasing variance and a gamma error structure increases non-linearly, ever faster. So basically GLM’s would be one of the standard techniques for dealing with this. However I still believe that the log-transformation will work. Just post your log-transformed scatter plot and we will see.

#### Statstastic

##### New Member
It looks like I've got a lot of reading to do!

The stuff I'm studying isn't as advanced as the GLM stuff you suggested, so I think the log transform is my best bet.

Thanks again