# Homogeneity requirements for Linear regression

#### JDB

##### New Member
I have a large data set >22,000 obs. This is industrial data with many unequal cells. Some cells have 1000's of obs, and some have 10's of obs. The Levene's test for Equality of Variances has a very sig p for non-homogeneity. Regardless of all that, the analysis makes very good sense, and useful for making a decision.

Residuals are normally distributed.

My question, is there a need for homogeneity with linear regression. I see the need for the normality of the residuals, and they are, but I do not see any discussion for homogeneity of the variances for linear regression .

Thanks for any help or ideas.

JDB

#### GretaGarbo

##### Human
My question, is there a need for homogeneity with linear regression.
In this case it will nat matter. Your parmeter estimates will be fine.

(It is when you want to do significance test with a saample size of say, n=20 that it it matter with constant variance.)

#### obh

##### Well-Known Member
Hi Greta,

From what sample size the "homogeneity of the variances" assumption is not important for linear regression?
I assume it is the same for ANOVA.?

#### JDB

##### New Member
Thanks for the input. very helpful. Eases my mind.

JDB

#### noetsi

##### Fortran must die
Generally speaking as the sample size gets larger violations of most (although not all) of the assumptions become less important. The results are asymptotically correct

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yeah, you all say this, but I think it should still be looked at to make sure something crazy extreme isn't going on. Visualization are so important. It could also reveal that say some outcome values aren't present or if values are bounded. Or help find erroneous outliers. It can't hurt to look at it.

#### obh

##### Well-Known Member
Generally speaking as the sample size gets larger violations of most (although not all) of the assumptions become less important. The results are asymptotically correct
Is there a "common" number, that from this number and larger, it is less important to check the "homogeneity of the variances" ?

#### obh

##### Well-Known Member
Yeah, you all say this, but I think it should still be looked at to make sure something crazy extreme isn't going on. Visualization are so important. It could also reveal that say some outcome values aren't present or if values are bounded. Or help find erroneous outliers. It can't hurt to look at it.
Correct, for example, CLT (if relevant ...) doesn't always work, try to do CLT on data with undefined skewness like F(3,3). the sample average will be skewed also for a huge sample size.