- Thread starter noetsi
- Start date

BTW, if you see a lot of diagonal lines in your residuals, it is probably caused by using an integer predictor.

Thanks miner. From what you say this is probably not a case of non-linearity, although I am not sure. I don't really care about the regression assumptions other than non-linearity, because they don't bias the estimate, because I have 5,000 plus data points, and because I actually have the population.

Can this type of issue distort the regression findings? Particularly the estimates?

It does not appear to in my experience. I have encountered this when helping design engineers develop algorithms for embedded software. Those algorithms are thoroughly tested without seeing any issues. However, I cannot speak to any theory. As you said, this doesn't seem to be addressed in any of the books.

The literature I know says that except for non-linearity, regression assumptions impact the standard errors not the effect size. There are issues, such as omitted variable bias or attenuation of the coefficients due to one of the variables having most of its values at one level that I have heard of which do influence effect size, but these are not part of the classical assumptions one reads about.

When I was in graduate school, violation of the regression assumptions seemed a huge deal to me. But as I read more and more it seems that if you have several thousand points (which I usually do) they really aren't that important. There are a lot of flaws in individual methods that still concern me (sometimes a lot such as the issue of time invariance with fixed effects models or nesting) but the classical assumptions no longer seem to be stressed as a major problem if you have enough data.

I think how generalizable your data is, is probably the major issue (especially over time) but that does not come up that much in the literature I read and dealing with it is difficult. Since I deal with populations commonly is not as much a problem for me as researchers (which I am not of course).