Long ago I was brought up on the view that analysis of the residuals was critical for regression. But I am confused about the advice on it now. This is for data sets that has thousands (or tens of thousands of points) . That is how important is residual analysis?
Normality: The sense I get is few concern themselves with this any more with large data sets. That suggests not reviewing it.
Heteroskedasticity. I am unclear what the importance of this is anymore. Some suggest just using White standard errors and ignoring it.
Outliers (this includes the issue of leverage). It is not clear to be the views about this. Two points are particularly pertinent. In a large data set can one or a few points move the regression line? And if you find an outliers (assuming they matter) what do you do. The most common advice I have seen recently is, (unless it is just a mistake) to leave it in.
Independence This is seen as important. But I have never found a test for it outside time series in the form of serial correlation. I understand in some cases theory leads you to conclude that it has been violated, but I know of no formal way to test this. I have not seen a solution for this othan than time series and multilevel models (which are special cases where theory suggests independence is likely to be violated).
I have grown lax over analyzing residuals because of the many comments that downplay its importance (and admittedly because I may have the population of interest - this is subject to dispute).
Normality: The sense I get is few concern themselves with this any more with large data sets. That suggests not reviewing it.
Heteroskedasticity. I am unclear what the importance of this is anymore. Some suggest just using White standard errors and ignoring it.
Outliers (this includes the issue of leverage). It is not clear to be the views about this. Two points are particularly pertinent. In a large data set can one or a few points move the regression line? And if you find an outliers (assuming they matter) what do you do. The most common advice I have seen recently is, (unless it is just a mistake) to leave it in.
Independence This is seen as important. But I have never found a test for it outside time series in the form of serial correlation. I understand in some cases theory leads you to conclude that it has been violated, but I know of no formal way to test this. I have not seen a solution for this othan than time series and multilevel models (which are special cases where theory suggests independence is likely to be violated).
I have grown lax over analyzing residuals because of the many comments that downplay its importance (and admittedly because I may have the population of interest - this is subject to dispute).