I have a dataset in which I am performing linear regression with multiple covariates. The goal of the analysis is to identify important covariates (categorical and continuous) which have an important effect on the response. This is a natural resource dataset so it is not as clean cut as other information I have worked with (lots of noise). My issue has been meeting the assumptions of autocorrelation and normally distributed residuals.
My first tactic was to eliminate every other observation to negate autocorrelation (which worked) and then eliminate outliers of the response until normal distribution is achieved in the residuals. Unfortunately, those outliers I eliminated contain valuable system relationships that are not present in the analysis anymore (i.e. a large increase in the response based on actions taken at a few points). Would it be practical to violate the assumption of normally distributed of residuals when the outcome makes since in the system and we are trying to identify important system covariates? Also any thoughts about autocorrelation would be great too.
My first tactic was to eliminate every other observation to negate autocorrelation (which worked) and then eliminate outliers of the response until normal distribution is achieved in the residuals. Unfortunately, those outliers I eliminated contain valuable system relationships that are not present in the analysis anymore (i.e. a large increase in the response based on actions taken at a few points). Would it be practical to violate the assumption of normally distributed of residuals when the outcome makes since in the system and we are trying to identify important system covariates? Also any thoughts about autocorrelation would be great too.
Last edited: