# Cross-estimating the independent variables to exclude outliers

#### erikstatman

##### New Member
I am working on a dataset where some of the independent observations have measurement issues (i.e. mixed feet and metric). I'm considering cross-estimating all the variables to exclude outliers before building the actual model. In other words, a method for defining and excluding outliers and issues in the data using ALL variables to estimate the independent variables. From the estimate of each variable I can then exclude all observations outside a 99 % confidence interval in the independent variables.

X1 = Y + X2 + X3, discard all X1 outside +/- 2 SD of mean
X2 = Y + X1 + X3, discard all X2 outside +/- 2 SD of mean
X3 = Y + X1 + X2, discard all X3 outside +/- 2 SD of mean
then
Y = X1 + X2 + X3

How statistically viable would this be? Also, would the standard error of the estimate and R square still be meaningful?

I have not seen or heard of this before, and I am a bit skeptical. Still, given that this observations in nature, and not social science the confidence interval of the estimate for the independent should be valid. What can make it fall apart is that the very outliers I am trying to get rid of, is included in the method. However, hoping this will be deflated to marginal influence in the method.

Purpose: pragmatic data mining and prediction, NOT for publication or science
Data: observations from nature, so a high degree of stability is expected in the relationships
N: approx. 15 k