Two of my models rejected the null hypotheses and that's not really a surprise, we expected the non-survey estimations to be way off. The third model failed to reject the nulls and both the intercept and the beta had fantastic values with really low t values indicating there was no statistically significant differences between the two (t values of 1.88 and 1.99, respectively). However, the coefficient of determination, R2 was extremely low >0.05 (this was the case of all three regressions, but my main concern is with the model that performed well).

My big question is how to explain why the R2 was so low, while the coefficient and intercept fit expectations. I ran the data through a few other tests. The other goodness of fit test, I (detailed below) showed poor goodness of fit. Theil's inequality coefficient, U indicated a naive shot-in-the-dark could have performed better than the models (U > 2), with the bias proportion (UM) being quite low (0.005), but the variance proportion (US) was moderately high (.404).

I(X, X*) SUM OF ABS VALUES OF (X* LOG2 (X*/X)

where: X = the survey variable and X* is the non-survey variable

Interpretation is values of I closer to 0 indicate better goodness of fit (0 = perfect fit, i.e. Xi = X*i), higher values indicate poorer goodness of fit.

Now, my inclination is the low R2 results from the high amount of variance as indicated by the US value. Another possible explanation might be that the observations are all close to zero and thus, there may be weighting issues with the small variables, but I'm not sure if that's as problematic given that the bulk of the observations are all quite small.

So does the high amount of variance seem like a plausible explanation or am I was off? If I'm off, what other explanations seem appropriate?