So, I foolishly tried to implement a test Gelman and his colleagues propose to determine whether observed data actually fits a model (Bayesian Data Analysis, Edition 1, p.172; or better: Gelman, Meng, Stern, 1996). If the test worked as advertised, it would be really useful--an omnibus way to determine whether any model fits the data. But, it doesn't seem to work--it doesn't catch really poor data / model fits I generate with Monte Carlo. What I'm wondering is whether anyone else has tried to implement this test and had any luck with it, and whether I should expect it to catch the kind of bad models I've tested it against.
If I understood the test correctly, the test statistic involves a comparison between the chi-square fit of y's replicated (y.rep) from the fitted model (across various MCMC draws from the parameters, theta) and the chi-square fit of observed y's (y.obs) estimated using the same theta draws. The fit is the sum, across the y's, of the squared difference between each y and the expected value of y given theta (XB in OLS model), divided by the variance of y given theta. Each draw of theta generated two chi-square estimates, one for y.rep and one for y.obs. After a number of thousand of draws of theta, if the chi-squareds for y.obs prove systematically larger (or, I suppose, smaller) than the chi-squareds for y.rep, then the model probably doesn't fit the data.
The problem is that I've tried one Monte Carlo model after another in which I deliberately try to make the data not fit the proposed model, yet chi-sq(y.rep) and chi-sq(y.obs) are not systematically either larger or smaller than the other (chi-sq(y.rep) proves larger in 48% of the cases give or take a percentage point). I've tried an OLS model with normal errors against data with a t, df=1 error. I've tried another OLS model taking a sixth of the y's and setting them to zero (graphically: cloud of points well above zero and a row of points at zero).
I've been over the R script I wrote with a fine-tooth comb and see no errors. I can even see, as I'm stepping through, that, say, the zeroed y values are showing elevated error rates. But, the OLS inflates the std error to accommodate the outlying zero values, with the apparent result that the non-zero points have compensatingly less error. Once the chi-square sums up, there's no difference between y.rep and y.obs. The kinds of models Gelman, Meng, and Stern use to illustrate the test are unlike anything I have seen in my social science, so maybe the test only works in odd non-OLS situations?
If I understood the test correctly, the test statistic involves a comparison between the chi-square fit of y's replicated (y.rep) from the fitted model (across various MCMC draws from the parameters, theta) and the chi-square fit of observed y's (y.obs) estimated using the same theta draws. The fit is the sum, across the y's, of the squared difference between each y and the expected value of y given theta (XB in OLS model), divided by the variance of y given theta. Each draw of theta generated two chi-square estimates, one for y.rep and one for y.obs. After a number of thousand of draws of theta, if the chi-squareds for y.obs prove systematically larger (or, I suppose, smaller) than the chi-squareds for y.rep, then the model probably doesn't fit the data.
The problem is that I've tried one Monte Carlo model after another in which I deliberately try to make the data not fit the proposed model, yet chi-sq(y.rep) and chi-sq(y.obs) are not systematically either larger or smaller than the other (chi-sq(y.rep) proves larger in 48% of the cases give or take a percentage point). I've tried an OLS model with normal errors against data with a t, df=1 error. I've tried another OLS model taking a sixth of the y's and setting them to zero (graphically: cloud of points well above zero and a row of points at zero).
I've been over the R script I wrote with a fine-tooth comb and see no errors. I can even see, as I'm stepping through, that, say, the zeroed y values are showing elevated error rates. But, the OLS inflates the std error to accommodate the outlying zero values, with the apparent result that the non-zero points have compensatingly less error. Once the chi-square sums up, there's no difference between y.rep and y.obs. The kinds of models Gelman, Meng, and Stern use to illustrate the test are unlike anything I have seen in my social science, so maybe the test only works in odd non-OLS situations?