Bayesian model checking: a problem with Gelman's chi-squared test

So, I foolishly tried to implement a test Gelman and his colleagues propose to determine whether observed data actually fits a model (Bayesian Data Analysis, Edition 1, p.172; or better: Gelman, Meng, Stern, 1996). If the test worked as advertised, it would be really useful--an omnibus way to determine whether any model fits the data. But, it doesn't seem to work--it doesn't catch really poor data / model fits I generate with Monte Carlo. What I'm wondering is whether anyone else has tried to implement this test and had any luck with it, and whether I should expect it to catch the kind of bad models I've tested it against.

If I understood the test correctly, the test statistic involves a comparison between the chi-square fit of y's replicated (y.rep) from the fitted model (across various MCMC draws from the parameters, theta) and the chi-square fit of observed y's (y.obs) estimated using the same theta draws. The fit is the sum, across the y's, of the squared difference between each y and the expected value of y given theta (XB in OLS model), divided by the variance of y given theta. Each draw of theta generated two chi-square estimates, one for y.rep and one for y.obs. After a number of thousand of draws of theta, if the chi-squareds for y.obs prove systematically larger (or, I suppose, smaller) than the chi-squareds for y.rep, then the model probably doesn't fit the data.

The problem is that I've tried one Monte Carlo model after another in which I deliberately try to make the data not fit the proposed model, yet chi-sq(y.rep) and chi-sq(y.obs) are not systematically either larger or smaller than the other (chi-sq(y.rep) proves larger in 48% of the cases give or take a percentage point). I've tried an OLS model with normal errors against data with a t, df=1 error. I've tried another OLS model taking a sixth of the y's and setting them to zero (graphically: cloud of points well above zero and a row of points at zero).

I've been over the R script I wrote with a fine-tooth comb and see no errors. I can even see, as I'm stepping through, that, say, the zeroed y values are showing elevated error rates. But, the OLS inflates the std error to accommodate the outlying zero values, with the apparent result that the non-zero points have compensatingly less error. Once the chi-square sums up, there's no difference between y.rep and y.obs. The kinds of models Gelman, Meng, and Stern use to illustrate the test are unlike anything I have seen in my social science, so maybe the test only works in odd non-OLS situations?
Hi Dason: Whoops, should have looked back sooner. Here's my script (see below). codaResults is standard output from JAGS (mcmc object list). df.model is a data frame with the model y and x's. yPositions and xPositions just indicate where in df.model the y and x's are. rightBOrder indicates the correct order of coefficients in the codaResults that correspond to the x's entered. My code isn't entirely elegant, but hopefully it is correct.

df.test.chi=fn.getGelmanChiSq(df.model1, v.yPositions=c(1), v.xPositions=c(2:4), codaResults, rightBOrder=c(1:3), sigmaLocation=4)

fn.getGelmanChiSq<-function(df.model, v.yPositions, v.xPositions, codaResults, rightBOrder, sigmaLocation)
	#The formula is sum[ (y.rep.i-E(y.i | theta))^2 / Var(y.i | theta) ] .
	#Substitute y.obs for y.rep for the 2nd Chi^2 formula.  What the
	#formula means is in OLS case is (I think):  E(y.i | theta) = XB and
	#Var(y.i | theta) = sigma^2 (for linear model).
	#thetaMat is draws X # of thetas, say 6000 X 4 including sigma.
	#bMat removes the sigma and transposes so it's 3 X 6000.
	#Each column of bMat is another random draw of parameters.
	#xMat is #obs X #coefficients; say 200 X 3.
	#expYMat=xMat%*%bMat.  expYMat is 200 X 6000.  Each column is
	#E(y.i | theta) for a given drawing of theta.
	#You end up comparing y.rep.i-E(y.i | theta) w/ y.obs.i-E(y.i | theta)
	#Get xMat and y:
	#collapse codaResults
	#get the linear coefficients B 
	#(also must be transposed so each column is a sample of the B params):
	#get sigma:
	#get E(y.i | theta):
	#Each column corresponds to y1...yn for a given theta draw
	#create a matrix of y.obs:
	yobsMat=matrix(data = y, nrow = Nobs, ncol = NdrawsTheta, byrow = F)
	#create a matrix of sigmas (same sigma for each column):
	sigmaMat=matrix(data = sigma.v, nrow = Nobs, ncol = NdrawsTheta, byrow = T)
	#generate a matrix of random errors drawn from sigmaMat:
	errMat=matrix(data=rnorm(Nobs*NdrawsTheta, sd=sigmaMat), nrow=Nobs, ncol=NdrawsTheta, byrow=F)
	#Get sum(y.rep.i-E(y.i | theta))^2 / Var(y.i | theta)
	#Note that y.rep.i-E(y.i | theta) is just error! because
	#y.rep.i = E(y.i | theta) + error
	chirep= (errMat^2) / (sigmaMat^2)
	#Calculate chi for observed values:
	chiobs= (yobsMat-expYMat)^2 / (sigmaMat^2)
	#combine into data matrix: t(rbind(chiobs, chirep)) )
	names(df.chi)=c("chiobs", "chirep")

	for(i in 2:length(codaResults))
		thetaMat=rbind(thetaMat, as.matrix(codaResults[[i]]))