Hi folks: I'm trying to apply cross-validation of models (e.g. the paper "On Cross-Validation of Bayesian Models" by Alqallaf and Gustafson) in the context of count data and have run into an interesting problem.
My understanding of how such cross-validation should work is that one draws replicated y values (y.rep) from the posterior predictive distribution and then compares these replications with the actually observed y's (y.obs). I'm interested in fit, namely the sum of squared deviations (SSE) between y.rep and y.obs.
If I understand the generation of y.rep correctly, this includes, if it is present, any random error. That is, let's say I have the following model, with X & B as matrices and normal error:
MODEL 1
y ~ poisson(lambda)
lambda=exp(XB + Normal(0,sd))
Then, my understanding is I would draw all parameters (B, sd) from the Bayesian model (on training data) and, for each draw, would calculate XB (X from validation data) and add to it a randomly drawn normal error of std deviation sd.
Now, the problem. Let's say I compare the above model to the same model, but without the normal error:
MODEL 2
y ~ poisson(lambda)
lambda=exp(XB)
Because I'm adding noise to y.rep in model 1 isn't it at a disadvantage to model 2? Model 2's y.reps directly reflect the best mean values, while model 1 adds noise to its best means. I've actually tried this and it seems there is a huge disadvantage.
So, should I instead be comparing the *expected value* of exp(XB + Normal(0,sd)) in Model 1 with exp(XB) in Model 2, even though the former is not a posterior predictive distribution? But, when I do model checking by comparing the distribution of y.rep relative to y.obs (say the .95 quantile or % of data at zero) I should use the posterior predictive distribution for each?
My understanding of how such cross-validation should work is that one draws replicated y values (y.rep) from the posterior predictive distribution and then compares these replications with the actually observed y's (y.obs). I'm interested in fit, namely the sum of squared deviations (SSE) between y.rep and y.obs.
If I understand the generation of y.rep correctly, this includes, if it is present, any random error. That is, let's say I have the following model, with X & B as matrices and normal error:
MODEL 1
y ~ poisson(lambda)
lambda=exp(XB + Normal(0,sd))
Then, my understanding is I would draw all parameters (B, sd) from the Bayesian model (on training data) and, for each draw, would calculate XB (X from validation data) and add to it a randomly drawn normal error of std deviation sd.
Now, the problem. Let's say I compare the above model to the same model, but without the normal error:
MODEL 2
y ~ poisson(lambda)
lambda=exp(XB)
Because I'm adding noise to y.rep in model 1 isn't it at a disadvantage to model 2? Model 2's y.reps directly reflect the best mean values, while model 1 adds noise to its best means. I've actually tried this and it seems there is a huge disadvantage.
So, should I instead be comparing the *expected value* of exp(XB + Normal(0,sd)) in Model 1 with exp(XB) in Model 2, even though the former is not a posterior predictive distribution? But, when I do model checking by comparing the distribution of y.rep relative to y.obs (say the .95 quantile or % of data at zero) I should use the posterior predictive distribution for each?