External Model Validation


Less is more. Stay pure. Stay poor.
Trying to get some input on my below process/method to validate a prediction model!

I have a sample of 200 peoples' dependent variable (positive continuous lab value) and days since an exposure (independent value). I fit a GLM with dist=gamma and link=log. From this I get a model coefficient for estimated DV given time since exposure.

Next, I procured external data for people with at least two dependent and independent values. I used their first value plus the days to the second value and previous model coefficient to create a prediction for the second dependent value. I do this for all of the external data.

Next, I subtract the prediction from the known subsequent value to get a prediction error. Some people had multiple serial values, and thus can have more than one prediction and subsequent error. To get an estimate on how good my model scored external data, I put all of these errors into a multilevel model to control for some errors being clustered in the same person. I use this estimate from the MLM to describe the mean prediction error when controlling for repeat observations.

This is basically my whole process - does anyone see an issue in doing this? So creating a model, applying it, and then putting those errors into another model to describe them.