What is the proper name for such kind of regression models - 2-stage or 2-step?

Dear community experts,

I've faced an interesting situation, which is out of my area of expertise, so I need an advice here from someone more experienced in the field.

The problem is following. Suppose, we have a binary logit model. Let's call it old model. The model is estimated over a set of co-variates and observations. Then, we use the model for log-odds prediction for new observations. The predicted log-odds are saved together with new observations.

Now, we do the following. We take new observations and use them to develop a new model, which is also binary logit, and we use log-odds, predicted by old model as one of the co-variates for the new model, plus some other variables. However, the observations are completely different.

This is the part I cannot completely understand. It looks like we use predicted value to predict another value. I wonder, which shortcomings/advantages it has. I mean, we can do that, but then, we must be careful with confidence intervals and model interpretation, must we not?

I read a bit about 2-stage regression models and instrumental variables, and they do have some similarities to the issue. But, as far as I understand, they run over the same set of observations, or am I wrong?

My area of expertise in statistics is a bit different, so I do not know if I'm correct or not. Does this situation has a name? How do we call such models?

Is there any academical work about similar matters so I can use it as a reference? Or, maybe some nice examples with explanation? Something, like 2-stage logit-logit with different data-sets, but connected to each other in a way I described.
I tried to Google the matter, but it is hard to do when I do not know exactly what to search.

I would appreciate any guideline about the problem.


Less is more. Stay pure. Stay poor.
Never heard of such a thing, though I do not know a tremendous amount. Reminds me of cross-validation or using a dataset to hone a model. Then you try to validate it on a comparable data set. However, in these approaches you would determine variables (covariates of interest), not probabilities. If the next dataset is "completely different" do you also mean the sampling procedure to procure it? A random sample is customary or a year compared to the next year. If it was truly a completely different dataset, it seems problematic.

Propensity scores are calculated this way, but applied back into the model to adjust covariate levels (weights) to predict the actual outcome. So that approach is still not like your description.
It is a completely different data-set with real observations, no random sampling was applied. The only thing is that the observations in the data-set for new model have log-odds recorded according to old model.

The problem now is the following: if we want to make a prediction of log-odds according to new model for completely new observations, first, we have to run old model to predict old log-odds, then substitute them into new model again to predict another log-odds. It does look like 2-stage and instrumental variable. But, new model was built over completely different data-set... I'm lost here.


Less is more. Stay pure. Stay poor.
I do not know much about Instrumental Variable, but what I do know, this does not seem relevant. Not sure where you are getting your description or definition of IV?

Why don't you rebuild the model using insights from the first model. Per your description, I am not sure what the purpose of using the first model was or is. What variables are in the first model, are those variables in the second model?


Less is more. Stay pure. Stay poor.
There are Bayesian approaches where you use "prior" info for probabilities, but I don't think you would do it this way.
Well, it is not me who is using such approach. I just happened to notice it in someone's work, and I simply cannot stand it as a statistician. I just feel that it is wrong, but I want to find some proofs. Or, maybe I'm wrong, and it is perfectly fine to do this way.

By the way, Bayesian is my specialty, so I know how to use it. In this case they are plain binary logit models, estimated with usual "frequentist" technique. Not Bayesian at all.

The purpose of using the output from the first models is that, according to second model estimate, it is statistically significant (p-value = 0), i.e. it is helps to predict outcomes for the second model. But, this p-value is wrong, because it does not take into account "random" nature of the covariate.

The variables from the first model are not in the second. That was my first natural thought - why do we need this 2-stage "dirty" way, if we could simply use variables from first model + variables from second model and that's it.

But, in this case, even according to common sense, without any statistical theory it looks like this:
1) we predict something in one model. As any predicted value it has some error.
2) we use predicted value to predict another value, again with error.

looks like we simply stack errors and that's it.

I just want to find any scholar article about it with proof or disapproval for such a way of variables usage.