GLM Added Variable Plot

Buckeye

Active Member
#1
Hello, I've always been told to look at residual plots against omitted variables to see if there is any obvious pattern. This would suggest that this variable might be useful in the model. I would like to check my understanding.

1.) Suppose we have residuals from an existing model (call them Res1).
2.) If we have a new continuous variable (Xnew), we can we build a model to predict Xnew using the existing variables in the model. Then, store these residuals (call them Res2).
3.) Finally, predict Res1 using Res2 as an independent variable. The resulting slope will be the effect of the new variable on the original response given all the other variables in the model?

Does this procedure work with a binary variable? For example, can I use logistic regression in step 2?
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
So you would have a linear model then how would you have a logistic model unless you switched your DV?

I would if people still the general process is still good. I would not taught this and I am not experienced with it. I know with two stage least squares residuals get used. But I guess a person would need to have a strong context knowledge, otherwise they could be adding and removing a modifier, mediator, or collider; and accidently treating it like a confounder if values were changing.

Do you have any reference to this approach?
 

Buckeye

Active Member
#6
I suppose I can summarize my question more generally than this specific procedure. If we want to find new variables to include in an existing model, can we take the existing residuals and try to predict them with new variables? Seems like a reasonable approach since a smaller residual means we can explain some amount of previously unexplained error.
 

hlsmith

Less is more. Stay pure. Stay poor.
#7
Yeah, this rings familiar given in instrumental variables you do some slightly like this with two stage least squares. Perhaps look that up as well. Also the Reset test does something like looking at the residuals to try and make sure the IV and DV aren't flipped.

The process seems like it would require content knowledge and a reasonable amount of independence between IVs. I couldn't access the link yesterday, since it is an international site - I may check it out this weekend.
 

Buckeye

Active Member
#10
In the context of insurance rating models, I think this is what they're looking at when they want to create segmentation using new variables.