Instrumental variable approach in case of interacting endogenous predictors

Hello dear forum members,

Suppose I have the following model:

y = a + x1 + x2 + z1 + z2 + z1*z2, where y is exponential, x1-x2 are controls, and z1-z2 are endogenous interacting predictors of interest.

Assuming I want to correct for endogeneity using IV approach, what will be the appropriate model to use:

A. y = a + x1 + x2 + (z1 z2 = iv1 iv2) --> consider endogenous variables separately;
B. y = a + x1 + x2 + (z1*z2 = iv1 iv2) --> consider endogenous variables as an interaction;
C. or something else?

I'd appreciate your suggestions on this :)
Last edited:
Are you able to draw this out for us using a graph?
Thank you for reply, hlsmith. Building on theory and prior research, I have a rather complex model -- see attached.

Note, while DV and some predictors vary over time, others, including regressors of interest, do not (marked red). Since fixed-effects specification is not feasible (except only for interactions, as they become time-variant after multiplication), at the preliminary stage of analysis I am using Poisson estimator with population averaged effects.
Last edited:


Less is more. Stay pure. Stay poor.
Hmm. I am not well enough versed in IV to help you out here. But I will ask for more clarification. So you have an IV variable or vector of IV variable that are exogenous to X1-X6?
Hmm. I am not well enough versed in IV to help you out here. But I will ask for more clarification. So you have an IV variable or vector of IV variable that are exogenous to X1-X6?
All IV's (either taken separately or as a vector) are exogenous in relationship to y(it) and are correlated (.45) with X1/X6 -- following Terza, (2008a,b).

For the sake of example, we may consider a simplified version of the model omitting X2/X4, and thus consider the following "full" model with interactions:

Y = C1+C2+C3+X1+X5+X6+X1*X5+X1*X6+X5*X6+X1*X5*X6

As such, I suspect that any term that involves an endogenous Xi becomes endogenous too. Is this correct?

If so, then to correct for endogeneity using IV approach, I need at least one IV per endogenous term to identify the model -- or 7(!) IVs for this one alone.

Assuming I have the necessary IVs, is my logic for testing the endogenous effects correct?


Terza, J. V., Basu, A., and Rathouz, P. J. 2008a. "Two-Stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling," Journal of health economics (27:3), pp. 531-543.
Terza, J. V., Bradford, W. D., and Dismuke, C. E. 2008b. "The Use of Linear Instrumental Variables Methods in Health Services Research and Health Economics: A Cautionary Note," Health Services Research (43:3), pp. 1102-1120.


Less is more. Stay pure. Stay poor.
Unfortunately, I am probably less savvy at this than yourself. I have just been trying to gain a little more knowledge here and there on IV. So the interaction part would be foreign to me. I will take a look at the "Counterfactual and Causal Inference" book by Morgan and Winship when I get home tonight and see if they mention this.

I will also give GretaGarble a PM, since she probably knows this better than most of us here at TS.
Hi everyone,

Just wanted to provide some updates on the topic. So, whenever an endogenous variable is involved in the interaction, the interaction becomes endogenous too and has to be "corrected".

IVs are used most commonly. Yet, one needs at least one (assuming) strong IV for each endogenous regressor to specify the model. Notably, when one lacks IVs or in case they are weak, I'd recommend to look at Lewbel's (2012) approach with "generated" instruments. For Stata's implementation see -ivreg2h, gen [fe]-.

Exponential models should be estimated using (1) Poisson with fixed-effects specification -- robust to over-dispersion and serial dependence (Wooldridge, 2015; I can provide reference if interested), and (2) population average specification, if one is interested in the regressor's hypothetical effect averaged across a given population. Note, whereas the former precludes estiamtion of time-invariant variables, the latter does not.

Now, some interesting stuff that I came across recently: random effects specification with heterogeneity bias modeling using Mundalak's (1978) formulation -- has anyone tried this?

See: Bell, A., & Jones, K. (2015). Explaining fixed effects: Random effects modeling of time-series cross-sectional and panel data. Political Science Research and Methods, 3(01), 133-153.