Instrumental variables


Fortran must die
Instrumental variables must not be correlated with the error term. How can you test if they are? The only way I can think of is testing their correlation with the residuals, and I don't think the residuals are the same as the unseen error term.

How do you know if the instrumental variable is correlated with the error term?
Last edited:


Not a robit
Sorry, I did not see you made this post. I will elaborate on my prior chatbox comment.

An instrumental variable (IV) would be correlated with an outcome's error term if there was a confounding variable (exogenous variable, AKA: a parent or antecedent) connecting the IV and outcome creating a backdoor path.

I believe you needed to have collected data on the potential confounder in order to see if the IV variable is endogenous in relation to the suspected confounding variable. Vinux mentioned endogeneity test, though I think you need to have data on the confounder for that test. Also, if you had data on the suspected confounder you could also examine the model with or without the variable and then look for a change in the IV coefficient (10% is typical in epidemiology).

Since we don't know what we don't know, I believe you are out of luck and would have to make an assumption based on your content knowledge for the topic being examined.

Another possibility when you have repeated collected day is the use of a negative control. I am not well informed on this topic. To my knowledge it can be a variable correlated with, in our case perhaps, the IV, but should not be associated with the outcome, since it is a future variable (measure of the variable). This variable should not be associated with the outcome, but if it is, it notes confounding. As I said this is still an area I am not overly familiar with.

P.S., GG is our IV knowledgable contributor, perhaps ask her as well. Also, if you are thinking of using an IV, that would signify a suspected confounder further down in the causal path. Is this so, and do you have no data on that confounder either?


Not a robit
I was just thinking about this question again and had a little more information to add.

A confounder (or set of confounding variables) may impact the IV in multiple ways. First it could fit into what may be lumped into a pseudo (un)Faithfulness assumption, meaning it goes in the exact opposite direction (opposite sign) with the exact same magnitude as the IV variable on the outcome. Thus negating the IV variable's identifiable association.

Or, the confounder could go in the exact opposite direction to a greater or lesser extent, meaning it decreases the magnitude of the IV effects on the outcome non-congruently. Which is similar to the former example, but it may not perfectly wash its effects away.

Next, you could have a confounder that goes in the same direction or if the IV has no real direction - it creates an association. If this is suspected, some people may call upon the use a sensitivity analyses. So they would state of magnitude the unknown confounder would have to have in order to account for the entirety of the effect or reverse it.

So take home message, if the IV has a decent magnitude effect on the outcome, this would mean the suspected confounder would also need to be larger to have a meaningful effect on the IV's influence on the outcome. Though there could also be a mixture of confounders doing all types of things with the outcome and IV, making possible interpretation difficult.
Hello there!

To check the correlation between IVs and the error of the second stage is sort of an informal solution. Assuming that you wish to check the correlation between the IV and the error to establish validity of the instrument, I'd say the following. Establishing validity of the instruments formally is quite complex and requires a comprehensive approach. I personally prefer a summary provided by Nichols (2007) -- -- see Section 4. Among a variety of approaches to establish validity of the IVs, there is a Wu-Hausman test. It is available, for example, in Stata's -ivendog- command -- Failure to reject null indicates exogeneity of the potentially endogeneous regressor that was corrected with (assumingly) valid IVs.

Hope this helps )


New Member
A simple way to test for endogeneity:

Let h0 = x is exogenous
Let ha = x is endogenous

Step 1: For each x, regress on all other x variables & instrumental variables
Step 2: Keep residuals from Step 1
Step 3: Regress y on all variables (except the other instrumental variables), and the residuals from Step 2
Step 4: Test the coefficient on the residual variable. If significant, it's endogenous. If insignificant, it's exogenous.