Is there a test for omitted variable bias?

#1
I study finance and economics and every time i study an econometric study with OLS regression i wonder how the author can be sure of the non existance of omitted variable bias. I guess that in almost every economic study with regression this bias is present.
Is there a method or test to check if this bias is present?
I haven't found any.
 

noetsi

Fortran must die
#2
I am not an economists, this is my view based on what I have read in regression. Caveat Emptor :)

There are practical problems with such analysis. All models are going to leave out pertinent variables because many variables influence reality and authors are only going to study a few. So its nearly certain that all analysis will have this type of problem. Knowing you have this problem, which may bias the slopes, and figuring out how they bias the slopes is extremely difficult (if not impossible) to do.

A theoretical (or philosophical) reason you can not do this is that it would require you to able to estimate a true model to run such a test. And many doubt a true model even exists (its almost certain we won't know what it is in economics, if you did no one would be running models since the reality would already be known).

As close as I can think of a test of this is a version of white's test which while primarily aimed at heteroscedastcity also tests if the model is misspecified. but the test won't tell you if the assumption of homoscedasticity or a misspecified model is involved. One way you might know the model is misspecified is outliers as well. Misspecification of the model might be behind the strange outliers you find.
 

hlsmith

Not a robit
#3
There is no test. You can try to quantify how large a lurking variable would have to be to negate results - you can do this via external or internal record-level sensitivity analysis.
 

hlsmith

Not a robit
#4
You don't know, what you don't know. If you do know you are missing a variable of interest you can do things like Bayesian or monte carlo simulation to try and figure our it possible effect.
 
#5
You don't know, what you don't know. If you do know you are missing a variable of interest you can do things like Bayesian or monte carlo simulation to try and figure our it possible effect.
So...in a complex system (almost every real life inference problem is within a complex system), the omitted variable bias is almost certain.
Not very well ...
 

noetsi

Fortran must die
#6
This goes to what Box said. All models are wrong. Some are useful. Since you will never know what the true causal relationships are, you will almost always have omitted some important variables.
 

hlsmith

Not a robit
#7
If you do not randomize the exposure and also control for extraneous factors you can never truly know if an omitted variable may be impacting the outcome. With enough evidence and causal assumptions, you can presume a causal relationship, but you just need to posit a confounder could be present. Sir Ronald Fisher fought along with the cigarette companies on whether smoking causing cancer. Cornfield postulated a solid case for the relationship and the Attorney General gave a warning. However, recently a genetic marker has been discovered that they are linking to people which appears to increase amount of smoking in people and their rates of cancer. Does smoking cause cancer, well on average, yes. Are there secondary contributing factors, yes. In addition, even in studies where everything is collected, conclusions can be wrong due to selection bias or poor modeling of the relationships between variables.
 

hlsmith

Not a robit
#9
If there are two groups of kids, one gets a computer, one doesnt. Later you see which group was more likely to pass a test. Well was it the computer or that one group had a disproportionate high number of honors students. So the computer could have helped or it was the kids would have done better regardless. Randomize who gets the computer you get the same proportion of smart kids in each group so the only difference is the computer.
 
#11
Forgive me if i could ask some trivial questions for who has a very strong background in statistics unlike me, but:

First of all:
In economics and finance the unconfoundness assumption is so implicit and so constant in EVERY paper.
Imho that implicit assumption in order to make causal inference is really not excused, because is very likely to be false.
So i don't understand the great amount of scientific economics papers who rely on that fragile assumption.
I'm not anymore able to read a paper when a linear regression with causal ambitions starts into it...it is becoming almost an obsession for me.
The coefficients of the regression, by definition, have not a causal meaning.
They only minime a loss function (OLS for example).
So WHY causal inference took hold in the regression models?

Second:
I don't understand how can a randomized inference procedure apply on a economics study.
For example: let's say i want to infer the causal factors of an economic or financial phenomenon (cross sectional to not complicate all with time series).
So i don't know exactly which are the causal factors.
I guess there are 3 probable causal regressors, but i'm not sure and there could be more than 3, but i start my regression model with those 3 regressors.
So: how can i check if those 3 regressors are causal? Is it possible? I don't understand how the randomized procedure can help me.

Is it possible in a complex system to find causal factors?

Example: y = log(GDP) (for cross sectional) or y = Δlog(GDP) (time series)

Imho is not possible to find the causal factors of that y, neither cross sectional nor over time.
You can have some approximate causal factors, but not a causal regression model with unbiased and consistent estimators.
 
Last edited:
#12
Your skepticism is well founded. Statistics don't equal causality, but you can add causal assumption to generate greater evidence. There are usually three main assumptions in the process that are not possible in all contexts, so in those extra considerations are used. Exchageability (i.e., unconfoundness or conditional unconfoundness), no interference (i.e., intervention doesn't effect the outcome of the other group), and a well defined intervention. Other things can be incorporated as well, I usually think about the Bradford-Hill Criteria (temporarilty, dose response, biological plausibility, consistency, if possible exclusivity, etc.) and contagion as well as sensitivity analyses. If these things are confirmable or reasonable, then the results of the model can be assumed possibly causal. In addition, a model is just an approximation to the real data generating process, so the model also has to be correct in the above scenario.

I am not an economics person, but they try to get closer to the randomization paradigm, via instrumental variables and things like regression discontinuity. Though, unless the data are generated in a vacuum in a laboratory with treatment assignment controlled, you never truly have confirmed causality. Though, most experts in causal inference will agree that you can make causal claims if you are able to meet most of the causal assumptions.

Does this help?
 
#13
I am not an economics person, but they try to get closer to the randomization paradigm, via instrumental variables and things like regression discontinuity. Though, unless the data are generated in a vacuum in a laboratory with treatment assignment controlled, you never truly have confirmed causality. Though, most experts in causal inference will agree that you can make causal claims if you are able to meet most of the causal assumptions.
But in economics e financial world you are not in a laboratory...you receive the data from the markets or real world, period.
For example: instrumental variables is a theoretical matter. It's not praticable in reality when you are inspecting a complex system as financial markets for example.


Anyway, thank you for answers.
 
#14
No, I agree that in economic it is hard to establish causality. Another design is the interrupted time series with a negative control.

The more you know about study design and analytics the less you trust grandiose study conclusions. But they still provide little pieces of knowledge.