Omitted variable bias


No cake for spunky
I was concerned about unobserved heterogeneity originally, and read an interesting article by Jake on that. However, his article (much of which was beyond me) :p raised a question which is different and that seems to apply to linear regression as well as logistic regression although being worse in the later perhaps. This is the article (and I know the question I am raising is different than the focus of the article).

Logistic regression is not fucked | Cookie Scientist (

"The second rhetorical question from above asked, “how can we interpret the slopes from any logistic regression model that we estimate, since we know that the estimates would change as soon as we included additional relevant covariates, even when there’s no confounding?” The answer is that we interpret them conditional on all and only the covariates that were included in the model. Again, conceptually speaking, the coefficients refer to a population in which we know the values of the covariates represented in the model and nothing more. There’s no problem with comparing these coefficients between samples or over time as long as these coefficients refer to the same population, that is, populations where the same sets of covariates are observed." If I understand this it means that in regression you are interpreting the slopes in the specific context of the model you build and that it may well be different if there are other variables in the model.

The practical problem I have with this, not the argument the reality behind it, is in the real world you will almost always leave variables out and some of those will be strongly associated with predictors in the model and the DV so they will bias the slope. And, in reality if not theory, you are unlikely to know these relationships so you can't correct the slope (something I have never seen done ever in any article I read).

For theory building this might not matter. You can say if this is true, this would happen. But I am not a theorist, I want to tell a group of managers if they do X Y will happen. And I don't honestly see how you can do this given this issue. It is made worse in my field because we have almost no theory about how X would impact Y. Maybe regression is not useful for non-theoretical issues (a disturbing thought to me who spent a lot of years trying to learn it to give practical advice)?

Maybe practitioners should go back to descriptives and hope. :p