regression issues

noetsi

Fortran must die
#1
Ok as I read a very good Gelman book book on regression some questions come up.

He stresses the use of robust methods in dealing with outliers. My question is how many and how serious do the outliers have to be to change your method. I think most data, mine has thousands of points, will have some serious outliers. There are many ways to define and detect outliers - but I remain unclear when you should be concerned about them to do things like transformations, robust regression, changing the assumed distribution etc.
 

noetsi

Fortran must die
#2
A related question. No model will be perfect. You will always omit some variables that predict the DV. Some of them will be related to factors in the model (given all the factors that exist in the real world how can this not be true as a former professor reminded me).

So how do you deal with omitted variable bias?
 

hlsmith

Less is more. Stay pure. Stay poor.
#3
Ok as I read a very good Gelman book book on regression some questions come up.

He stresses the use of robust methods in dealing with outliers. My question is how many and how serious do the outliers have to be to change your method. I think most data, mine has thousands of points, will have some serious outliers. There are many ways to define and detect outliers - but I remain unclear when you should be concerned about them to do things like transformations, robust regression, changing the assumed distribution etc.

I have told you this before, run the model with and without the extreme outliers, if it doesn't substantially influence the estimates you are fine.
 
Last edited:

hlsmith

Less is more. Stay pure. Stay poor.
#4
A related question. No model will be perfect. You will always omit some variables that predict the DV. Some of them will be related to factors in the model (given all the factors that exist in the real world how can this not be true as a former professor reminded me).

So how do you deal with omitted variable bias?
You need to think about the relationships between the variables. If it is exogenous, it won't be a big deal. If it is a mediator, you just won't know its effect but its cause should help explain the outcome. It all depends on the purpose. If you are using ORs, RRs, or RDs as your outcome estimate, Tyler Vander Weele created Evalues, which can tell you how big an unknown/uncolllected variable would have to be to negate your estimate of interest.

A general note, including instrumental variables of the causes in you model, given they don't have a direct effect, will increase your SEs, and including other independent causes of the outcome, should increase the precision of your variable of interest.
 

noetsi

Fortran must die
#5
Precision is not a major factor with me since I have the population in nearly all cases. I ignore the SE and p values these days.

I do not know what and OR, RR, or RD means in this case. Odds Ratio, Relative risk? Can you find the Evalues in Sas? I never heard of this before.

I have essentially zero theory to build from so whether a variable is exogenous, a mediator or whatever is not known to me unless I can determine this from the datqa. I do not work in a field that does a lot of empirical analysis (vocational rehabilitation). Which is sad.

Thanks for your comments hlsmith. Staying poor is a lot easier in my field than yours :p
 

Dason

Ambassador to the humans
#6
You always say you have the population but unless you literally only care about making a statement about a particular group at some point in the past then you definitely don't have the actual entire population of interest.
 

noetsi

Fortran must die
#7
I have everyone we have ever had as customers including all our active customers. And that is who I care about. I guess you could argue that future customers will be substantively different. But I strongly doubt that. Our future customers, years from now, likely will vary little from out customers now. They certainly won't vary for several years given the displacement levels.

I only care about what is true of our customers. No one else. That is who I get paid to run analysis on. I don't do academic research.