Overdispersion/ unobserved heterogenity in logistic regression.

hlsmith

Not a robit
#2
Do you have a source referencing these as concerns in logistic reg? I haven't heard of overdispersion in logistic, Poisson and count models yes. Heterogeneity, can you be more specific? Treatment heterogeneity is a concern when you have an exposure variable with uncontrolled for interaction.
 

hlsmith

Not a robit
#3
Do you have a source referencing these as concerns in logistic reg? I haven't heard of overdispersion in logistic, Poisson and count models yes. Heterogeneity, can you be more specific? Treatment heterogeneity is a concern when you have an exposure variable with uncontrolled for interaction.
nice to see you posting @noetsi
 

noetsi

Fortran must die
#4
Thanks. I got over some of my physical problems which made it impossible to post until recently.

A reference, from a book I think you have, is Allison's book "Logistic Regression Using SAS." 2nd ed p 98. Some authors raise more concern than he does, I list him because I think you are familiar with his book.

This deals with a related concept called over dispersion.

http://support.sas.com/documentatio...efault/viewer.htm#statug_logistic_sect068.htm

Comparing coefficients across models can be very difficult

Logistic regression estimates do not behave like linear regression estimates in one important respect: They are affected by omitted variables, even when these variables are unrelated to the independent variables in the model. You cannot straightforwardly interpret log-odds ratios or odds ratios as effect measures, because they also reflect the degree of unobserved heterogeneity in the model because of this problem. You cannot compare log-odds ratios or odds ratios for similar models across groups, samples, or time points, or across models with different independent variables in a sample. This article discusses these problems and possible ways of overcoming them[this is the only article I have found that says this, although in general such comparisons are difficult across samples even for linear regression].

http://esr.oxfordjournals.org/content/26/1/67.abstract

I have found no test for unobserved heterogenity and am not sure one exists.
 
Last edited:

hlsmith

Not a robit
#5
I will start with overdispersion, why the heck are they using logistic instead of Poisson???????

They end up with odds ratios instead of rate ratios which are more interpretible and what the outcome is scalewise!

Not drinking the Kool-Aid unless someone gives a good rationale, I will check the Allison book at work tomorrow, but it seems wack.
 

hlsmith

Not a robit
#7
I will skim that other article tomorrow, but yeah you can't compare outcomes from different models, that is true. I would be interested to read why they say that isn't true in linear models.
 

hlsmith

Not a robit
#8
If the response is binomial then logistic makes sense to model what you actually think the random variables would be.
I get your rationale, but they set it up as a rate (e.g., number of positive trials divided by the number of possible positive trials). But you always see these types of data modeled as counts. I bet a Google search of Poisson would reveal very comparable examples over and over again.
 
Last edited:

noetsi

Fortran must die
#9
These are all binomial response variables. What hlsmith called rate my books and links call odds, but I assume they are the same.
 

Jake

Cookie Scientist
#10
"Unobserved heterogeneity" in logistic regression is nothing to be afraid of. I address this here, arguing directly against Allison and Mood: http://jakewestfall.org/blog/index.php/2018/03/12/logistic-regression-is-not-fucked/

Overdispersion is a completely different issue. In logistic regression it can only happen if your DV can take on more than 2 values; it can't happen with binary 0 vs. 1 outcomes. But if you do have more than 2 DV values, then yes, you should check whether your outcome is dispersed according to your model.
 

noetsi

Fortran must die
#11
thanks a lot jake. Although now I have two statistical experts (you and Allison) disagreeing entirely. :p I never know quite what to do about that - since I am not an expert :)

We won't be running ordinal or multinomial logistic regression so we will never have more than two levels of the DV. I have been told that the agency we report to uses linear regression (aka linear probability models) for two level DV so we may do that in the end. Crazy as that seems to me, you do what those who pay you tell you to do. :)
 

noetsi

Fortran must die
#12
"For virtually every logistic regression model that we estimate in the real world, there will be some uncorrelated covariates that are statistically associated with the binary outcome, but that we couldn’t observe to include in the model. In other words, there’s always unobserved heterogeneity in our data on covariates we couldn’t measure. But then—the argument goes—how can we interpret the slopes from any logistic regression model that we estimate, since we know that the estimates would change as soon as we included additional relevant covariates, even when there’s no confounding?"

I think in practice this problem exists in linear regression as well. In few if any real world situations are you going to include all variables that are related to the DV. And since many of these omitted will be related to variables in the model as well all slopes are biased (again in real world data).

Which, since you don't know how much the bias is, always seemed to be a strong argument against using regression :p