Logistic Regression: Include constant in the model or not?

#1
Hi!

1. Could anybody give me a detailed explanation about why is inappropriate not to use a constant in a logistic regression model? Could you provide me any example about how that would bias the results?

2. Is there any exception for this rule? For example, small/large sample sizes, low/high number of independent variables included in the model, 50%,<50%,>50% incidence of the outcome in the sample, or anything else?

3. I have got strikingly different results according to whether I include the constant in the model. Our hypothesis is favored by not including the constant. Is that somehow expected?

I really appreciate your help guys,

Looking forward to learn with you,

Best regards
 
#2
Hey neuroscientist,

I'm pretty sure the constant serves two purposes. The first is pretty straightforward: it tells you what the probability of success is when all other covariates are set to zero. So if you omit the constant, you're essentially saying that if all other variables are zero, then the probability of success is zero.

The second is much more important, though: The constant variable is what allows your dependent variable to be dichotomous, i.e. to take only the two standard values (usually zero and one) without changing the interpretation of the results. In other words, to make a logistic regression without a constant valid, you would need to use a different value than one for "success" and/or zero for "failure" in order for your estimates to be unbiased. It would be difficult if not impossible to know that correct value a priori.... for instance, failure without a constant could still be coded as zero, but then maybe success would have to be represented by \(\pi\).

Unlike with simple linear regression, I know of no circumstances under which omitting a constant in a logistic regression is a reasonable assertion.
 
#3
Well... I found a nice article that may be helpful for somebody with the same doubt:

http://www.duke.edu/~rnau/regnotes.htm

Briefly, it is interesting that they mentioned that:
"In rare cases you may wish to exclude the constant from the model. (...) Usually, this will be done only if (i) it is possible to imagine the independent variables all assuming the value zero simultaneously, and you feel that in this case it should logically follow that the dependent variable will also be equal to zero; or else (ii) the constant is redundant with the set of independent variables you wish to use."
 

Jake

Cookie Scientist
#4
So if you omit the constant, you're essentially saying that if all other variables are zero, then the probability of success is zero.
Remember we are dealing with the logit scale here. So a 0 intercept means that when all the predictors are 0, the logit is 0. And logit = 0 implies probability = 0.5 (not probability = 0).

I didn't really follow what you were saying in the second paragraph. Maybe you could explain it again.
 
#5
Remember we are dealing with the logit scale here. So a 0 intercept means that when all the predictors are 0, the logit is 0. And logit = 0 implies probability = 0.5 (not probability = 0).

I didn't really follow what you were saying in the second paragraph. Maybe you could explain it again.
So let's say that the dependent variable is a disease and I have the same number of controls and patients. If the predictors (independent variables) are 0, the logit=0 and the probability of the outcome (disease) will be 0.5 in my sample, which is real because I matched patients with controls. Therefore, am I allowed to exclude the constant?
 

Jake

Cookie Scientist
#6
The intercept reflects the base rate of the disease--it has nothing to do with whether patients are matched with controls. You certainly should estimate the intercept in your model, not exclude it. It is a rare circumstance indeed when one would be justified in constraining the intercept in a regression model, and nothing that I have read so far suggests to me that this is one of those rare circumstances.