Logit v. Probit: A fight to the death

Dason

Ambassador to the humans
#1
Every now and then I have to work with binary responses with continuous covariates. So of course this screams generalized linear model. The question then arises as to what link function to use. In the past (before I really knew any alternatives) I would just do a logistic regression (logit link) and be happy with that. My recent studies have made me think about the modeling process more though and I'm not sure I'm completely satisfied with using logit just because it's whats familiar.

The probit offers some interesting ways of interpreting certain things that I've found somewhat appealing recently. I know there are other links that could be used but haven't really dealt with those much so I left them out of the title. Does anybody have any opinions on when a probit links seem more appropriate than the logit or the logit more appropriate than the probit?

I could explain my reasoning for finding the probit attractive but I'd like to see at least a few responses first - even if it's just what you use and "cause that's what I've always done" as your reasoning. I guess I'm just interesting in how others approach this problem.
 

vinux

Dark Knight
#2
In general, one could use any CDF with support R as index/link function for a binary response. The main advantage of logit is the interpretation. We are modeling on log odds. This idea is easy to sell in analytics. But in case of probit, we can't bring this relation.

Also in GLM, when you assume logit, the mean, variance of the beta estimates can easily estimate compare to probit.

When you compare the fitted sigmoid curve, there is not much difference between probit and logit( in my experience). data may suit probit or logit, but logit can interpret well the structure. So my vote is for logit.
 

Jake

Cookie Scientist
#3
I have traditionally used logit basically because that is what I was taught. I had at least been made aware of the existence of probit, but it was quickly dismissed as being "basically equivalent to logit," and it wasn't until recently that I was motivated to read a little about what probit really is (specifically, upon learning that the standard signal detection analysis is equivalent to performing a probit regression of the "yes" responses on whether "yes" is the correct response -- I still think that is really neat!).

One nice thing about logits is that the coefficients can be fairly straightforwardly interpreted in terms of odds ratios. However, as the signal detection example illustrates, probits can have a meaningful interpretation too. For example, the probit slope for the model I mentioned above is often called d' in the signal detection literature, where d' can be conceptualized as the standardized distance between the distribution of some normally-distributed latent response variable in the presence of signal + noise vs. the distribution of the latent response variable in the presence of noise alone. So while not as concrete as the interpretation of logits, probit distances are like shifts in some latent continuous variable that is underlying the manifest binary responses.
 
#4
The great majority of analyst in the real world I am sure use logitistic regression, because the log of the odds can be explained far more easily than probit can be. That is no one understands what a probit is (outsides the statistical community) and its very hard to relate it to anything people do understand. Also I don't think probit has odds ratio, by far the most common way to analyze the results (although I could be wrong on that - I have never heard it mentioned).
 

Dason

Ambassador to the humans
#5
I guess I'm just not fully satisfied with using logistic regression solely because of the ease of interpretation. It does seem to provide a good fit but can anybody actually explain why the log of the odds ratio should be linear (other than it seems to work reasonably well)?
 

Link

Ninja say what!?!
#6
In my understanding, there is no real reason why the log of the odds ratio SHOULD be linear. Logisitic regression is just a way of modelling a binary outcome as a linear combination, while still restriction the probabilities to be in the range of 0-1. Everyone who works with it should know that no process in the "real world" follows a logistic model. It's simply a tool we use to try and estimate effects.

One thing that does arise my curiosity is why you're asking. What is the purpose of the model? If it's to predict an outcome, I'd instead recommend machine learning. If it's to estimate an effect, I'd recommend the IPTW method via NP SCM's.
 

bryangoodrich

Probably A Mammal
#7
I really have nothing to offer to this contrast, since I've never really worked with GLMs at all. However, the issue seems to be that logit is convenient, especially for interpreting. This is important when you have to communicate results to the uninitiated. I have to wonder, is there no good way to bring the probit to simple terms for the lay person? Of course, my question is ambiguous, since the interpretation and what is to be communicated depends ultimately on the audience and the intended message or question to be answered.
 

Dason

Ambassador to the humans
#8
In my understanding, there is no real reason why the log of the odds ratio SHOULD be linear.
And this is partially why I ask. We're making that assumption so I just was wondering if anybody has heard any good justification for it. Just like how we can justify why a poisson might be a good distribution for count data by looking at the poisson postulates I was just wondering if anybody has heard justification for a logistic model in a similar manner. I guess the ease of the interpretability does offer some justification - in that a unit increase in the predictor has the same multiplicative affect on the odds ratio - no matter what the original X value was. If we can consider that as reasonable then I guess that offers some reasoning for why a logistic model makes sense. I just am wondering how reasonable that is in most cases.

One thing that does arise my curiosity is why you're asking. What is the purpose of the model? If it's to predict an outcome, I'd instead recommend machine learning. If it's to estimate an effect, I'd recommend the IPTW method via NP SCM's.
I have no particular application in mind. There was a post here that had to do with either logistic or probit regression (I can't remember which) that made me think a little bit more about these issues. I like the latent variable way of thinking about probit regression which is why I was comparing these two. Logistic regression definitely has the advantage in interpretability of the parameters (and in that it is used much more often so it's much more familiar to most people).

Probit regression isn't as well known so I'm not sure how many are even familiar with the latent variable way of thinking about probit regression (I just looked and I thought somebody had mentioned it in this thread but nothing like that appears to be here anymore...) but I find it at least somewhat satisfying in that it gives me a nice way to think about the underlying process and whether I find the assumptions to go along with the model reasonable. It seems all we have in the logistic regression case are "well it seems to work pretty well" and the argument I gave about odds ratios (I guess also: "Hey look - it's easily implementable in my software package of choice!")
 

vinux

Dark Knight
#9
What is the advantage of probit over logit? What measure you are using for goodness of fit?

Regarding the linear relationship, there are empirical literature available. I read a few papers long time back(7 year before). It was in bio context.
 

Jake

Cookie Scientist
#10
For what it's worth, I mentioned the latent variable interpretation of probit in the context of signal detection analysis. For me that was a big turning point in terms of convincing me that probit may actually be worth looking into more deeply.
 
#11
If you are not required to explain your results to a non-academic then it does not matter if you can make it comprehensible or not. Most are not that blessed:) A correct answer no one understands (aside from statisticians) in the real world is commonly worse than no answer at all. As you will find when you present your data (I have had that wonderous experience).

I terms of linarity the logit (that is the predicted value in logistic regression) is linearly related to X. The odds and probability are not linearly related to X (I think the odds ratio is a constant, although I am not sure).
 

Dason

Ambassador to the humans
#12
If you are not required to explain your results to a non-academic then it does not matter if you can make it comprehensible or not. Most are not that blessed:) A correct answer no one understands (aside from statisticians) in the real world is commonly worse than no answer at all. As you will find when you present your data (I have had that wonderous experience).
I think it takes quite a bit to explain either model. You're still working in the framework of the generalized linear model which isn't something that most people 'get' right away. Like I've said I think the latent variable approach to thinking about the probit model does help in understanding and might actually make explaining the model a little bit easier compared to the logistic regression. I also know that you're of the opinion that a simpler/easier to explain model is justifiable to use (especially when you need to explain it to people that might not understand) but once again I have to disagree somewhat. If I have a technique that does significantly better (whether it's better at predicting or just offers a more satisfying justification) at the task that I'm being given then I am going to use that technique regardless of how much harder it is to explain to somebody else. If I can't explain it properly then I probably don't understand it well enough to justify using it with any confidence.
 

Link

Ninja say what!?!
#13
You know, your post has gotten me extremely curious as to GLM's for binary dependent outcomes.

We can either use a logit link with a binomial family (i.e logisitic regression) OR a log link with a binomial family (which would give us relative risks rather than odds ratios). Are there any thoughts as to why to use one over the other here???
 

spunky

Can't make spagetti
#15
I also know that you're of the opinion that a simpler/easier to explain model is justifiable to use (especially when you need to explain it to people that might not understand) but once again I have to disagree somewhat. If I have a technique that does significantly better (whether it's better at predicting or just offers a more satisfying justification) at the task that I'm being given then I am going to use that technique regardless of how much harder it is to explain to somebody else. If I can't explain it properly then I probably don't understand it well enough to justify using it with any confidence.
i definitely have to second you on your agreement to disagree. this idea of let's do something wrong (or "wronger") because it's easier to explain or to compute or whatever translates into bad practice regardless of how you look at it, and i find it so much in my everyday life both with students and faculty that need help in data analysis that makes me wonder whether there's even a point of having editors and reviewrs in journals...

now, with that being said, i do believe there is merit in using probit over logit depending on what you're trying to accomplish, which is kind of what Link was referring to when you asked about the purpose of the model. although not as common as it used to be, the basic tenets of item response theory lived and died by probit regression because of the reliance on the normal ojive. there are even corrections out there to logit models to make them look more like probit models because of the assumption of normality in the parametr(s) of IRT equations, especially in Rasch Models. now, the advantage here is that we're starting from the assumption of normally-distributed latent variables but i'm not sure how that would hold for probit-over-logit choices when you have no theory to adhere as for why one should be chosen over the other..

(ps- i hate being late to these interesting discussions but i'm taking my 1st formal data mining course at the uni and just couldnt get away from the lab, lolz...)
 

Link

Ninja say what!?!
#16
LOL. The probit link is nice I guess. I've never really gotten into it myself.

With the log-link vs the logit-link though, we retain our interpretability of the effect estimate, only estimating them on different scales. Because logistic regression has been around so long, I can see how its more saturated and more popular. However, lay persons would be able to interpret Relative Risks easier than Odds Ratios. So why not use a log-link binomial GLM rather than logistic regression?
 
#17
I think it takes quite a bit to explain either model. You're still working in the framework of the generalized linear model which isn't something that most people 'get' right away. Like I've said I think the latent variable approach to thinking about the probit model does help in understanding and might actually make explaining the model a little bit easier compared to the logistic regression. I also know that you're of the opinion that a simpler/easier to explain model is justifiable to use (especially when you need to explain it to people that might not understand) but once again I have to disagree somewhat. If I have a technique that does significantly better (whether it's better at predicting or just offers a more satisfying justification) at the task that I'm being given then I am going to use that technique regardless of how much harder it is to explain to somebody else. If I can't explain it properly then I probably don't understand it well enough to justify using it with any confidence.
One thing to remember is that what I believe is true only in the business world I operate in. I would not argue this in research. In my experience if you can not explain, very simply, to senior managers something there is zero chance it will be used. So it does not matter if method b is better than a, method b won't be used if its too complex for managers (with little to no understanding of statistics). I offer two examples (both done for senior managers far more knowledable than the norm - they both had doctorates). I create a metric (after several days hard work) that measured reality clearly far better than a second one - but the metric was not intuitively easy to explain. My manager, a brillant PHD in economics, told me flat out not to use it, because there was no way the senior management would understand it and thus it was useless.

Last summer I ran hundreds of t-test, despite my expressed concerns with FW error, because it was the simplest way to do that. Again what was better was harder to do, or explain. So it did not get done. That is my uniform experience in the private and public sector (and talking to others I think it is almost uniform outside research entities). The people who make the decisions are not interested in being better at the cost of greater complexity and they won't (usually) just let you run stuff for them they have no understanding of - taking your word for the results if its too complex.

I absolutely agree that it is not easy to explain odds ratios or logistic regression to non-statisticians. I spent an hour trying that last year with one consultant -in very simple terms- and got nowhere.

Now admitedly I have nowhere remotely near your statistical ability Danson. But I think what I stated here is generally true, because it has to do largely with understanding of statistics by managers and not the one explaining them.
 
#18
LOL. The probit link is nice I guess. I've never really gotten into it myself.

With the log-link vs the logit-link though, we retain our interpretability of the effect estimate, only estimating them on different scales. Because logistic regression has been around so long, I can see how its more saturated and more popular. However, lay persons would be able to interpret Relative Risks easier than Odds Ratios. So why not use a log-link binomial GLM rather than logistic regression?
Because no one has heard of log-link binomial GLM, and people use what is known (particularly in organizations with limited knowledge of statistics).

I have tried to calculate relative risk (which is easier to understand for most than odds ratios I believe) but all the formulas I ran into required information I could not get.
 

Link

Ninja say what!?!
#19
I have tried to calculate relative risk (which is easier to understand for most than odds ratios I believe) but all the formulas I ran into required information I could not get.
There is a formula published by Zhang and Yu that allows you to convert the odds ratio to RR's without the need of more information than what the model requires. Just do a search on google scholar. There are caveats though by some researchers:
-Incorrect adjustment for confounding
-Estimate still biased away from 1
-Confidence intervals are too narrow

Also, if you're able to run logistic regression, running log-link binomial regression should not be much different. Just letting you know in case you're interested.

Lastly, I still lean towards doing what is more "correct", regardless of whether it is know or not. I do understand where noetsi is coming from, as I've had my share of experiences trying to explain complicated things to lay people. However, if people don't know, then I'd teach them. I'd think that showing them log-link GLMs wouldn't be difficult if they understand logistic regression.
 
#20
thanks link that is great.

I am interested in log-link binomial regression (assuming that SPSS or SAS will run it). I just never heard of it before today :)

But I respectfully disagree that you can explain complex statistics to senior managers, or that it is wise to try:) They are most certainly not interested in learning them:) Based on my own experience just using the term log link GLM would doom the exercise...

I have never met a manager who understood logistic regression. I have met one who was willing to take my explanation of what it showed (simply in terms of the odds ratio),