# Maximum likelihood estimation and GLMs - can you help me refresh my rusty memory?

#### JohnDoe2014

##### New Member
Hi all,

I was wondering if you can help me refresh my rusty memory on maximum likelihood estimation.

I think I remember the basic concept from my university courses in a previous life. Banal example: we want to fit a normal distribution to the observations 3,4,5, then N(4,1) is the best fit because it maximises the likelihood that those observations came from that distribution.

However, when we apply ML estimation to a (generalised) linear model, we are assuming a specific distribution for what exactly: the observed independent variables, the parameters we are estimating, the errors, or what? I remember that in a linear model the OLS and the ML estimates of the parameters are the same, when normality is assumed. But I am confused because I could have cases where the observations are not normally distributed, yet a linear model (estimated with ML and assuming normality) can still be a good fit. Say I have a relationship like y = 2x ; this relationship can hold even if my x are concentrated around 2,4 and 6, and my y around (respectively) 4, 8 and 12, i.e. when my observations are not normally distributed.

#### CB

##### Super Moderator
The distribution is assumed with respect to the error terms.

Another way of saying this is that the distributions of the Ys conditional on the X values is normal.

The conditional distribution of the Ys may indeed be normal despite the marginal distribution not being normal. A group of us from the forum wrote a paper on regression assumptions in which we used a simple simulation showing a situation like this: http://pareonline.net/getvn.asp?v=18&n=11

#### JohnDoe2014

##### New Member
The distribution is assumed with respect to the error terms.

Another way of saying this is that the distributions of the Ys conditional on the X values is normal.

The conditional distribution of the Ys may indeed be normal despite the marginal distribution not being normal. A group of us from the forum wrote a paper on regression assumptions in which we used a simple simulation showing a situation like this: http://pareonline.net/getvn.asp?v=18&n=11
Thank you -the paper is extremely useful! It should be compulsory reading in all introductory statistics courses.

So, if I understand correctly, with a linear model we assume normality of the errors, and therefore normality of Y|X, not of the marginal distribution of Y. A rough way of visualising this can be to plot the residuals vs the predicted Ys: if the residuals increase as the predicted Ys increase, then there is no normality.

How does this translate to a generalised linear model, though? A GLM is identified by a link function and a distribution.
For example:

A logistic regression is a GLM with logit link function and binomial distribution. Does this mean we are assuming errors or Ys which are binomially distributed?

If I want to model a relationship like log(Y) = a + bX, I could use a GLM with a log link function, and... what distribution and how would I test if the distribution assumption holds?

#### CB

##### Super Moderator
A logistic regression is a GLM with logit link function and binomial distribution. Does this mean we are assuming errors or Ys which are binomially distributed?
Similar thing here as in a linear model - we assume the "errors" are binomially distributed, though in a GLM especially it makes more intuitive sense to say (equivalently) that the conditional distribution of the Ys is binomial.

If I want to model a relationship like log(Y) = a + bX, I could use a GLM with a log link function, and... what distribution and how would I test if the distribution assumption holds?
It's slightly unclear here whether you would need a log link function, since the way you've expressed the model above suggests that you'd already be log-transforming the Y variable before estimating the model. It isn't really possible to say what distribution you'd want to use without more information. In general the question is: Conditional on the predictors, what distribution is the Y variable likely to take?

Hope that helps!
CB

#### JohnDoe2014

##### New Member
Similar thing here as in a linear model - we assume the "errors" are binomially distributed, though in a GLM especially it makes more intuitive sense to say (equivalently) that the conditional distribution of the Ys is binomial.
Understood, thanks.

It's slightly unclear here whether you would need a log link function, since the way you've expressed the model above suggests that you'd already be log-transforming the Y variable before estimating the model. It isn't really possible to say what distribution you'd want to use without more information. In general the question is: Conditional on the predictors, what distribution is the Y variable likely to take?

Hope that helps!
CB
Mine was more of a generic question. I was trying to understand a bit more about GLMs in general.

In my specific case, I will not be log-transforming the Y, as there are many zeros.

#### CB

##### Super Moderator
Mine was more of a generic question. I was trying to understand a bit more about GLMs in general.

In my specific case, I will not be log-transforming the Y, as there are many zeros.
Gotcha. Well, I guess the answer is that there are several distributions one might use with a log link function (e.g., poisson, quasi-poisson, negative binomial), and again it does come down to what seems most plausible based on what you know about the data and how it was generated. I think your question about how to tell whether the conditional distribuiton of the Ys actually follows the assumed distribution is a good one. I guess one main way would be to use a q-q plot of the residuals (i.e. plotting the quantiles of the residuals against the quantiles of the assumed distribution). People do this often when the distribution assumed is the normal distributiuon, but the tactic can be used more generally.

How do others here go about testing whether the assumed distribution is well approximated in a GLM?

So, if I understand correctly, with a linear model we assume normality of the errors, and therefore normality of Y|X, not of the marginal distribution of Y. A rough way of visualising this can be to plot the residuals vs the predicted Ys: if the residuals increase as the predicted Ys increase, then there is no normality.
Sorry, I forgot to address this earlier. The plot of residuals vs predicted values is primarily helpful for testing the assumptions of homoscedasticity (is the variance of the residuals similar regardless of the level of the predicted variables?) and exogeneity (is the mean of the errors zero regardless of the levels of the predictor variables?)

For testing the normality assumption you're more likely to use a q-q plot, or a statistical test like the Shapiro-Wilk, or maybe just a good ol' fashioned histogram.