I was wondering if you can help me refresh my rusty memory on maximum likelihood estimation.

I think I remember the basic concept from my university courses in a previous life. Banal example: we want to fit a normal distribution to the observations 3,4,5, then N(4,1) is the best fit because it maximises the likelihood that those observations came from that distribution.

However, when we apply ML estimation to a (generalised) linear model, we are assuming a specific distribution for what exactly: the observed independent variables, the parameters we are estimating, the errors, or what?

I remember that in a linear model the OLS and the ML estimates of the parameters are the same, when normality is assumed. But I am confused because I could have cases where the observations are not normally distributed, yet a linear model (estimated with ML and assuming normality) can still be a good fit. Say I have a relationship like y = 2x ; this relationship can hold even if my x are concentrated around 2,4 and 6, and my y around (respectively) 4, 8 and 12, i.e. when my observations are not normally distributed.

Thank you for your patience!