Generalized linear model: application conditions

Q&A

New Member
#1
Hello all,

I would like to know what the conditions are for the application of a GLM.
Should we test if the response variable is normally distributed? (shapiro wilk)
Is it a problem if my variables are correlated with each other but my observations are not?

Thanks
 

Buckeye

Active Member
#2
You don't need to run a statistical test before using a GLM. Knowledge of the subject is enough to steer us in the right direction. For example, do we have count data, proportion data, continuous data? etc. The normality assumption applies to the residuals of an OLS model. You can check for multicollinearity in the predictors after the model fit.
 

Q&A

New Member
#3
You don't need to run a statistical test before using a GLM. Knowledge of the subject is enough to steer us in the right direction. For example, do we have count data, proportion data, continuous data? etc. The normality assumption applies to the residuals of an OLS model. You can check for multicollinearity in the predictors after the model fit.
Hi,

Yes I have adjusted my model, and used the VIF to remove the variables that had a too high VIF.
But by doing a shapiro test on the residuals of my response variable which is a lifetime, I rejected the null hypothesis.
So I don't know if having residuals from a normal distribution is one of the application conditions or not?
Can you please tell me more?
 

Buckeye

Active Member
#4
Did you fit a plain old linear regression or something else? Generally, the coefficient estimates are robust to violations of normality for large sample sizes. Statistical inference is affected more by missing/inappropriate predictors than by non-normality of residuals. Personally, I don't like running a ton of statistical tests and rather observe plots of the data. If you fit a GLM, then I wouldn't expect the residuals to be normally distributed. Can you post some plots?

PS: You can fit a Gaussian GLM via maximum likelihood and it should come out to the same coefficients as OLS I'm pretty sure.
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
Great insights @Buckeye

I believe the ML and OLS model are only identical in the case of one predictor. Otherwise they can deviate some. Yes I would also like to hear about the sample size and number of and type of predictors.
 
#6
If you are doing a maximum likelihood estimate based on the assumption that the data y (conditionally on the X) is independently normally distributed with a constant variance (that is the same as saying that the residuals are normally independently distributed) then the estimator will be the same as ordinary least squares (OLS).

beta_hat = (X'X)^-1*X'y

It will be the same formula (for both Max Like and OLS) even if there is one or several independent variabels.

GLM means generalized linear models. That includes distributions from the exponential family, like the normal distribution, the binomial distribution, the Poisson distribution, the exponential distribution, the gamma distribution and others.

Then the formula will be iteratively reweighted least squares.