Negative Binomial Mixed Effects Model

Hi everyone,

This is related to an earlier post. I'm studying repair time (in days) for vehicles which is a discrete count. I'm trying to remember the justification for including random effects? I'm considering repair state and vehicle make as random effects in a model amongst other variables. I imagine that each state and make has there own correlated responses. I have 2.8 million rows and 55 distinct makes. I heard that if the test set does not see a make that is in the training set then it will use the global mean for that observation. This sounds appealing, so I'm drawn towards this random effect idea. Any thoughts are appreciated.



Less is more. Stay pure. Stay poor.
What does the distribution of the outcome look like?

Two random effects? So three total levels? People clustered in makes clustered in States. Please describe the model some more :)
Well, I'm just brainstorming ideas for now. It's definitely a negative binomial distribution. The 99th percentile is 57 days. The 1% has values that extend past 100s of days. So, I needed to "zoom in" to really see this distribution. I'm trying to jog my memory of the use cases for random effects in particular. I'm reading that they don't take up degrees of freedom in the same way as fixed effects. I'm also wary about scenarios where there are unseen levels in the test set. I just hate lumping a variable like vehicle make in an "other" category. There are 50 states and 50+ makes and I always have this argument with myself that I'm using too many dfs. But, this is probably not the case.


Less is more. Stay pure. Stay poor.
Well you have alot of data - that is good. The age old example is when you have nested groups, like kids in classrooms or schools and that they are similar within those clusters, so you need to control for them and account for the within group variability that is explained. So you can control for the between group and within group variability. This process also can let you have random intercepts and random slopes. Random effects is when you have a variable that is at the group level not individual level. Sex, that is individual level. Speed limits, that may be state level. To mix things up more you can also have interactions across levels, say sex impacts the adherence of speed limits in states. Probably not the best example - but hopefully it helps.

Side note, most people don't go beyond 2-level, since you have to have theory and there can be model convergence issues, but you could have kids nested in classrooms, nested in schools. Also, if you have clusters, but not alot or difficulties modeling them, some people will just run OLS with robust SEs (sandwich SEs). I do this some times with I have a dataset and a few patients have repeated encounters, so they are in the dataset more than once and I don't want to drop them or use MLM, so I just run a OLS with robust SEs.
That's pretty slick. I was getting confused about when to include slopes/intercepts. The visual helps! There is a hierarchical nature to my data. Maybe not so much states and makes. More so repair shop type nested within state. There are two types of shops per state (A and B). Each repair shop type has its own "starting point" if you will. So, I'm thinking about letting the repair shop vary by state. I think I would have random slopes and intercepts in this case.
Last edited:


Less is more. Stay pure. Stay poor.
There is a quote that I can't remember who to attribute it to, but it goes,

"Once you learn multilevel models, all you can see are multi-level models" I probably butchered it, but there is some truth in it.

I once did a cross-level interaction in a project. I did not have that many groups but went for it anyway. I looked at people's radiation exposures (risks) during different settings in the hospital. And certain people and areas had higher risk, but when those people were in those areas of the hospital the risks were multiplicatively higher.
That's pretty cool from a stats perspective. But, probably terrifying from a patient perspective. Thankfully, most people don't need to be in the hospital for long periods of time. Unless they are employees! mwahahahaha!!
As mentioned earlier, I have a lot of data. 2.8 million rows which I subset further to key in on attributes of interest. Down to 1.3 million rows. The models I've tried thus far are taking a long time to fit. So, I'm thinking of building a model from 10% of these 1.3 million rows. So long as this sample is representative of the original 1.3 million data points. I'm thinking of sampling 10% and stratifying the data such that the outcome variable has the same distribution as the original data. Does this sound reasonable? One of the mixed models I'm trying is taking over 24 hours and counting on the server. A model with just random slopes and random intercepts for state and shop type took 3 hours. This model is over-dispersed likely due to omitted covariates. I ain't got time to wait days just to see if the model converges.


Active Member
regarding your data overflow, the only advice i could give is to try to compute the model using the sufficient statistics (ie summary statsistics appropriate to your model). What are those stats is probably too hard to solve for complicated mode though. I would consider using a generalized estimating equation, i think i mentioned in previous post. For a poison it will probably work out to be total counts and appropriate denominators in some partition of the data. hope that helps, or at least does little harm.


Less is more. Stay pure. Stay poor.
My 2 cents, yeah getting it to run on a random subsample is always a good idea. Most of the MLM models I have ran in the past were in SAS, I am guessing you are using R. In SAS if you sorted by the group ID before running the model, it was a known trick for decreasing the processing time. Not sure if this would carry over or not.

Traditionally MLM model results may not be too different from OLS or MLE estimates for fixed effects. The MLM allows you to account for the between and within group variability and get random effects, but depending on the model's purpose, it may be likely you can us the basic model and then have your MLM run over a weekend. Then compare the results.

Normally I don't have processing resource issues, but my currently project has me simulating a 10k dataset 10k's of times, so I had to get the whole thing to work using data from a small hospital then recycled the code to use it on a larger one - same as you - to get it to work prior to waiting an hour to find out I had a bug.
Thanks for the replies. hlsmith, I fit a fixed effects model on 100k rows and it ran in like 15 seconds. I standardized my numeric predictors and added a quadratic term. Deviance / degrees of freedom was ~ 1.02 and the residual plots looked alright. I use R and SAS. We have SAS Grid :cool:. I'm not too familiar with PROC GLIMMIX syntax. I used a lot of PROC MIXED in school. I will probably try some other things, but I think I'm on the right track. My problem is, I don't have much past experience looking at diagnostic plots. So, I don't know what is considered acceptable in some cases. I also don't know why I'm seeing resources online talk about normal QQ plots for a GLM. I don't remember considering that plot in the past.
Last edited:


Less is more. Stay pure. Stay poor.
So wouldn't residuals be important in any model to see if there may be a linearity issue or heteroscedasticity in errors. Making a linear model or use of standard SEs for precision estimates poor, respectively.

One of the first steps in MLM is to fit an empty model (e.g., DV ~ nothing) and put the random cluster ID in model to see how much variability is explained in the DV just by controlling for the clusters.

Glad to hear you are making progress.
I'm specifically referring to a normal QQ plot of the deviance residuals for a negative binomial model. Do we plot this to show that the selected distribution (i.e. negative binomial) is correct? I have high outliers, so the QQ plot shows the right skew. I read that we can have deviations in the QQ plot and still have the correct distribution. I understand plotting deviance by fitted values. For my model, the deviance by fitted plot shows a fairly constant band around 0 suggesting that non-linearity may not be a problem.