Binary mixed-model logistic regression using lmer() of lme4 for multilevel analysis

#1
In my experiment, each patient has been observed twice, once using placebo and the other time using the real treatment. I want to assess the predictive effects of the treatment and patients' demographics (age, sex, etc.). Therefore I have for example 100 patients, but actually 200 pairs of treatment (0 and 1 for placebo and the real treatment, respectively) and disease (0 and 1 for the absence or presence of the disease).

This way, I have duplicated (copied & pasted) my 100 rows of patients' demographics into 200 rows and have distinguished these two 100-row bundles of data by the variable "treatment" (100 rows with treatment = 0, 100 copied rows with treatment = 1).

Now I need to run a mixed model regression to account for the multilevel design and avoid atomistic fallacy (thanks to Greta).

I have written the following R syntax, but there are some problems: first that the P values are really disappointing (all above 0.8, although they might be correct). Second that I don't know if the code is correct in the first place or not!!

My code is:
Code:
(mixed <- lmer(Disease ~ Gender + Age + Demographic3 + Demographic4 + TREATMENT 
             + Age*Demographic4 + Age*TREATMENT + Age*Gender + Age*Demographic3 + Demographic4*TREATMENT + Demographic4*Gender 
             + Demographic4*Demographic3 + TREATMENT*Gender + TREATMENT*Demographic3 + Gender*Demographic3 
             
             + (Age          | PatientID) 
             + (Gender       | PatientID) 
             + (Demographic3 | PatientID) 
             + (Demographic4 | PatientID)
 
             , family=binomial(logit), MixedModelData))
After converging, R warns
Code:
In mer_finalize(ans) : false convergence (8).
I would be so grateful if you (looking at you Jake!) could check if my syntax is correct. I want R to treat the four demographics as repeated-measures or nested within the duplicated patients. Each patient has a unique PatientID [100 PatientIDs repeated in 200 cases].

Many thanks. :)
 

spunky

Doesn't actually exist
#2
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

and if someone can come up with the GEE version of it you get a free (virtual) cookie from me!
 

Jake

Cookie Scientist
#3
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

It looks like your model is misspecified in a few different ways.

First of all, your random effects specification attempts to fit 4 separate random intercepts for the same units, 1 for each parenthesis block. The correct syntax for this would be to put all 4 of these random slopes in a single parenthesis block like so: (Age+Gender+Demograpic3+Demographic4|PatientID)

Second and more importantly, you can't actually estimate random effects for these demographic variables. In a two-level model like you have here, it is only possible to estimate random effects of predictors that vary within the experimental units, not ones that vary only between the units. I think the correct model should look more like this:
Code:
(mixed <- lmer(Disease ~ (Gender + Age + Demographic3 + Demographic4 + TREATMENT)^2 
             + (TREATMENT | PatientID) 
             , family=binomial(logit), data=MixedModelData))
And naturally, because we have a lot of interactions terms here (all two-way interactions), I would make sure the predictors are centered for interpretation reasons.
 
#4
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Thanksss Jake :)

First of all, your random effects specification attempts to fit 4 separate random intercepts for the same units, 1 for each parenthesis block. The correct syntax for this would be to put all 4 of these random slopes in a single parenthesis block like so: (Age+Gender+Demograpic3+Demographic4|PatientID)
Second and more importantly, you can't actually estimate random effects for these demographic variables. In a two-level model like you have here, it is only possible to estimate random effects of predictors that vary within the experimental units, not ones that vary only between the units.
Awesome! I guess you mean that in each bundle (treatment = 0 or 1), there might be some variation and thus effect for the demographics, but not between the copied bundles. Then perhaps this is why all P values were close to 1.0 before. Thanks. :)

Also thanks for the squared parentheses hint which equals all the main effects plus their 2-way interactions.

..........................

I ran the new code (both with and without interactions), but the same warning happened each time and the P values were around 0.9. Perhaps the real model has no good P values (and as you mentioned the model was almost free of multicollinearity [centered variables and VIFs less than 2 in a single level regression]) which can be sad news, but why that warning message?
 

Jake

Cookie Scientist
#5
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

The false convergence warning is kind of a famously vague warning sometimes given by the optimizer used in the current version of lme4. (By the way, a newer version of lme4 that uses a different optimizer has been submitted to CRAN and reportedly should be available from CRAN within a week.) The warning generally is a sign that the model you have tried to fit is too complex for the data to support.

Try fitting the following model:
Code:
(mixed2 <- lmer(Disease ~ (Gender + Age + Demographic3 + Demographic4 + TREATMENT)^2 
             + (1 | PatientID) + (0 + TREATMENT | PatientID) 
             , family=binomial(logit), data=MixedModelData))
And let me know if you still get the warning message. If you do, I would next try eliminating the random TREATMENT slope.
 

spunky

Doesn't actually exist
#6
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

this is a small chat Jake and i had about your thread victor. Jake considered it worthwhile to add it here so i'm copy-pasting our conversation:


spunky: Jaaaaake!! what does the ()^2 thing does in the code you gave victor? never seen it before

Jake: you can enclose groups of predictors in parentheses and raise the block to ^n to add all interactions between the enclosed predictors up to order n

Jake: so ^2 adds all possible two-ways, ^3 adds all possible three-ways and two-ways, etc.

spunky: awesome! now, i haven't used lme4 in a while so just as a refresher. when you did (1 | PatientID) + (0 + TREATMENT | PatientID) what you're asking is for a random intercept of TREATMENT across PatientID and a random slope for TREATMET across PatientID and the "0" at first is to prevent a correlation between the two, right?

Jake: essentially yes. technically the 0 says to suppress the intercept. the reason for this being that we already added the intercept in the previous parenthesis block, and unfortunately lmer is not smart enough to recognize when you've instructed it to estimate the same effect twice, so it will really try to do it...

Jake: the lack of correlation comes from the fact that the effects are put in separate parenthesis blocks. all predictors written in the same block get their covariance estimated, while predictors written in different parenthesis blocks are assumed uncorrelated

Jake: so we have to put the intercept in 1 block, and the slope in another block. but since by default just saying (x | group) also adds a random intercept for group, we had to manually suppress it

Jake: i expect what we will find in victor's output is that the variance in the random treatment slopes is close to 0

Jake: but we'll see
 
#7
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Thanks a lot Jake. I couldn't quite understand the (1 | PatientID) and (0 + TREATMENT | PatientID) probably since I should study well the mixed models. Another thing is that this function is 1000x faster than SPSS' version.

Jake: i expect what we will find in victor's output is that the variance in the random treatment slopes is close to 0
Unfortunately so true. :( The new code reduced the P values for about 0.1 in general, but still around 0.8 and greater.
 

Jake

Cookie Scientist
#8
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Note that I was not talking about the fixed treatment effect, but rather the variance of the random treatment effects. If you post the glmer() output of the model I mentioned I can point out to you what I'm talking about.
 
#9
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Thanks for clarification :)

Also I bootstrapped it using the following code, and it gave an error after 1.5 hours of running 50 replicates.

The code:

Code:
library (boot)

mixedGLM <- function(formula, data, indices) {
  d <- data[indices, ]
  (fit <- lmer(DV ~ (Demo1 +Demo2+Demo3 +Demo4 +Trt)^2 
             + (1 | PatientID) + (0 + Trt | PatientID)
             , family=binomial(logit), d))
    return(coef(fit))
}

results <- boot(data=MixedModelData4 , statistic = mixedGLM, R= 50, formula= DV~Demo1 +Demo2 +Demo3 +Demo4 +Trt)

results
The errors:

Code:
Error in t.star[r, ] <- res[[r]] : 
  incorrect number of subscripts on matrix
In addition: There were 50 or more warnings (use warnings() to see the first 50)
Code:
> warnings()
Warning messages:
1: In mer_finalize(ans) : false convergence (8)
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
3: glm.fit: fitted probabilities numerically 0 or 1 occurred
4: glm.fit: fitted probabilities numerically 0 or 1 occurred
5: In mer_finalize(ans) : false convergence (8)
6: glm.fit: fitted probabilities numerically 0 or 1 occurred
7: In mer_finalize(ans) : false convergence (8)
8: In mer_finalize(ans) : false convergence (8)
9: glm.fit: fitted probabilities numerically 0 or 1 occurred
10: In mer_finalize(ans) : false convergence (8)
11: glm.fit: fitted probabilities numerically 0 or 1 occurred
12: In mer_finalize(ans) : false convergence (8)
13: glm.fit: fitted probabilities numerically 0 or 1 occurred
14: In mer_finalize(ans) : false convergence (8)
15: glm.fit: fitted probabilities numerically 0 or 1 occurred
16: In mer_finalize(ans) : false convergence (8)
17: glm.fit: algorithm did not converge
18: glm.fit: fitted probabilities numerically 0 or 1 occurred
19: In mer_finalize(ans) : false convergence (8)
20: glm.fit: fitted probabilities numerically 0 or 1 occurred
21: In mer_finalize(ans) : false convergence (8)
22: In mer_finalize(ans) : false convergence (8)
23: glm.fit: fitted probabilities numerically 0 or 1 occurred
24: In mer_finalize(ans) : false convergence (8)
25: glm.fit: algorithm did not converge
26: glm.fit: fitted probabilities numerically 0 or 1 occurred
27: glm.fit: fitted probabilities numerically 0 or 1 occurred
28: In mer_finalize(ans) : false convergence (8)
29: glm.fit: algorithm did not converge
30: glm.fit: fitted probabilities numerically 0 or 1 occurred
31: In mer_finalize(ans) : false convergence (8)
32: glm.fit: algorithm did not converge
33: glm.fit: fitted probabilities numerically 0 or 1 occurred
34: glm.fit: fitted probabilities numerically 0 or 1 occurred
35: In mer_finalize(ans) : false convergence (8)
36: glm.fit: fitted probabilities numerically 0 or 1 occurred
37: In mer_finalize(ans) : false convergence (8)
38: glm.fit: fitted probabilities numerically 0 or 1 occurred
39: In mer_finalize(ans) : false convergence (8)
40: glm.fit: fitted probabilities numerically 0 or 1 occurred
41: glm.fit: fitted probabilities numerically 0 or 1 occurred
42: In mer_finalize(ans) : false convergence (8)
43: glm.fit: fitted probabilities numerically 0 or 1 occurred
44: glm.fit: fitted probabilities numerically 0 or 1 occurred
45: In mer_finalize(ans) : false convergence (8)
46: glm.fit: fitted probabilities numerically 0 or 1 occurred
47: In mer_finalize(ans) : false convergence (8)
48: In mer_finalize(ans) : false convergence (8)
49: glm.fit: fitted probabilities numerically 0 or 1 occurred
50: In mer_finalize(ans) : false convergence (8)
Can you pretty please tell me what is the bug? I should add the same bootstrap syntax worked fine for fixed-effects (simple) binary logistic regression of mine, although it gave warnings similar to the warnings of this one.
 
#10
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Spunky, I ran a GEE model but it terminated with an error:

Code:
library(gee)

GEE <- gee(DV ~ (Demo1 +Demo2 +Demo3 +Demo4 +Trt)^2, PatientID,
                 MixedModelData4,
                 family=binomial(logit))
error:
Code:
Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27

running glm to get initial regression estimate
       ...All the coefficients for the variables and interactions in a simple GLM...

Error in gee(DV ~ (Demo1 +Demo2 +Demo3 +Demo4 +Trt)^2,  : 
  Cgee: error: logistic model for probability has fitted value very close to 1.
estimates diverging; iteration terminated.
 
#11
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Ok thanks to my dear friends (Jake and Henrik), I could finally and "hopefully" (if the destroyer of hope allows) do a bootstrapped mixed model binary logistic regression. The syntax is as follows:

Code:
library(afex)
results5 <- mixed(DV ~ (Demo1+Demo2+Demo3 +Demo4 +Trt)^2 
                  + (0 + Trt | PatientID),
                  family=binomial(logit), data = MixedModelData5,
                  method = "PB", args.test = list(nsim = 2))
 
#12
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Victor didn't supply a reproducible example so an other user on another site did that.
(I just don't want to hide a cross posting discussion.)


And I added (and deleted) a little:

Code:
library(lme4)



# Look at an example =======================================

# Create some data


set.seed(2708) 

wide.df <- data.frame(id=1:100 ,  Age=rnorm(100,40,10), trt1=rbinom(100,1,0.5),
                      disease1=rbinom(100,1,0.5), disease2=rbinom(100,1,0.5)  , 
                      Gender =  sample(c(0,1), 100,replace=TRUE),
                      Demo1  =  sample(c(0,1), 100,replace=TRUE),
                      Demo2  =  sample(c(0,1), 100,replace=TRUE),
                      Demo3  =  sample(c(0,1), 100,replace=TRUE), 
                      ind.level = rnorm(100,mean=0,sd=1)   )



wide.df$trt2 <- abs(wide.df$trt1 -1)

head(wide.df)


# Restrucure for multilevel model
long.df <- reshape(wide.df , 
                   varying=list(c("trt1","trt2"),c("disease1","disease2")),
                   v.names=c("trt","disease"),
                   times=1:2 , 
                   direction="long")

#setting up expected values to simulate from, "p" in binomial distribuion
long.df$p <- 1/(1+exp(-(-1+long.df$ind.level+0.5*long.df$Gender+
         0.2*long.df$Demo1+0.3*long.df$Demo2+0.40*long.df$Demo3+0.5*long.df$trt )))

hist(long.df$p)

#OR ODDS RATIO:
#exp(0.5) 
  
long.df$disease3 <-  rbinom(200,1 ,long.df$p)

table(long.df$disease3)


# Reorder to look at
long.df<-long.df[order(long.df$id) ,]

long.df[1:10,]

# long.df[1:20, c(1,2,3,10,11,12)]


# estimates
#my glmer version: 0.99999-0

(res1 <- glmer(disease3 ~ (Gender+Demo1+Demo2+Demo3+Age + trt) 
              + (0 + trt | id ),
              family=binomial(logit), data = long.df ))


(res2 <- glmer(disease3 ~ (Gender+Demo1+Demo2+Demo3+Age + trt) 
        +(1   | id )  ,
        family=binomial(logit), data = long.df ))


(res3 <- glmer(disease3 ~ (Gender+Demo1+Demo2+Demo3+Age + trt) 
        +(1   | id )+ (0 + trt | id ),
        family=binomial(logit), data = long.df ))


(res4 <- glmer(disease3 ~ (Gender+Demo1+Demo2+Demo3+Age + trt) +
          (1 + trt | id ),
        family=binomial(logit), data = long.df ))

I got the impression that Victor preferred the res1 model. But I am not sure why. I guess that many would start with the res2 model.

From the estimates I get the impression that the res3 is over parameterised, by having both a random intercept and a random slope. And res4 seems even more so. But it is interesting and instructive to see different models.
 
#13
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Thanks a lot dear Greta. This is just Awesome especially with your superb explanations.

Btw, what does over parametrized mean? Is having both random slope and intercept something good?
 

Jake

Cookie Scientist
#14
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

It is definitely not appropriate to go with the res1 model, which omits the random intercepts for subjects. When you estimate these effects there turns out to be a lot of variance in the random intercepts, so a model that ignores this variance (as res1 does) should not be used. I would go with res2 or res3 (assuming you can get stable estimates out of res3, which maybe you can't).
 
#15
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

“Over parametrization” is when you try to estimate more parameters “than you have data for”. The model “res3” seems to be over parametrized. It's estimate come close to zero (with these data).

Is there an easy way to get a parameter estimate for “p” for for example females and its confidence interval?

Is there an easy way to get a parameter estimate for “p” for for example females and its confidence interval?

Is this OK for a confidence interval?

The point estimate:

\( \hat{p} = \frac{1}{1+exp(-(\hat{\alpha}+\hat{\beta}1 ))} \)

where "1" indicates for females.

And the confidence intervals given by:

\( Low_{p} = \frac{1}{1+exp(-(\hat{\alpha}+\hat{\beta}1 -1.96\sqrt{stderr^2_\alpha +stderr^2_\beta+2cov(\hat{\alpha},\hat{\beta} } ))} \)

And the upper interval:

\( Upper_{p} = \frac{1}{1+exp(-(\hat{\alpha}+\hat{\beta}1 +1.96\sqrt{stderr^2_\alpha +stderr^2_\beta+2cov(\hat{\alpha},\hat{\beta} } ))} \)

Where the covariance (cov()) is estimated from:

\( cov(\hat{\alpha},\hat{\beta} )= stderr_\alpha stderr_\beta Corr(\hat{\alpha}, \hat{\beta}) \)

Let's pretend that the large sample properties hold.

Or is there any kind of "confint" command for this type of confidence interval?
 

Jake

Cookie Scientist
#16
Re: Binary mixed-model logistic regression using lmer() of lme4 for multilevel analys

Greta, yes. In the soon-to-be-replaced version of lme4 that is currently on CRAN, there is no built-in function for doing this, but one can accomplish it using the following code recipe: LINK

I believe that this code will produce the same standard errors / confidence intervals that you wrote in your post.

In the development version (soon to be the official CRAN version), there is built-in functionality for this, I think through the predict() method for merMod objects.