Binomial model with 'complete separation' ...?

#1
Hello All,

I have a data set that is similar to this..

Code:
structure(list(d.in.p = c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), time2 = c(0, 0.30209836793605, 
0.292033542976939, 0.00731537389688806, 0, 0, 0.0249544211485871, 
0, 0.368636194723151, 0, 0.975045578851413, 1, 0.631363805276849, 
1, 1, 0.69790163206395, 0.707966457023061, 0.992684626103112, 
1, 1), CR = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1), F.name = structure(c(3L, 5L, 6L, 15L, 9L, 7L, 13L, 
11L, 2L, 4L, 13L, 11L, 2L, 4L, 3L, 5L, 6L, 15L, 9L, 7L), .Label = c("", 
"10a", "10b", "11b", "2c", "4c", "5c", "6a", "6c", "7a", "8a", 
"8b", "9a", "9b", "z"), class = "factor")), .Names = c("d.in.p", 
"time2", "CR", "F.name"), row.names = c("7", "35", "99", "112", 
"149", "160", "17", "31", "125", "137", "171", "311", "1251", 
"1371", "72", "351", "991", "1121", "1491", "1601"), class = "data.frame")

Where I would like to run model to how time works with being chosen or rejected 'CR' in the data frame..

when I run code like this..
Code:
glm(CR~time2,family=binomial,data=d4)
error messages like this.. 'Warning messages:
1: glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred



I understand the problem.. and I found some ideas in other threads, such as increasing the iteration number, but none have worked so far..

Can anyone help?
 

Dason

Ambassador to the humans
#2
Code:
> library(plyr)
> ddply(d4, .(CR), summarize, min = min(time2), max = max(time2))
  CR       min       max
1  0 0.0000000 0.3686362
2  1 0.6313638 1.0000000
We can see that the max for time2 when CR=0 is .368 and the min for time2 when CR=1 is .631. This kind of separation causes problems with logistic regression. Basically the parameters aren't estimable because the coefficient on time2 will diverge to infinity in attempting to get as close to the model "predict 1 for time2 > c, predict 0 for time2 <= c" for some c between .36 and .63.
 
#3
Dang... Is there any way I can generate a bunch of data that mimics the distribution of the data I have presented.. Then maybe there would be some overlap?
 
#4
... I was able to use the brglm package to get the model up and running.. I know this corrects for some sort of biased in a binomial model but not exactly sure how it worked..

just thought I share..