One for the pros: conditional logistic regression discrepancy

pisti

New Member
#1
Ok here´s the problem. I am using R version 2.8.0 on a Windows XP machine. I am not a statistician.

I am trying to perform a conditional univariate regression in a matched case control group.

The raw data look like this (matset=match set, case=diseased patients, retr=an exposure); cases are matched with controls 1:3:

> m
matset case retr
1 1 1 0
2 1 0 1
3 1 0 0
4 1 0 1
5 2 1 0
6 2 0 0
7 2 0 0
8 2 0 0
9 3 1 1
10 3 0 0
11 3 0 0
12 3 0 0
13 4 1 1
14 4 0 0
15 4 0 1
16 4 0 0
17 5 1 1
18 5 0 0
19 5 0 0
20 5 0 0
21 6 1 1
22 6 0 0
23 6 0 0
24 6 0 0
25 7 1 0
26 7 0 1
27 7 0 0
28 7 0 0
29 8 1 1
30 8 0 0
31 8 0 1
32 8 0 1
33 9 1 1
34 9 0 1
35 9 0 0
36 9 0 1
37 10 1 1
38 10 0 0
39 10 0 0
40 10 0 1
41 11 1 0
42 11 0 0
43 11 0 0
44 11 0 0

Calling the appropriate function works like a dream:

> clogit(case~retr+strata(matset))
Call:
clogit(case ~ retr + strata(matset))


coef exp(coef) se(coef) z p
retr 1.70 5.45 0.84 2.02 0.043

Likelihood ratio test=4.86 on 1 df, p=0.0275 n= 44

I have just one tiny problem. The above data are fake! In the real data I have three cases more with the exposure "retr", which means (at least to me) that I have an even stronger case for assuming that there´s an association between the exposure "retr" and case. Unfortunately, however, conditional logistic regression does not work with the real data. Calling the above function with the authentic data yields the following result:

> cl <- clogit(case~retr+strata(matset))
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Ran out of iterations and did not converge

and more: ....

> summary(cl)
Call:
coxph(formula = Surv(rep(1, 44L), case) ~ retr + strata(matset),
method = "exact")

n= 44
coef exp(coef) se(coef) z p
retr 21.8 2.99e+09 13666 0.00160 1

exp(coef) exp(-coef) lower .95 upper .95
retr 2.99e+09 3.35e-10 0 Inf

Rsquare= 0.32 (max possible= 0.5 )
Likelihood ratio test= 17.0 on 1 df, p=3.79e-05
Wald test = 0 on 1 df, p=0.999
Score (logrank) test = 13.4 on 1 df, p=0.000257


Nice. The lower .95 confidence interval has become 0. There goes this association. My boss will be delighted! How can this be?

I appreciate any helpful comment.

Greetings, Pisti
 
#2
I don't know R, but can I have another question on top of yours -- using a binary to predict another binary with logistic regression is strange to me (by conceiving the probability curve). What is the difference between logistic and linear here?
 

pisti

New Member
#3
Hm don´t know exactly, but.

Dear owenpediatrica,

good point. I am not a statistician, just a numbercruncher working on a need to know basis.

Using "normal" logistic regression I would include non-categorical data (such as age) as well as categorical (such as ***).

I use conditional logistic regression because I believe that it more accurately accounts for the matching that I have done (I have matched one case with three controls according to s_e_x).See also chapter 16 of the book available for free here.

Greetings, P
 
#4
Thanks for the info,
I am not sure, but it looks like the power of your test is not enough, perhaps because of the small sample size and small inter-group difference. If it is true, you might find similar results with other statistical tests. Regarding inter-group difference -- have you observed the contingency table of the two factors?
 
#5
I don't know R either, although I can follow what you're doing.

If I understand you correctly, you have had to change some of your cases or retrs to 0 to get it to run.

Without checking it out thoroughly, my guess is you have Complete Separation (or close to it). What that would mean is that there aren't enough retr=0 that match up with case=1 (or some other combination). R can't calculate the odds of success if there are no successes in one of your conditions. The odds become infinite.

Owenpediatricia is right that doing a contingency table will make this clear.

The only thing to do is what you did. Change some of your data to a more conservative situation. Realize that your results are biased conservatively. The only other options in this situation are to collect more data or collapse categories of your predictor, but that one won't work here since you have only two categories.

And in response to other questions here:
1. Lack of power doesn't lead to non-convergence. Just high p-values.
2. It's totally fine to use logistic regression with a single categorical predictor, just as you can use linear regression with a single categorical predictors. Log-Linear models were set up for this situation, just as ANOVA is, but hey, if it works and you can understand it, don't bother with something more complicated that will give you the same results another way. And I have no idea about whether Log-Linear models handle the matching.

Karen
 

pisti

New Member
#6
Thank you for your comments.

Ok, I will try to explain the problem with contingency tables.

Let´s assume the raw data look like this:
Code:
> table(case,retr)
   
     0  1
  0 24  9
  1  4  7
meaning: of 11 cases (lower line), 7 have the exposure while 4 have not.
While the odds ratio (OR) is not significant...
Code:
> fisher.test(table(case,retr))

        Fisher's Exact Test for Count Data

data:  table(case, retr)
p-value = 0.06681
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.890246 26.417833
sample estimates:
odds ratio
  4.484251
...the result of conditional logistic regression is:
Code:
> e5clogit <- clogit(case~retr+strata(matset),data=e5)
> summary(e5clogit)
Call:
coxph(formula = Surv(rep(1, 44L), case) ~ retr + strata(matset),
    data = e5, method = "exact")

  n= 44
    coef exp(coef) se(coef)    z     p
retr 1.70      5.45     0.84 2.02 0.043

    exp(coef) exp(-coef) lower .95 upper .95
retr      5.45      0.183      1.05      28.3

Rsquare= 0.105   (max possible= 0.5 )
Likelihood ratio test= 4.86  on 1 df,   p=0.0275
Wald test            = 4.08  on 1 df,   p=0.0435
Score (logrank) test = 4.8  on 1 df,   p=0.0285
Good. Now again, the above data is fake. I have *reduced* the number of cases with the exposure, so that conditional logistic regression using the R function clogit() works. The real data look like this in the contingency table:
Code:
> table(case,retr)
   
     0  1
  0 24  9
  1  1 10
So in reality, most cases (i.e. 10 of 11) have the exposure, while only 9 of 33 controls have it. To me this looks like a stronger case for an association between the exposure in question and the cases. The OR indeed becomes significant...
Code:
> fisher.test(table(case,retr))

        Fisher's Exact Test for Count Data

data:  table(case,retr)
p-value = 0.000311
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
    2.825337 1194.820962
sample estimates:
odds ratio
  24.51318
...nevertheless, the output from conditional logistic regression is different:
Code:
> e1clogit <- clogit(case~retr+strata(matset),data=e1)
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge
> summary(e1clogit)
Call:
coxph(formula = Surv(rep(1, 44L), case) ~ retr + strata(matset),
    data = e1, method = "exact")

  n= 44
    coef exp(coef) se(coef)       z p
retr 21.8  2.99e+09    13666 0.00160 1

    exp(coef) exp(-coef) lower .95 upper .95
retr  2.99e+09   3.35e-10         0       Inf

Rsquare= 0.32   (max possible= 0.5 )
Likelihood ratio test= 17.0  on 1 df,   p=3.79e-05
Wald test            = 0  on 1 df,   p=0.999
Score (logrank) test = 13.4  on 1 df,   p=0.000257
As you can see, the lower .95 confidence interval now is 0. Furthermore, the Likelihood ratio test and the Wald test yield conflicting results.

Now due to the warning message...
Code:
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge
...I tried to up the number of iterations using a the function coxph.control(max.iter=10e10), but that didn´t help.

The whole case control study investigates more exposures (which I haven´t listed here), all of which, however, have nonsignificant odds ratios.

Maybe, as Karen suggests, the categories are too separated. Maybe there are too few cases that *lack* the exposure in question here (10 of 11).

Anyway. Collecting more data is not an option, because we simply (or luckily) did not have more patients with the condition at that moment. I did extend the number of controls from 22 (2:1 match) to 33 (3:1 match), but the problem remained.

I hope the contingency tables make the situation a bit more transparent.

I appreciate further helpful comments.

Thanks, P
 
#7
Thanks for TheAnalysisFactor's clarification.
What I meant is -- to conceptualise, your 95% CI estimation is between 0 and infinite, which means You might not have enough sample to get a shorter 95% CI (= a defined solution).
But sorry I do not have a good answer to your problem. Is it really so bad to use Fisher's Exact in your case? Why bother using matched tests? I would use Fisher's Exact.
 
#8
Yup, that's it. quasi complete separation.

You can google it to find more info. I just did and a bunch came up.

Or I know that Scott Menard's book "Applied Logistic Regression Analysis" has info on it. It's one of those little green Sage books.

You only have the 3 choices I gave you before, and since two don't work, what you did is your best choice.

It's really an ironic problem to have--your predictor is so good you can't measure how good it is.

And owenpediatricia, thanks for the clarification about power. That makes sense. And something you said made me think of one more thing.

The Fisher's Exact does seem a good idea, but I think reviewers will balk if you ignore the matching. I would suggest looking into StatXact software. It can do just about anything on categorical data using exact p-values. I don't know if you can do a conditional logit, but I would try to find out. They have the absolute best manuals I've ever seen in statistical software and would at least explain it well.

At the very least, you probably don't have to change 3 observations. One should do it.

Karen
 

pisti

New Member
#9
Thanks

Dear Karen and owenpediatrica,

thank you for your hints.

A method that accounts for the matching would be great. As Karen said, one has to get past the reviewers in these days.

Hopefully there is something in R that can manage this degree of separation. I have to ask some R gurus.

Greetings, P