Hello,
I have a cancer dataset of 98 observations. Cancer detection rate was determined for 2 detection modalities (C and S). One of the independent variables of interest was a 3 tiered scoring system (possible scores: 3, 4, and 5). On univariate logistic regression, the score was statistically significant for C but not for S. The coefficients and SE are very different between the two models. However, when I look at the tables for C and S side-by-side, they look remarkably similar. I am using R, the output is below.
Is there a straightfoward way to explain the math on this? Why are the standard errors so wildly different? Is it related to the 4 additional cancer detection observations in C or perhaps 0 cancer detections for S with a score of 3? I am using these lecture notes as a guide for the math but having a difficult time wrapping my head around it: https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture26.pdf
Thanks!
Call:
glm(formula = cancer_detect_C ~ score, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.607 -1.101 0.802 0.802 2.297
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.565 1.038 -2.472 0.01345 *
score4 2.383 1.081 2.204 0.02752 *
score5 3.534 1.096 3.223 0.00127 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 135.82 on 97 degrees of freedom
Residual deviance: 114.89 on 95 degrees of freedom
AIC: 120.89
Number of Fisher Scoring iterations: 5
Call:
glm(formula = cancer_detect_S ~ score, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.49929 -1.06331 -0.00013 0.88661 1.29596
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.57 1743.25 -0.011 0.992
score4 18.29 1743.25 0.010 0.992
score5 19.30 1743.25 0.011 0.991
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 135.49 on 97 degrees of freedom
Residual deviance: 110.62 on 95 degrees of freedom
AIC: 116.62
Number of Fisher Scoring iterations: 17
I have a cancer dataset of 98 observations. Cancer detection rate was determined for 2 detection modalities (C and S). One of the independent variables of interest was a 3 tiered scoring system (possible scores: 3, 4, and 5). On univariate logistic regression, the score was statistically significant for C but not for S. The coefficients and SE are very different between the two models. However, when I look at the tables for C and S side-by-side, they look remarkably similar. I am using R, the output is below.
Is there a straightfoward way to explain the math on this? Why are the standard errors so wildly different? Is it related to the 4 additional cancer detection observations in C or perhaps 0 cancer detections for S with a score of 3? I am using these lecture notes as a guide for the math but having a difficult time wrapping my head around it: https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture26.pdf
Thanks!

Call:
glm(formula = cancer_detect_C ~ score, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.607 -1.101 0.802 0.802 2.297
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.565 1.038 -2.472 0.01345 *
score4 2.383 1.081 2.204 0.02752 *
score5 3.534 1.096 3.223 0.00127 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 135.82 on 97 degrees of freedom
Residual deviance: 114.89 on 95 degrees of freedom
AIC: 120.89
Number of Fisher Scoring iterations: 5
Call:
glm(formula = cancer_detect_S ~ score, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.49929 -1.06331 -0.00013 0.88661 1.29596
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.57 1743.25 -0.011 0.992
score4 18.29 1743.25 0.010 0.992
score5 19.30 1743.25 0.011 0.991
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 135.49 on 97 degrees of freedom
Residual deviance: 110.62 on 95 degrees of freedom
AIC: 116.62
Number of Fisher Scoring iterations: 17