# Trouble understanding univariate logistic regression using categorical data

#### Uroboy

##### New Member
Hello,

I have a cancer dataset of 98 observations. Cancer detection rate was determined for 2 detection modalities (C and S). One of the independent variables of interest was a 3 tiered scoring system (possible scores: 3, 4, and 5). On univariate logistic regression, the score was statistically significant for C but not for S. The coefficients and SE are very different between the two models. However, when I look at the tables for C and S side-by-side, they look remarkably similar. I am using R, the output is below.

Is there a straightfoward way to explain the math on this? Why are the standard errors so wildly different? Is it related to the 4 additional cancer detection observations in C or perhaps 0 cancer detections for S with a score of 3? I am using these lecture notes as a guide for the math but having a difficult time wrapping my head around it: https://web.stanford.edu/class/archive/stats/stats200/stats200.1172/Lecture26.pdf

Thanks!

Call:
glm(formula = cancer_detect_C ~ score, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.607 -1.101 0.802 0.802 2.297

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.565 1.038 -2.472 0.01345 *
score4 2.383 1.081 2.204 0.02752 *
score5 3.534 1.096 3.223 0.00127 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 135.82 on 97 degrees of freedom
Residual deviance: 114.89 on 95 degrees of freedom
AIC: 120.89

Number of Fisher Scoring iterations: 5

Call:
glm(formula = cancer_detect_S ~ score, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.49929 -1.06331 -0.00013 0.88661 1.29596

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -18.57 1743.25 -0.011 0.992
score4 18.29 1743.25 0.010 0.992
score5 19.30 1743.25 0.011 0.991

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 135.49 on 97 degrees of freedom
Residual deviance: 110.62 on 95 degrees of freedom
AIC: 116.62

Number of Fisher Scoring iterations: 17

#### hlsmith

##### Less is more. Stay pure. Stay poor.
univariate analyses only have one variable, say like a mean of a group or a proportion. You have bivariate, a IV and DV, right. For convention's sake, I would just say you had logistic regression with one IV and not try to use the univariate or bivariate terms, since multivariate actually means multiple DVs. All semantics, but will help you convey your questions and results.