I have this complex toy example trough which I'm trying to learn how I can apply generalized linear models (a logit model to be exact where the response has two levels) to a research problem I'm battling. However, since I have very limited experience with these types of models, I need help to interpret the validity/goodness of the model.
I've been reading for days about the subject so I know a bit, but for example, how should the plots and the anova table be interpreted? Please help me out...
Below are some results/printouts from my model. Keep in mind that the problem is hard, i.e., the success rate is low, around 60% with a optimal cut-off.
SUMMARY:
ANOVA:
WALD TEST:
Plots:
I've been reading for days about the subject so I know a bit, but for example, how should the plots and the anova table be interpreted? Please help me out...
Below are some results/printouts from my model. Keep in mind that the problem is hard, i.e., the success rate is low, around 60% with a optimal cut-off.
SUMMARY:
Code:
Call:
glm(formula = truth ~ factor_1 + factor_3 + factor_4 + factor_6 + factor_7 + factor_10 +
factor_11 + factor_12 + factor_13 + factor_14 + factor_15 + factor_17 + factor_18, family = binomial("logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.10579 -1.14359 0.07018 1.15778 4.07823
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.006e+00 4.910e-02 20.484 < 2e-16 ***
factor_1 -1.518e-02 5.708e-04 -26.591 < 2e-16 ***
factor_3 -1.230e-01 5.697e-03 -21.588 < 2e-16 ***
factor_4 1.498e-02 6.399e-03 2.342 0.019188 *
factor_6 4.784e-01 2.831e-02 16.896 < 2e-16 ***
factor_7 -1.736e-01 1.550e-02 -11.202 < 2e-16 ***
factor_10 -5.781e-07 2.958e-08 -19.546 < 2e-16 ***
factor_11 -1.110e-03 2.042e-04 -5.437 5.43e-08 ***
factor_12 -1.137e-01 3.306e-02 -3.439 0.000584 ***
factor_13 -8.764e-02 3.650e-02 -2.401 0.016338 *
factor_14 -3.583e-01 3.455e-02 -10.371 < 2e-16 ***
factor_15 -3.363e-01 4.176e-02 -8.052 8.16e-16 ***
factor_17 1.800e-06 6.156e-08 29.232 < 2e-16 ***
factor_18 -6.518e-03 3.175e-03 -2.053 0.040089 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 207944 on 149999 degrees of freedom
Residual deviance: 203171 on 149986 degrees of freedom
AIC: 203199
Code:
Analysis of Deviance Table
Model: binomial, link: logit
Response: truth
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL 149999 207944
factor_1 1 1393.15 149998 206551 < 2.2e-16 ***
factor_3 1 934.43 149997 205617 < 2.2e-16 ***
factor_4 1 77.46 149996 205539 < 2.2e-16 ***
factor_6 1 289.88 149995 205249 < 2.2e-16 ***
factor_7 1 365.69 149994 204884 < 2.2e-16 ***
factor_10 1 349.35 149993 204534 < 2.2e-16 ***
factor_11 1 36.79 149992 204497 1.315e-09 ***
factor_12 1 120.22 149991 204377 < 2.2e-16 ***
factor_13 1 162.69 149990 204214 < 2.2e-16 ***
factor_14 1 48.78 149989 204166 2.862e-12 ***
factor_15 1 71.28 149988 204094 < 2.2e-16 ***
factor_17 1 919.35 149987 203175 < 2.2e-16 ***
factor_18 1 4.22 149986 203171 0.04005 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Code:
Wald test
Model 1: truth ~ factor_1 + factor_3 + factor_4 + factor_6 + factor_7 + factor_10 + factor_11 + factor_12 +
factor_13 + factor_14 + factor_15 + factor_17 + factor_18
Model 2: truth ~ 1
Res.Df Df Chisq Pr(>Chisq)
1 149986
2 149999 -13 4483 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Plots:

Last edited: