R - Logistic regression. Coding to get all predictor variables for categorical vars

#1
I'll preface my question by stating that I am new to using R. I am trying to use logistic regression on my data. I typed glm(formula = purchase ~ predictor vars separated by + sign, family=binomial, data=modeldata)
The solution does not produce the multiple intercept and beta coefficients for up to 10 categorical variables that I am expecting.
I am trying to replicate what I get using a subset of the data on SPSS. SPSS does generate the intercepts for all the levels of the categorical variables that fit the model . SPSS would be too slow to run the entire database.

My questions are:
If R is not reading these variables as categorical, how do I specify them as categorical?
When I get the output from R, will it generate just the final solution after attempting to fit all the predictors? Or will it only include those predictors that have a P(z) < .05?
How do I go about achieving my desired solution? Thanks for your attention.

William Cooper
 
Last edited:

Dason

Ambassador to the humans
#2
Re: R - Logistic regression. Coding to get all predictor variables for categorical va

Well what DO you get? It's hard to say what the issue is if we don't have your data and don't have your output. Basically at the moment all we have is "it's not quite giving me what I want" and it's really hard for us to diagnose anything with that amount of information.
 
#3
Re: R - Logistic regression. Coding to get all predictor variables for categorical va

When I use the command str(modeldata), I get this
$ Age : int 38 36 46 42 68 44 42 54 40 42 ...
$ BankCard : int 1 0 1 1 1 1 1 1 1 1 ...
$ Cat : int 0 0 0 0 0 0 1 0 1 1 ...
$ Dog : int 0 0 0 0 1 0 0 0 1 0 ...
$ DwellingType : int 1 1 1 1 1 1 1 1 1 2 ...
$ Education : int 0 1 2 0 3 1 2 0 1 1 ...
$ Income : int 8 0 13 7 3 10 5 13 8 5 ...

This is my output

Age -0.0069081 0.0007494 -9.218 < 0.0000000000000002 ***
BankCard 0.0506236 0.0295674 1.712 0.086870 .
Cat -0.0111615 0.0243875 -0.458 0.647187
Dog 0.0235260 0.0223738 1.051 0.293030
DwellingType -0.0620168 0.0335837 -1.847 0.064800 .
Education 0.0037874 0.0099639 0.380 0.703860
Income -0.0077887 0.0031929 -2.439 0.014712 *

BankCard, Cat, Dog and DwellingType have only two possible answers. But Age, Education and Income are multiple choice demographic questions that are categorical.
I am expecting Age to show as Age(1), Age(2), Age(3), etc. Same for Education and Income. What should be the variable type for these three variables?

Thanks for your prompt response,
William Cooper
 
#4
Re: R - Logistic regression. Coding to get all predictor variables for categorical va

R constructs the model based on the type of your variables. Since your variables are all of type integer, it models them as such. Note that this seems to be correct in the case of Age, which unlike you say, is coded not with categories, but with the actual value itself.

You need to convert your integers to factors. Before fitting your model, try adapting the data matrix like so:

Code:
modeldata$Education <- factor(modeldata$Education, labels = c("high school", "bachelor", "master"))
 

gianmarco

TS Contributor
#5
Re: R - Logistic regression. Coding to get all predictor variables for categorical va

Once you have properly prepared your data, you may want to use a R function I have put togheter, which allows to visually display the fitted model's results (i.e., betas and ORs). It also allows to plot some model's diagnostics.
The function is described here: http://cainarchaeology.weebly.com/r-function-for-binary-logistic-regression.html

Short video tutorial here:
[YOUTUBE]zv8vz8SVzaA[/YOUTUBE]

In the same site, a couple of functions are also available to perform LR internal validation.

Hope this helps.
Best
gm
 
#6
Re: R - Logistic regression. Coding to get all predictor variables for categorical va

Thanks to everyone for their help. Especially for the video, that's the best resource.