Logistic regression

#1
Hi guys,

I am a little noob in statistics and I need your help to run a logistic regression in SPSS for the data that I have.

I have a dichotomous outcome: final vision 20/60 or better and final vision <20/60 vision.
The other variables that I want to fit in the model is:
Age: scale
Sex: Male, female
Surgery, cataract, glaucoma, vitrectomy, injections.
pain: no, yes
baseline vision: LP, CF, NLP, etc (these are not scales but categorical too.
Please see the attached picture Untitled.png

how to fit these variables in a regression model. Please let me know!

Thanks for your help.
 
Last edited:

hlsmith

Omega Contributor
#2
You lol them in and run the model, it looks like most of the independent variables are categorical. What is your sample size?
 
#3
You lol them in and run the model, it looks like most of the independent variables are categorical. What is your sample size?
My sample size is 250. Also would the sample size affect how I do my regression model ?

I can put them all in the model but my question is these categorical variables the have more than 2 categories, should I create a dummy variables for them and all minus one in the model of just put them as is and mark them as categorical in the model itself?

Thanks
 
#4
My sample size is 250. Also would the sample size affect how I do my regression model ?

I can put them all in the model but my question is these categorical variables the have more than 2 categories, should I create a dummy variables for them and all minus one in the model of just put them as is and mark them as categorical in the model itself?

Thanks
Just because the software allows you to "fit them all in the model" doesn't mean it's the right thing to put them all in the model. Having more parameters estimated relative to fewer at a given sample size runs risks of overfitting, inappropriate parameter estimates, and other problems. Sample size is very important and the total sample size is less important in this case than the group size you have for the smaller group (if you have 5 events and 245 non events, this is far worse, generally speaking, than having 50 events and 50 non events (smaller total sample size)). The more terms you estimate in the model, the more degrees of freedom you eat up.
 

j58

Active Member
#5
@Nooooooby - You can create a set of k-1 dummy variables for each k-category independent variable; however, it is common for statistical software to do this for you. You should look at the documentation for the logistic regression program in SPSS, and see if there is an option to enter categorical variables as factors.

I wonder, though, why you've dichotomized a perfectly good continuous outcome, like visual acuity.
 
Last edited:
#6
I wonder, though, why you've dichotomized a perfectly good continuous outcome, like visual acuity.
I'm unsure if the OP is a similar background/mindset, but my experience is that medical researchers do this a lot because they want a "firm" guideline to say if X then Y, otherwise do Z. Blood pressure guidelines, abnormal lab values, many other things where medicine dichotomizes so they can make practice guidelines (not saying this is correct, as often it is arbitrary).
 

j58

Active Member
#7
I'm unsure if the OP is a similar background/mindset, but my experience is that medical researchers do this a lot because they want a "firm" guideline to say if X then Y, otherwise do Z. Blood pressure guidelines, abnormal lab values, many other things where medicine dichotomizes so they can make practice guidelines (not saying this is correct, as often it is arbitrary).
Yes, I know. Doctors are notorious dichotomaniacs. But, in my experience, if you explain to them why they shouldn't do that, they listen. Dichotomizing a continuous outcome results in loss of statistical power and obfuscates the true functional relationship between the treatment/exposure and the outcome. When the outcome is continuous, the analysis should always be done on the continuous variable. Once the relationship between the treatment/exposure and the outcome is understood, cutpoints for treatment decisions can be decided on.
 

hlsmith

Omega Contributor
#8
I agree with all the above comments, in addition - which wasn't directly mentioned, you would want to make sure there wasn't a non-linear relationship between a continuous variable and the outcome. I think this gets overlooked even by people that are using the variables in its continuous format.