A little clarification on logistic regression in SPSS

#1
Hi! I need some clarification.

I have a dichotomous outcome (not cancer/cancer) and I want to analyze the influence of 5 different variables (2 continous and 3 categorical) in predicting one of the outcomes (cancer). For that I ran a binomial logistic regression on SPSS and I got the following results:

- Wald test was statistically significant
- Omnibus model was also statistically significant
- Hosmer and Lemeshow was not statistically significant (which I learned is good)
- I got about 80% corret prediction in the second classification table (it was about 65% in the first classification table)

Up until this point my model seems good to predict the outcome right?

My problem is when I go to the last table (variables in the equation), none of my variables are significant. I ran the test with each variable isolated and two of them are significant, but when I run the test with all variables together they loose significance.
So is my model not good after all? Should I only run the test with my 2 significant variables? (I tried that and 1 of the 2 looses statistical significance). My sample size is 50, could this influence the test?


Bare with me that my statistics class didn't really go into details about logistic models, only regression in general and all I know is basically from watching youtube tutorials.

Any help is apreciated!
 
#4
Cancer and many other diseases are to serious to just talk superficially in a post or two.Do you have access to any statistician nearby? If so bring in her (him) in the discussion. Then we can all learn.

Yes, it is possible, and very common, that each one of the variables are not statistically significant, but that overall, taken together, that they are significant.
 
#5
Cancer and many other diseases are to serious to just talk superficially in a post or two.Do you have access to any statistician nearby? If so bring in her (him) in the discussion. Then we can all learn.

Yes, it is possible, and very common, that each one of the variables are not statistically significant, but that overall, taken together, that they are significant.
Thank you for taking the time to answer to my thread!

My advisor is very good in the field of cancer diagnosis, but not so much with more complex statistical analysis. I wanted to report to her what I found in my preliminary analysis with the logistic model on our upcoming metting on monday, so I posted here because I didn't really know how to report my results. Is it correct to report the Exp(B) values even if they are not significant and say that the overall model is significant like you said?

I'll seek help from a statistician in my research department next week and if it is of interest I can come back here with their answer.
 
#6
It's not the main part of my research, I just wanted to verify what is the influence of demographic data like habit of smoking, number of children, use of oral contraceptive, etc. in gynecological cancer (the patients are already diagnosed). These factors are known to have influence in the literature, I want to know what is the influence in my population of study.
 
Last edited:
#7
It happens quite often that when two explanatory variables, x1 and x2, are correlated, that each one of them are statistically significant in their influence on the dependent variable y (cancer/no cancer), but that when both of them are included, then none of them are formally significant, although taken together they have a significant influence on y. That is called "multicolinearity". (Of course there can be more explanatory variables like x1, x2, x3, x4, x5.)

One crude method to deal with that (when x1 and x2 are positively correlated) is to take the mean of the two variables, like x_new =(x1+x2)/2 and use x_new as the single explanatory variable.

An other and more sophisticated method is to combine the explanatory variables (say x1, x2, x3, x4, x5) into a single variable with the principal component analysis to the variable pc1 and use that as an explantory model.

(Other methods are ridge regression, LASSO or elastic net or pls regression. But let's not make too complicated.)