How to perform several univariate analysis (binary logistic regression)

#1
I have a simple question about SPSS.
I have a dataset in which I want to identify variables (among 154) which may be associated with my dependent variable (binary).
To present the results, peer-review want univariate analysis for all 154 of them. Do I have to do a bivariate binary logistic regression of variable A on dependent variable, then variable B etc... 154 times.
Or is there a way to have the results for all of them (but not taking into account they are covariates?). I hope I am clear enough.
Thank you in advance.

remark:
I tested GLM to estimate the beta coefficient with 1 variable, but it is not the same result as when I did the binary logistic regression analysis. So I am at a loss.
 
Last edited:

gianmarco

TS Contributor
#2
Hello,
I am not a statistician, rather a statistics user. So, I will try to help you on the basis of what I have experimented in my own experience.

First of all, I read somewhere that selecting predictors via several univariate tests is not sound, even tough I have seen that approach used very often in many PhD dissertations.
On the other hand, using 154 predictors may prove practically difficult, and could pose issues if you have a sample size not fit to that huge body of predictors. Some form of stepwise predictors selection could be used, even tough many people here would advise against stepwise procedures. I have read some literature about Logistic Regression, and scholars have mixed opinions about stepwise methods. Should you use them, you have to provide strong supporting motivations (the principle of parsimony, for instance).

Unfortunately, I cannot help you in using SPSS, even tough I believe that there must be a facility to perform stepwise procedures.

Should you be familiar with R, or willing to use R, performing stepwise logistic regression would be quite simple. In particular, I happen to use a package (bootStepAIC) which implements a bootstrap stepwise approach which is mainly based on this work (which has been used in published literature, in different research fields):
Austin, P C, and J V Tu 2004 Statistical Bootstrap Methods Practice for Developing Predictive Models. The American Statistician 58(2): 131–137.

The rationale of the procedure is (quoting from the article):
One would expect that variables that truly were independent predictors of the outcome would be identified as predictors in a majority of the bootstrap samples, whereas noise variables would be identified as predictors in only a minority
The procedure allows to gauge
the posterior probability of each variable being included in the model.
Finally, should you need to validate your model in R (i.e., to assess to what extent your model is able to generalize outside the traning dataset), I have put together a couple of R functions to perform internal validation of binary logistic regression models. The functions are described in my website (in the page 'Other tools for statistics'):
http://cainarchaeology.weebly.com/


Hope this helps,
gm
 
#3
Thank you for your reply.
I know univariate analysis is not meaningful, but still, I need it to prove that the factors I put in the multivariate model are coherent.
Hence, I need to have the unadjusted OR for each covariate...

Unfortunately, I am not a R user afficionado that is why I was asking for a solution on SPSS :(
 

gianmarco

TS Contributor
#4
Hello,
maybe my reply was not so clear.
Regardless of the fact that you use R or anything else, my point was that performing 154 preliminary tests to spot which predictor can be deemed as having an impact on your dependent variable is not sound.
I guess that what your reviewer is actually suggesting is not to perform 154 separate Logistic Regression models, rather to perform 154 separate tests: for example, if you have a continous predictor (say, blood pressure) as one of your 154 predictors, you could perform a t-test with your dependent variable as grouping variable (i.e., factor) to see if there is a significant difference in blood pressure between the two factors of the dependent variable. This should be repeated for the remaining 153 predictors.

You could reply to the reviewer providing grounds to change the suggested approach (which is not totally sound), and to use whatever stepwise methods SPSS allows you to perform. BUT, you have to find a reason to justify this, and to back it up by literature.

Gm