Logistic regression - multiple model or few bivariate models

#1
Hi,

I want to assess the strength of dependence between several risk factors and a dependent variable. The dependent variable is typo 0/1 and the risk factors are eg gender or education category.

I would like to ask what is the difference between the two approaches:

a) I use logistic regression to calculate B, p, OR and 95% CL for each factor separately. For example, a separate analysis for the sexes and a separate analysis for the category of education.
b) I add several factors to one regression analysis. For example, as factors 1 sex, as a factor 2 education category

When I compare the results, I get slightly different OR and much different p. What are the differences?
 
#3
Thank you. So if I am only interested in assessing the impact of specific risk factors on the dependent variable and not building a multi-factorial model, it is enough to carry out several analyzes (approach A)? Should I then correct p due to repeated testing?
 

hlsmith

Not a robit
#4
If you have a model and two predictors - put them both in the model at the same time, unless your sample is freakishly small. Both predictors could be explaining the same phenomenon and entering them separately masks this possibility. Why are yo adverse to option B?
 
#5
If you have a model and two predictors - put them both in the model at the same time, unless your sample is freakishly small. Both predictors could be explaining the same phenomenon and entering them separately masks this possibility. Why are yo adverse to option B?
I have many more factors (around 12, the number of levels varies from 2 to 4). In principle, I have no preference for using any option, I would just like to know if both approaches are statistically correct. The more that I am not interested in entanglement between particular variables

If I could additionally ask:
I found at least a few examples in the scientific literature in my field (psychiatry) where, when analyzing similar data, the chi square was calculated first, the p values were then reported, then, either multivariate or bivariate logistic regression of the same variables, reporting only OR and 95Cl . Is there any justification for such approach? It seems to me that when using regression does not make sense to carry out a chi square
 
Last edited:

hlsmith

Not a robit
#6
Correct there isnt a need this day and age to do this, it is an artifact back to when software and calculations werent easily calculate. You can look up Table 2 fallacy. Given this you shouldnt also just dump a bunch of terms in a saturated model.
 
#7
Correct there isnt a need this day and age to do this, it is an artifact back to when software and calculations werent easily calculate. You can look up Table 2 fallacy. Given this you shouldnt also just dump a bunch of terms in a saturated model.
Thank you for your help, now I am getting to understand it better.

I would be very grateful if you would assess whether this approach would be appropriate:

I am studying the influence of various factors on the occurrence of depression. Factors that I take into consideration are: Age, Gender, Year of Study, Living, Financial Status, alcohol use, nicotine use, being in a relationship, and average grades.

I would include Age, Sex and the year of study in the first logistic regression model. I would check multicollinearity between Age and Gender, Age and year of study, and Gender and year of study using variance inflation factors. I would note OR, 95% CL and p of these 3 variables.

For the second model, I would include variables from model 1 and one more factor from the list, again checked the VIF between the new factor and the others. Again, I would note OR, 95% CL and p of these new variable.

And so on with subsequent models until the variables are exhausted.
 
Last edited:

hlsmith

Not a robit
#8
No need to do the process over and over, if you have rationale to include the variables, just dump them into a single model and examine VIF. Also you don't need to report pvalues, just estimates with 95% CI's. If someone tells you differently, tell the to **** off!
 
#9
Thank you very much for your help.
I would just like to make sure that I understand correctly the idea of counting VIF in this case. (I am using SPSS) so I should add all variables to one linear regression model (as the dependent variable I set the presence of depression), I calculate the VIF for each independent variable. Will this VIF be a 'test' of multicollinearity between the this independent variable and the rest of the independent variables in this model?

Once again, thanks a lot.
 

hlsmith

Not a robit
#10
Not an SPPSS person, but the process is pretty simple. You typically put all of the variables into a linear model, just like you would for a logistic model (bunch of IVs and a single DV) Then you will likely click on the option for it to calculate either the VIF or tolerance statistic (they both say the same thing). Then review the output.