Logistic regression: trouble finding out which variables are significant

#1
Hi,

I'm using logistic regression to analyse a dataset on managerial decision making. Managers have made decisions to fire 2 employees out of a group of 8 from a fictional company used in the research setup. This data is coded binominally (0=chosen to retain, 1 = fired). To find out what variables have an impact on the fictional employees getting fired, I gathered data on their opinions using 7-point likert scale questions (i.e. academic background is important, 1-7), and ran binominal logistic regressions using SPSS with a number of these kind of variables on the decision of firing a certain employee or not.

The problem I'm having is that in the complete model, some variables are not even close to significant (with values in the sig. column of more than 0,500). However, when I run logistic regressions with just one variable at a time, in some cases they are significant, with values in the sig. column of <0,05. The omnibus chi-square is sometimes not significant for these models, other times it is, I'm not sure how important this is for the relevance of the variable.

I've tried looking at correlations between variables, but in some cases a variable that doesn't correlate with any of the other variables will become significant if tested individually.

Now, I'm not sure how to find out or interpret which of my variables are of influence in the decision whether or not to fire a certain member, and which aren't.

Also, I don't know how to interpret the unstandardized B-values and the Exp(B) values, because testing a variable individually results in a different B-value from those found when I include all the variables in the model. This means I'm not sure what to report on the actual effect (Exp(B)) of these variables on the decision.

Thank you for any and all help! :)
 

noetsi

No cake for spunky
#3
The problem I'm having is that in the complete model, some variables are not even close to significant (with values in the sig. column of more than 0,500). However, when I run logistic regressions with just one variable at a time, in some cases they are significant, with values in the sig. column of <0,05.
This is normal in regression. The univariate relationship between a single predictor and the response variable often has nothing at all to do with the marginal effect (i.e., the slope)of a predictor when it is in a group of predictors. That is the whole reason you do multiple regression, to see the impact of a variable controlling for other variables. It really does not matter what a univariate relationship is, because much of the impact of a single variable may actually be explained (or duplicated) by other variables. Personnally I don't think univariate analysis has much value other than perhaps as a diagnostic to catch outliers etc.

The one thing to be careful about is multicollinearity. Run a VIF or tolerance test.
 
#4
Thanks noetsi. If I understand correctly I should just look at the variables that are significant in the overall model then and ignore significance of a single predictor? And of course check for multicollinearity :)

@ hlsmith, I have 7 covariates in the complete model, dependent variables have 78 observations each so although it's not a big sample I think it should be enough, right?

In the meantime I've done a factor analysis with which I could also reduce the variables to 4 main factors, this might give the model a better fit. Any tips on how I could calculate the new variables? Just take take the average of the scores of all variables that go into one factor?
 

noetsi

No cake for spunky
#5
You ignore the univariate responses, only the signficance of each variable in the overall model matters. But, and this is critical, if you have multicolinearity the significance of the individual predictors will be inaccurate. It is possible to have a signficant model and have no signficant predictors in that case. In linear regression heteroskedacity can also distort your signficance test, but this is not assumed in logistic regression.

In the meantime I've done a factor analysis with which I could also reduce the variables to 4 main factors, this might give the model a better fit. Any tips on how I could calculate the new variables? Just take take the average of the scores of all variables that go into one factor?
If you have multicolinearity this may eliminate the problem (at least if you did orthoganal loadings). There are many ways to create the scores of the new variables from the factors. I just added (not averaged) the scores for the variables and my instructor said that was acceptable. But this is an area you might want to read an article on before you do it.
 
#6
Ok thanks,

So I ran linear regressions to test for multicollinearity, it doesn't appear to be a problem, the lowest tolerance value (1-Rsquare) I found was 0,390 (VIF = approx. 2,6). If you take =< 0,25 for tolerance and => 4 for VIF as critical values all appears to be fine.

However I am still experiencing some problems. Just ran logistic regression using 6 of 7 variables in block 1, then entered the 7th (let's call it variable C) to be added in block 2. In block 1, variable A and B were significant, in block 2 (with all variables), only variable A was significant. Doing the same thing but now swapping variable B and C, I got a block 1 model (without variable B) with variable A and C significant, and block 2 again with just variable A significant.

I first suspected the cause would be multicollinearity (I did this before running the multicollinearity tests), but there is no proof of multicollinearity. The variables causing me this headache only have a pearson correlation of 0,597. They did however 'merge' into the same factor when I did the factor analysis. But all in all, I'm not sure what to make of this..
 

noetsi

No cake for spunky
#7
To me it looks like you are doing stepwise regression and I strongly disagree with that approach (as do a lot of researchers a lot smarter than I).:p Even without multicolinearity, chance and bivariate collinearity between variables can signficantly distort the results. So too can the order you add the variables, which looks like what occured. Moreover, stepwise is not robust in that the purely empirical results that drive it can vary by sample and will (or can) totally change the results were you to take another sample. .6 seems a reasonably high correlation to me and the fact that they loaded on the same factor suggests further these variables are related. And that is always a major problem with stepwise if you included one of these variables in and left the other out.

In short I would not enter the variables in blocks, I never do, unless you are doing so based on some theory of relative importance (what some call hiearchical regression which is totally different than stepwise). If that is what is occuring it looks like your theory is not working:)
 
#8
Thanks again for the feedback!

I'm actually aiming to avoid stepwise regression, mostly because it complicates matters and I am neither an experienced researcher nor experienced statistician :p But when this came up I was curious what was going on. Further analysis shows that the reduced model based on the factor analysis works equally well, though not necessarily better (not much has changed in terms of significant results and pseudo Rsquare values are still roughly the same). When I made an interaction variable between the variables that seemed to me to be acting weird, I found it to be significant, which leads me to conclude that for respondents scoring high on both there is an impact, but neither have a significant impact on their own. This also explains that they could be significant in a model where the other was excluded.

In any case, this chat has definitely helped me to better understand the idea of what I am doing in terms of statistics, so I hope I'll be able to work ahead from here :)