Relative impact

My first question 11 years ago and I still cycle back to it every few years. I have a series of dummy predictor variables. I have a two level DV. I am running logistic regression. What I want to do is determine what variable has the greatest impact on the DV. Because we want to know what predictor changed the DV the most controlling for the others. The dependent variable is overall satisfaction, the predictors are satisfaction with things like pay. Answers to this question tend to say either, relative impact can not be assessed with regression, or standardize your predictors and see which is larger. But a lot of analyst disagree with standardizing dummy variables.


Less is more. Stay pure. Stay poor.
If everything is binary (i.e., IVs and DV) - given your scenario I would use LASSO logistic regression. You may have some sparsity, but I would use half the data in the LASSO and then run a model with the selected features (IVs) in the second half of the data to get estimates.
Ok I know nothing of that, you mentioned it before. I don't care at all about the actual slopes which we will never use. I just want to know which has more impact.

Does lasso work when the predictors are correlated, as mine will be. That seems to be a major issue with relative impact although I don't know if it impacts LASSO or not.

You might be interested in this approach which I had never heard of before. I have to find out if you can do it in SAS. Or if I can learn the R.

ORM341993 767..781 (

It is called dominance analysis.


Less is more. Stay pure. Stay poor.
I have always done it in R previously, since it wasn't available in SAS. But via the below link it seems assessible in SAS now. I would do a data split and run the LASSO, then fit the selected model to the holdout set using normal logistic regression and look at the coefficient and SEs to make a decision. Which can be a little tricky since you can have a big coefficient and big SE or a little coefficient and little SE, etc. and which is more accurate. The LASSO's partial purpose is to circumnavigate collinearity. Another approach could be fitting a random forest and just looking at the variable importance list - but I think the latter is my choice.
I only want it to say which variables are more important, which it seems to do indirectly by getting rid of less important variables. It still won't tell me which are the better ones of those that remain, but there seems no agreed on way to do that with dummy variables which can not be standardized.

What do you think about the logic, for dummy predictors, of saying the ones that have the highest odds ratio are relatively more important after lasso.


Less is more. Stay pure. Stay poor.
If all the variables are binary - the issue is moot, since they are all formatted the same, correct?
They all are formatted the same and have the same scaling since they are all 0 and 1.

In that case, if I understand correctly if one of the predictors has a higher odds ratio than it has the greater impact than one that has a lower odds ratio. But I note that some disagree you can ever measure relative impact with regression, particularly if the predictors are correlated which they certainly will be in this case. Regression deals with this in the slopes by controlling for other variables, and this to me would seem to address that issue. But it is clear to me others disagree and this is a topic I can find little on in the literature.

Probably because the people writing the articles don't really care much about which has greater impact. :p They are theorist not practitioners.
Here is where, in practical terms logistic regression gets tricky. We ran a series of predictors which are 1 satisfied 0 unsatisfied. The DV is the same. We ran logistic regression.

The odds ratio for pay was about 19 meaning that people on average were far more likely to be overall satisfied if they were satisfied with pay compared to if they were not satisfied with pay controlling for about 30 other causes of satisfaction. But does that mean that pay satisfaction drove overall satisfaction or had more influence than other factors with lower odds ratios?

I don't know. That is the problem with categorical dummies to me. You know that one group is more or less than another group. But there is no reasonable way to know if they actually caused anything, particularly relative to other factors that influence satisfaction.

Other that experimental design, which is not possible for our organization, anyone know a way to address this?


Less is more. Stay pure. Stay poor.
Yeah this goes back to all models are wrong. You can't know the truth without having zero non-respondents and perfect accuracy in responses. So you just have to deal with it.
Well you can't be sure without random assignment. But since that is not possible in our cases I am not sure what the alternative is. But it is a major issue with impact.
Anyone here familiar with dominance analysis? It seems like the right way to go after relative impact.

One problem with relative impact is correlated variables (I assume they mean multicollinearity). I did a VIF test of the variables in my model and none were even 3 which suggests MC is not a big issue (which surprised me). Is that low enough to suggest you do not have to worry about correlation impacting the analysis.
A really interesting review of this topic. It is very discouraging to those like me who hope for a simple answer. Or really any answer.

Grömping2015_WiresCS_AcceptedVersion.pdf (

As I review this topic I find several things.

1) There are many approaches to doing this.
2) No one agrees on a way to do it.
3) No one agrees on which way works or does not work.
4) A lot of people don't think you should do it :)
Last edited: