I have a data-set with 1 continuous independent variable and several categorical variables, how can I find the most important categorical variable?


I have a data-set which consists of 1 continuous (although it can be discrete if I choose to round up the values) dependent variable Y and several categorical and discrete data columns that may or may not have an effect on Y.
Y, in this case, is not normally distributed so to check if a given categorical data-column/independent variable has a significant effect on Y a Kruskal-Wallis test can be used.

However, my objective is to find out which categorical data-column is the most important one, that is has the most significant effect on Y.

Could someone please point me in the right direction, which statistical test/analysis is applicable here?


TS Contributor
Y, in this case, is not normally distributed
This is irrelevant for the general linear model (analysis of variance, linear regression). It is assumed that the residuals from the model are from a normal distribution. And even this is not important if sample size is large enough (n > 30 or so, cf. central limit theorem). How large is your sample size?

As to your other problem: what is the topic of your study, and what are your variables, what do they contain?

With kind regards

Last edited:
Hi Karabiner

I'll use a generic example (since I cannot share my data-set), let Y be the lifetime of a certain type of machine. A given machine has a lot of other information associated with it such as material, the factory that runs it, the conditions it runs in etc. These serve as the predictor variables of Y. Some of them are categorical and some assume numerical values.

There are 138 measurements.

The measurements are also not independent due to repetitions. You can have multiple measurements for the same machine.

All the best


TS Contributor
I do not fully understand your design, I'm afraid. For example,
The measurements are also not independent due to repetitions, what this means.
I hesitate to deal with generic examples instead of the true study, because many
times something important is missing.
Your sample size was again not stated.

Maybe someone else here has got a clue.

With kind regards