Is there a Correlation metric for Categorical vs Numerical features?

hlopes

New Member
I've been searching for some time for a correlation metric analogous to the Pearson correlation value for numerical vs numerical features, or Cramér's V for categorical vs categorical features, but this time for categorical vs numerical features.

This is my toy data example in Python, where the categorical variable is not ordinal and notice that the number of observations per class of the categorical feature is not the same:

pd.DataFrame({'numerical': np.array([19, 27, 31, 26, 39, 43, 32, 29, 19, 19, 27, 31]),
'categorical': np.array(['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C'])})

I've seen a lot of answers referring Interclass Correlation (but I don't have a square matrix and also I don't have subjects being analysed by several judges...). Also, I've seen that the use of one-way ANOVA is also frequent, but it does not solve the problem because it does not translate in a clear strength of association coefficient as Pearson.

Can you suggest a metric or it is impossible to have one for this case?

Karabiner

TS Contributor
ANOVA gives you R² as measure of strength of association.

With kind regards

K.

hlopes

New Member
Thanks for the suggestion Karabiner.

Unfortunately, I can't see a way to make the ANOVA give the R2. Are you sure? From what I understand it gives you F-statistic and a p-value, and nothing like a correlation value. I recall, I'm talking about a possible metric to calculate the strength of association (analogous to Pearson correlation coefficient) between a nominal/categorical variable with more than two unique values ('A', 'B', 'C, ...) and a continuous/numerical variable (e.g., Age or income).

Did I make myself clear with this explanation?

Last edited: