# How to know that two categorical variables interact

#### Ashiok

##### New Member
I have a problem that I'm trying to solve, but I can't find an immediate answer on the internet (probably because I don't know the correct terminology to search for).

Assume that you have a sample of 50 individuals from a population of aliens that you have no prior information whatsoever about.

You notice that, from these 50 individuals, there are 3 distinct characteristics:
• 15 have horns; 35 don't
• 32 have feathers; 18 don't
• 7 have tails; 43 don't
Now, you are interested in answering the following question: given that an alien presents a characteristic A, does it make it more or less likely that it will also present characteristic B? Or, in other words, if an alien has horns, does it make it more or less likely that it will also have feathers? Or a tail?

Assume that you know, for each of the 50 aliens, weather it has horns, feathers and a tail. With only the data that I presented, can I answer the question above? If so, which test should I use? Thanks.

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Cronbach Mantel Hanzel (sp?) test of homogeneity of odds ratios may be of interest.

you could also look at interaction terms in logistic regression.

you have a little dataset in both scenarios with risk of low statistical power.

#### Ashiok

##### New Member
Cronbach Mantel Hanzel (sp?) test of homogeneity of odds ratios may be of interest.

you could also look at interaction terms in logistic regression.

you have a little dataset in both scenarios with risk of low statistical power.
Hey, thanks for your answer! I'm sorry, I'm not well versed in statistics and I'm having some trouble in understanding how I would organize the data that I describe to apply the test that you suggested (Cronbach Mantel Hanzel). Could you clarify that if possible?

Also, according to the internet, it seems that one of the assumptions of this test is independence of observations, which I'm not sure it applies since the characteristics I'm looking at all come from the same set of subjects.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Naive bayes is another approach, perhaps. Correct, you need the actual dataset for calculations.

#### Ashiok

##### New Member
Naive bayes is another approach, perhaps. Correct, you need the actual dataset for calculations.
Hey! What if I organized the data in 2x2 contingency tables like:

Could I apply Fisher's exact test?

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Do you have record level data? That was unclear to me, given your description. So you have an observation for each person stating status of three variables? Also, what have you covered in class?

Also, did you come up with the word "interact" or was that a part of the question? It would be easiest if you just posted the exact question and set as posed to you.

#### Ashiok

##### New Member
Hey!

This isn't a homework assignment, I'm a masters student and this is one of the analysis that I included in my thesis. I used the alien example to better translate the problem that I'm dealing with, sorry if something wasn't clear. To answer your question: yes, I have for each alien the status of the three variables that I described.

I studied statistics in graduation, but it was only at a very very vey introductory level (my major is biology). I came up with the word interact because I thought that was the correct term to use here. Given all that, can I use the Fisher test as I described? Thanks for the attention.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Yes. The options I was proposing was for examining an interaction, e.g., simultaneous examination of three or more variables at the same time. Given your sample size, you should stick with Fisher's and not chi-square. You could also generate odds ratios with 95% confidence intervals. So you would end up with statements like:

Aliens with feathers have a non-significant 3.33 (95% CI: 0.63, 17.5) ) times greater odds of having horns compared to aliens without feathers.

P.S., If you had prospective data (so alien data prior to the collection of phenotypical presentation and then data on phenotypical presentations) you would use relative risks.

#### Ashiok

##### New Member
Thank you very much!

It was relatively simple to code that in MATLAB. The only issue is that, if you have too many categories, you have to build lots and lots of 2x2 tables. I am interested in trying to understand how you could examine simultaneously as you said, because I have a similar analysis that I wish I could run, but the number of categories is greater (12 or so categories, as if in the alien example you would also add 'having a third eye', 'having wings', etc.). If you have the time, could you show me how I would set up a table with multiple categories and which test I would use? Thank you again.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
A note, if you are making lots and lots of comparisons - you need to correct your alpha level (i.e., make it smaller than say 0.05) because of threat of false discovery (fishing until you find something)!