I have a simple (yet to me still trivial) problem to submit to you.

I have a dataset of a group of patients affected by a disease, for which the presence of several genes mutations was inferred.

Each gene is a variable with either 0 for negative and 1 for positive.

I need to assess the presence of associations between these genes, to establish whether some tend to be co-mutated while some other tend to be mutually exclusive. For doing this, I first analyzed all the genes possible combinations in 2x2 contingency tables such as:

in this case for example, the p value is very significant, so I thought it could be useful to compute the OR to establish a relationship. Here, for example, the OR obtained from the formula (OR=A x D/ C x B) is 0.53, hence it should mean that the two genes tend to be more in opposite directions (0-1 or 1-0) compared to same directions (0-0 or 1-1). However, my concern is that in this way it is not clear whether the two genes have a positive or negative correlation. Should I just compare the double positive (1-1) against the total of discordant cases (0-1 and 1-0)? In this case it would be 69/(131+428)=69/559=0,12. Is it useful?

However, each gene has a different % of mutation within the population, so for example gene 1 here has a .18 probability of being mutated whilst gene 2 has a .46 probability. Should I take this into account?

I played around and tried to see how these 4 combinations would look like if they were only due to the each of the two genes expected mutation frequencies, so something like that came out:

final numbers are the same, but if you look at it, numbers are ridistributed according to the expected frequencies (ie: total no of mut gene 1 cases is 195/1059=0.18 which is the expected mut frequency of the gene). I then computed another OR for these numbers (12.76) and compared it with the previous one using Tarone´s test of homogeneity between the two tables (in this case, p-value is significant).

From the simple division of each category from the "real life" table / the "expected frequencies" table I obtained a ratio (ie: 0/0 ratio=431/542=0.79, there are less double negative than expected). Do you think this is a correct reasoning? If so, should i use the 1/1 ratio to know if the relation is positive or negative (in this case 69/172=0,4, there are less double positive than expected so the genes are inversely correlated)?

I thank you in advance and look forward for your help!

Best,

Luca