Combining multiple data sets in a single contingency table

I am wondering whether there is a test that one can do to decide whether it is legitimate to combine data from multiple data sets into a single contingency table. Let's say that I think that the sex ratio in offspring from crossing two mutant fruit flies is different than the one you get from crossing two wildtype (non-mutant) flies. I can make a 2 x 2 contingency table with categories "numbers of males versus females" and "mutant versus wildtype". But let's say that I have collected my progeny in 4 separate crosses carried out in different places and at different times. I'd like to combine all of my numbers into a single contingency table, but I don't know if there is reason to be suspicious that something is off about one or more of my crosses.

I have thought about ANOVA, but I think that that is not applicable to a situation where I have a binary phenotype like this (male versus female). Googling led me to the concept of "overdispersion", which I am having some trouble understanding and seems to me to be an overly complicated approach. My problem seems common enough that I would have thought that there was a straightforward answer to it, but maybe that's not true.




Active Member
It's not clear what you have done. You say 4 separate crosses - does that mean 4 different place/time experiments and you can make a valid 2x2 table for each of them? And you want to know if the four tables can they be combined?
Hi Kaxt,

I think it's what you say, but I'll try to clarify it to make sure.

I have four separate data sets, and each data set consists of something like this:

"A mutant cross yielded 401 males and 352 females, and a wildtype cross yielded 704 males and 680 females."

Clearly I can combine all four sets of numbers to generate a single contingency table, but it seems like there would be some test that people would run to tell you that the result you're getting isn't an artifact of something having been weird in one of your data sets. In the extreme, my numbers could have been:

1. mutant: 50 males and 50 females; wildtype: 50 males and 50 females
2. mutant: 50 males and 50 females; wildtype: 50 males and 50 females
3. mutant: 50 males and 50 females; wildtype: 50 males and 50 females
4. mutant: 500 males and 0 females; wildtype: 50 males and 50 females

If I combine all of those into a single contingency table, I'll find that the mutant cross gives more males than females, but clearly it's just because there's something funny about the fourth data set. It seems to me like there will be situations where you want to combine several data sets to increase your statistical power, but you want some sort of test that says, "It's OK that I combined them all, because none of these data sets looks like an outlier."

Does that make sense?


Hi Kat,

Yes, I'd love to get the workshop activity. I assume that I don't need to post my email address and that you can just contact me via TalkStats. If that isn't true, please let me know, and I'll post my email here.

Thanks so much for the help. I appreciate it.