Multiple comparisons vs multiple pairwise t tests

#1
Hi all,

I am comparing the mRNA expression of multiple genes between 2 groups from one microarray dataset (n = 338) that has been normalized and put into log2 form. Admittedly not super familiar with biostats, but my approach was to just do pairwise comparisons for each gene. However, I am wondering if this is correct or if I need to use a multiple comparisons approach since I will end up looking at 21 genes or so. Any guidance and explanations of when to use multiple comparisons for analyzing mRNA expression would be much appreciated!

Thanks!
 

katxt

Well-Known Member
#3
I think hlsmith has a good idea.
More false positives generally means fewer false negatives, so to some extent what you do depends on the relative cost of false positives and false negatives. Do you mind exploring some blind alleys if it means that you might find something special you might have otherwise missed? q values formalize this (sort of.)
 
#4
I think hlsmith has a good idea.
More false positives generally means fewer false negatives, so to some extent what you do depends on the relative cost of false positives and false negatives. Do you mind exploring some blind alleys if it means that you might find something special you might have otherwise missed? q values formalize this (sort of.)
Thank you both! So you would do multiple comparisons instead of head to head single t tests even with a limited number of genes?
 

katxt

Well-Known Member
#5
even with a limited number of genes
Does this mean that you have already chosen about 21 genes that you think are likely to be different and you are only going to test them? And you anticipate that a good proportion of that 21 will have p values less than 0.05?
 
#6
Does this mean that you have already chosen about 21 genes that you think are likely to be different and you are only going to test them? And you anticipate that a good proportion of that 21 will have p values less than 0.05?
Yes! I’ve chosen a family of receptors and am using a publicly available microarray dataset to look at differences in expression
 

katxt

Well-Known Member
#7
The problem of multiple testing and false positives is one that statisticians haven't solved. Realistically there isn't any reliable way to distinguish between real and false positives. What you say depends on who the report is for. This is an approach I have taken in the past.
Explain the multiple p value problem in your report. Do all the t tests. Report the p values as they are.
Then say something along these lines and let the reader decide.
p values < 0.01 indicate that there is very likely a difference.
p values between 0.01 and 0.05 show that there probably is a difference but we really need more work to establish this with suitable confidence.
For p values >0.05 say there may well be a difference but if there is one it is too small to be detected by this data.
To me, this is more honest and useful than trying to decide where to cut the significance off.
 
#8
The problem of multiple testing and false positives is one that statisticians haven't solved. Realistically there isn't any reliable way to distinguish between real and false positives. What you say depends on who the report is for. This is an approach I have taken in the past.
Explain the multiple p value problem in your report. Do all the t tests. Report the p values as they are.
Then say something along these lines and let the reader decide.
p values < 0.01 indicate that there is very likely a difference.
p values between 0.01 and 0.05 show that there probably is a difference but we really need more work to establish this with suitable confidence.
For p values >0.05 say there may well be a difference but if there is one it is too small to be detected by this data.
To me, this is more honest and useful than trying to decide where to cut the significance off.
Thank you so much, this is very helpful!!