ANOVA for Sampling bias detection

#1
I hope that this is the right place for such a question.

What test to use to detect if the selected sample has a bias or not?

We have a protein that naturally binds with certain short known DNA sequences (let's say 5 nucleotides each). what I want to show, is that from all possible 5-nucleotides sequences in nature, this protein selected these ones because of how often these occur in gene coding regions.

So what I did is count how many times each 5-nucleotide sequence appears in all genes, then compared the mean of all of them to the mean of the one used by the protein to suggest that the protein favored these because of how often they appeared. and the difference is significant.

I have used ANOVA (specifically Welch ANOVA, as the sample and the population are unbalanced. and also the variances are different. Population variance is 15.0842 and the sample variance is 181.3176). P-value was extremely small which means rejection of the null hypothesis of equal means. is this correct? or should I look into a different measure?

so is my approach correct or not?

Thank you for your input and time.
 
Last edited:

fed2

Active Member
#2
why ANOVA, seems like you just got two populations, protein genes, and non-protein genes? so it would be t-test wouldn't it?
 
#3
why ANOVA, seems like you just got two populations, protein genes, and non-protein genes? so it would be a t-test wouldn't it?
thank you! for your answer and suggestion.
True, but what I am really interested in knowing is can I test a sample vs the population? I have used ANOVA and Student's t-test before, but it has always been with sample vs sample (three or more samples when using ANOVA). but now I need to test whether a sample is being selected randomly from a population, or in other words, is there a preference for this protein when it comes to selecting sequences, is it biased. what kind of test would I run in such a case? that was my question! sorry if I was not clear.