# Which method shall I use?

#### AJLRB

##### New Member
Hello,

I am working on my first paper and I feel quite lost with the statistical method. Hope you can help me!

I am comparing two samples from different populations (i.e. sample 1, n=478 and sample 2 (n = 502)) with 10 categories each (i.e. category A, B, C...) to see if there is a significant difference between both samples. The date are shown below. For such purpose I used a chi-squared test and I obtained a P value < 0.05, which, as far as I know, means that both samples are significantly different from each other. However, by looking at the data, it seems clear that there are just some groups that really differ from each other. I would like to know if:
1. Chi squared was the right test or I'd rather use another one.
2. In the case chi-squared was the right test, is there any other method to highlight if there are significant differences in only some of the categories studied.
3. There is a quite important number of common subjects in both groups (142 in total). Is there anything I should add or take into account to avoid any bias because of this?

Here I add the data in case they are useful:

Categories: A B C D E F G H I J
Sample 1) 42 68 68 38 73 31 32 37 53 36
Sample 2) 50 70 83 65 41 25 31 40 58 39

#### NellyB

##### New Member
I don't believe the chi-square was correct.

If I am understanding you correctly then the data in Categories A, B, C...are a within subjects. Have you considered conducting a Mixed Factorial ANOVA? The Samples would be your between subjects variable.

I'm not sure what this statement means: "There is a quite important number of common subjects in both groups (142 in total). Is there anything I should add or take into account to avoid any bias because of this?". Can you please clarify this?

#### AJLRB

##### New Member
I don't believe the chi-square was correct.

If I am understanding you correctly then the data in Categories A, B, C...are a within subjects. Have you considered conducting a Mixed Factorial ANOVA? The Samples would be your between subjects variable.

I'm not sure what this statement means: "There is a quite important number of common subjects in both groups (142 in total). Is there anything I should add or take into account to avoid any bias because of this?". Can you please clarify this?
Thanks for your answer, and sorry for not expressing myself correctly. I'll try to clarify what I meant.

I am working with two lists of words that were obtained from 2 different corpus of texts. The first list has 478 words and the second list has 502 words. In my analysis I have to classify each word from each list into 1 of 10 syntactic categories (let's say "noun, adjective, verb, etc."). The purpose is to compare the lists in terms of syntactic categories, so I want to know to what extent the lists differ from each other. I thought that by using the chi-squared test I could see this (so a p<0.05 means that the lists are significantly different from each other), but in this case I would not know if the "real" difference is, for instance, only between the "verbs" or "nouns" categories. Should I use ANOVA to find out this?

Furthermore, there are 142 common words between the lists, and I don't know how much this could affect the analysis (maybe they are not distributed homogeneously among the categories). Is there any way to solve this?

Thanks again, you are really helpful!

#### GretaGarbo

##### Human
I can not see anything incorrect in doing a chi-squared test on these data:

Categories: A B C D E F G H I J
Sample 1) 42 68 68 38 73 31 32 37 53 36
Sample 2) 50 70 83 65 41 25 31 40 58 39
But it would just do an overall test if the column and rows are statistically independent or not.

An other possibility is to do a Poisson anova, where the dependent variable (the number of words) is Poisson distributed and categories and samples are explanatory main effects. That would give which categories deviates from each other (that is obvious from the table) but it wold also give significance tests for these.

#### NellyB

##### New Member
Now that I understand the data better, I agree with you. A chi-square sounds right.