# Chi-square influential categories

#### bioniclime

##### New Member
I have counts of events in a number of categories and an a priori expected frequency for those events. Clearly, this is a standard chi-squared.

However, I am looking for a test that will tell me whether the chi square test would still be significant if I eliminated certain categories from consideration. For instance, if I removed the category with the highest (O-E)^2/E, would the chi squared still be significant? What if I removed the highest and then the second highest, and so forth?

Essentially, I want the question answered, "Which of the categories actually make the chi-squared test significant?" I hope that makes sense...

If there is no such test, then it seems like a simple solution would be to do it iteratively, as I described above -- remove the most influential category remaining, and then, re-run the chi-squared test on the remaining categories. When the chi-squared becomes non-significant, then the remaining categores can be considered non-influential.

As an example, suppose there were five categories, with a priori expected probabilities of {0.2, 0.2, 0.2, 0.2, 0.2}. The observed number of events that we get is {10, 10, 10, 34, 36}. Suppose this comes out as a significant chi-squared. Since the "36" is the most influential observation (i.e., (O-E)^2/E is the greatest), then we would mark it as a "significant influencer" (SI) and eliminate it and go to the next step. The a priori expected values change to {0.25, 0.25, 0.25, 0.25} and the observed variables are {10, 10, 10, 34}. Let's suppose that this comes out with a significant chi-squared test -- we would then mark the "34" as a SI, and eliminate it. The a priori probabilities would then go to {0.333, 0.333, 0.333} and the observed categories would go to {10, 10, 10}. This is obviously non-significant, and thus, the remaining observations would not be SIs.

While this seems this would get at what I'm interested in, I'm not sure how statistically kosher this is.

Any input would be appreciated. Thank you.

#### gianmarco

##### TS Contributor
Hello,
why do you need such an iterative procedure?
If you want to answer the question Which of the categories actually make the chi-squared test significant?, why don't you inspect the table of standardized residuals as a sort of follow-up step after chi-square?

cheers
Gm

#### bioniclime

##### New Member
Thank you... I knew there was something like that, but I'd never used it before.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
GM, thank as well.

I had forgotten about this. I was going to propose, which may or may not be quicker or more exact, that they run all of the 2x2 tables and see which had the biggest test statistic. An dI would guess anything > 1.96**2 woulld be of interest.