# All Chi sq residuals significant

#### Chisq23

##### New Member
Hi everyone,
today I was doing a chi sq that resulted to be significant (p<0.05), but when I plotted the residuals they where all >1.96 or <-1.96.
I'm going to check the data again since I worry there might be something wrong in my file but I was just wondering if this is a possible scenario and if so how would you interpret the residuals?

#### Karabiner

##### TS Contributor
I guess you mean the standardized residuals.

If the Chi² test is statistically significant, then I would expect that one or more cells show
a "significantly" decreased (or increased, respectively) cell frequency, compared with the
frequency expected under the null hypothesis. Such a decrease or increase is indicated by
a standardized cell residual < -1.96 or > 1.96.

With kind regards

Karabiner

#### Chisq23

##### New Member
I guess you mean the standardized residuals.

If the Chi² test is statistically significant, then I would expect that one or more cells show
a "significantly" decreased (or increased, respectively) cell frequency, compared with the
frequency expected under the null hypothesis. Such a decrease or increase is indicated by
a standardized cell residual < -1.96 or > 1.96.

With kind regards

Karabiner
Hi Karabiner

yes, I mean the standardised residuals. I was also expecting some results higher than 1.96 or lower than -1.96. What really confused me is that every cell has one of these values. How can I know in what cell(s) is significant, if all of them are? Hope it makes sense

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Well all of the groups differ, I could imagine that could happen. Say you are looking at gender distributions between two groups. Well if one group had 25% men and 75% women and the comparison group had 75% men and 25% men with equal group sizes, I would think the two residuals may be large and comparable, right?

Hayden

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I just ran my example with group sizes = 100 and the absolute residuals values were 7.0711 for all cells. This is a simplified version based on 2x2 table, since if you increase one group the other has to go decrease and vice versa.

#### gianmarco

##### TS Contributor
Hi Karabiner

yes, I mean the standardised residuals. I was also expecting some results higher than 1.96 or lower than -1.96. What really confused me is that every cell has one of these values. How can I know in what cell(s) is significant, if all of them are? Hope it makes sense
That means that all the cells significantly contribute to the departure from the Null Hypothesis (in either direction; i.e., either featuring counts significantly lower or higher than expected).

Just out of curiosity: how large is your table and what is your table's grand total? If you want to visually represent your table and the association between row and column categories, yu may want to consider Correspondence Analysis.

Below is an example of the above technique to analyse association between grams of tabacco smoked and some diseases (toy dataset from my CAinterprTools R package).

Best
Gm

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
I saw you have posted on this @gianmarco - but just looking at the above figure it has no intuitive meaning to me. Can you explain what I am looking for?

Thanks.

#### gianmarco

##### TS Contributor
@hlsmith, thanks for the question, which is not so easy to asnwer in few words.

CA is a dimesionality reduction technique used for cross-tabulation.
Essentially, it allows you to display the deviation from independence by displaying the rows and columns of a cross-tab in a low-dimensional space. The spread of point out from the origin is related to the amount of variability in the table (i.e., the "discrepancy" between expected and observed counts).

By and large, row points (diseases) close together feature a similar proportion of column categories (and viceversa). Again by and large, the chart indicates an opposition between diseases more "related" to the "none" category (right-hand side) and diseases that are more "related" to the highest dose of tabacco (left-hand side). As you can see, the first (horizontal) dimension is capturing 80% of the data variability.

Think of CA as a sort of PCA tailored for categorical data.

Hope this helps
Best
Gm

p.s.
a better description in literature (of course) and in my website (link alrteady provided)