Outliers in categorical data?


I am working on a pre-analysis plan and have to specify what I am going to do with outliers. I have two categorical variables (5 levels and 2 levels) and I will be performing a chi-square test for independence.

I thought of using a boxplot to detect outliers, but now I am not sure if it is even possible to have outliers in categorical data. You have such a small range, so a lot of variation in the data won't be possible. The only outlier I could think of is wrong data (data which falls outside the possible range due mistakes). I have looked online and in my statistic books, but was unable to find a solution, so I really hope someone here can help me out.

To summarize, is it possible to have outliers in categorical data and if yes, how do I detect them?

Thank you so much for your time and have a nice day!:D


No cake for spunky
I have never heard it suggested that you can not have outliers in categorical data. The Tukey boxplot supposedly makes no distributional assumption (which is likely wrong since it is influenced by skew). The problem with outliers is that most or all test make assumptions about the distribution of the data so if your data. If the distribution does not match that assumption your results will likely be wrong.

In other words if you use a common test for outliers that assumes normality, and many do, your results will be wrong if the data is not in fact normal.


Less is more. Stay pure. Stay poor.
You can examine residuals in chisq, but I don't think they would be referred as outliers or that term translates over.