Interaction effects, distributions not normal- is ANOVA justified??


I have a 2*4*6 factorial design. the sample distributions in most cases are NOT normal (tested with Kolmogorov-Smirnof test). It is recommended to use non-parametric tests in such cases but my question is this: What test should I use to test for interaction effects?

Thank you!


TS Contributor
Remember that the important thing is not whether the raw data are normally distributed. Rather, the important thing is whether the residuals of the model are normally distributed. So, the first thing you need to examine the distribution of your response variable--not your independent variables--. Once getting that normally distributed (though a link function or, less preferably, a data transformation), you should be able to tell what your next step should be. I would not go the non-parametric route yet!
Ok that's something I want to hear. But, now I'm confused. what I did and thought I was supposed to do was test if, for exemple, both of the 2 independant groups had their values on a dependant variable normaly distributed. So what is the difference between that and exemening the distribution of my response variable. And isn't the distribution of residuals something different from both those things?
Ok I tried what you said. I used linear regression to get the standardized residuals and than I use the Kolmogorov-Smirnoff test to see if the distributions of residuals in each group are different from normal. And I still got the same results as before. Most of the distributions are not normal. I don't really know how to do a link function or data tranformation to get normal distributions. Could you please explain what my next step should be? Or do you have any suggestions about what procedure other than ANOVA I could use to test the interaction effects?
The sample sizes are between 50 and 100. Only one group has n=210. I should also mention that the variances are equal.

And most of my Q-Q plots look like the one on the image I attached.
Last edited:
You have a design with 2*4*6 = 48 cells.

How many observations do you have in total and in each the cells. I suggest you give a few values of “n” in each cell of the 48 cells.

Even if you had perfectly normally distributed random error terms, (which is almost the same as “residuals”) but also an imbalanced design (an equal “n” in each cell) it is questionable if it is meaningful to estimate interactions.

Maybe someone else has some comments on this?
Oh, yeah, sorry. Yeah the n's in each of 48 cells are quite small (<10). I see your point. The interactions are, btw, all non-significant. I wonder what would happen if i combined some groups together thus decreasing the number of cells and enlargening the sample sizes in each cell. But somehow i think that is not advisable doing and should have been done prior to conducting the experiment.
Specifically talking about the sample size:

How large is the sample in total?

If you pick out arbitrarily (or even better randomly) 5 to 10 cells, how many observations do you have in each of these cells?
Oh, sorry I didn't see that. All together the whole sample has N=400. Individual cell go from highest n=64 to lowest n=1. But most of them are between 5 and 20. I think that's what you were asking.
I was asking about how many observations you have in each cell to try to get some idea about how unbalanced your design is. It is a matter about the “quality” of the estimates. If it is exactly balanced it will be “orthogonal” so including or excluding an interaction term will not influence the estimates of the remaining. (The question in a way is if you should include or exclude higher order interactions.)

I have the impression that different users have different opinions about this. I don't want to suggest arbitrary recommendations since I have not seen the data.

So, it would be nice if someone else have some suggestions about this.


TS Contributor
If it is exactly balanced it will be “orthogonal” so including or excluding an interaction term will not influence the estimates of the remaining.
This is a good point. You have a huge range of sample size between the cells. Like GretaGarbo said, you need be careful of this.

Another point is about the plot you provided. What does your residual plot for your full model look like?

But, more to the heart of the matter....we don't know what your data look like, so, as Greta says, I think many of us will be cautious giving specific recommendations. That said, I think most of us agree that models with three-way interactions are a bit tricky. They have to be based on huge sample sizes, but even then interpretation can be difficult. Though I don't entirely recommend this, one option is drop the highest-order interaction from the model (the three-way term) and then conduct a model selection approach on all lower-order models, with the following as your most parameterized (or global) model:

A + B + C + AxB + AxC + BxC

And then these models:

A + B + C + AxC + BxC
A + B + C + AxB + BxC
A + B + C + AxB + AxC
A + B + C + BxC
A + B + C + AxC
A + B + C + AxB
A + B + C
A + B + C + BxC
A + B + C
A + C + AxC
A + C
A + B + AxB
A + B
A + C
A + B

Inference, I think, would be a bit easier and would still be strong. Of course, there are trade-offs with this approach. The only major downside I see is that model selection approaches are generally not philosophically compatible with a designed experiment. But I'm not sure if your factorial design was an experiment or observational study, etc. People, of course, have different opinions on this, so take what I say with a few handfuls of salt.
Last edited:
I have the impression that people that have worked a lot with experimental design can only accept a small deviations from perfect balance. But people that are used to observational data, data that have a lot of colinearity, that is severe imbalance, would gladly run an imbalanced model. Well, opinions can be different so it is interesting to hear different views.

I started asking about sample size because an imbalanced model can be viewed as “not-acceptable” if the imbalance is severe, even if the data is normally distributed.

If you estimate a full model with 48 parameters there is plenty of room to make the residuals look normal since least squares is maximum likelihood estimates based on the normal distribution.

“The interactions are, btw, all non-significant.
“ And the variance constant. Maybe the situation is not that bad.

“I wonder what would happen if i combined some groups together thus decreasing the number of cells and enlargening the sample sizes in each cell.”
I think that could cause other problems, so I would avoid that.

Some people want to start with the full model, from the top of jpkellys (great) list. Others want to start with just the main effects, from the bottom of jpkelleys list.

One possibility is to include significant terms (if you start from bottom) or to drop non-significant effects (if you start from the top). Then you can plot the normal QQ-plot for the 400 residuals and look if it is on a straight line. If it is on a straight line it is normally distributed.

Having said that I don't want to make any suggestions since I don't want to come with an arbitrary suggestion.

But I believe that knedlica have said that the data are still non-normal. Then it remains a link function to some other distribution or a normalising transformation. And what would be wrong with that?


TS Contributor
Greta's post made me realize that I forgot to specify the kind of model selection to which I referring. That was lazy of me.

Stepwise model selection is one route (either from the simplest or from the most complex--global--model). This isn't my favorite approach (either addition or elimination), since I've found that it tends to encourage people to fudge a bit with where to stop their process ("oh, that's p=0.055...that's good enough, I think"). The most list I provided was a list to submit to a model selection approaching using an information criterion like AIC. This allows you to throw all the candidate models in to one pot and let the [AIC] math do the work of choosing the model. Then, given the output, you can commence with model averaging of the top models.

Greta is right that the issue remains that the data are still non-normal. For the unbalanced design and the non-normal data, then I would encourage you to delve into the realm of generalized linear (or non-linear) mixed-effect models. These can manage unbalanced designs fairly well (to a certain limit) and allow specification of whatever link function is appropriate for the distribution of your response data. You still might need a data transformation, but I wouldn't do this before you try one of the link functions. Unless your data are really funky (zero-inflated, or whatnot), you should be golden.
I don't like stepwise procedures, since a lot of strange models can be machinewise selected. (someone said: “stepwise=unwise”)

I like AIC = Akaike Informations Criteria, that essentially makes a trade off between the fit (loglikelihood) and the number of parameters p {AIC= -2*(log(L)-p) where L is likelihood and p is number of estimated parameters}

I would still like to have a hierarchy of models. At least the main effects (A, B or C) should be tested if they should be included. And if any higher order interaction effect are included then their corresponding main effects should be included, even if they are non significant.

It would be a very strange model if it only included A+ B*C. If the interaction B*C is included then the main effects B+ C should also be included.

This is in my humble opinion. It is interesting to hear jpkellys and others view.

Still, we haven't seen the data. This is a little bit like sitting on the beach practising swimming. I would like to jump into the water!

Who knows, maybe this is state secrets for CIA or KGB(sic!). But it would be easier if the participants showed their data, in an easy to read format. Then we could discuss the actual data and not practice dry swimming.


TS Contributor
It would be a very strange model if it only included A+ B*C. If the interaction B*C is included then the main effects B+ C should also be included.
I agree. This is exactly why my previous model list included only those interaction effects for which there were also main effects!
Ok, I've read everything you wrote, and although you lost me on some points with some technicality I am not familiar with here's what I have so far:

1) I am new to residuals and q-q plots so I'm not sure if i did this correctly but i calcučated the residuals for the whole sample using the formula residuals=X-M(x) (the q-q plot is shown below). (there are only 7 values for the dependant variable that is why there are so few dots there i guess). And from what i can see the distribution is almost normal except for the right-end part which suggest a slight negativly assimetrical distribution (the dependant variable is number of simptoms presented after the treatment (0-6) so this makes sense to me). What i want to know is do i have enough arguments for using parametric tests? I compared one-way anova's with kruskal-wallis and U-whitney test and they give the same results.

2) You both made some interesting points regarding the use of stepwise procedures. I know that they are used in regression models, is that what you were talking about or can they be used in ANova's as well and if so can that be done in spss automatically like it can for regression models or do i have to do it myself? and while we are on the subject a more general question: can i detect mediation effects in using anova like i can using hierarchical regression? Because when i use a one-way anova for my independat variable (age) with 6 groups it's signifficant but when i put it in a factorial model with the other independant variable (lenght of treatment) the treatment variable is signifficant but the age variable is no longer signifficant. Is this normal and can I conclude that age doesn't have effect on the number of symptoms but that it nearly effects the length of the treatment (mediator variable) which in turn effects the number of symptoms?

3)is's not a cia project ^^ I just don't know how to present the data to you in a simple and not time-consuming way.