I am a student who has carried out a renewable energy experiment for a dissertation project. My dependent variable is total biogas production. I designed an experiment to hopefully look at the influences on biogas production of pre-treatment (Yes or No), Enzyme (Yes or No) and sample volume (100, 200, 300 and 1000ml). I have data for a fully factorial experiment of 2x2x4. Each test was carried out in triplicate so my overall size is N=48.

I have been using the Shapiro-wilk tests to examine normality and attempted to transform my data a fair few ways to get a normal distribution but failed in some cases. When normality was met I used an ANOVA which was fine. For non-normal data comparisons I was able to use the kruskal-wallis, then multiple Mann-whitney U for post-hoc tests.

My complication is that I am also interested in all the interaction effects. I have been taught that when normality tests or levenes tests are significant go to non-parametric tests. However during my own reading here and in other places, people seem happy to break this rule and use parametric tests anyway. I have read in more than one place that an ANOVA and multiple comparisons can be used when mathematical normality tests fail but “the distribution of each group is close to normal”. Sorry if this is a silly question but I don’t know what is satisfactory in terms of close to normal?

Also my supervisor says that interactions can’t be tested when all comparable groups don’t have a normal distribution? After doing a web search I think that generalized linear model maybe be able to tell me about the significance and effect size of any interactions – if the data can be fitted to a particular skew such as the Poisson shape.

I (crudely) studied the interaction between enzyme and pre-treatment by using a mann-whitney U test to study Pre-treatment when Enzyme-Yes and Pre-treatment when Enzyme-NO (and vice versa). Although I believe this will only tell me something useful if 1 causes the other to become either no-longer significant, newly significant or significant in the opposite direction. I.e. I would like an approach that would highlight if a significant positive increase caused by enzyme is significantly increased further when using pre-treatment.

In summary my questions are where can I learn (or please can someone explain) how to know when it is ok to break the normality ‘rules’ and proceed with an ANOVA?

And; hypothetically in a case when all data comparisons are non-normally distributed can interactions be tested? And if so is a three-way comparison a bad idea as for each case here sample size would be n=3 (my triplicate of each condition).

I don’t expect anyone to do my home work but any pointers would be greatly appreciated.

Thank you already as this forum has given me much of this basic knowledge.