# How can you prove normal distribution enough to use e.g. ANOVA not Kruskal Wallis?

#### Geoff

##### New Member
Hi everyone,

I'm trying to produce an automated framework for choosing hypothesis tests and carrying them out. I really am new to statistics so sorry if I'm asking really obvious questions, but I've tried for so long to find an answer and aren't getting anywhere. This also isn't homework help so sorry if it's in the wrong sub forum, but it is a request for help and I'm not exactly an expert yet.

So the problem is that, as we know, in hypothesis testing some tests assume a normal distribution so you should only use them if your data is normally distributed. Generally non parametric tests don't require a normal distribution whereas parametric do. Suppose you have more than 2 groups of continuous data which are not paired. If it's between ANOVA and Kruskal Wallis you should use Kruskal Wallis unless you know that your data is normally distributed.

My problem is, how can you ever know that your data is normally distributed? I've spent ages looking up tests for normality and there's plenty about - Shapiro-Wilk, Shapiro-Francia, Lilliefors etc. But the problem is that they all have a normal distribution as their null hypothesis. This means, as I understand it, that if you get a P value from them lower than you want (e.g. 0.05) then you have provided a fair degree of certainty that your data is not normal but if the P value is higher then you haven't proved anything, your data still may or may not be normal. I wanted to know whether you could test against the null hypothesis that your data isn't normal, and so I googled for this but didn't get anywhere. A Stack Overflow thread told me that such a test doesn't even make mathematical sense.

http://stackoverflow.com/questions/...ribution-not-test-for-non-normal-distribution.

It looks to me as though you can never prove with any quantifiable certainty that your data is normally distributed, only that it isn't.

What I don't understand is that this would surely mean that KW would have to always be used in favour of ANOVA. In fact, given that the same thing seems to apply for heteroscedasticity tests it looks like its a general rule that you have no choice but to always use the test that makes the least assumptions, e.g. always use something like the Brunner Dette Munk test which doesn't even assume homoscedasticity. But people must somehow be finding justification for using tests like ANOVA, especially as I keep on seeing advice to use tests like ANOVA if I do know my data is normally distributed as parametric can give you more power. Are people simply choosing ANOVA through looking at their data by eye to see if it looks normal? Or maybe using Bayesian, rather than frequentist methods - but if you do that you still have to somehow choose a threshold below which you'll use the test which doesn't assume normality and I don't know how to choose that threshold? Am I missing something here?

Thanks a lot,

Geoff

#### Karabiner

##### TS Contributor
Re: How can you prove normal distribution enough to use e.g. ANOVA not Kruskal Wallis

Generally non parametric tests don't require a normal distribution whereas parametric do.
No, they don't. First of all, it's the residuals of the (e.g. ANOVA or linear regression)
model, not the observed data. So you first have to build your model before you
can make statements about normality. Second, not residuals in the sample
have to be normally distributed, but the residuals in the population from which
the data are sampled. Only because of this could statistical tests (like
Shapiro-Wilk) make any sense at all (many people doubt that they make much
sense at all). Third, with a large enough sample n > 50, or better n > 100,
the central limit theorem ensures that the test results
will be correct, even if the sample residuals are from a nonnormal population.

With kind regards

K.

#### shahnawaz

##### New Member
Re: How can you prove normal distribution enough to use e.g. ANOVA not Kruskal Wallis

But since you only have the sample available, how can you get residuals of the population ?