# Test for normality

#### lynnar

##### New Member
HI, I have a question on my test of normality. I have 1 independant variables (type of maps) ..so if my data is normal, i should be conducting one way anova.

I run tess of normality and here what I have:

For task 4- Map c (as outlined in red) i have P<.05..so it means that it is not normaly distributes.

so , does it mean that I would need to conduct one way ANOVA for the other test and Kruskal-Wallis Test (non parametric one way anova) for test 4 ?

Thanks

#### lynnar

##### New Member
HI,.

I dont relaly understand the discussion over tehre as i hardly imagine teh resut wohtout a screen shot of the spss result..

Thanks

#### trinker

##### ggplot2orBust
ledzep from the other post said:
First of all, data are not always normal (e.g, count data, yes/no data etc). So, what type of variable is it? A continuous variable with real line as its support?

Second point, assuming your variable is continuous, why do you want your values to religiously follow a normal distribution? When performing analysis and checking the model fit, it is the normality of the residuals which are important (not the normality of data).

Last point, even if the data (or residuals I should say) are not normal, the associated F-test are pretty robust to non-normality.

And what sort of statistical tests/analysis are you running? It appears to me that you just want the histograms of your raw data to look normal, which is not necessarily needed (unless there are some good reasons for it be look/be normal).

I repeat it: it is the normality of the residuals which are important (not the normality of data).
The assumption of normality is that your residuals are normal. That means you run the analysis and then look at some sort of a plot of the residuals (or a test but a plot is usually better IMHO). I generally use a qqplot of the residuals as it's easy to distinguish departures from normality with this method.

#### SiBorg

##### New Member
Trinker, when you say the 'residuals' are normal, do you mean that the means of the data would follow a normal distribution? What do you mean exactly by residuals as this is an important point usually completely missed in the medical literature.

#### trinker

##### ggplot2orBust
No the residuals are the error terms of the model. The amount of variance in your dependent variable not accounted for by the model. Think in simple bivariate terms: this is how far the points on a scatter plot are away from the regression line. So you will have an amount each observation is not accounted for by the model (error terms or residuals). These error terms are what are assumed to be normal. I think most if not all of stats programs can give you these. I used to do it with my TI 83. The residuals can be found by taking the observed value minus the predicted value.

So lets say we had the following variables:
x = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
y = 0.9, 1.4, 2.8, 5.5, 4.8, 5.2, 6.5, 7.0, 8.3, 10.1

We calculate the model and get the following slope and intercept:
(Intercept) beta
0.06667 0.94242

So out model is:
y = .067 + .942x

We can then plug the observations back into that equation (the equation is a linear equation meaning it is the formula to draw a line that goes through all the points and minimizes the distance the points lie from that line).

So if we plug all our x values from above back into the equation we get the predicted values:

1.009091, 1.951515, 2.893939, 3.836364, 4.778788, 5.721212, 6.663636, 7.606061, 8.548485, 9.490909

Now our observations aren't exactly these values (unless you had a perfect model but that never happens in real life). So we take the original x values and subtract the predicted value from each giving us the error terms or the residuals below:

-0.10909091, -0.55151515 , -0.09393939, 1.66363636, 0.02121212, -0.52121212, -0.16363636, -0.60606061, -0.24848485, 0.60909091

These are what we're interested in being normal. Generally it is recommended to plot these residuals in a qqplot as seen below: Here's a link for help on interpreting the qqplot (LINK). For a better interpretation of these plots see Cohen, Cohen, Aiken and West pages 138-139

Here is a link that discusses assumptions of regression (an ANOVA is a linear model like regression and carries many of the same assumptions) (LINK)

#### SiBorg

##### New Member
Am I right in thinking a T-test is a special case of an ANOVA? So this works for the T-test too? So let's say I'm doing a paired t-test and want to decide if my residuals are normal. How do I draw the line y = ax + b? What if I'm doing a non-paired t-test?

This is constantly done incorrectly in the medical literature - authors always perform the tests of normality on the dataset which I knew was wrong but I don't know how to do it correctly. This discussion is very helpful!

#### trinker

##### ggplot2orBust
An anova really is an extension of a t-test to more groups. So yes it will work for a t-test. I remember having to prove that regression and t-tests were equivalent in my elementary stats class. But please consult a quality text or article to determine the assumptions specific to each test you're using.

To draw a line using y = .067 + .942x
To draw a line in the form of y = ax + b you know that b is the intercept (aka y intercept). So that's where the line crosses the y axis. Put a dot there on the y axis at .067 (in red). Now the a is the slope. Remember rise over run here for slope. If you have a slope of .942 we can put that over 1 so we have [TEX]\frac{.942}{1}[/TEX]. Now we go up .942 (rise) and over 1 (run). This makes your points pretty close together and hard to draw a line between so I recommend multiplying both the numerator and denominator by a constant. [TEX]\frac{.942}{1}*8=\frac{7.536}{8}[/TEX]. So we go up 7.536 from the red point and over (right because it's positive) 8. This is the orange dot: Now connect the two lines: In real life you'd never do any of this by hand but it's good to know the theory behind everything.

#### SiBorg

##### New Member
Sorry trinker I wasn't clear in my last post. I meant to say that if I do a T-test how do I work out my residuals using the logic you've just told me. Is it the difference of each value from the mean? Or do I somehow have to work out what 'line' the t-test is based on and then calculate the residuals? I'm not quite sure what the T-test is doing and how I calculate residuals from it.

#### noetsi

##### No cake for spunky
It is depressing that one proves t-test and anova are the same in an elementary class #### Dason

Sorry trinker I wasn't clear in my last post. I meant to say that if I do a T-test how do I work out my residuals using the logic you've just told me. Is it the difference of each value from the mean? Or do I somehow have to work out what 'line' the t-test is based on and then calculate the residuals? I'm not quite sure what the T-test is doing and how I calculate residuals from it.
Residuals are just the actual value minus the predicted value. In the case of the t-test the predicted value is just the group mean so your residuals just reduce down to the observation minus the group mean.

#### victorxstc

##### Pirate
WOWWWWWWWWWW!!!

THANKS Trinker and other guys In all these years, this is the first time I am hearing normality test of ANOVA and t-test depends on the distribution of residuals (not sample distribution), which really shocked me. It was never recommended (or done) by any statistician or requested by any journal statisticians or reviewers to revise the statistics and redo the normality test on the error terms, not sample distribution. I have never seen any articles in which the authors have justified the usage or exclusion of an ANOVA by means of checking the normality of residuals. All of them, I mean all!, had checked sample distribution. So, an ANOVA can be applied to non-normal data, if the residuals are still normally distributed (as I remember was accentuated by Dason as the real assumption for linear regression)? Could somebody please tell me how to find the residuals in an ANOVA output? In regression, SPSS plots the residuals so its very easy to check them, but I have never seen any option for plotting residuals in ANOVA, nor have I seen any easy articles talking about slopes in ANOVA or at least calculating them (except some tutorials about fitting slopes in ANCOVA). Is the answer again R?!

Maybe I should do it independently using a q-q plot first.

But again I don't know why I haven't been asked or corrected in any research project of mine, to actually look for normal distribution of residuals in an ANOVA rather than normal distribution of sample?

--------------------

No, not basically after an ANOVA, but it can draw a q-q plot.

Edit:
I tested it. It works quite fine. Thanks But again why our or journals' statisticians don't know this?! Why they always perform a KS test and incorrectly rely on it without noticing the residuals' distribution?

Edit 2:
I guess perhaps since the KS test itself uses residuals to check the normality of sample and perhaps if it reports a non-normal sample (P<0.05), it is actually implying a non-normal distribution of residuals. So when a sample is normally distributed, its residuals might fit the normal curve in a way that the q-q plot gets smooth and in line. So statisticians might know and do it without wanting to confuse medical researchers by statistical details which are usually frightening to them.

----------------------++

Residuals are just the actual value minus the predicted value. In the case of the t-test the predicted value is just the group mean so your residuals just reduce down to the observation minus the group mean.
Could you please kindly tell what is the predicted value in ANOVA? For example I can draw a q-q plot and check it with normal or other types of distributions, but should I check it against something else in an ANOVA? Again, is it the mean?

AFAK, in linear regression, residuals are checked in terms of being normally distributed; does it apply to ANOVA too? If so, again the normality of sample might tell the normality of residuals.

#### Dason

Does SPSS give you a qqplot after an anova? That would typically be a qqplot of the residuals and is used to assessing the normality of the residuals.

#### SiBorg

##### New Member
@Dason

Just had another think about this. If, for a t-test, the residuals are simply the deviations of the observed values from the sample mean, surely, in this specific case of the t-test, if the residuals are normally distributed, then so is the data?

If this were true, a test of normality on the data should be a surrogate for a test of the residuals in this specific case.