# Help with ANOVA

#### greyg8r

##### New Member
Here are my data:

I developed a protocol to measure soil oxygen concentration and want to test its efficacy in three different types of soils over three time periods after treatmet. In each soil, I measure the soil oxygen concentration in three replicates of my treatment and simultaneously in three replicates of my control one hour, four hours and eight hours after treatment.

The times and types of soils are independent variables. The oxygen concentration is the dependent variable.

I am most interested in determining if there is a significant difference between the treatment samples and the control samples in each of the three soil types at each time (i.e., 1 hour, 4 hour, and 8 hours).

Any help in advance would be greatly appreciated and acknowledged. If further clarification is needed, I can produce some typical data.

PS: I used a Randomized Complete Block Design ANOVA for my MS thesis in biology 29 years ago, but have had very little use of stats since then.

Regards,

Richard

#### staassis

##### Member
There is very little data. That is the problem. In each (soil, time) group you have only 3 observations. For that reason you will have to approach the task in two stages if you want to exploit only perfectly valid practices. And this may require brushing up your statistics knowledge substantially.

The design is Repeated Measures ANOVA. The distribution of residuals in your data is unclear. Most statistical software packages allow for only two cases:

a) normally distributed residuals,
b) large sample size in each group, where the definition of "large" depends on the data but is certainly above 30.

Hence the need for two stages in your case.

1] Run regular Repeated Measures ANOVA with

soil = the only between-subjects factor and
time = the only within-subjects factor.

Save the residuals. Examine them for normality visually and using Kolmogorov-Smirnov or Shapiro-Wilks tests. If the residuals are normal, run Mauchly's test for sphericity. Depending on the verdict of the test, use one of the below:

a) Sphericity Assumed test,
b) the Greenhouse-Geisser correction,
c) the Huynh-Feldt correction.

You can read on which option to choose here:

https://en.wikipedia.org/wiki/Mauchly's_sphericity_test

SPSS offers the most convenient implementation of Repeated Measures ANOVA. To get yourself started read this:

https://www.spss-tutorials.com/spss-repeated-measures-anova/

2] If the residuals are not normal, estimate their distribution from the data using empirical distribution function. Then program a likelihood ratio test specific to the estimated distribution of residuals. This programming task would be most convenient in R or Matlab. R can be downloaded for free from here:

https://www.r-project.org/

#### greyg8r

##### New Member

I think the easiest way for me to move forward is to greatly increase the number of replicates. It would not be unbearable to go to 40 replicates each.

Does that change the assumptions?

#### GretaGarbo

##### Human
.... in three different types of soils over three time periods after treatmet.
...... in three replicates
Can you tell us where the number of three comes from? It it not a holy number is it? (Joseph, Maria and Jesus).

I believe that they don't want to do just one. They want to replicate. But with two there could be one outlier. So they recommend Three, because then they can compare the two with the third and look if it is an outlier. That is my guess. Or why do they choose three?

Please Greyg8er, tell us who advised you to take three? The suggestion of "three" has nothing to do with statistics. It is a sociological habit. Tell us about it.

#### greyg8r

##### New Member
Can you tell us where the number of three comes from? It it not a holy number is it? (Joseph, Maria and Jesus).

I believe that they don't want to do just one. They want to replicate. But with two there could be one outlier. So they recommend Three, because then they can compare the two with the third and look if it is an outlier. That is my guess. Or why do they choose three?

Please Greyg8er, tell us who advised you to take three? The suggestion of "three" has nothing to do with statistics. It is a sociological habit. Tell us about it.
They were Joseph, Mary and Jesus and I am Greyg8r (Richard). I detect a heavy dose of condescension, but I must be mistaken because I can't imagine someone be patronizing to a new member seeking advice. So, assuming I misread your intent and you really aren't a bully jerk but really was trying to be helpful, I will address your questions:

First, there is nothing magical about:

1. Three soil types. Soils are generally classified into three categories for my purposes.

2. Three time periods. 1 hour, 4 hours and 8 hours are convenient sampling periods for wetland scientists assessing a sampling site. 1 hour allows a quick assessment. 4 and 8 hour (1/2 and full day) time periods are logical. I could add 2 hours to avoid your "sociological habit" of 3 periods, but that would be extraneous.

3. Three replicates. When I did a simplistic student's t test after 8 hours in each soil type, using 3 replicates, the results were very significant (p<.001). So, I used 3 replicates as a starting point for my ANOVA.

There you go.

Richard

#### GretaGarbo

##### Human
I detect a heavy dose of condescension, but I must be mistaken because I can't imagine someone be patronizing to a new member seeking advice.
That was absolutely not my intention. I apologize. But you are not the first one using the three-number. I just wonder where people get things from.

the results were very significant (p<.001).
So, OK there you go. You have a pilot plant estimate.
There is very little data.
And it is not to little data. (And I do not intend to be "condescending" now either.) I just note the fact.

#### staassis

##### Member
Richard, with 40 observations per group you have much better chances of satisfying the assumptions of regular Repeated Measures ANOVA and performing step 1 only. Still, the normality of sample mean has to be tested in each group (via bootstrap, for example).

In general, getting as much data as possible is more important that any particular statistical method or stage of analysis. If you can get 40, do it by all means, no matter whether simple statistical solutions are available already. On 40 observations per group your conclusions are likely to be quite different from the conclusions on 3 observations per group.

#### hlsmith

##### Not a robit
I have read none of the above posts, but wanted to say that multi-level models can be flexible and handle missing data. I usually recommend them over repeated ANOVA style analytics.