1way anova with three treatments one treatment has some atypically high results meaning it's variance is obscuring result, what should I do?

#1
Hi, I've conducted an experiment to assess the impact of plastic film mulch on the weed seed bank. Using a randomized block design with three treatments- mulched, unmulched and weeded and unmulched and unweeded then counting seedlings emerging from soil samples. Most results are in 0-9 range and roughly normally distributed but 4 results of 9 in the last treatment are very much higher 18 to 51. Running an ANOVA shows the results as significantly different but using a Tukey test shows only the 3rd result to be different. However I'm pretty sure the first two are different but the large variance in the third treatment is obscuring this.
Could someone advice me on the correct approach here, should I use a t-test on the first two treatments? The data from the third isn't really normally distributed because of the high counts in some samples so is the Anova result valid and if not how should I show this?
Thanks
Martin
 
#2
One way ANOVA only tell you that at least two groups were different. so it seems OK to use Tukey Test
One of the ANOVA assumptions is a sample from a normal distribution, but ANOVA (and t-test) is not very sensitive to medium deviations from normality if the sample data is reasonably symmetric around the average.
Is this the case?
If the answer is yes, I assume you can run t-test (or ANOVA for only 2 variables ignoring the problematic one)
better running weltch's t-test if you aren't sure both samples have equal variance
 
#3
Hi,
Thanks for that reply.
The data for this looks normal only for 1 of the 3 treatments the first has 4 out of 9 results 0 the rest 1 or 3. The second has reasonably normal data in range 1-7 and the third has 5 out of 9 results in the range 3-9 and then 4 very high counts 18-51 these are weeds germinating from different treatments. Anova and kruscal-wallis show the results as significant but post hoc tests (Tukey and Dunn's) show that the first two treatments aren't significantly different from each other. However if I ignore the third treatment and use a T-test or a Mann-Whitney it shows the results to be significantly different p<0.001 or p=0.004.
I understand that using multiple T tests isn't a good way to separate lots of different treatment results. But surely the two sample tests can't be totally wrong. If I hadn't done the third treatment or hadn't measured it the results of the first two would have been the same and I would have used a T-test or Mann-Whitney and the results would have been valid. My supervisor says just to go with the ANOVA but I can't help thinking that's missing out on a clearly significant result.
Some authoritative validation of what I'm trying to do or an explanation of why I'm wrong would be really appreciated.
Martin
 
#4
Since you are
"counting seedlings emerging from soil samples".
Then it seems more appropriate to replace the normal distribution assumption (that is implicit in traditional anova) with a distribution that is based on counts, e.g. the Poisson distribution (start with that) or the negative binomial (that is more difficult - try it afterwards). You can estimate this with a generaLIZED linear model. It exists in most statistical software.

Instead of Tukey hsd, search for Bonferroni Holm.
 
#5
Hi,
Thanks. The data definitely isn't Poisson distributed, I'd kind of expect it to be normal but i think there are complicating factors and the number of samples is only 9 for each treatment so it's difficult to tell.
However i have discovered a method that seems to work: transforming the data with a Johnson transform. Is there any reason that shouldn't be used here? It seems to give good normal data and sensible results.
Cheers
 
#7
Sure OK.
So i'm measuring the number of weeds that germinate from a soil sample from three different treatments 1 had been covered with a biodegradable plastic mulch film during the growing season, treatment 2 wasn't mulched but was weeded and treatment 3 wasn't mulched or weeded. the trial used a randomized block design. There were three repetitions and 3 samples were taken from each of these making nine samples for each treatment.
The data comes out like this
1518035459543.png
have attached the data but try as I might I don't seem to be able to get it to attach any kind of spread sheet file.
I think i know exactly what is going on there are three source of seeds that are germinating in treatment 1 only seeds remaining ungerminated through the season are coming up, in treatment 2 some are also blowing in on the wind and on treatment three some weeds are going to seed resulting in large numbers of weeds if the sample happens to contain soil from under that weed were the weeds have dropped their seed. There might also be differences in the number of weed seeds surviving from the previous season due to different conditions.
My supervisor seems quite happy with the transformation but he's an agronomist and not a statistician.
Thanks for taking an interest.
 

Attachments