choosing the right statistical test for comparing data, when do you use which test

#1
Hello all,

I would like some feedback on the right approach of analysing data, comparing means of groups of data.

Currently I work with the following 5 steps to analyse my data. Is this approach correct or how do you analyse your data if you do it differently?

1) I test for normality with boxplots, histograms and the shapiro wilk test for normality
2) I check for homoscedasticity with the levene test
3) if 1 and 2 are positive I do parametric tests if not non parametric tests

4) PARAMETRIC TESTS
4a) if there are only two groups to compare I use the t-test
4b) if there are more groups to compare I use ANOVA to check for differences in means
4c) if there are differences I use the Tukey test for pairwise comparison

5) NON PARAMETRIC TESTS
5a) if there are only 2 groups to compare I use the Wilcoxon rank sum test, is this test the same as the Mann Whitney test?
5b) if there are more groups to compare I use the Kruskal Wallis test to check for differences in means
5c) if there are differences I do pairwise comparison with the kruskalmc test Are there good alternatives for this test?

Kind Regards
Joachim
 

Karabiner

TS Contributor
#2
Re: choosing the right statistical test for comparing data, when do you use which tes

Could you please describe your study first, i.e. research area,
research questions, study design, sampling procedure, sample
size (!), and measuremets taken (which, when, from whom)?

With kind regards

K.
 
#3
Re: choosing the right statistical test for comparing data, when do you use which tes

Hello,

I perform plant experiments in which I would like to compare different treatments on groups of plants. Treatments are usually chemical or biological control agents (pesticides or beneficial insects). My data is mostly countings of the number of pests or beneficial insects on plants before and/or some time after treatment. Usually we work with 4 or 5 replications (and always a control treatment), but some experiments have more up to 10. We usually compare 5 to 20 different treatments.

I'm not a statistician, but I would like to know if my plan of action (the steps in my previous message) is suitable to be used as basis for the statistical analysis of my different experiments, because all experiments are basically quite similar. Plants may vary, different insects will be counted, number of replications can change and number of treatments is also different for different experiments. But basically the experimental setup is very similar. Please correct me if I'm wrong. I would like some advise on what the best method is for analysing my data.

Kind Regards
Joachim
 

Karabiner

TS Contributor
#4
Re: choosing the right statistical test for comparing data, when do you use which tes

What still remains unclear to me is your design and the nature
of your dependent measure. First of all, if you have
countings of the number of pests or beneficial insects on plants before and/or some time after treatment
does this mean that you you sometimes have both measures
and sometimes only follow-up? If you sometimes want to take
into account both measures, then in those cases data analysis
will be different from the instances where you only have f'up-
measures.

As to your dependent measure, in what range will be the
numbers of insects, e.g. between 0 and 5? Or, 0 and 40? 100
and 1000? If you have very small counts, then you should
resort to tests for ordinal scaled (ranked) variables.

And, do analyse only 1 dependent variable in each analysis,
or several of them (such as counts of pests1 and pest2 and
benefical1) at the same time?

We usually compare 5 to 20 different treatments.
Do these treatments differ qualitatively (for example 15 different
pesticides), or quantitavely (15 different doses of the same pesticide)?

With kind regards

K
 
#5
Re: choosing the right statistical test for comparing data, when do you use which tes

Hello,

Indeed for some experiments we have countings before and after, so then we usually compare the means of the differences. In other cases when the pest is randomised and spread evenly we only do countings afterwards and thus compare the number of pests after treatment. Should I use different tests for those 2 cases?

About the numbers in the countings, this strongly depends on the pest type. Sometimes the range is 0 - 500 in other cases numbers will be smaller and vary from 0-20 or even less.

Sometimes (quite rarely) we count more than one pest, but most commonly we will focus on only 1 pest per experiment.

About the products also both option smay occur: either 15 different products or it could be five products each in three doses.

I am trying to find a general way of analysing my different datasets, but now that I see all the questions I'm probably wrong in assuming that I can just do generalised analysis? Is there a general aproach possible?
 

Karabiner

TS Contributor
#6
Re: choosing the right statistical test for comparing data, when do you use which tes

Indeed the most simple (and robust) thing would be
Kruskal-Wallis (or Wilcoxon rank sum / Mann-Whitney
U-test in case of 2 groups) This o.k. especially with
small sample sizes and with count data. Mind that
these procedures do not compare means.

There are several possible scenarios which you described,
where other types of analyses could be used, but
it seems as if you want to have one type for all analyses.

With kind regards

K.
 
#7
Re: choosing the right statistical test for comparing data, when do you use which tes

Hello,

When the Kruskall Wallis test shows me there are differences in means, what would the best test be to do pairwise comparison of my datagroups? Currently I use the pairwise Wilcoxon (Mann-Whitney) test with Benjamini Hochberg correction. Would this be a good solution? If you are familiar with R, this is my line: pairwise.wilcox.test(value,variable,p.adjust.method = "BH")

thank you very much for the help.

Kind Regards Joachim
 
#9
Re: choosing the right statistical test for comparing data, when do you use which tes

There are several possible scenarios which you described,
where other types of analyses could be used, but
it seems as if you want to have one type for all analyses.


K.
I would prefer one type for all analyses, but it should be statistically correct too of course. Could you give me some tips on the different tests or strategies to be used for my different types of experiments?

Kind Reegards
Joachim
 

Karabiner

TS Contributor
#10
Re: choosing the right statistical test for comparing data, when do you use which tes

KW tests for differences in medians.
I beg to differ a bit. If you look at the formula (or that of the Wilcoxon), you'll see that
it just tests whether the values from one group tend to have higher ranks than the values
from the other group. This usually means that the medians differ, but it ain't necessarily so.
If one wants to test the median, then the "median test" (sic!) could be used (which has lower
power, though).

With kind regards

K.
 

Karabiner

TS Contributor
#11
Re: choosing the right statistical test for comparing data, when do you use which tes

When the Kruskall Wallis test shows me there are differences in means, what would the best test be to do pairwise comparison of my datagroups?
Dunn's test.

I would prefer one type for all analyses, but it should be statistically correct too of course.
Using Kruskal-Wallis is correct, and it is quite robust and reliable,
but it doesn't necessarily represent the "best" solution in all scenarios.
E.g. you could use oneway ANOVA or t-tests, if their assumptions are
fulfilled (which could be difficult to assess, especially in case of small
samples), because they test for difference in means and are more
powerful than the rank-based Kruskal-Wallis. Regarding pre-post measures,
I initially thought about repeated-measures analyes, but AFAIK using
differences scores as dependent variable is o.k. in experimental designs
with random allocation of subjects to groups.

With kind regards

K.
 
#12
Re: choosing the right statistical test for comparing data, when do you use which tes

Hello,

Indeed in most cases because the sample sizes are small I can't use parametric t-test nor ANOVA. So I have to go for the non parametric tests. So this means Kruskall Wallis. When Kruskall Wallis tells me there are differences between the groups of data, but to know which groups differ, what would be the best option? Currently I use a pairwise wilcoxon test with Benjamini Hochberg correction for mulltiple comparison. Would this be a good choice? I use this after some searching in books and internet, but have to admitt I don't really know what this does exactly.....

Kind Regards and thanks for all the help already
Joachim
 
#14
Re: choosing the right statistical test for comparing data, when do you use which tes

Hello,

i checked out the Dunn's test and have an extra question. there are several options available to adjust p-values for multiple comparisons (in R at least, whioch I use for my statistics), would one of these options be the better choice, or doesn't it really matter for my kind of data??

"bonferroni" the FWER is controlled using Dunn's (1961) Bonferroni adjustment, and adjusted p-values = max(1, pm).

"sidak" the FWER is controlled using Šidák's (1967) adjustment, and adjusted p-values = max(1, 1 - (1 - p)^m).

"holm" the FWER controlled using Holm's (1979) progressive step-up procedure to relax control on subsequent tests. p values are ordered from smallest to largest, and adjusted p-values = max[1, p(m+1-i)], where i indexes the ordering. All tests after and including the first test to not be rejected at the alpha/2 level are not rejected.

"hs" the FWER is controlled using the Holm-Šidák adjustment (Holm, 1979): another progressive step-up procedure but assuming dependence between tests. p values are ordered from smallest to largest, and adjusted p-values = max[1, 1 - (1 - p)^(m+1-i)], where i indexes the ordering. All tests after and including the first test to not be rejected at the alpha/2 level are not rejected.

"hochberg" the FWER is controlled using Hochberg's (1988) progressive step-down procedure to increase control on successive tests. p values are ordered from largest smallest, and adjusted p-values = max[1, p*i], where i indexes the ordering. All tests after and including the first to be rejected at the alpha/2 level are rejected.

"bh" the FDR is controlled using the Benjamini-Hochberg adjustment (1995), a step-down procedure appropriate to independent tests or tests that are positively dependent. p-values are ordered from largest to smallest, and adjusted p-values = max[1, pm/(m+1-i)], where i indexes the ordering. All tests after and including the first to be rejected at the alpha/2 level are rejected.

"by" the FDR is controlled using the Benjamini-Yekutieli adjustment (2011), a step-down procedure appropriate to depenent tests. p-values are ordered from largest to smallest, and adjusted p-values = max[1, pmC/(m+1-i)], where i indexes the ordering, and the constant C = 1 + 1/2 + . . . + 1/m. All tests after and including the first to be rejected at the alpha/2 level are rejected.

Kind Regards
Joachim