Comparison of the power of the tests: Parametric test versus Non-parametric test

#1
How can one compare using a simple experimentation or simulation whether a parametric test (e.g. t-test) is more powerful than its non-parametric counterpart (Wilcoxon signed rank test)?

I understand that statistical power is defined as the probability of rejecting the null hypothesis when it is in fact false and should be rejected. The power of the parametric tests can be calculated easily from formulae, tables and graphs based on their underlying distributions.

I have learned that power for non-parametric tests can be calculated using Monte Carlo simulation methods. I am not sure if I have understood the procedure correctly. Please correct me if the procedure below is wrong.
• An alternative hypothesis (Ha) is specified together with a sample size.
• Sample data are generated pseudo-randomly from the probability density function (under Ha) and the test is carried out.
• This process is repeated many times (e.g. 1000 times) and the proportion of the ‘null acceptances’ is recorded. Since any acceptance of the null hypothesis is false by definition, this proportion represents the magnitude of Type II error denoted as β and its complement is power denoted as 1-β.

Can we readily compare the calculated power for the parametric test with the calculated power for the non-parametric test? Or we shall use the corresponding variances of the t-test statistic and Wilcoxon test statistic, then conclude whichever has smaller variance as the more powerful test?
 
Last edited:

Jake

Cookie Scientist
#2
Can we readily compare the calculated power for the parametric test with the calculated power for the non-parametric test? Or we shall use the corresponding variances of the t-test statistic and Wilcoxon test statistic, then conclude whichever has smaller variance as the more powerful test?
I think the first way is more straightforward.
 

Dason

Ambassador to the humans
#3
It also actually answers the question to a better degree (mainly what is the power of both tests). I don't see how comparing the variances would actually answer the question of interest.
 

Jake

Cookie Scientist
#4
My impression is that it is not irrelevant, but that it is at best an indirect answer to the question. If we consider, for a given set of data/parameters, the distributions of estimates given by the parametric and nonparametric models, then if the two distributions are centered at exactly the same point (above or below 0), then whichever distribution has lower variance has great power. Of course, a precise answer to the question of how much greater power simply entails doing the first kind of analysis anyway. And if the distributions of estimates differ in both mean and variance, then the first approach is really the only informative option at all.
 

Dason

Ambassador to the humans
#5
Gotcha. But we're talking about power so typically this implies we're not looking at the case where the null is true. So we really wouldn't expect the mean to be centered at 0 (for a test statistic where 0 implies no difference). So the variance really isn't of interest unless the means are the same and the critical value is the same. And even then the variance might not be relevant if the distribution of the test statistic is skewed. Really I just don't see the point. If we're interested in power calculations then just simulate and calculate the estimated power. You'd have to do the simulations to get an estimate of the variance so it just seems silly to me to take a roundabout way to answer the question of interest when you could just directly answer the question using the same simulation.
 

Jake

Cookie Scientist
#6
So the variance really isn't of interest unless the means are the same and the critical value is the same. And even then the variance might not be relevant if the distribution of the test statistic is skewed.
Both good points.
 

Dason

Ambassador to the humans
#7
I'd also like to add that I think 1000 simulations is quite small for a simulation study. Using 1000 gives a margin of error of +/- 3% which if the tests are close to each other in terms of power wouldn't be good enough (in my mind) to adequately differentiate between the two. Plus for something simple like this you could do 100000 in a matter of seconds...