- Thread starter Alisonjones
- Start date
- Tags non normal

The Mann-Whitney U test compares only 2 groups. the Kruskal-Wallace is the equivalent non-parametric test to the one way above assuming H0 the rank of all groups is equal. So it is a good suggestion

If you want to compare each pair (ab, ac, cb) then you must use a

as the three groups have the same shape but possibly different locations.

(For symmetrical distributions mean equal median)

My data is non normal.

(who told you so, by the way?).

If sample size is small, then the prediction errors (residuals) of your ANOVA should prefereably

be sampled from a normally distributed population of residuals..

If your sample size is large, then the t-test or the F-test, respectively, is considered as being

robust against violations of even that assumption.

How large is your sample size?

With kind regards

Karabiner

Hi Karabiner

Correct, but I may rephrase it in another way:

The t-test uses t distribution.

t-distribution assume normal data (uses instead of z when you don't know the population)

Due to central limit theorem usually, when the sample size is large enough the distribution is approximately normal.

So you can use the t-test

I also tried to run a simple simulation (hopefully correct ???)

The expected result should be pvalue=0.05, the probability to reject a correct H0 if the t-test is perfect for this data.

I should also show the power to reject an incorrect H0...

I use chi-square with df=4 independent of the sample size, just to show a non-symmetrical distribution. (usually, the df going up with sample size)

I tried the same also for a normal distribution.

df <- 4 # degree of freedom

reps <- 200000 # number of simulations per one sample size

sample_size=c(2,4,6,8,10,15,20,25,30,35,40) # sample size

mean_pvalues <- numeric(length(sample_size))

set.seed(1)

j <- 1

for (n in sample_size)

{

pvalues <- numeric(reps)

for (i in 1:reps)

{

x1 <- rchisq(n, df, ncp = 0)

x2 <- rchisq(n, df, ncp = 0)

pvalues* <- t.test(x2,x1,alternative="greater")$p.value*

}

mean_pvalues[j] <- mean(pvalues < 0.05)

j=j+1

}

mean_pvalues

plot(sample_size,mean_pvalues)

lines(sample_size,mean_pvalues)

*
*

*And compares chi-squared distribution (blue) with a normal (red)*

Correct, but I may rephrase it in another way:

The t-test uses t distribution.

t-distribution assume normal data (uses instead of z when you don't know the population)

Due to central limit theorem usually, when the sample size is large enough the distribution is approximately normal.

So you can use the t-test

I also tried to run a simple simulation (hopefully correct ???)

The expected result should be pvalue=0.05, the probability to reject a correct H0 if the t-test is perfect for this data.

I should also show the power to reject an incorrect H0...

I use chi-square with df=4 independent of the sample size, just to show a non-symmetrical distribution. (usually, the df going up with sample size)

I tried the same also for a normal distribution.

df <- 4 # degree of freedom

reps <- 200000 # number of simulations per one sample size

sample_size=c(2,4,6,8,10,15,20,25,30,35,40) # sample size

mean_pvalues <- numeric(length(sample_size))

set.seed(1)

j <- 1

for (n in sample_size)

{

pvalues <- numeric(reps)

for (i in 1:reps)

{

x1 <- rchisq(n, df, ncp = 0)

x2 <- rchisq(n, df, ncp = 0)

pvalues

}

mean_pvalues[j] <- mean(pvalues < 0.05)

j=j+1

}

mean_pvalues

plot(sample_size,mean_pvalues)

lines(sample_size,mean_pvalues)

Last edited:

Hi Dason,

I used the following code, I added normal distribution and colors and increased reps.

I used the following code, I added normal distribution and colors and increased reps.

Code:

```
df <- 4 # degree of freedom
reps <- 800000 # number of simulations per one sample size
sample_size=c(2,4,6,8,10,15,20,25,30,35,40) # sample size
mean_pvalues <- numeric(length(sample_size))
set.seed(1)
j <- 1
for (n in sample_size)
{
pvalues <- numeric(reps)
for (i in 1:reps)
{
x1 <- rchisq(n, df, ncp = 0)
x2 <- rchisq(n, df, ncp = 0)
pvalues[I] [ i ]<- t.test(x2,x1,alternative="greater")$p.value
}
mean_pvalues[j] <- mean(pvalues < 0.05)
j=j+1
}
mean_pvalues
plot(sample_size,mean_pvalues)
lines(sample_size,mean_pvalues,col="blue")
#-2.-------------------
mu <- 10 # mean under the null hypothesis
sigma <- 20 # mean under the null hypothesis
#reps <- 800000 # number of simulations per one sample size
sample_size=c(2,4,6,8,10,15,20,25,30,35,40) # sample size[/I]
[I]mean_pvalues2 <- numeric(length(sample_size))[/I]
[I]set.seed(1)[/I]
[I]j <- 1
for (n in sample_size)
{
pvalues <- numeric(reps)
for (i in 1:reps)
{
x1 <- rnorm(n, mu, sigma)
x2 <- rnorm(n, mu, sigma)
pvalues[I] [ i ]<- t.test(x2,x1,alternative="greater")$p.value
}
mean_pvalues2[j] <- mean(pvalues < 0.05)
j=j+1
}
mean_pvalues2
#plot(sample_size,mean_pvalues2)
lines(sample_size,mean_pvalues2,col="red")
[ /code][/I][/I]
```

Last edited: