equal sample sizes required for Kruskal-Wallis?

#1
Why in R does it require equal samples sizes for the Kruskal-Wallis and Wilcoxon tests??

If calculating by hand, then m and n (levels of each sample) can have different values, at least according to textbook critical Values for the Mann–Whitney (Wilcoxon) Statistic.

The other thing that is really confusing me is how R calculates df for each of these tests. I mean, for e.g., if m = 10 and n = 10, then shouldn't df = m-1 +n-1 ? Yet if I run a Kruskal-wallis test it says df = 3 :confused:
 

Dason

Ambassador to the humans
#2
It doesn't? I'm thinking you have some syntax wrong. Would you mind posting your code so we can see what the actual problem is?
 
#3
Here is the syntax:

kruskal.test(X, C)

Where X is a sample of ordinal data (n=10); and C is a second sample of ordinal data (n = 7).

The outcome is:

Error in kruskal.test.default(X, C) :
'x' and 'g' must have the same length



Alternatively, if I run the same test on two equally sized ordinal datasets (n=10 for both) I get:

kruskal.test(X, Y)

Kruskal-Wallis rank sum test

data: X and Y
Kruskal-Wallis chi-squared = 2.305, df = 3, p-value = 0.5116


And here we see this df=3 I was referring to earlier. Is this right?
Seems strange to me....I thought if using a chi-square table, then df = k-1, which in this example would be 2-1 = 1 ?
 
Last edited:

Dason

Ambassador to the humans
#4
Yup. You've got the syntax wrong. You should give the help page a gander. Just type "?kruskal.test".

The problem is that the first vector needs to be all of your data and the second vector is a way to identify which group it's in. You could fix this pretty easily with something like:
Code:
dat <- c(X,C)
nam <- c(rep(1,length(X)),rep(2,length(C)))
kruskal.test(dat,nam)
Another way would be to give them more descriptive names and use a formula in the test call instead:
Code:
dat <- c(X,C)
nam <- c(rep("X", length(X)), rep("C", length(C)))
j <- data.frame(data = dat, names = nam)
kruskal.test(data~names, data = j)
I kind of like the second way a little more because you can examine "j" and make sure everything is the way it should be a little bit easier.
 

Dason

Ambassador to the humans
#5
Alternatively, if I run the same test on two equally sized ordinal datasets (n=10 for both) I get:

kruskal.test(X, Y)

Kruskal-Wallis rank sum test

data: X and Y
Kruskal-Wallis chi-squared = 2.305, df = 3, p-value = 0.5116


And here we see this df=3 I was referring to earlier. Is this right?
Seems strange to me....I thought if using a chi-square table, then df = k-1, which in this example would be 2-1 = 1 ?
Yes, this is why it's important to know the syntax of the function you're using. In this case if you just let it slide you would be getting a completely meaningless test. It would have taken the values in Y as identifiers and used those to group the data in X and apparently there are only 4 distinct values in Y so it was just doing a 4 way test to see if those groups are significantly different.
 
#6
Ahh, silly me - it expected factor groupings for the second vector; this all which makes perfect sense now :)

I guess my syntax with the wilcoxon test was also wrong because it complained about needing X and Y to be of equal length at one point- although I cannot recall the exact syntax I used now :confused:

The other thing is, in terms of df for the two-sample wilcoxon test is there a way to display this?

Thanks for your help btw :)