Two-Sample, Nonparamteric test for "stochastic equality" with discrete numerical data



I would like to present a statistical analysis I performed on discrete numerical data. I am fairly new to statistics and would really appreciate another person’s input on the statistical methods employed and how to correctly interpret the results. I have supplied relevant publications, data, descriptive statistics, and histograms. Any comments, questions, recommendations, and critiques are welcomed. Thanks,

Research Question:

I am trying to show two samples (X and Y) of discrete numerical data are equal, where X and Y are different locations in the body (Shoulder, Knee, Ankle, Hip, etc... n=11). The data is scaled in 2.5 mm increments. The data represents the radius of curvature (RoC) measured in 27 male and female cadavers (cadavers = deceased people who donated their bodies to scientific research). The data was practically identical to the left and right side so I decided to combine those measurements. Also, two previous studies have shown no statistical difference between the RoC in the left and right side. This resulted in approximately 54 measurements for each location. However, due to variations in each cadaver, data for some locations could not be measured. The hypothesis of this study is H0: X=Y (e.g. the RoC in the hip = the RoC in the shoulder).

The Problem:

(A) The data is heteroscedastic and violates other parametric assumptions. I used the Shapiro-Wilk test in SPSS and all locations (Hip, Knee, Ankle, Shoulder) violated the normality assumption.
(B) The sample sizes are unequal
(C) Data cannot be dichotomized into male and female.
(D) The distributions are unequal. (e.g. uniform vs. skewed left or right). The two-sample Kolomogrov-Smirnov test was significant for most comparisons between X and Y (Hip vs. Knee, Hip vs. Shoulder, etc...) Also the histograms looked very different between each location.

First Impression:

I avoided using parametric tests like a t-test assuming unequal variance. As a results, I started investigating non-parametric alternatives like the Mann-Whitney-Wilcoxon test. Unfortunately, the distributions were too different. Just a note, the only statistical test I knew before starting this analysis was the t-test. I was lost and like a normal stereotypical “male” I didn’t ask for directions.


I did what most researchers do when they need a miracle, I prayed to the search engines. After a “grueling” literature review, I encountered the Brunner-Munzel test (BM) for stochastic equality. Based on a study from 2002 by Delaney and Vargha, I decided this new method should provide the best statistical method for analyzing the current data. I used R and the R-package: “lawstat” to implement the test. I also found a BM permutation test that can be used for smaller sample sizes. Please see the following Statistical Methods section for how I implemented this test and the Analysis and Example section for how I interpreted results from the BM test.

Again, any help, comments, recommendations, or critiques are more than welcomed. Please let me know if you have any questions. Thanks for reading.


P.S. I have attached relevant publications and histograms of each data.

General Questions

1. Is this a valid application of the Brunner-Munzel test?
2. Should I use a different statistical test? (e.g. Cliff’s delta)
3. Is it possible to use an alternative to the two-sample Kololmogrov-Smirnov test for overlapping distributions?


1. Schulz A, Neuhaeuser M. (2008) R-Program to perform Brunner and Munzels’s generalized Wilcoxon test as a permutation test.
2. Noguchi et. al. (2009). lawstat: An R package for biostatistics, public policy, and law. R package version 2.3.
3. Brunner E, Munzel U. The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small-Sample Approximation. Biometrical Journal. 2000;42:17-25.
4. Delaney HD, Vargha A. Comparing several robust tests of stochastic equality with ordinally scaled variables and small to moderate sized samples. Psychol Methods. 2002;7:485-503.
5. Cliff, N. (1996). Ordinal methods for behavioral data analysis. Mahwah, NJ: Erlbaum.
6. Wilcox RR, Keselman HJ. Modern robust data analysis methods: measures of central tendency. Psychol Methods. Sep 2003;8(3):254-274.

Statistical Methods

Statistical tests were performed using SPSS® (SPSS version 20.0, Chicago, IL), PASS 11 (NCSS, LLC, Kaysville Utah) and R (R version 2.15.0). Confidence was set at 0.05. The mean, median, inter-quartile range and standard deviation were calculated. The radius of curvature was measured on a discrete numerical scale of 2.5mm increments. Nonparametric and parametric tests were applied when indicated. Equality of error variance about the mean and median was tested using the Levene’s median or mean test. The null hypothesis that the data followed a normal distribution was assessed using the Shapiro-Wilk test. The two-sample Kolmogrov-Smirnov test was used to test the assumption of overlapping distributions. The two-sided Brunner Munzel test for stochastic equality was used to evaluate differences between the radii of curvature between anatomical sites. The relative effect size for stochastic equality (AXY) is a generalized equality of the following form:
where Prob = probability, X = Sample X, and Y = Sample Y. The Brunner-Munzel method is used to test the null hypothesis AXY=0.5.

Analysis and Example

X = measured RoC in the Shoulder (n=47, median=25mm, IQR (25-30), mean=26.4+/-2.8)
Y1 = measured RoC in the Knee (n=45, median=25mm, IQR (21.9-30), mean=25.1+/-3.8)
Y2 = measured RoC in the Ankle (n=52, median=25mm, range (17.5-30), mean=25.1+/-1.9)




Two-sided Brunner-Munzel Test Results

X vs. Y1
AXY = 0.40, p = 0.083, CI (0.29 - 0.51)
AXY - 0.5 = 0.10 Based on this result, can I conclude there is a 10 % chance Y1 (the Roc in the Knee) will be less than X (the Roc in the Ankle)?

X vs. Y2
AXY = 0.41, p = 0.041, CI (0.33 - 0.50)

Y1 vs. Y2
AXY = 0.48, p = 0.755, CI (0.36 - 0.60)
Last edited: