Reporting Non-Parametric Data: Median or Mean (or both)?

#1
At the most simple level, my question is which descriptive statistics are the most kosher for reporting when using nonparametric tests like Mann-Whitney and Kruskall-Wallace?

My reason for asking this is a recent kerfluffle over some stats reporting I did years ago for a paper in STEM education. We had several variables being tested, but these variables faced a few issues when it came to applying ANOVA and t-tests: some variation from normality, uneven n's, etc. Most of the normality problems were due to floor effects since these were time measures and could not be negative. For consistency, we ended up doing nonparametric tests throughout the paper: 3-way Kruskall-Wallis then post hoc Mann-Whitney (with adjustments). When it came to publishing decisions, we decided to report the means and SDs along with the test statistics.

Half a decade later, we've received some complaints that we should have reported medians instead. From what I've read and seen, my knowledge as a stats consultant working in education and STEM research says that it's debatable which is better to report. Medians make sense because the tests are median-based. However, my argument was and still is that the data was mostly normal and that the mean and standard deviation gives more information.

Of course, the sensible answer is to report mean, median, and SD. We didn't because the table was already at the point of illegibility and the authors refused to add a second table.

So, is there any consensus here? Published opinions? Or is this a horrible, unending debate like the vi versus emacs debates in computer science?

-- kate
 

hlsmith

Omega Contributor
#2
No solid consensus, just have to justify their actions. I believe it makes more sense to include medians and interquartile ranges since skewness may result in means and std that may not fit the traditional interpretation.

To provide a little support for your option is the stance that around a sample size of 30 can be adequate in genrralizing to a standard normal distribution.
 

gianmarco

TS Contributor
#3
Hi!
Just my two cents.
I agree with hlsmith. But a general concern of mine is that, in principle, the KW test (and also MW for that matter) does not test the equality in median. I do not remember where (but you could make a search into this same Forum), but we have discussed cases in which the median was similar, and yet the MW test showed a significant difference....
May be it could be useful to go back to the original publication by Mann and Whitney to have a first-hand look into the mechanics of the test (I mean, into the test as they conceived it).

Cheers,
Gm

edit:
further, MW test could be also conceived in term of probability of superiority (i.e., the probability of randomly selecting an observation from one of the samples that is greater than a randomly selected one from the other sample). See from this standpoint, reporting mean or median could be put aside (provided that the context of the study, or the referee, will allow that).
 
Last edited:
#4
The Wilcoxon-Mann-Whitney (WMW) does not test if the medians are equal. It tests the null hypothesis that P(Y1<Y2) = 0.5. But it is often described as of testing the medians.

In a number of papers by Fagerland and Sandvik investigates tests like WMW and t-tests and so on.

I have thought that the most “fair” description would be to show the diagram of accumulated empirical distribution function for both groups. If the entire distribution for Y1 is to the left of the distribution of Y2, then there is no strange interpretation.

But OP asked for a one-number-summary. We have been asked to vote for either the median or the mean. Sorry, but here I bring in another number.

O'Brien et.al. is suggesting to use an odds parameter.

That is surprising but seems like a good suggestion.

- - -

Also the user Cowboybear has discussed these issues on this site several times.
 
#5
Hi there!

Sorry to revive an old thread; this came up in my searches to try and answer my specific question.

What is considered best practice when presenting a descriptives table and you have a mixture of normal and non-normal variables? Some variables required parametric testing and others required non-parametric testing. Is it too confusing to report mean (SD) and median (IQR) for each respective variable all in the same table? If so, what would be the recommendation to simplify the reading? Mean, median (SD), as suggested by OP, perhaps?
 

Karabiner

TS Contributor
#6
Sorry to revive an old thread;
Yes. Does it mean that we should re-read the whole thread before turning to your question? Or should we just re-read some specific parts of the whole thread (which ones)? Or, would it perhaps make no sense to re-read the old thread, since your problem can sufficently be discussed without reference to the old thread?

With kind regards

Karabiner