Mean and Median for testing data distribution

DanR

New Member
#1
Hi,

I noticed that in the following articles data distribution is tested via checking for >10% difference in mean and median.

http://www.sciencedirect.com/science/article/pii/0022175996000130

http://www.sciencedirect.com/science/article/pii/002217599500078O

However, when I follow the references they lead to a resource in German, which I unfortunately do not speak.

I have searched online and in various textbooks but I cannot find any reference indicating what sort of difference between mean and median is acceptable in normal distributions.

While for research I would always use more robust test, e.g. Anderson-Darling or Shapiro-Wilk's, it would be nice to have an idea of the range so I could get a feel for data stored in Excel.

Can anyone direct me to a reference which suggests some sort of rule of thumb?

All help is much appreciated.
Regards
Dan
 

BGM

TS Contributor
#2
Without looking at the detail, I guess you are looking at the statistic

\( \frac {|\bar{X}_n - M_n|} {S_n} \)

where \( \bar{X} \) is the sample mean, \( M \) is the sample median, \( S \) is the sample standard deviation and \( n \) is the sample size.

The distribution of this statistic is complicated (regardless of the original random sample), and should be independent of the nuisance parameters (but dependent on the sample size \( n \)).

Anyway for a given sample size \( n \), you can always use simulation to estimate the quantiles of this distribution. Simulate \( n \) normal random variates, calculate this statistic, store it and repeat many times independently. From the stored value you can pick out the sample quantile as an estimate.

Naturally if this statistic is large, then it provide a stronger evidence against the population is normal. So you may pick out, e.g. the 95% quantile from the stored value as your cutoff point, and compare whether the observed test statistic in your data exceed this certain cutoff value or not.
 

DanR

New Member
#3
Hi,

Thanks for getting back to me. I have never seen the statistic you detailed, though it does look interesting - do you have a reference for it? Or some sort of name I could Google? It would be handy to get an idea of what sort of range of results I could expect to see.

With regards to the orignal question the equation looks more like (mean-median)/mean. In the spreadsheet you can download the author uses an 'IF' function to confirm the result is within the range 0.9-1.1
Provided the result is within this range the analysis approach is parametric, if the result is outside the range the analysis approach is non-parametric.
 
#4
Is it so that you really want to test if the data is normally distributed?

The normal distribution is not the only parametric distribution. There are many parametric distributions for skewed data (where the mean is larger than the median).

The easiest way to check the distribution of the data is to do a histogram.

The t-test is robust to non-normality so don't run away to do a Mann-Whitney test.
check this link http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3445820/
 
#5
Dan,

That is an utterly interesting comment you posted, of the ScienceDirect articles talking about checking for >10% difference in mean and median, to check for normal distribution. (I was asking a similar question, see http://www.talkstats.com/showthread...normal-judging-from-the-mean-median-and-modus.) However, I cannot access the articles and they are asking a lot of money. Would you have the reference data of the German resource you're talking about? As a return favor, I can try to translate the most important parts. I am Dutch and speak German quite well, even though statistical terminology could be very different in German.