......and the mean is not an appropriate measure of central tendency when data are skewed.
I should have said that it was just this statement that I disagreed with.
If you are interested in the mean or median (of course both are good measures of localisation) depends on your objective. And if you apply for a job and are going to work there for 100 weeks, then which would you be more interested to be informed about? The weekly median salary or the mean salary? I would like to know the mean since it corresponds best to the wage sum.
Or if you are at a hospital department, and you want to cure patients with treatmen A or B, then you are more interested in the mean since if is related to the long run sum.
Consider there two results for a and b:
Code:
> a <- c(1 , 2, 4, 8, 16)
> b <- c(1/4, 1, 4, 16, 64)
> mean(a)
[1] 6.2
> mean(b)
[1] 17.05
> sum(a)
[1] 31
> sum(b)
[1] 85.25
Clearly the median is the same, 4, but mean of b is much higher.
Also, there is a tendency to talk about outliers as some kind off error. But there are lots of natural data that are highly skewed, e.g the income distribution and substances in environment pollutants. There is nothing wrong with these values. But to delete them would be wrong.