- Thread starter ooostats
- Start date
- Tags non-normality outliers tukey z-scores

and the mean is not an appropriate measure of central tendency when data are skewed.

I don't agree. I think that the mean is an appropriate measure of central tendency. Not only because the sample mean is an unbiased estimate of the population mean, but because you are often interesed in the mean or the sum. Suppose you apply for a job, then it is the wage sum over a longer period that is relevant, not the week by week median. You will want to know the sum i.e. the mean.

You can say that both measures mean and median are good for central tendency, each had a different aspect. Now the question is which method is better for outliers calculation?

Using mean in skewed distribution will result uneven tails, say you potentially get more outliers from one tail.

......and the mean is not an appropriate measure of central tendency when data are skewed.

If you are interested in the mean or median (of course both are good measures of localisation) depends on your objective. And if you apply for a job and are going to work there for 100 weeks, then which would you be more interested to be informed about? The weekly median salary or the mean salary? I would like to know the mean since it corresponds best to the wage sum.

Or if you are at a hospital department, and you want to cure patients with treatmen A or B, then you are more interested in the mean since if is related to the long run sum.

Consider there two results for a and b:

Code:

```
> a <- c(1 , 2, 4, 8, 16)
> b <- c(1/4, 1, 4, 16, 64)
> mean(a)
[1] 6.2
> mean(b)
[1] 17.05
> sum(a)
[1] 31
> sum(b)
[1] 85.25
```

Also, there is a tendency to talk about outliers as some kind off error. But there are lots of natural data that are highly skewed, e.g the income distribution and substances in environment pollutants. There is nothing wrong with these values. But to delete them would be wrong.