# z scores for anomaly detection

#### nkport

##### New Member
hi all,

I'm exploring z-scores as a method to spot anomalies in my data. What I don't really understand is what the benefit of using a z-score is over just looking at the percentage difference a value has from the mean average in the data?

For example, if I calculate that a value is 70% different from the mean, isn't that enough to assess if it's an anomaly ? What is the added value to calculate it's z-score?

Thanks for any tips!
Pat

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Z-scores tell you how many standard deviations the value is away from the mean and we know the general coverage area of standard deviations. For example 1, 2, and 3 standard deviations represent 68%, 95, ~99% data land within.

@Miner - any input for a person looking for outliers/anomalies?

#### Dason

##### Ambassador to the humans
For example 1, 2, and 3 standard deviations represent 68%, 95, ~99% data land within.
That's only true if the data is distributed with a normal distribution and if you're using the parameters (not estimates - although for large enough sample sizes the estimates will work mostly but you might want to consider a robust estimate if you expect there are some values that don't actually follow the distribution).

For the general case you can use Chebyshev's Inequality.

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Correct, I thought about saying that - but it is Monday. I would imagine also, if the value was really an anomaly it would be pulling the mean, so the anomaly if erroneous would actually be further away from the true mean then suggested by the above process.

thanks all.