Hi All,

First of all, thanks for showing an interest in this thread, and secondly, if your so kind to give your thoughts, I'd be so grateful.

Background

- We work with financial bank statements for fraud investigation.

- I have converted a bank statements debit transactions into a scatter plot, and a probability density function curve (1D), using kernel density estimation with bandwidth optimised and the kernel is epanechnikov.

- Works perfect, we now have a probability density plot, which without any doubt represents our underlying dataset. Hard work done!

- It produces a graph with very low probabilities. Max being in region of 0.002, and lowest outliers much much less (obviously close to 0, if not 0)

- Obviously plugging in the value into KDE will reveal the probability at that point x.

- I need a definative answer on whether this is an outlier or not. How do I know the threshold for this outlier quantification??? (e.g. If p(x) > 0.01 etc.)

First of all, thanks for showing an interest in this thread, and secondly, if your so kind to give your thoughts, I'd be so grateful.

Background

- We work with financial bank statements for fraud investigation.

- I have converted a bank statements debit transactions into a scatter plot, and a probability density function curve (1D), using kernel density estimation with bandwidth optimised and the kernel is epanechnikov.

- Works perfect, we now have a probability density plot, which without any doubt represents our underlying dataset. Hard work done!

- It produces a graph with very low probabilities. Max being in region of 0.002, and lowest outliers much much less (obviously close to 0, if not 0)

- Obviously plugging in the value into KDE will reveal the probability at that point x.

- I need a definative answer on whether this is an outlier or not. How do I know the threshold for this outlier quantification??? (e.g. If p(x) > 0.01 etc.)

Last edited: