x^T * S^-1 * x

S is the covariance matrix for any given observation x.

x^T is the transpose of x

This is my reasoning for how its derived. The covariance matrix S is orthogonally diagonalizable and if you plot all your observations in a scatterplot you'll see that they have all been scaled along the orthogonal axes/basis vectors, defined by S eigenvectors, by magnitudes equal their corresponding eigenvalues. Therefore you cannot take the Euclidean distance of x from the origin/mean as it stands.

You need to first transform x to the vector it would have been if it wasen't affected by any covariances, meaning, you have to undo the transformation applied to it by S.

This vector is simply z = S^-1 * x

Now you can use the Euclidean distance to find the distance of z to the mean. Its squared distance is given by its dot product

z^T * z = (S^-1 * x)^T * (S^-1 * x) = x^T * S^-1 *S^-1 * x

As you can see I've obtain a second S^-1 factor so my derivation is wrong.

Could anyone please help me find my mistake and also help me understand how to to derive the formula above?

Thanks in advance.