Deriving the Mahalanobis distance formula, where is the mistake in my reasoning?

#1
The squared Mahalanobis distance/length of an observation vector x from its mean (assuming its the zero vector) is given by

x^T * S^-1 * x

S is the covariance matrix for any given observation x.
x^T is the transpose of x

This is my reasoning for how its derived. The covariance matrix S is orthogonally diagonalizable and if you plot all your observations in a scatterplot you'll see that they have all been scaled along the orthogonal axes/basis vectors, defined by S eigenvectors, by magnitudes equal their corresponding eigenvalues. Therefore you cannot take the Euclidean distance of x from the origin/mean as it stands.

You need to first transform x to the vector it would have been if it wasen't affected by any covariances, meaning, you have to undo the transformation applied to it by S.

This vector is simply z = S^-1 * x

Now you can use the Euclidean distance to find the distance of z to the mean. Its squared distance is given by its dot product

z^T * z = (S^-1 * x)^T * (S^-1 * x) = x^T * S^-1 *S^-1 * x

As you can see I've obtain a second S^-1 factor so my derivation is wrong.

Could anyone please help me find my mistake and also help me understand how to to derive the formula above?

Thanks in advance.