Distance Measures, High Dimensions

#1
Hi all,

I am learning how to handle high dimensional data. I am trying to cluster a matrix, that is about 2000×5000, with log-likelihood values in its cells. As a first step, to be able to visualize my data, I used Principal Component Analysis (PCA) and Principal Coordinate Analysis (PCoA). As the former technique uses the variance-covariance matrix, and the latter a distance matrix, I thought this allows to look at the data from multiple angles and perhaps discover interesting trends (e.g. dense spots, possible clusters).

However, for PCoA, there are so many distance measures to use, and I heard that Euclidean distance (the most common measure) is not very good in high dimensions. For that reason, I tried many distance measures (ones that I could find implemented in R), but I am not completely sure how to choose among them.

Could someone please explain me that
i) why does Eucledian distance not perform well in high dimensions,
ii) what concerns should be taken into account when measuring the distance in high dimensions, and
iii) what distance measures fit well for numeric (log-likelihood) valued vectors, whose dimension is ~2000, and why?

Thank you very much for your kind answers in advance!
 
Last edited: