Best way to present result from large dataframe (26x1522) in an easily understandable manner

#1
I am presently doing nucleotide diversity analysis for large number of genes. I have 1 dataframe (26 columns x unknown number of rows) for each gene. The 26 columns each correspond to 1 population (African, American, etc). The rows correspond to the nucleotide positions studied for each gene. Now I have to present this data in the form of a chart in order to compare the nucleotide diversity distribution for each of the 26 population. What would be the best visual presentation of this data without losing information at each position. Nucleotide diversity values range from 0-1 (with most values occuring between 0 and 0.5). I have principal component analysis to see the clustering of the data but I lose much of the information at each nucleotide position. I am doing my analysis in R. Any suggestion would be most appreciated!
 

staassis

Active Member
#2
You can run PCA on rows and not columns. This would allow you to better understand how the diversity values of nucleotides move together. As your study conjectures, different genetic groups may have different pictures. Then you can create a heat map for [K principal components] * [26 populations].
 
Last edited:

ahusn

New Member
#3
I tried the PCA plot on R. Please find the file attached. Are you able to take out any meaningful interpretation (in terms of clustering by the different populations) from the plots . The first one is ok but the other two i'm not sure I can interpret them. Thanks.
 

Attachments