Search code examples
rk-meanspca

Get clusters from PCA r


I have a PCA that shows two really big clusters and I dont know how to figure out which of my samples are in each cluster.

enter image description here

If it helps, Im using prcomp to generate the PCA:

pca1 <- autoplot(prcomp(df), label = TRUE, label.size = 2)

My approach has been to attempt to cluster the PCA output using kmeans with 2 groups to get the clusters:

pca   <- prcomp(df, scale.=TRUE)
clust <- kmeans(pca$x[,1:2], centers=2)$cluster

I can then make a beautiful plot, but I am still lost as to which samples are in each cluster. For reference, here is the plot generate if I graph the kmeans output:

enter image description here

As you can see in the first PCA plot, the labels literally say which sample each dot is. My ideal output would be a two column txt file with the sample name in one column, and the group it belongs to in the other column.

All that aside, if there is a better way, please let me know.

Thanks in advance.

Here is a chunk of my data:

              a        b       c       b      e
Sample_1013 312011  624559  625898  534309  220415

Sample_1046 474774  949458  951145  843049  366136

Sample_104  645363  1290450 1292520 919474  272200

Sample_1057 267319  534685  535294  690574  422645

Sample_106  414065  830571  834527  657354  234130

Sample_107  299289  602483  603756  566256  262153

Solution

  • In my question, clust is the name of the output from my kmeans:

    clust <- kmeans(pca$x[,1:2], centers=2)$cluster
    

    I typed clust into the terminal and got which samples belong to each group:

    > clust
    Sample_1013     Sample_1046      Sample_104     Sample_1057      Sample_106      Sample_107 
              1               1               1               1               1               1 
    Sample_1098      Sample_109     Sample_1109     Sample_1129     Sample_1130     Sample_1140 
              1               1               1               1               1               1 
    Sample_1149      Sample_115      Sample_118     Sample_1220     Sample_1223     Sample_1225 
              1               1               1               1               1               1 
    

    Hopefully this helps someone.