I have a PCA that shows two really big clusters and I dont know how to figure out which of my samples are in each cluster.
If it helps, Im using prcomp to generate the PCA:
pca1 <- autoplot(prcomp(df), label = TRUE, label.size = 2)
My approach has been to attempt to cluster the PCA output using kmeans with 2 groups to get the clusters:
pca <- prcomp(df, scale.=TRUE)
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I can then make a beautiful plot, but I am still lost as to which samples are in each cluster. For reference, here is the plot generate if I graph the kmeans output:
As you can see in the first PCA plot, the labels literally say which sample each dot is. My ideal output would be a two column txt file with the sample name in one column, and the group it belongs to in the other column.
All that aside, if there is a better way, please let me know.
Thanks in advance.
Here is a chunk of my data:
a b c b e
Sample_1013 312011 624559 625898 534309 220415
Sample_1046 474774 949458 951145 843049 366136
Sample_104 645363 1290450 1292520 919474 272200
Sample_1057 267319 534685 535294 690574 422645
Sample_106 414065 830571 834527 657354 234130
Sample_107 299289 602483 603756 566256 262153
In my question, clust is the name of the output from my kmeans:
clust <- kmeans(pca$x[,1:2], centers=2)$cluster
I typed clust into the terminal and got which samples belong to each group:
> clust
Sample_1013 Sample_1046 Sample_104 Sample_1057 Sample_106 Sample_107
1 1 1 1 1 1
Sample_1098 Sample_109 Sample_1109 Sample_1129 Sample_1130 Sample_1140
1 1 1 1 1 1
Sample_1149 Sample_115 Sample_118 Sample_1220 Sample_1223 Sample_1225
1 1 1 1 1 1
Hopefully this helps someone.