I am using the fviz_cluster package. There is a nice tutorial at https://afit-r.github.io/kmeans_clustering where it shows how to use the package to visualize the clusters. That is all straightforward. The data they use for the tutorial is df <- USArrests. When viewing the data it shows as
'data.frame': 50 obs. of 4 variables:
$ Murder : num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
$ Assault : int 236 263 294 190 276 204 110 238 335 211 ...
$ UrbanPop: int 58 48 80 50 91 78 77 72 80 60 ...
$ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
The dataframe first column that has the key of each observation does not have a column header. With that, the package works great.
my data has a column header obviously. How should I make my data appear like the USArrests so this works, but then I need to append the cluster number back to the data?
My data has 11 columns including the first column with the observation name so I did the clustering by skipping the first column using
[,2:11]
when I use this to visualize using
fviz_cluster(allLfit, data = allLdf[,2:11])
it works but the plot uses ambiguous names
Any suggestions??
Thanks!!!
structure(list(PIN = structure(1:5, .Label = c("a", "b", "c",
"d", "e"), class = "factor"), v1 = c(0.8, 0.36, 0.21, 0.84, 0.43
), v2 = c(0.87, 0.01, 0.56, 0.75, 0.98), v3 = c(0.48, 0.13, 0.26,
0.34, 0.83)), row.names = c(NA, 5L), class = "data.frame")
Following the same procedure in the link, scale
d the numeric columns to create 'df', while setting the row.names
as the first column, get the kmeans
('k2') and use fviz_cluster
on the 'k2' specifying the 'df' as the scale
d dataset
library(factoextra)
library(cluster)
df <- scale(`row.names<-`(allLdf[-1], allLdf[[1]]))
k2 <- kmeans(df, centers = 2, nstart = 25)
fviz_cluster(k2, data = df)
-output