Search code examples
rcluster-computing

how to use the fviz_cluster cluster visualization package when the first column of my data has a column name


I am using the fviz_cluster package. There is a nice tutorial at https://afit-r.github.io/kmeans_clustering where it shows how to use the package to visualize the clusters. That is all straightforward. The data they use for the tutorial is df <- USArrests. When viewing the data it shows as

'data.frame':   50 obs. of  4 variables:
 $ Murder  : num  13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
 $ Assault : int  236 263 294 190 276 204 110 238 335 211 ...
 $ UrbanPop: int  58 48 80 50 91 78 77 72 80 60 ...
 $ Rape    : num  21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...

The dataframe first column that has the key of each observation does not have a column header. With that, the package works great.

my data has a column header obviously. How should I make my data appear like the USArrests so this works, but then I need to append the cluster number back to the data?

My data has 11 columns including the first column with the observation name so I did the clustering by skipping the first column using

[,2:11]

when I use this to visualize using 
fviz_cluster(allLfit, data = allLdf[,2:11]) 

it works but the plot uses ambiguous names

Any suggestions??

Thanks!!!

structure(list(PIN = structure(1:5, .Label = c("a", "b", "c", 
"d", "e"), class = "factor"), v1 = c(0.8, 0.36, 0.21, 0.84, 0.43
), v2 = c(0.87, 0.01, 0.56, 0.75, 0.98), v3 = c(0.48, 0.13, 0.26, 
0.34, 0.83)), row.names = c(NA, 5L), class = "data.frame")

enter image description here


Solution

  • Following the same procedure in the link, scaled the numeric columns to create 'df', while setting the row.names as the first column, get the kmeans ('k2') and use fviz_cluster on the 'k2' specifying the 'df' as the scaled dataset

    library(factoextra)
    library(cluster)
    df <- scale(`row.names<-`(allLdf[-1], allLdf[[1]]))
    k2 <- kmeans(df, centers = 2, nstart = 25)
    fviz_cluster(k2, data = df)
    

    -output

    enter image description here