Search code examples
rcluster-analysissom

How to get the number of cluster on the SOM plot, in each of the node?


I want to understand to which node my wine is connected after getting a som plot.

That's why firstly we need to get data.frame with the name of wine and the number of cluster that wines belongs to. And next step would be to see the number of the cluster on this plot. But idk how:)

data(wines)
View(wines)    
#adding id for each wine

wines<-as.data.frame(wines)
wines$ID <- seq.int(nrow(wines))

#substract the id to know the "name" of wine

som_wines<-wines[,-14]
som_model<-som(scale(som_wines), grid = somgrid(5, 5, "hexagonal"))
som_codes<-as.data.frame(som_model$codes)

#ilustrating needed quantity of clusters

mydata <- as.data.frame(som_model$codes)
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) 
for (i in 2:15) {
  wss[i] <- sum(kmeans(mydata, centers=i)$withinss)
}
plot(wss)

#som plot

som_cluster <- cutree(hclust(dist(som_codes)), 3)
plot(som_model, type="codes",bgcol= som_cluster, main = "Clusters") 
add.cluster.boundaries(som_model, som_cluster)   ` 

#Here we got 3 clusters. Creating the dataframe which defines wines id's to cluster groups.

cluster_details <- data.frame(id=wines$ID, cluster=som_cluster[som_model$unit.classif])

And now I want numbers of clusters to be shown there, on the som plot. Are there any suggestions how to cope with that? Would appreciate any answer :)


Solution

  • the answer is situated here: add clusters and nodes from SOMbrero package to training data

    Particularly in these lines :

    SomModel <- som(
        data = TrainingMatrix,
        grid = GridDefinition,
        rlen = 10000,
        alpha = c(0.05, 0.01),
        keep.data = TRUE
    )
    
    nb <- table(SomModel$unit.classif)
    groups = 5
    tree.hc = cutree(hclust(d=dist(SomModel$codes[[1]]),method="ward.D2",members=nb),groups)
    
    
    result <- OrginalData
    result$Cluster <- tree.hc[SomModel$unit.classif]
    result$X <- SomModel$grid$pts[SomModel$unit.classif,"x"]
    result$Y <- SomModel$grid$pts[SomModel$unit.classif,"y"]