Search code examples
rclassificationcluster-analysisk-meanscentroid

K-means clustering with pre-defined centroids


I'm trying to run K-means algorithm with predefined centroids. I have had a look at the following posts:

1.R k-means algorithm custom centers

2.Set static centers for kmeans in R

However, every time I run the command:

km = kmeans(df_std[,c(10:13)], centers = centroids)

I get the following error:

**Error: empty cluster: try a better set of initial centers**

I have defined the centroids as:

centroids = matrix(c(140.12774, 258.62615, 239.36800, 77.43235,
                      33.37736, 58.73077,  68.80000,  12.11765,
                     0.8937264, 0.8118462, 0.8380000, 0.8052941,
                     11.989858, 12.000000, 8.970000,  1.588235),
ncol = 4, byrow = T)

And my data, is a subset of a data frame say: df_std. It has been scaled already

df_std[,c(10:13)]

I'm wondering why would the system give the above error? Any help on this would be highly appreciated!


Solution

  • While browsing for the specific error that I posted above:

    Error: empty cluster: try a better set of initial centers
    

    I found the following link to a conversation:

    http://r.789695.n4.nabble.com/Empty-clusters-in-k-means-possible-solution-td4667114.html

    Broadly speaking, the above error is generated when the centroids don't match with the data.

    It can happen when k is a number: due to random starts of the k-means algorithm, there is a possibility that the centres do not match with data

    It may also happen when k represents the centroids (my case). The problem was: my data was scaled but my centroids were unscaled.

    The above shared link made me realise that there is a bug in my code. Hope it will help someone in a similar situation as mine!