I'm trying to run K-means algorithm with predefined centroids. I have had a look at the following posts:
1.R k-means algorithm custom centers
2.Set static centers for kmeans in R
However, every time I run the command:
km = kmeans(df_std[,c(10:13)], centers = centroids)
I get the following error:
**Error: empty cluster: try a better set of initial centers**
I have defined the centroids as:
centroids = matrix(c(140.12774, 258.62615, 239.36800, 77.43235,
33.37736, 58.73077, 68.80000, 12.11765,
0.8937264, 0.8118462, 0.8380000, 0.8052941,
11.989858, 12.000000, 8.970000, 1.588235),
ncol = 4, byrow = T)
And my data, is a subset of a data frame say: df_std. It has been scaled already
df_std[,c(10:13)]
I'm wondering why would the system give the above error? Any help on this would be highly appreciated!
While browsing for the specific error that I posted above:
Error: empty cluster: try a better set of initial centers
I found the following link to a conversation:
http://r.789695.n4.nabble.com/Empty-clusters-in-k-means-possible-solution-td4667114.html
Broadly speaking, the above error is generated when the centroids don't match with the data.
It can happen when k is a number: due to random starts of the k-means algorithm, there is a possibility that the centres do not match with data
It may also happen when k represents the centroids (my case). The problem was: my data was scaled but my centroids were unscaled.
The above shared link made me realise that there is a bug in my code. Hope it will help someone in a similar situation as mine!