Search code examples
rk-meanshierarchical-clustering

How to fix kmeans error in r : 'more cluster centers than distinct data points'


When I run a kmeans algorithm I receive this error :

Error in kmeans(x, 2, 15) : 
  more cluster centers than distinct data points.

How can this error be fixed and what does it mean ? I think my data points are distinct ?

Here are my files and the r code I am using to generate kmeans :

rnames.csv : 
"a1","a2","a3"

cells.csv : 
0,1,2,1,4,3,5,3,4

cnames.csv : 
"google","so","test"

cells = c(read.csv("c:\\data-files\\kmeans\\cells.csv", header = TRUE))
rnames = c(read.csv("c:\\data-files\\kmeans\\rnames.csv", header = TRUE))
cnames = c(read.csv("c:\\data-files\\kmeans\\cnames.csv", header = TRUE))

x <- matrix(cells, nrow=3, ncol=3, byrow=TRUE, dimnames=list(rnames, cnames))

# run K-Means
km <- kmeans(x, 2, 15)

Solution

  • Fix for this is to use :

    cells = c(read.csv("c:\\data-files\\kmeans\\cells.csv", header = FALSE))
    rnames = c(read.csv("c:\\data-files\\kmeans\\rnames.csv", header = FALSE))
    cnames = c(read.csv("c:\\data-files\\kmeans\\cnames.csv", header = FALSE))
    

    instead of

    cells = c(read.csv("c:\\data-files\\kmeans\\cells.csv", header = TRUE))
    rnames = c(read.csv("c:\\data-files\\kmeans\\rnames.csv", header = TRUE))
    cnames = c(read.csv("c:\\data-files\\kmeans\\cnames.csv", header = TRUE))