Search code examples
pythonnumpyscikit-learnk-means

k-means with selected initial centers


I am trying to k-means clustering with selected initial centroids. It says here that to specify your initial centers:

init : {‘k-means++’, ‘random’ or an ndarray} 

If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

My code in Python:

X = np.array([[-19.07480000,  -8.536],
              [22.010800000,-10.9737],
              [12.659700000,19.2601]], np.float64)
km = KMeans(n_clusters=3,init=X).fit(data)
# print km
centers = km.cluster_centers_
print centers

Returns an error:

RuntimeWarning: Explicit initial center position passed: performing only one init in k-means instead of n_init=10
  n_jobs=self.n_jobs)

and return the same initial centers. Any idea how to form the initial centers so it can be accepted?


Solution

  • The default behavior of KMeans is to initialize the algorithm multiple times using different random centroids (i.e. the Forgy method). The number of random initializations is then controlled by the n_init= parameter (docs):

    n_init : int, default: 10

    Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

    If you pass an array as the init= argument then only a single initialization will be performed using the centroids explicitly specified in the array. You are getting a RuntimeWarning because you are still passing the default value of n_init=10 (here are the relevant lines of source code).

    It's actually totally fine to ignore this warning, but you can make it go away completely by passing n_init=1 if your init= parameter is an array.