Search code examples
python-3.xscikit-learnk-means

Why does kmeans give exactly the same results everytime?


I have re-run kmeans 4 times and get

enter image description here

From other answers, I got that

Everytime K-Means initializes the centroid, it is generated randomly.

Could you please explain why the results are exactly the same each time?

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt 
from sklearn.cluster import KMeans
%config InlineBackend.figure_format = 'svg' # Change the image format to svg for better quality
don = pd.read_csv('https://raw.githubusercontent.com/leanhdung1994/Deep-Learning/main/donclassif.txt.gz', sep=';')

fig, ax = plt.subplots(nrows=2, ncols=2, figsize= 2 * np.array(plt.rcParams['figure.figsize']))

for row in ax:
    for col in row:
        kmeans = KMeans(n_clusters = 4)
        kmeans.fit(don)
        y_kmeans = kmeans.predict(don)
        col.scatter(don['V1'], don['V2'], c = y_kmeans, cmap = 'viridis')
        centers = kmeans.cluster_centers_
        col.scatter(centers[:, 0], centers[:, 1], c = 'red', s = 200, alpha = 0.5);

plt.show()

Solution

  • I post @AEF's comment to remove this question from unanswered list.

    Random initialziation does not necessarily mean random result. Easiest example: k-means with k=1 always finds the mean in one step, regardless of where the center is initialised.