Search code examples
pythondata-sciencecluster-analysisk-means

When using scikit-learn K-Means Clustering, how can you extract the centroids in original data domain?


I am using the sklearn KMeans k-means clustering algorithm. Before clustering, I normalize my data from [0,1] using

scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(data)

Now, I can run the K-means algorithm.

kmeans = KMeans(
        init="random",
        n_clusters=3,
        n_init=10,
        max_iter=3000,
    )
    kmeans.fit(scaled_features)

Then, I can extract the 3 cluster centroids using kmeans.cluster_centers_. However, these centroids are in the normalized domain [0,1]. How can I re-transform these to the original data domain?


Solution

  • Get the corrdinates in [0,1] scale , then use scaler.inverse_tranform to convert them to the original coordinates.