I am having difficulty interpreting the results of the cluster_centers_
array output.
Consider the following MWE:
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
import numpy as np
# Load the data
iris = load_iris()
X, y = iris.data, iris.target
# shuffle the data
shuffle = np.random.permutation(np.arange(X.shape[0]))
X = X[shuffle]
# scale X
X = (X - X.mean()) / X.std()
# plot K-means centroids
km = KMeans(n_clusters = 2, n_init = 10) # establish the model
# fit the data
# km centers
array([[ 1.43706001, -0.29278015, 0.75703227, -0.89603057],
[ 0.78079175, -0.04797174, -0.96467783, -1.60799713]])
In the array above, it is unclear to me how I use these values to identify the cluster center. I told K-Means to give me 2 clusters, yet it returns 8 values for me, but they cannot be x, y coordinates for all 4 features.
If I plot 1.43706001, -0.29278015
; this makes intuitive sense, its a cluster right in the middle of a predicted cluster.
So if this is the case, and my second cluster is 0.78079175, -0.04797174
, what are the values in columns 2 and 3 for?
From documentation
: ndarray of shape (n_clusters, n_features)
The iris database has 4 features (X.shape = (150,4)
), you want Kmeans to get two centroids in 4-dimensional feature space. cluster_centers_
does exactly that, each entry of list corresponds to the coordinates of the centroid in R^4.