I have a data to made a k-means clustering:
from sklearn.cluster import KMeans
num_clusters = 5
km = KMeans(n_clusters = num_clusters, init="random", max_iter=100, n_init=1)
x=km.fit(X)
print(km.labels_)
Output:
[3 0 1 ... 2 0 0]
Then i made a plot:
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns; sns.set()
plt.scatter(X[:,0],X[:,1], c=km.labels_, cmap='rainbow')
But got this result:
What could be a reason why i got this results?
You're plotting the two first dimensions of X (race and gender) with colors being the clusters found by K-means. There's thus no surprise why you get these results.
I believe what you are looking for is a way to visually check that the clustering done by K-means makes sense. For that, you'll have to visualise all the features used by K-means to make clusters: but that's 41, and our eyes can not see more than 4.
An interesting solution here is dimension reduction: most of the information in the 41 features can be synthetized into less (e.g. 2). For example using principal component analysis (PCA), you can compress X into two features. Try the following:
from sklearn.decomposition import PCA
X_pca = PCA.fit_transform(X, n_dim=2)
plt.scatter(X_pca[:,0], X_pca[:,1], c=km.labels_, cmap='rainbow')