I have implemented k-means elbow plot to find the optimum K for my data (after doing PCA). I have gotten the elbow plot shown below. My question is: I think the optimum K is 3 in my case (this is where a sudden drop occurs/point of inflection)? But looking at my X_PCA_1 VS. X_PCA_2 plot, I think the data can be clustered into 2 clusters only? or am I mistaken?
Note: I am still a beginner.
If you want to plot to see clearly the clusters, first you can use PCA with 3 components:
pca = PCA(3)
X_pca = pca.fit_transform(scaled_df)
Then, you can append each point to each dimension:
X = []
Y = []
Z = []
for i in X_pca:
X.append(i[0])
Y.append(i[1])
Z.append(i[2])
From here you can choose a library to plot 3d graphs.
model = KMeans(n_clusters=3)
cluster_kmeans = model.fit_predict(scaled_df)
df_graph = pd.DataFrame({'X': X,
'Y': Y,
'Z': Z,
'labels': cluster_kmeans
})
fig = plt.figure(figsize=(20, 10))
ax = fig.add_subplot(111, projection='3d')
for s in df_graph.labels.unique():
ax.scatter(df_graph.X[df_graph.labels==s],df_graph.Y[df_graph.labels==s],df_graph.Z[df_graph.labels==s],label=s)
ax.legend()
plt.show()