python machine-learning scikit-learn k-means

how make kmeans on specific columns?

I would like to do a K-means on specific columns of my data set. As these are categorical data, I plan to do a onehot_encoding on it. Now I would like to know if it is possible to do K-means on specific columns and display the results (of a group for example) with all the columns?

For example i have col1, col2 and col3, K-means on col2 and col3which are onehot_encoded and display results with col1, col2 and col3. I hope I have clearly expressed my concern

Solution

This follows the basic documentation of kmeans:

from sklearn.cluster import KMeans
#here you select your columns
X = df[['col1', 'col2', 'col3']]
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
#this will give you the groups back
kmeans.predict(X)

So the kmeans predict command will give you the group back which you can add to your original data.