Search code examples
pythonmachine-learningscikit-learnk-means

how make kmeans on specific columns?


I would like to do a K-means on specific columns of my data set. As these are categorical data, I plan to do a onehot_encoding on it. Now I would like to know if it is possible to do K-means on specific columns and display the results (of a group for example) with all the columns?

For example i have col1, col2 and col3, K-means on col2 and col3which are onehot_encoded and display results with col1, col2 and col3. I hope I have clearly expressed my concern


Solution

  • This follows the basic documentation of kmeans:

    from sklearn.cluster import KMeans
    #here you select your columns
    X = df[['col1', 'col2', 'col3']]
    kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
    #this will give you the groups back
    kmeans.predict(X)
    

    So the kmeans predict command will give you the group back which you can add to your original data.