I'm trying to do a K-means analysis in a dataframe like this:
URBAN AREA PROVINCE DENSITY
0 1 TRUJILLO 0.30
1 2 TRUJILLO 0.03
2 3 TRUJILLO 0.80
3 1 LIMA 1.20
4 2 LIMA 0.04
5 1 LAMBAYEQUE 0.90
6 2 LAMBAYEQUE 0.10
7 3 LAMBAYEQUE 0.08
(You can download it from here)
As you can see, the df refers to different urban areas (with different urban density values) inside provinces. So, I want to do the K-means clasification by one column: DENSITY. To do so, I execute this code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
df=pd.read_csv('C:/Path/to/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']])
df['KMeans_Clusters']=clustering.labels_
df
And I get this result, which is OK for this first part of the example:
URBAN AREA PROVINCE DENSITY KMeans_Clusters
0 1 TRUJILLO 0.30 0
1 2 TRUJILLO 0.03 0
2 3 TRUJILLO 0.80 1
3 1 LIMA 1.20 1
4 2 LIMA 0.04 0
5 1 LAMBAYEQUE 0.90 1
6 2 LAMBAYEQUE 0.10 0
7 3 LAMBAYEQUE 0.08 0
But now I want to do the k-means classification in urban areas by province. I mean, to repeat the same process inside any province. So I had tried with this code:
df=pd.read_csv('C:/Users/rojas/Desktop/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
df['KMeans_Clusters']=clustering.labels_
df
but I get this message:
AttributeError Traceback (most recent call last)
<ipython-input-4-87e7696ff61a> in <module>
3 clustering=KMeans(n_clusters=2, max_iter=300)
4
----> 5 clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
6
7 df['KMeans_Clusters']=clustering.labels_
AttributeError: 'KMeans' object has no attribute 'groupby'
Is there a way to do so?
try this
def k_means(row):
clustering=KMeans(n_clusters=2, max_iter=300)
model = clustering.fit(row[['DENSITY']])
row['KMeans_Clusters'] = model.labels_
return row
df = df.groupby('PROVINCE').apply(k_means)
results
URBAN AREA PROVINCE DENSITY KMeans_Clusters
0 0 1 TRUJILLO 0.30 0
1 1 2 TRUJILLO 0.03 0
2 2 3 TRUJILLO 0.80 1
3 3 1 LIMA 1.20 1
4 4 2 LIMA 0.04 0
5 5 1 LAMBAYEQUE 0.90 0
6 6 2 LAMBAYEQUE 0.10 1
7 7 3 LAMBAYEQUE 0.08 1