I'm working on k-mean algorthim to cluster list of number, If i have an array (X)
X=array([[0.85142858],[0.85566274],[0.85364912],[0.81536489],[0.84929932],[0.85042336],[0.84899714],[0.82019115], [0.86112067],[0.8312496 ]])
then I run the following code
from sklearn.cluster import AgglomerativeClustering
cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
cluster.fit_predict(X)
for i in range(len(X)):
print("%4d " % cluster.labels_[i], end=""); print(X[i])
i got the results
1 1 [0.85142858]
2 3 [0.85566274]
3 3 [0.85364912]
4 0 [0.81536489]
5 1 [0.84929932]
6 1 [0.85042336]
7 1 [0.84899714]
8 0 [0.82019115]
9 4 [0.86112067]
10 2 [0.8312496]
how to get the max number in each cluster with value of (i) ? like this
0: 0.82019115 8
1: 0.85142858 1
2: 0.8312496 10
3: 0.85566274 2
4: 0.86112067 9
First group them together as pair using zip
then sort it by values(second element of pair) in increasing order and create a dict out of it.
Try:
res = list(zip(cluster.labels_, X))
max_num = dict(sorted(res, key=lambda x: x[1], reverse=False))
max_num:
{0: array([0.82019115]),
2: array([0.8312496]),
1: array([0.85142858]),
3: array([0.85566274]),
4: array([0.86112067])}
Edit:
Do you want this?
elem = list(zip(res, range(1,len(X)+1)))
e = sorted(elem, key=lambda x: x[0][1], reverse=False)
final_dict = {k[0]:(k[1], v) for (k,v) in e}
for key in sorted(final_dict):
print(f"{key}: {final_dict[key][0][0]} {final_dict[key][1]}")
0: 0.82019115 8
1: 0.85142858 1
2: 0.8312496 10
3: 0.85566274 2
4: 0.86112067 9
import pandas as pd
df = pd.DataFrame(zip(cluster.labels_,X))
df[1] = df[1].str[0]
df = df.sort_values(1).drop_duplicates([0],keep='last')
df.index = df.index+1
df = df.sort_values(0)
df:
0 1
8 0 0.820191
1 1 0.851429
10 2 0.831250
2 3 0.855663
9 4 0.861121